Skip to content

Commit

Permalink
Merge pull request #112 from RTIInternational/npsas_main
Browse files Browse the repository at this point in the history
Merge npsas_main into master
  • Loading branch information
AstridKery committed Sep 8, 2021
2 parents d87d6c2 + 2ea1711 commit c95eaa0
Show file tree
Hide file tree
Showing 35 changed files with 7,556 additions and 2,808 deletions.
2 changes: 1 addition & 1 deletion backend/django/.flake8
@@ -1,3 +1,3 @@
[flake8]
exclude = manage.py
ignore = E501,W503,E203, W605, E231
ignore = E501,W503,E203,W605,E231
4 changes: 4 additions & 0 deletions backend/django/core/data/example-labels.csv
@@ -0,0 +1,4 @@
Label,Description
about cats,this card has some content about cats
not about cats,
unclear,
1,019 changes: 988 additions & 31 deletions backend/django/core/data/example.csv

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions backend/django/core/data/example_old.csv
@@ -0,0 +1,31 @@
Text,Label
The feds are still planning to put a woman on the $10 dollar bill #abouttime #equalityforall #historychanged #SemST,
"If a man wants abortion but the woman wants to keep it, why on earth should he pay for child support? #doublestandards #SemST",
"Sleezy men: DO NOT catcall at me and @emmalizrizz while we're out walking. It's NOT welcome, attractive or harmless. #SemST",
"Men are from Earth, women are from Earth. Deal with it. - George Carlin #QuotesOfTheDay #EqualityForAll #equality #SemST",
"Celebrities, please stop declaring yourselves ""not a feminist"" and coloring it with a different term. #equality #samething #SemST",
"Women in the middle east get stoned to death for being raped but hey, check out the new friend icon in facebook! #SemST",
"There really are days when I'm ashamed to admit I share gene's with some people, today is one of those days. #SemST",
"I hate it when ignorant losers say ""another feminist cured."" I don't need to be cured for wanting to be treated like a person. #SemST",
#GamerGate tag is about games Other things? They have their own tags: #SJWlogic #SaveTheCover #ShirtStorm #SemST,
The shaming we do around celebrity choices in photoshoots is highly related to the principles behind #SemST,
Why are you afraid of me when I open my mouth but not when I open my legs? #SemST,
@Paula0422 btw if you like Chris Brown I WILL SET YOU TO FLAMES WITH MY SPARK FARTS! #rhianna #SemST,
"The anti-gamergate people have behaved abominably, and have no moral ground from which to stand in judgement. #gamergate #antiSJW #SemST",
"Unless u have a health or physiological problem, u should not be proud of ur overweight body! #health #SemST",
What young woman doesn't want to be #sugarbaby? It's far better than slaving away in a cubicle for nothing. #sugardaddy #SemST,
"Since when did #feminists get to decide what men are, and are not, allowed to be turned on by? #SemST",AGAINST
We celebrate the company of 50 great people & organisations. From #fiji-lovers #perfumers #ecowarrior #vegan #humanrights #healers #SemST,NONE
@NARAL Complaining that your boss won't pay for your birth control but insisting that he 'stays out of my bedroom.' #birthcontrol #SemST,AGAINST
"The true woman is as yet a dream of the future - Elizabeth Cady Stanton, 1888 #letsbedreamyfuturewomen #quote #womenforwomen #SemST",FAVOR
"Please tell me more about how UTI commercials are icky, so I won't forget to livetweet my next one in person. #SemST",NONE
"@fuckboyusagi : ATTENTION ALL FEMINISTS, FOLLOW #TumblrPosts #yesallwomen #THANKYOUELLENPAO #reddit #GamerGate #KILLALLMEN #LGBT #SemST",FAVOR
Today's roundup of pro-circ trolls: @cbpolis @AnimalRightsJen @ynkutner @DrJohnChisholm @monaeltahawy @hibowardere #i2 #SemST,NONE
i need feminism because i cant even walk to dairy queen without a male yelling nice ass out of his window #SemST,FAVOR
@femfreq I don't understand why all these nerds hate you so much. I would love to do the nasty with you sexy #GamerGate #feminist #SemST,FAVOR
"The more you push me, the more I resist. Don't bother #strongwomen #women #SemST",FAVOR
"@Maisie_Williams is our hero with her #LikeAGirl campaign! What a brilliant leader she is, so badass just like Arya! #GoT #SemST",FAVOR
"Rather be an ""ugly"" feminist then be these sad people that throws hat on people that believes in equality! #SemST",FAVOR
iamNovaah: RT ChrzOC: Bitches be running wild... # # # # # #feminazi #feminonsence #brosb4hos #F4F #R... #SemST,AGAINST
@angerelle you disagree that people should strive to be stronger? #empowering #SemST,FAVOR
"#Rapeculture is basically a FABLE. It has almost no reason on its side, but plenty of emotion. #rape #women #antifeminism #antiSJW #SemST",AGAINST
988 changes: 988 additions & 0 deletions backend/django/core/data/test_files/test_no_labels_with_metadata.csv

Large diffs are not rendered by default.

56 changes: 29 additions & 27 deletions backend/django/core/forms.py
Expand Up @@ -13,7 +13,7 @@
from .models import Label, Project, ProjectPermissions


def clean_data_helper(data, supplied_labels):
def clean_data_helper(data, supplied_labels, metadata_fields=[]):
ALLOWED_TYPES = [
"text/csv",
"text/tab-separated-values",
Expand All @@ -25,8 +25,7 @@ def clean_data_helper(data, supplied_labels):
"application/vnd.ms-excel.addin.macroenabled.12",
"application/vnd.ms-excel.sheet.binary.macroenabled.12",
]
ALLOWED_HEADER = ["Text", "Label"]
ALLOWED_HEADER_ID = ["ID", "Text", "Label"]
REQUIRED_HEADERS = ["Text", "Label"]
MAX_FILE_SIZE = 500 * 1000 * 1000

if data.size > MAX_FILE_SIZE:
Expand Down Expand Up @@ -76,33 +75,31 @@ def clean_data_helper(data, supplied_labels):
"Unable to read the file. Please ensure that the file is encoded in UTF-8."
)

if (len(data.columns) != len(ALLOWED_HEADER)) and len(data.columns) != len(
ALLOWED_HEADER_ID
):
raise ValidationError(
"File has incorrect number of columns. Received {0} but expected {1} or {2}.".format(
len(data.columns), len(ALLOWED_HEADER), len(ALLOWED_HEADER_ID)
)
)
for col in REQUIRED_HEADERS:
if col not in data.columns:
raise ValidationError(f"File is missing required field {col}.")

if len(data) < 1:
raise ValidationError("File should contain some data.")

if (data.columns.tolist() != ALLOWED_HEADER) and (
data.columns.tolist() != ALLOWED_HEADER_ID
found_metadata_fields = [
c for c in data.columns if c.lower() not in ["text", "label", "id"]
]
if metadata_fields is not None and (
len(metadata_fields) > 0
and (set(metadata_fields) != set(found_metadata_fields))
):
raise ValidationError(
"File headers are incorrect. Received {0} but header must be {1} or {2}.".format(
", ".join(data.columns),
", ".join(ALLOWED_HEADER),
", ".join(ALLOWED_HEADER_ID),
)
"There were metadata fields provided in the "
"initial data upload that are missing from this data."
f" Original fields: {', '.join(metadata_fields)}."
f" Found fields: {', '.join(found_metadata_fields)}."
)

if len(data) < 1:
raise ValidationError("File should contain some data.")

labels_in_data = data["Label"].dropna(inplace=False).unique()
if len(labels_in_data) > 0 and set(labels_in_data) != set(supplied_labels):
if len(labels_in_data) > 0 and len(set(labels_in_data) - set(supplied_labels)) > 0:
raise ValidationError(
"Labels in file do not match labels created in step 2. File supplied {0} "
"There are extra labels in the file which were not created in step 2. File supplied {0} "
"but step 2 was given {1}".format(
", ".join(labels_in_data), ", ".join(supplied_labels)
)
Expand All @@ -115,7 +112,7 @@ def clean_data_helper(data, supplied_labels):
"to do active learning. Please upload a file that has less labels."
)

if len(data.columns) == len(ALLOWED_HEADER_ID):
if "ID" in data.columns:
# there should be no null values
if data["ID"].isnull().sum() > 0:
raise ValidationError("Unique ID field cannot have missing values.")
Expand Down Expand Up @@ -153,14 +150,16 @@ class Meta:

def __init__(self, *args, **kwargs):
self.project_labels = kwargs.pop("labels", None)
self.project_metadata = kwargs.pop("metadata", None)
super(ProjectUpdateForm, self).__init__(*args, **kwargs)

def clean_data(self):
data = self.cleaned_data.get("data", False)
labels = self.project_labels
metadata_fields = self.project_metadata
cb_data = self.cleaned_data.get("cb_data", False)
if data:
return clean_data_helper(data, labels)
return clean_data_helper(data, labels, metadata_fields)
if cb_data:
return cleanCodebookDataHelper(cb_data)

Expand Down Expand Up @@ -242,6 +241,7 @@ def __init__(self, *args, **kwargs):
validate_min=True,
extra=0,
can_delete=True,
absolute_max=10000,
)
LabelDescriptionFormSet = forms.inlineformset_factory(
Project, Label, form=LabelDescriptionForm, can_delete=False, extra=0
Expand Down Expand Up @@ -282,7 +282,7 @@ class Meta:
percentage_irr = forms.FloatField(initial=10.0, min_value=0.0, max_value=100.0)
num_users_irr = forms.IntegerField(initial=2, min_value=2)
use_default_batch_size = forms.BooleanField(initial=True, required=False)
batch_size = forms.IntegerField(initial=30, min_value=10, max_value=1000)
batch_size = forms.IntegerField(initial=30, min_value=10, max_value=10000)

use_model = forms.BooleanField(initial=True, required=False)
classifier = forms.ChoiceField(
Expand Down Expand Up @@ -319,12 +319,14 @@ class DataWizardForm(forms.Form):

def __init__(self, *args, **kwargs):
self.supplied_labels = kwargs.pop("labels", None)
self.supplied_metadata = kwargs.pop("metadata", None)
super(DataWizardForm, self).__init__(*args, **kwargs)

def clean_data(self):
data = self.cleaned_data.get("data", False)
labels = self.supplied_labels
return clean_data_helper(data, labels)
metadata = self.supplied_metadata
return clean_data_helper(data, labels, metadata)


class CodeBookWizardForm(forms.Form):
Expand Down
66 changes: 66 additions & 0 deletions backend/django/core/migrations/0053_metadata_metadatafield.py
@@ -0,0 +1,66 @@
# Generated by Django 3.2.3 on 2021-08-04 20:55

import django.db.models.deletion
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("core", "0052_alter_model_cv_metrics"),
]

operations = [
migrations.CreateModel(
name="MetaDataField",
fields=[
(
"id",
models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
("field_name", models.TextField()),
(
"project",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="core.project"
),
),
],
),
migrations.CreateModel(
name="MetaData",
fields=[
(
"id",
models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
("value", models.TextField()),
(
"data",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="core.data"
),
),
(
"metadata_field",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
to="core.metadatafield",
),
),
],
options={
"unique_together": {("data", "metadata_field")},
},
),
]
23 changes: 23 additions & 0 deletions backend/django/core/migrations/0054_alter_metadata_data.py
@@ -0,0 +1,23 @@
# Generated by Django 3.2.3 on 2021-08-05 18:12

import django.db.models.deletion
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("core", "0053_metadata_metadatafield"),
]

operations = [
migrations.AlterField(
model_name="metadata",
name="data",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="metadata",
to="core.data",
),
),
]
18 changes: 18 additions & 0 deletions backend/django/core/migrations/0055_alter_metadata_value.py
@@ -0,0 +1,18 @@
# Generated by Django 3.2.3 on 2021-08-09 19:41

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("core", "0054_alter_metadata_data"),
]

operations = [
migrations.AlterField(
model_name="metadata",
name="value",
field=models.TextField(null=True),
),
]
18 changes: 18 additions & 0 deletions backend/django/core/migrations/0056_alter_metadata_value.py
@@ -0,0 +1,18 @@
# Generated by Django 3.2.3 on 2021-08-09 20:01

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("core", "0055_alter_metadata_value"),
]

operations = [
migrations.AlterField(
model_name="metadata",
name="value",
field=models.TextField(blank=True, null=True),
),
]
20 changes: 20 additions & 0 deletions backend/django/core/models.py
Expand Up @@ -130,6 +130,26 @@ def __str__(self):
return self.text


class MetaDataField(models.Model):
project = models.ForeignKey("Project", on_delete=models.CASCADE)
field_name = models.TextField()

def __str__(self):
return self.field_name


class MetaData(models.Model):
class Meta:
unique_together = ("data", "metadata_field")

data = models.ForeignKey("Data", on_delete=models.CASCADE, related_name="metadata")
metadata_field = models.ForeignKey("MetaDataField", on_delete=models.CASCADE)
value = models.TextField(null=True, blank=True)

def __str__(self):
return f"{str(self.metadata_field)}: {self.value}"


class Label(models.Model):
class Meta:
unique_together = ("name", "project")
Expand Down
12 changes: 11 additions & 1 deletion backend/django/core/serializers.py
Expand Up @@ -56,9 +56,19 @@ class Meta:


class DataSerializer(serializers.ModelSerializer):
metadata = serializers.StringRelatedField(many=True, read_only=True)

class Meta:
model = Data
fields = ("pk", "text", "project", "irr_ind", "hash", "upload_id_hash")
fields = (
"pk",
"text",
"project",
"irr_ind",
"hash",
"upload_id_hash",
"metadata",
)


class DataLabelSerializer(serializers.HyperlinkedModelSerializer):
Expand Down
6 changes: 3 additions & 3 deletions backend/django/core/templates/projects/admin/admin_label.html
@@ -1,13 +1,13 @@
{% block label_tab %}
<div id="label" class="tab-pane fade in active">
<div class="row">
<div class="col-md-6">
<div class="col-md-12">
<h3>Label Distribution</h3>
<div id="distribution_chart" style="height:300px">
<div id="distribution_chart" style="height:400px">
<svg class="nvd3-svg"></svg>
</div>
</div>
<div class="col-md-6">
<div class="col-md-12">
<h3>Time To Label</h3>
<div id="timer_chart" style="height:300px">
<svg class="nvd3-svg"></svg>
Expand Down
Expand Up @@ -32,15 +32,15 @@ <h3>Description</h3>
<p>Time to upload your data! Please upload a data file that contains text (and optionally labels) for your project. To upload, the file <strong>must pass the following checks:</strong></p>
<ul class="list-group">
<li class="list-group-item">The file needs to have either a <code>.csv</code>, <code>.tsv</code>, or <code>.xlsx</code> file extension.</li>
<li class="list-group-item">The file requires the data to be formatted into two columns, with header names <code>Text</code> and <code>Label</code> OR three columns with header names <code>ID</code> <code>Text</code> and
<code>Label</code>.</li>
<li class="list-group-item">The file requires the data to have at least two columns, with header names <code>Text</code> and <code>Label</code>. It can also contain a unique id column named <code>ID</code>.</li>
<li class="list-group-item">The largest file size supported is 500MBs.</li>
</ul>
<p>The (optional) <code>ID</code> column should contain a <b>unique</b> identifier for your data. The identifiers should be no more than 128 characters.</p>
<p>The <code>Text</code> column should contain the text you wish users to label. For example, if you are building a sentiment analysis classifier to predict whether a tweet is positive, negative, or neutral, the <code>Text</code>
column would contain the tweets.</p>
<p>The <code>Label</code> column should contain any pre-exisiting labels for the corresponding text. If none of your data contains existing labels, then this column can be left blank. Extending our sentiment analysis example, if a
lead coder has already annotated some tweets as positive, negative, or neutral, this column would contain those labeled records.</p>
<p><b>All additional columns in the data file will be saved as metadata and the values will be displayed along with the text for labeling. These fields can be empty for some or all data. Currently SMART does not support using metadata values in its models.</b></p>
<p><i>Data Upload Notes:</i></p>
<ul class="list-group">
<li class="list-group-item">SMART restricts your project to having two million unique records.</li>
Expand Down Expand Up @@ -94,4 +94,4 @@ <h3>Description</h3>
}, 30000)
}
</script>
{% endblock %}
{% endblock %}

0 comments on commit c95eaa0

Please sign in to comment.