# Overview

use PB2020 api data to create a training dataset
- add category labels for type of force
- use dataset with categories training data on logistic regression
- pickle trained model to upload


- **Presence**: police show up and their presence is enough to de-escalate. This is ideal
- **verbalization**: police use voice commands, force is non-physical
- **empty-hand control soft technique**: Officers use grabs, holds and joint locks to restrain an individual. shove, chase, spit, raid, push
- **empty-hand control hard technique**: Officers use punches and kicks to restrain an individual. injured, charge at, kneel, drag, beat, tackle,  
- **blunt impact**: Officers may use a baton to immobilize a combative person, struck, shield, beat 
- **projectiles**: bean bags, rubber bullets, water hose, deadly weapons such as firearms, shot, munitions, shoot, explosives, throw, launched, fired at, flashbangs, gun
- **chemical**: Officers may use chemical sprays or projectiles embedded with chemicals to restrain an individual (e.g., pepper spray) fire, gassed, smoke, 
- **conducted energy devices**: Officers may use CEDs to immobilize an individual. CEDs discharge a high-voltage, low-amperage jolt of electricity at a distance.
- **miscillaneous**: LRD long range audio device, lrad,  sound cannon, sonic weapon

In [483]:
!pip install snorkel



In [484]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Human_Rights_First-TeamC-DS/main/Data/pv_incidents.csv', na_values=False)

In [485]:
df2 = df[['name']]

In [486]:
df2 = df2.rename(columns={'name':'text'})

In [487]:
df2['text'] = df2['text'].astype(str)

## presence category

In [488]:
PRESENCE = 1
NOT_PRESENCE = 0
ABSTAIN = -1

In [489]:
from snorkel.labeling import labeling_function

In [490]:
@labeling_function()
def lf_keyword_swarm(x):
  return PROJECTILE if 'swarm' in x.text.lower() else ABSTAIN

In [491]:
@labeling_function()
def lf_keyword_show(x):
  return PROJECTILE if 'show' in x.text.lower() else ABSTAIN

In [492]:
@labeling_function()
def lf_keyword_arrive(x):
  return PROJECTILE if 'arrive' in x.text.lower() else ABSTAIN

In [493]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_swarm, lf_keyword_show,lf_keyword_arrive]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["presence_label"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 14624.47it/s]


In [494]:
df2[df2['presence_label']==1]

Unnamed: 0,text,presence_label
305,Photojournalist shows wounds from less-lethal ...,1
306,Photojournalist shows wounds from less-lethal ...,1
307,Photojournalist shows wounds from less-lethal ...,1
344,Protester shows wound left by less-lethal round,1
345,Protester shows wound left by less-lethal round,1
346,Protester shows wound left by less-lethal round,1
356,Medic shows rubber bullet wound,1
357,Medic shows rubber bullet wound,1
358,Medic shows rubber bullet wound,1
407,Journalist shows wound from impact munition,1


## verbalization category

In [495]:
PRESENCE = 1
NOT_PRESENCE = 0
ABSTAIN = -1

In [496]:
@labeling_function()
def lf_keyword_shout(x):
  return PROJECTILE if 'shout' in x.text.lower() else ABSTAIN

In [497]:
@labeling_function()
def lf_keyword_order(x):
  return PROJECTILE if 'order' in x.text.lower() else ABSTAIN

In [498]:
@labeling_function()
def lf_keyword_loudspeaker(x):
  return PROJECTILE if 'loudspeaker' in x.text.lower() else ABSTAIN

In [499]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_shout, lf_keyword_order,lf_keyword_loudspeaker]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["verbal_label"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 15139.74it/s]


In [500]:
lf_keyword_shout, lf_keyword_order, lf_keyword_loudspeaker = (L_train != ABSTAIN).mean(axis=0)
print(f"lf_keyword_shout coverage: {lf_keyword_shout * 100:.1f}%")
print(f"lf_keyword_order coverage: {lf_keyword_order * 100:.1f}%")
print(f"lf_keyword_loudspeaker coverage: {lf_keyword_loudspeaker * 100:.1f}%")

lf_keyword_shout coverage: 0.1%
lf_keyword_order coverage: 0.5%
lf_keyword_loudspeaker coverage: 0.0%


In [501]:
df2[df2['verbal_label']==1]

Unnamed: 0,text,presence_label,verbal_label
170,Police apply no-assembly order to journalists,-1,1
171,Police apply no-assembly order to journalists,-1,1
172,Police apply no-assembly order to journalists,-1,1
812,Officers shove press during dispersal order,-1,1
813,Officers shove press during dispersal order,-1,1
814,Officers shove press during dispersal order,-1,1
1082,Police selectively enforce curfew and dispersa...,-1,1
1083,Police selectively enforce curfew and dispersa...,-1,1
1444,"Police charge into peaceful crowd shouting ""gr...",-1,1


## empty-hand control soft technique

In [502]:
@labeling_function()
def lf_keyword_shove(x):
  return PROJECTILE if 'shove' in x.text.lower() else ABSTAIN

In [503]:
@labeling_function()
def lf_keyword_grabs(x):
  return PROJECTILE if 'grabs' in x.text.lower() else ABSTAIN

In [504]:
@labeling_function()
def lf_keyword_holds(x):
  return PROJECTILE if 'holds' in x.text.lower() else ABSTAIN

In [576]:
@labeling_function()
def lf_keyword_arrest(x):
  return PROJECTILE if 'arrest' in x.text.lower() else ABSTAIN

In [577]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_shove, lf_keyword_grabs,lf_keyword_holds, lf_keyword_arrest]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["ehc-soft_technique"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 13372.11it/s]


In [506]:
df2[df2['ehc-soft_technique']==1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique
47,Police shove and pepper spray protesters,-1,-1,1
48,Police shove and pepper spray protesters,-1,-1,1
49,Police shove and pepper spray protesters,-1,-1,1
67,Officers shove a woman to the pavement,-1,-1,1
68,Officers shove a woman to the pavement,-1,-1,1
...,...,...,...,...
1646,San Francisco law enforcement shove man off th...,-1,-1,1
1674,Protesters shoved down stairs and arrested,-1,-1,1
1685,Police shove woman and then fire pepper balls ...,-1,-1,1
1701,Police officer shoves protester on bike; polic...,-1,-1,1


## empty-hand control hard technique

beat, tackle, punch

In [507]:
@labeling_function()
def lf_keyword_beat(x):
  return PROJECTILE if 'beat' in x.text.lower() else ABSTAIN

In [508]:
@labeling_function()
def lf_keyword_tackle(x):
  return PROJECTILE if 'tackle' in x.text.lower() else ABSTAIN

In [509]:
@labeling_function()
def lf_keyword_punch(x):
  return PROJECTILE if 'punch' in x.text.lower() else ABSTAIN

In [510]:
@labeling_function()
def lf_keyword_assault(x):
  return PROJECTILE if 'assault' in x.text.lower() else ABSTAIN

In [511]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_beat, lf_keyword_tackle,lf_keyword_punch,lf_keyword_assault]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["ehc-hard_technique"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 12587.43it/s]


In [512]:
df2[df2['ehc-hard_technique']==1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique
2,Police assault protesters,-1,-1,-1,1
3,Police assault protesters,-1,-1,-1,1
4,Police assault protesters,-1,-1,-1,1
41,Police punch arrestee on ground,-1,-1,-1,1
42,Police punch arrestee on ground,-1,-1,-1,1
...,...,...,...,...,...
1612,Police beat and pepper spray protesters,-1,-1,-1,1
1680,5 police officers use batons to beat protester,-1,-1,-1,1
1681,Louisville police swarm and beat a man screami...,1,-1,-1,1
1686,"Police tackle protester, then target witness",-1,-1,-1,1


## blunt impact

baton, club, shield

In [513]:
@labeling_function()
def lf_keyword_baton(x):
  return PROJECTILE if 'baton' in x.text.lower() else ABSTAIN

In [514]:
@labeling_function()
def lf_keyword_club(x):
  return PROJECTILE if 'club' in x.text.lower() else ABSTAIN

In [515]:
@labeling_function()
def lf_keyword_shield(x):
  return PROJECTILE if 'shield' in x.text.lower() else ABSTAIN

In [516]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_baton, lf_keyword_club,lf_keyword_shield]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["blunt_impact"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 16304.45it/s]


In [517]:
df2[df2['blunt_impact']==1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact
71,"Police beat protester with batons, then pepper...",-1,-1,-1,1,1
72,"Police beat protester with batons, then pepper...",-1,-1,-1,1,1
128,Officer attacks photographer with baton and pe...,-1,-1,-1,-1,1
129,Officer attacks photographer with baton and pe...,-1,-1,-1,-1,1
130,Officer attacks photographer with baton and pe...,-1,-1,-1,-1,1
140,"Officer chases man filming, beats with baton a...",-1,-1,-1,1,1
141,"Officer chases man filming, beats with baton a...",-1,-1,-1,1,1
142,"Officer chases man filming, beats with baton a...",-1,-1,-1,1,1
167,Officer strikes journalist with baton,-1,-1,-1,-1,1
168,Officer strikes journalist with baton,-1,-1,-1,-1,1


## projectiles category

In [518]:
from snorkel.labeling import labeling_function

In [519]:
PROJECTILE = 1
NOT_PROJECTILE = 0
ABSTAIN = -1

In [520]:
@labeling_function()
def lf_keyword_gas(x):
  return PROJECTILE if 'gas' in x.text else ABSTAIN

In [521]:
@labeling_function()
def lf_keyword_pepper(x):
  return PROJECTILE if 'pepper' in x.text else ABSTAIN

In [522]:
@labeling_function()
def lf_keyword_rubber(x):
  return PROJECTILE if 'rubber' in x.text else ABSTAIN

In [523]:
@labeling_function()
def lf_keyword_bean(x):
  return PROJECTILE if 'bean' in x.text else ABSTAIN

In [524]:
@labeling_function()
def lf_keyword_shoot(x):
  return PROJECTILE if 'shoot' in x.text else ABSTAIN

In [525]:
@labeling_function()
def lf_keyword_shot(x):
  return PROJECTILE if 'shot' in x.text else ABSTAIN

In [569]:
@labeling_function()
def lf_keyword_fire(x):
  return PROJECTILE if 'fire' in x.text else ABSTAIN

In [570]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_gas, lf_keyword_pepper, lf_keyword_rubber, lf_keyword_bean,lf_keyword_shoot,lf_keyword_shot, lf_keyword_fire]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["projectile"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 5665.43it/s]


In [528]:

df2[df2['projectile'] == -1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile
2,Police assault protesters,-1,-1,-1,1,-1,-1
3,Police assault protesters,-1,-1,-1,1,-1,-1
4,Police assault protesters,-1,-1,-1,1,-1,-1
17,Police use horses as weapons,-1,-1,-1,-1,-1,-1
18,Police use horses as weapons,-1,-1,-1,-1,-1,-1
...,...,...,...,...,...,...,...
1694,Police arrest protesters leaving scene,-1,-1,-1,-1,-1,-1
1699,Peaceful protesters arrested for breaking curfew,-1,-1,-1,-1,-1,-1
1700,Peaceful protesters arrested for breaking curfew,-1,-1,-1,-1,-1,-1
1701,Police officer shoves protester on bike; polic...,-1,-1,1,-1,-1,-1


In [529]:
from snorkel.labeling import LFAnalysis

LFAnalysis(L=L_train, lfs=lfs).lf_summary()

Unnamed: 0,j,Polarity,Coverage,Overlaps,Conflicts
lf_keyword_gas,0,[1],0.221961,0.04815,0.0
lf_keyword_pepper,1,[1],0.156195,0.027011,0.0
lf_keyword_rubber,2,[1],0.038168,0.027011,0.0
lf_keyword_bean,3,[1],0.004698,0.004698,0.0
lf_keyword_shoot,4,[1],0.074574,0.031122,0.0
lf_keyword_shot,5,[1],0.081033,0.025837,0.0


In [530]:
df2.iloc[L_train[:, 1] == PROJECTILE].sample(10, random_state=1)

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile
799,Officer singles out and pepper sprays proteste...,-1,-1,-1,-1,-1,0
128,Officer attacks photographer with baton and pe...,-1,-1,-1,-1,1,0
1252,Police fire pepper balls at car with pregnant ...,-1,-1,-1,-1,-1,0
1077,Police officer pepper-sprays three people on t...,-1,-1,-1,-1,-1,0
491,Protester bowing on sidewalk is pepper sprayed,-1,-1,-1,-1,-1,0
1436,Protests at Trump rally met with pepper spray,-1,-1,-1,-1,-1,0
1391,Police shove and pepper spray protesters,-1,-1,1,-1,-1,0
1153,Police casually pepper spray passers by,-1,-1,-1,-1,-1,0
755,Journalist shoved and pepper-sprayed,-1,-1,1,-1,-1,0
1141,Police pepper spray two kneeling protesters,-1,-1,-1,-1,-1,0


In [531]:
from snorkel.analysis import get_label_buckets

buckets = get_label_buckets(L_train[:, 0], L_train[:, 1])
df2.iloc[buckets[(ABSTAIN, PROJECTILE)]].sample(10, random_state=1)

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile
1659,Police fire pepper bullets into apartment,-1,-1,-1,-1,-1,0
1359,Police pepper spray protesters during arrest,-1,-1,-1,-1,-1,0
825,"Individual held on the ground, pepper sprayed,...",-1,-1,-1,1,-1,0
976,Police pepper spray protesters on sidewalk,-1,-1,-1,-1,-1,0
1511,Police pepper spray crowd,-1,-1,-1,-1,-1,0
72,"Police beat protester with batons, then pepper...",-1,-1,-1,1,1,0
974,Police pepper spray protesters on sidewalk,-1,-1,-1,-1,-1,0
326,Police pepper spray protesters with hands up,-1,-1,-1,-1,-1,0
1250,Police fire pepper balls at car with pregnant ...,-1,-1,-1,-1,-1,0
1693,Protester pepper sprayed through open door,-1,-1,-1,-1,-1,0


In [532]:
from snorkel.labeling.model import MajorityLabelVoter

majority_model = MajorityLabelVoter()
preds_train = majority_model.predict(L=L_train)

In [533]:
preds_train

array([ 1,  1, -1, ..., -1, -1, -1])

In [534]:
from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df2, test_size=0.33, random_state=42)

In [535]:
from snorkel.labeling.model import LabelModel

label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train=L_train, n_epochs=500, log_freq=100, seed=123)

In [536]:
from snorkel.labeling import filter_unlabeled_dataframe

df_train_filtered, probs_train_filtered = filter_unlabeled_dataframe(
    X=df2, y=preds_train, L=L_train
)

In [537]:
df_train_filtered

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile
0,Police throw tear-gas at protesters on a bridge.,-1,-1,-1,-1,-1,1
1,Police throw tear-gas at protesters on a bridge.,-1,-1,-1,-1,-1,1
5,Police shoot non-violent protester in the head,-1,-1,-1,-1,-1,1
6,Police shoot non-violent protester in the head,-1,-1,-1,-1,-1,1
7,Police shoot non-violent protester in the head,-1,-1,-1,-1,-1,1
...,...,...,...,...,...,...,...
1693,Protester pepper sprayed through open door,-1,-1,-1,-1,-1,0
1695,Reporter shows tear gas canister fired at him ...,1,-1,-1,-1,-1,1
1696,Woman bleeding from face after being shot by p...,-1,-1,-1,-1,-1,0
1697,"Police Mace, shoot pepper bullets at protester...",-1,-1,-1,-1,-1,1


In [538]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(ngram_range=(1, 5))
X_train = vectorizer.fit_transform(df_train_filtered.text.tolist())
X_test = vectorizer.transform(df2.text.tolist())

## chemical

pepper, gas, smoke

In [539]:
@labeling_function()
def lf_keyword_pepper(x):
  return PROJECTILE if 'pepper' in x.text else ABSTAIN

In [540]:
@labeling_function()
def lf_keyword_gas(x):
  return PROJECTILE if 'gas' in x.text else ABSTAIN

In [541]:
@labeling_function()
def lf_keyword_smoke(x):
  return PROJECTILE if 'pepper' in x.text else ABSTAIN

In [542]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_pepper, lf_keyword_gas,lf_keyword_smoke]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["chemical"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 15238.64it/s]


In [543]:
df2[df2['chemical']==1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile,chemical
47,Police shove and pepper spray protesters,-1,-1,1,-1,-1,0,1
48,Police shove and pepper spray protesters,-1,-1,1,-1,-1,0,1
49,Police shove and pepper spray protesters,-1,-1,1,-1,-1,0,1
71,"Police beat protester with batons, then pepper...",-1,-1,-1,1,1,0,1
72,"Police beat protester with batons, then pepper...",-1,-1,-1,1,1,0,1
...,...,...,...,...,...,...,...,...
1684,Protesters in St. Matthews shot with pepper ro...,-1,-1,-1,-1,-1,0,1
1685,Police shove woman and then fire pepper balls ...,-1,-1,1,-1,-1,0,1
1693,Protester pepper sprayed through open door,-1,-1,-1,-1,-1,0,1
1697,"Police Mace, shoot pepper bullets at protester...",-1,-1,-1,-1,-1,1,1


## conducted energy devices

tazer, stun, taser, stungun

In [544]:
@labeling_function()
def lf_keyword_taser(x):
  return PROJECTILE if 'taser' in x.text else ABSTAIN

In [545]:
@labeling_function()
def lf_keyword_stun(x):
  return PROJECTILE if 'stun' in x.text else ABSTAIN

In [546]:
@labeling_function()
def lf_keyword_stungun(x):
  return PROJECTILE if 'stungun' in x.text else ABSTAIN

In [547]:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [lf_keyword_taser, lf_keyword_stun,lf_keyword_stungun]

# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df2)

# Train the label model and compute the training labels
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train, n_epochs=500, log_freq=50, seed=123)
df2["ced_category"] = label_model.predict(L=L_train, tie_break_policy="abstain")

100%|██████████| 1703/1703 [00:00<00:00, 14740.94it/s]


In [548]:
df2[df2['ced_category']==1]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile,chemical,ced_category
746,Officers deploy tear gas and stun grenades aga...,-1,-1,-1,-1,-1,1,0,1
747,Officers deploy tear gas and stun grenades aga...,-1,-1,-1,-1,-1,1,0,1
748,Officers deploy tear gas and stun grenades aga...,-1,-1,-1,-1,-1,1,0,1
914,Police throw stun grenade at retreating protes...,-1,-1,-1,-1,-1,-1,-1,1
915,Police throw stun grenade at retreating protes...,-1,-1,-1,-1,-1,-1,-1,1
916,Police throw stun grenade at retreating protes...,-1,-1,-1,-1,-1,-1,-1,1
948,Officer repeatedly uses stun gun on suspect wh...,-1,-1,-1,-1,-1,-1,-1,1
949,Officer repeatedly uses stun gun on suspect wh...,-1,-1,-1,-1,-1,-1,-1,1


## summary

In [549]:
df2.columns

Index(['text', 'presence_label', 'verbal_label', 'ehc-soft_technique',
       'ehc-hard_technique', 'blunt_impact', 'projectile', 'chemical',
       'ced_category'],
      dtype='object')

In [585]:
def add_label_names(row):
  tags = []
  if row['presence_label'] == 1:
    tags.append('Presence')
  if row['verbal_label'] == 1:
    tags.append('Verbalization')
  if row['ehc-soft_technique'] == 1:
    tags.append('EHC Soft Technique')
  if row['ehc-hard_technique'] == 1:
    tags.append('EHC Hard Technique')
  if row['blunt_impact'] == 1:
    tags.append('Blunt Impact')
  if row['projectile'] == 1 or row['projectile'] == 0:
    tags.append('Projectiles')
  if row['chemical'] == 1:
    tags.append('Chemical')
  if row['ced_category'] == 1:
    tags.append('Conductive Energy')
  if not tags:
    tags.append('Other')
  return tags
    

In [586]:
df2['tags'] = df2.apply(add_label_names,axis=1)

In [592]:
def join_tags(content):
  return ', '.join(content)

In [595]:
df['tags_str'] = df2['tags'].apply(join_tags)

In [588]:
df2[['text','tags']].head(3)

Unnamed: 0,text,tags
0,Police throw tear-gas at protesters on a bridge.,[Projectiles]
1,Police throw tear-gas at protesters on a bridge.,[Projectiles]
2,Police assault protesters,[EHC Hard Technique]


In [596]:
df['tags_str'].value_counts()

Projectiles                                                                    596
Other                                                                          351
EHC Soft Technique                                                             275
Projectiles, Chemical                                                          199
EHC Hard Technique                                                              77
EHC Soft Technique, Projectiles, Chemical                                       41
EHC Soft Technique, EHC Hard Technique                                          28
EHC Soft Technique, Projectiles                                                 24
EHC Hard Technique, Blunt Impact                                                18
Blunt Impact                                                                    16
Presence                                                                        14
EHC Hard Technique, Projectiles, Chemical                                       11
Verb

In [590]:
df2[df2['tags_str'] == '[]'][20:30]

Unnamed: 0,text,presence_label,verbal_label,ehc-soft_technique,ehc-hard_technique,blunt_impact,projectile,chemical,ced_category,tags,tags_str


In [599]:
df = df.rename(columns={'name':'text'})

In [601]:
df[['date_text','text','tags_str','LATITUDE','LONGITUDE','Link 1','Link 2']].to_csv('training_data.csv')