## Data cleaning for Syria Conflict database

This dataset was found on HDX.

Source:<br>
*Remember to add to presentation!*<br>
"Armed Conflict Location & Event Data Project (ACLED)"<br>
[Source](acleddata.com)<br>
[HDX Source](https://data.humdata.org/dataset/acled-data-for-syrian-arab-republic)<br>

The appropriate codebook to use is: Armed Conflict Location & Event Data Project (ACLED)

In [1]:
import numpy as np
import pandas as pd
pd.set_option("max_r", 100)
pd.get_option("display.max_rows")
pd.options.display.max_columns = None
from textblob import TextBlob


In [2]:
conflict_syr = pd.read_csv('../data/HDX-data/syr/conflict_data_syr.csv',low_memory=False)

In [3]:
conflict_syr.head()

Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,actor1,assoc_actor_1,inter1,actor2,assoc_actor_2,inter2,interaction,region,country,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
0,,,#event+code,,#date+occurred,#date+year,,#event+type,,#group+name+first,#group+name+first+assoc,,#group+name+second,#group+name+second+assoc,,,#region+name,#country+name,#adm1+name,#adm2+name,#adm3+name,#loc+name,#geo+lat,#geo+lon,,#meta+source,,#description,#affected+killed,,#country+code
1,6993667.0,760.0,SYR76938,76938.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),,3.0,,,0.0,30.0,Middle East,Syria,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1.0,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,SYR
2,6993669.0,760.0,SYR76939,76939.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),,3.0,,,0.0,30.0,Middle East,Syria,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2.0,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,SYR
3,6993742.0,760.0,SYR76930,76930.0,2020-03-14,2020,1.0,Battles,Armed clash,Unidentified Armed Group (Syria),,3.0,QSD: Syrian Democratic Forces - Intelligence,,2.0,23.0,Middle East,Syria,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2.0,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,SYR
4,6993746.0,760.0,SYR76937,76937.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),,3.0,Operation Peace Spring,Opposition Rebels (Syria),2.0,23.0,Middle East,Syria,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1.0,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,SYR


In [4]:
print(conflict_syr.iloc[0,:])

data_id                                  NaN
iso                                      NaN
event_id_cnty                    #event+code
event_id_no_cnty                         NaN
event_date                    #date+occurred
year                              #date+year
time_precision                           NaN
event_type                       #event+type
sub_event_type                           NaN
actor1                     #group+name+first
assoc_actor_1        #group+name+first+assoc
inter1                                   NaN
actor2                    #group+name+second
assoc_actor_2       #group+name+second+assoc
inter2                                   NaN
interaction                              NaN
region                          #region+name
country                        #country+name
admin1                            #adm1+name
admin2                            #adm2+name
admin3                            #adm3+name
location                           #loc+name
latitude  

In [5]:
conflict_syr = conflict_syr.drop(0)

In [6]:
print(conflict_syr['year'].max())
print(conflict_syr['year'].min())

2020
2017


In [7]:
null_cols = conflict_syr.isnull().sum()
null_cols

data_id                 0
iso                     0
event_id_cnty           0
event_id_no_cnty        0
event_date              0
year                    0
time_precision          0
event_type              0
sub_event_type          0
actor1                  0
assoc_actor_1       57825
inter1                  0
actor2              32623
assoc_actor_2       67046
inter2                  0
interaction             0
region                  0
country                 0
admin1                  0
admin2                  0
admin3                  0
location                0
latitude                0
longitude               0
geo_precision           0
source                  0
source_scale            0
notes                   0
fatalities              0
timestamp               0
iso3                    0
dtype: int64

In [8]:
conflict_syr.shape

(74883, 31)

In [9]:
conflict_syr = conflict_syr.drop(['assoc_actor_1', 'assoc_actor_2' ], axis = 1)

In [10]:
conflict_syr.head()

Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,region,country,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
1,6993667.0,760.0,SYR76938,76938.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3.0,,0.0,30.0,Middle East,Syria,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1.0,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,SYR
2,6993669.0,760.0,SYR76939,76939.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3.0,,0.0,30.0,Middle East,Syria,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2.0,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,SYR
3,6993742.0,760.0,SYR76930,76930.0,2020-03-14,2020,1.0,Battles,Armed clash,Unidentified Armed Group (Syria),3.0,QSD: Syrian Democratic Forces - Intelligence,2.0,23.0,Middle East,Syria,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2.0,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,SYR
4,6993746.0,760.0,SYR76937,76937.0,2020-03-14,2020,1.0,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3.0,Operation Peace Spring,2.0,23.0,Middle East,Syria,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1.0,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,SYR
5,6993757.0,760.0,SYR76943,76943.0,2020-03-14,2020,1.0,Riots,Violent demonstration,Rioters (Syria),5.0,,0.0,50.0,Middle East,Syria,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2.0,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,SYR


In [11]:
conflict_syr['fatalities'].count()

74883

In [12]:
conflict_syr.aggregate({"fatalities":['sum']}) 

Unnamed: 0,fatalities
sum,0001000000110005300300000000100002000000000011...


In [13]:
conflict_syr.dtypes

data_id             float64
iso                 float64
event_id_cnty        object
event_id_no_cnty    float64
event_date           object
year                 object
time_precision      float64
event_type           object
sub_event_type       object
actor1               object
inter1              float64
actor2               object
inter2              float64
interaction         float64
region               object
country              object
admin1               object
admin2               object
admin3               object
location             object
latitude             object
longitude            object
geo_precision       float64
source               object
source_scale         object
notes                object
fatalities           object
timestamp           float64
iso3                 object
dtype: object

In [14]:
conflict_syr = conflict_syr.astype({"notes": str, "fatalities": int})

In [15]:
conflict_syr.aggregate({"fatalities":['sum']}) 

Unnamed: 0,fatalities
sum,102582


In [16]:
print(conflict_syr.country.unique())
print(conflict_syr.iso.unique())
print(conflict_syr.iso3.unique())
print(conflict_syr.region.unique())

['Syria']
[760.]
['SYR']
['Middle East']


In [17]:
syr_data = conflict_syr.drop(['iso', 'country', 'region', 'iso3','event_id_cnty','event_id_no_cnty'], axis = 1)

In [18]:
syr_data[['data_id', 'time_precision', 'inter1','inter2', 'interaction', 'geo_precision']] = syr_data[['data_id', 'time_precision', 'inter1','inter2', 'interaction', 'geo_precision']].astype(int)
#df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)

In [19]:
syr_data.dtypes
# notes is still an object but it might not even be a problem.

data_id             int64
event_date         object
year               object
time_precision      int64
event_type         object
sub_event_type     object
actor1             object
inter1              int64
actor2             object
inter2              int64
interaction         int64
admin1             object
admin2             object
admin3             object
location           object
latitude           object
longitude          object
geo_precision       int64
source             object
source_scale       object
notes              object
fatalities          int64
timestamp         float64
dtype: object

### Text analysis

The column 'notes' has a short description for each event that occurred. This is the data I want to analyse. 

[StackOverflow select string](https://stackoverflow.com/questions/11350770/select-by-partial-string-from-a-pandas-dataframe)<br>
[StackOverflow new column based on string selection](https://stackoverflow.com/questions/43399435/new-column-based-on-specific-string-info-from-two-different-columns-python-panda)<br>
[StackOverflow contains statement](https://stackoverflow.com/questions/54484435/python-create-new-column-with-condition-and-contains-statement)<br>

**Step 1:** Create two separate columns counting mentions of men and women (separately).<br>
**Step 2:** Find a way to get associative words for each of these (TextBlob has a noun_phrases function, maybe that will get me somewhere)

In [20]:
#syr_data[['men' in x for x in syr_data['notes']]]

#def count_men(row):
 #   if syr_data[['men' in x for x in syr_data['notes']]]:
  #      val = 1
   # else:
    #    val = 0
    #return val

#syr_data['men'] = syr_data.apply(count_men, axis=1)

#syr_data['men'] = np.where(syr_data.notes.str.contains(r'\bmen\b'), 1, else 0)

#syr_data['men'] = syr_data['notes'].apply(lambda x : 1 if x.str.contains(r'\bmen\b') else 0)

#print(syr_data)

In [21]:
syr_data['men'] = np.where(syr_data['notes'].str.contains('\\b(?:men|man|male|males)\\b'), 1, 0)


In [22]:
syr_data.head()

Unnamed: 0,data_id,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,men
1,6993667,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0
2,6993669,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0
3,6993742,2020-03-14,2020,1,Battles,Armed clash,Unidentified Armed Group (Syria),3,QSD: Syrian Democratic Forces - Intelligence,2,23,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,0
4,6993746,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,Operation Peace Spring,2,23,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,0
5,6993757,2020-03-14,2020,1,Riots,Violent demonstration,Rioters (Syria),5,,0,50,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,0


In [23]:
print(syr_data.men.unique())
syr_data['men'].value_counts()

[0 1]


0    71123
1     3760
Name: men, dtype: int64

In [24]:
syr_data['women'] = np.where(syr_data['notes'].str.contains('\\b(?:women|woman|female|females)\\b'), 1, 0)
syr_data.head()

Unnamed: 0,data_id,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,men,women
1,6993667,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0
2,6993669,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0
3,6993742,2020-03-14,2020,1,Battles,Armed clash,Unidentified Armed Group (Syria),3,QSD: Syrian Democratic Forces - Intelligence,2,23,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,0,0
4,6993746,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,Operation Peace Spring,2,23,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,0,0
5,6993757,2020-03-14,2020,1,Riots,Violent demonstration,Rioters (Syria),5,,0,50,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,0,0


In [25]:
syr_data['women'].value_counts()

0    72729
1     2154
Name: women, dtype: int64

#### For the love of all that is good, please remember your research question and your goal! 

Goal: to approximate gender disaggregated data through notes/mentions/news headlines (the latter refers to the DRC dataset). 

Question: Can we approximate gender disaggregation for vulnerability in violent events? 

In [26]:
def extract_nouns(text):
    blob = TextBlob(text)
    return [word for word in blob.noun_phrases]

syr_data['nouns'] = syr_data['notes'].apply(extract_nouns)

In [27]:
syr_data.head()

Unnamed: 0,data_id,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,men,women,nouns
1,6993667,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, tawsi, ar-raqqa]"
2,6993669,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, qatirji, oil trucks, minkhar, kar..."
3,6993742,2020-03-14,2020,1,Battles,Armed clash,Unidentified Armed Group (Syria),3,QSD: Syrian Democratic Forces - Intelligence,2,23,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,0,0,"[march, unknown gunman, military vehicle, qsd,..."
4,6993746,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,Operation Peace Spring,2,23,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,0,0,"[march, ied, ar-ra'ee, aleppo]"
5,6993757,2020-03-14,2020,1,Riots,Violent demonstration,Rioters (Syria),5,,0,50,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,0,0,"[march, idlib, jisr ariha, aleppo-latakia, int..."


In [28]:
# For future reference: 
# https://stackoverflow.com/questions/56980515/how-to-extract-all-adjectives-from-a-strings-of-text-in-a-pandas-dataframe
# https://stackoverflow.com/questions/53155803/textblob-for-extracting-noun-phrases-wordlist-issue

In [29]:
print(syr_data['nouns'].str.len().max())
print(syr_data['nouns'].str.len().min())

83
0


In [30]:
def get_verbs(text):
    tags = ["VB", "VBZ", "VBP","VBD","VBN","VBG"]
    blob = TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag in tags]

syr_data['verbs'] = syr_data['notes'].apply(get_verbs)

In [31]:
syr_data['children'] = np.where(syr_data['notes'].str.contains('\\b(?:child|children)\\b'), 1, 0)
syr_data['civilians'] = np.where(syr_data['notes'].str.contains('\\b(?:civilian|civilians)\\b'), 1, 0)

In [32]:
syr_data.head()

Unnamed: 0,data_id,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,men,women,nouns,verbs,children,civilians
1,6993667,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, tawsi, ar-raqqa]","[planted, exploded, were, reported]",0,0
2,6993669,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, qatirji, oil trucks, minkhar, kar...","[planted, exploded, were, reported]",0,0
3,6993742,2020-03-14,2020,1,Battles,Armed clash,Unidentified Armed Group (Syria),3,QSD: Syrian Democratic Forces - Intelligence,2,23,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,0,0,"[march, unknown gunman, military vehicle, qsd,...","[set, affiliated, were, reported]",0,0
4,6993746,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,Operation Peace Spring,2,23,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,0,0,"[march, ied, ar-ra'ee, aleppo]","[planted, exploded, killing, backed, injured]",0,1
5,6993757,2020-03-14,2020,1,Riots,Violent demonstration,Rioters (Syria),5,,0,50,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,0,0,"[march, idlib, jisr ariha, aleppo-latakia, int...","[have, been, holding, overlooking, cut, setting]",0,0


I'm going down a NLP path now, and that wasn't the original intention. 

I'm going to see if I can make some of the descriptive data numeral so I can analyse it better.

**Interesting columns:**<br>
1) type of event<br>
2) sub-type<br>
3) Actors have already been given numeral values in the original dataframe, so I'll take over their codebook here. 

In [33]:
print(len(conflict_syr.event_type.unique()))
print(len(conflict_syr.sub_event_type.unique()))
print(conflict_syr.event_type.unique())
print(conflict_syr.sub_event_type.unique())

6
25
['Explosions/Remote violence' 'Battles' 'Riots' 'Strategic developments'
 'Violence against civilians' 'Protests']
['Remote explosive/landmine/IED' 'Armed clash' 'Violent demonstration'
 'Shelling/artillery/missile attack' 'Disrupted weapons use'
 'Change to group/activity' 'Attack' 'Looting/property destruction'
 'Headquarters or base established' 'Peaceful protest' 'Air/drone strike'
 'Non-violent transfer of territory' 'Arrests'
 'Abduction/forced disappearance' 'Other' 'Government regains territory'
 'Non-state actor overtakes territory' 'Agreement' 'Grenade'
 'Mob violence' 'Suicide bomb' 'Excessive force against protesters'
 'Protest with intervention' 'Sexual violence' 'Chemical weapon']


[StackOverflow Assign id](https://stackoverflow.com/questions/33283086/assign-unique-id-to-columns-pandas-data-frame)

In [34]:
event_types = conflict_syr.event_type.unique()
event_types

event_types_num = pd.Series(np.arange(len(event_types)), event_types)
print(event_types_num)

syr_data[['event_id']] = syr_data[['event_type']].applymap(event_types_num.get)

#df['Value'] = df['Venue Category'].map(d)
# Save event_id to codebook

Explosions/Remote violence    0
Battles                       1
Riots                         2
Strategic developments        3
Violence against civilians    4
Protests                      5
dtype: int64


In [35]:
syr_data.head()

Unnamed: 0,data_id,event_date,year,time_precision,event_type,sub_event_type,actor1,inter1,actor2,inter2,interaction,admin1,admin2,admin3,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,men,women,nouns,verbs,children,civilians,event_id
1,6993667,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,Ar-Raqqa,35.9428,39.0519,1,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, tawsi, ar-raqqa]","[planted, exploded, were, reported]",0,0,0
2,6993669,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,,0,30,Ar-Raqqa,Ar-Raqqa,Karama,Karama,35.8699,39.2813,2,SHAM,National,"On 14 March 2020, an IED planted by an unknown...",0,1584396000.0,0,0,"[march, ied, qatirji, oil trucks, minkhar, kar...","[planted, exploded, were, reported]",0,0,0
3,6993742,2020-03-14,2020,1,Battles,Armed clash,Unidentified Armed Group (Syria),3,QSD: Syrian Democratic Forces - Intelligence,2,23,Deir-ez-Zor,Deir-ez-Zor,Basira,Takihi,35.1751,40.4547,2,SOHR,Other,"On 14 March 2020, an unknown gunman set fire t...",0,1584396000.0,0,0,"[march, unknown gunman, military vehicle, qsd,...","[set, affiliated, were, reported]",0,0,1
4,6993746,2020-03-14,2020,1,Explosions/Remote violence,Remote explosive/landmine/IED,Unidentified Armed Group (Syria),3,Operation Peace Spring,2,23,Aleppo,Al Bab,Ar-Ra'ee,Ar-Ra'ee,36.6125,37.4464,1,SOHR,Other,"On 14 March 2020, an IED planted in a car by a...",1,1584396000.0,0,0,"[march, ied, ar-ra'ee, aleppo]","[planted, exploded, killing, backed, injured]",0,1,0
5,6993757,2020-03-14,2020,1,Riots,Violent demonstration,Rioters (Syria),5,,0,50,Idleb,Ariha,Ariha,Ariha,35.814,36.6102,2,SOHR,Other,"On 14 March 2020, people of Idlib have been st...",0,1584396000.0,0,0,"[march, idlib, jisr ariha, aleppo-latakia, int...","[have, been, holding, overlooking, cut, setting]",0,0,2


In [36]:
print(conflict_syr.location.unique())
print(len(conflict_syr.location.unique()))
# Not assigning numeral values to locations. 
print(conflict_syr.admin1.unique())
print(len(conflict_syr.admin1.unique()))
print(conflict_syr.admin2.unique())
print(len(conflict_syr.admin2.unique()))
print(conflict_syr.admin3.unique())
print(len(conflict_syr.admin3.unique()))

#Due to time constraints and the amount of data here, I won't be focusing on precise locations right now. 
# I might do something with Admin1 if time allows, because that has 14 unique values. I may be able to distinguish between front-line or conflict-heavy areas and 'calmer' areas at the time of the event.

['Ar-Raqqa' 'Karama' 'Takihi' ... 'Al-Nasiriyyah Military Airport'
 'Himarayn' 'Rasm Abbud']
3328
['Ar-Raqqa' 'Deir-ez-Zor' 'Aleppo' 'Idleb' 'Al-Hasakeh' 'Lattakia' 'Hama'
 'Damascus' "Dar'a" 'Homs' 'Rural Damascus' 'As-Sweida' 'Quneitra'
 'Tartous']
14
['Ar-Raqqa' 'Deir-ez-Zor' 'Al Bab' 'Ariha' 'Afrin' 'Ras Al Ain'
 'Al-Qardaha' 'Jablah' 'Idleb' 'Al Mayadin' 'Al-Hasakeh' 'Harim'
 'Al-Haffa' 'As-Suqaylabiyah' "A'zaz" 'Damascus' 'Tell Abiad' "Dar'a"
 "Al Ma'ra" 'Homs' 'Jebel Saman' 'Abu Kamal' 'Quamishli' 'Duma' 'At Tall'
 'Rural Damascus' "Izra'" 'Al-Malikeyyeh' 'As-Sweida' 'Lattakia' 'Hama'
 'Jisr-Ash-Shugur' 'Ar-Rastan' 'Menbij' 'Ath-Thawrah' 'Quneitra'
 'Al-Qusayr' 'As-Sanamayn' 'Ain Al Arab' 'Jarablus' 'Qatana' 'As-Safira'
 'Muhradah' 'Masyaf' 'Tadmor' 'An Nabk' 'Darayya' 'Shahba' 'Al Makhrim'
 'Yabroud' 'As-Salamiyeh' 'Al Qutayfah' 'Tartous' 'Az-Zabdani'
 'Tall Kalakh' 'Salkhad' 'Banyas' 'Al Fiq']
58
['Ar-Raqqa' 'Karama' 'Basira' "Ar-Ra'ee" 'Ariha' 'Afrin' 'Ras Al Ain'
 'Al-Qardah

### Final cleaning:

While all these columns are massively interesting and I would love to analyse all of them, I am creating a stripped copy of the current dataframe containing only columns that are relevant to my current goal. This is subject to change (as I might need more data down the line, but right now I notice that keeping all the current columns is distracting.

In [37]:
syr_data_new = syr_data.filter(['year', 'event_type', 'sub_event_type', 'actor1', 'inter1', 'actor2', 'inter2', 'interaction', 'admin1', 'admin2', 'admin3', 'location', 'notes', 'fatalities', 'men' , 'women', 'nouns', 'verbs', 'event_id', 'children', 'civilians'], axis=1)

In [None]:
#with open('../data/HDX-data/syr/conflict_data_syr.csv') as f:
    print(f)

In [None]:
#syr_data.to_csv('../data/syr_data_cleaned.csv', encoding='utf-8', index=False)

In [None]:
#syr_data_new.to_csv('../data/syr_data_new.csv', encoding='utf-8', index=False)