# Political Violence Targeting Women & Demonstrations Featuring Women

#### All violence targeting women, as well as demonstrations featuring women, are included in the data file below. The data in this file cover all events in which women were specifically targeted by political violence, not all events involving women in any way; the file also covers all demonstration events in which women were specifically featured, not all demonstrations involving women. 

#### This dataset was extract from ACLED Access Portal and we perform EDA using Python.


### Data Collection & Data Exploration

We import pandas and necessary libraries for dataframe manipulation and analysis.

In [1]:
# Libraries are imported
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, date


In [2]:
# Dataset is downloaded
df_gpv = pd.read_csv("gender_Sep27-1.csv")
df_gpv.drop([
    'event_id_cnty',
    #'event_date', 
    'year', 
    'time_precision',
    'disorder_type', 
    #'event_type', 
    #'sub_event_type', 
    'actor1',
    'assoc_actor_1', 
    #'inter1', 
    'actor2', 
    'assoc_actor_2', 
    'inter2',
    #'interaction', 
    'civilian_targeting', 
    #'iso',
    #'region', 
    #'country',
    'admin1', 
    'admin2', 
    'admin3', 
    #'location', 
    'latitude', 
    'longitude',
    'geo_precision', 
    'source', 
    'source_scale', 
    #'notes', 
    #'fatalities',
    'tags',
    'timestamp'                           
], axis=1, inplace=True)


df_gpv["date"] = df_gpv["event_date"].apply(date.fromisoformat)
#df_gpv['timestamp'] = df_gpv['timestamp'].apply(datetime.fromtimestamp)
display(df_gpv.shape)

(75563, 12)

In [4]:
df_gpv['date'][0]

datetime.date(2024, 9, 27)

In [5]:
df_gpv['date'][0].year

2024

In [6]:
df_gpv['date'][0].month

9

In [7]:
df_gpv['date'][0].day

27

In [3]:
# First rows are checked
display(df_gpv.head(2))


Unnamed: 0,event_date,event_type,sub_event_type,inter1,interaction,iso,region,country,location,notes,fatalities,date
0,2024-09-27,Violence against civilians,Sexual violence,Political militia,Political militia-Civilians,76,South America,Brazil,Sorriso,"On 27 September 2024, in Sorriso (Mato Grosso)...",0,2024-09-27
1,2024-09-27,Violence against civilians,Attack,Political militia,Political militia-Civilians,218,South America,Ecuador,Guayaquil,"Around 27 September 2024 (as reported), in Gua...",1,2024-09-27


#### Checking unique values

In [9]:
df_gpv['region'].unique()

array(['South America', 'North America', 'Middle East', 'Southern Africa',
       'Europe', 'East Asia', 'South Asia', 'Northern Africa',
       'Western Africa', 'Middle Africa', 'Central America',
       'Caucasus and Central Asia', 'Southeast Asia', 'Eastern Africa',
       'Caribbean', 'Oceania'], dtype=object)

In [8]:
df_gpv['event_type'].unique()

array(['Violence against civilians', 'Protests', 'Riots',
       'Strategic developments', 'Explosions/Remote violence'],
      dtype=object)

In [None]:
df_gpv['inter1'].unique()


In [None]:
df_gpv['interaction'].unique()

In [None]:
df_gpv['sub_event_type'].unique()

In [None]:
# Last five rows are checking

display(df_gpv.tail(2))

In [None]:
# columns's names are checked
display(df_gpv.keys())

In [None]:
# Structural Overview of the DataFrame
df_gpv.info()


In [None]:
# Basic statistical description (numerical columns)
df_gpv.describe()

In [None]:
# First column is displaying #may change
df_gpv['event_id_cnty']

#### Handling missing data

In [None]:
# Missing values are identifying
df_gpv.isnull().sum()

### Pre-processing data

In [None]:
len(df_gpv['event_id_cnty'].unique())


In [None]:
#Unique values from'actor1'
unique_actors = df_gpv['actor1'].unique()
for i in unique_actors:
    print(i)

In [None]:
df_gpv[df_gpv['fatalities']== 750].iloc[0]['notes']

In [None]:
df_gpv['actor1'].value_counts()

 ### Challenge suggested
- Iteras por cada fila en la columna de "actor1".
- Por cada string, si es distinto de NaN, te fijás que TODAS tengan la forma: "xxxxx (pais)".
- Si es así, entonces, vas de derecha a izquierda en ese string, la primera vez que aparece el "(", borrás todo el string desde ese punto a la derecha y que te quede "xxxxx " (ojo el espacio del final).
- Al string resultante le haces un .strip() para sacar espacios varios al principio y final.
- Y el resultado lo creas en una nueva columna llamada "actor1_formatted".

In [None]:
for value in df_gpv['actor1']:

    if '(' not in value:
        print(value)