## Table of Contents


## Goals
This notebook contains an analysis on Violence Against Women and Girls data. The goal for this project was to do the following:
* Get acquainted with the data* 
Clean the data so it is ready for analysi
* Develop some questions for analysis
* Analyze variables within the data to gain patterns and insights on these questionsons

## Data 
The data for this project was downloaded from Kaggle:

https://www.kaggle.com/datasets/andrewmvd/violence-against-women-and-girls?resource=download

Information regarding the features for the data are located in the `Column` section on the website.

## Loading Data

In [44]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import re
from IPython.display import display

In [46]:
violence_data = pd.read_csv('violence_data.csv')
violence_data.head(10)

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
0,1,Afghanistan,F,Marital status,Never married,... if she burns the food,01/01/2015,
1,1,Afghanistan,F,Education,Higher,... if she burns the food,01/01/2015,10.1
2,1,Afghanistan,F,Education,Secondary,... if she burns the food,01/01/2015,13.7
3,1,Afghanistan,F,Education,Primary,... if she burns the food,01/01/2015,13.8
4,1,Afghanistan,F,Marital status,"Widowed, divorced, separated",... if she burns the food,01/01/2015,13.8
5,1,Afghanistan,F,Employment,Employed for kind,... if she burns the food,01/01/2015,17.0
6,1,Afghanistan,F,Age,15-24,... if she burns the food,01/01/2015,17.3
7,1,Afghanistan,F,Employment,Unemployed,... if she burns the food,01/01/2015,18.0
8,1,Afghanistan,F,Residence,Rural,... if she burns the food,01/01/2015,18.1
9,1,Afghanistan,F,Age,25-34,... if she burns the food,01/01/2015,18.2


In [48]:
violence_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12600 entries, 0 to 12599
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   RecordID               12600 non-null  int64  
 1   Country                12600 non-null  object 
 2   Gender                 12600 non-null  object 
 3   Demographics Question  12600 non-null  object 
 4   Demographics Response  12600 non-null  object 
 5   Question               12600 non-null  object 
 6   Survey Year            12600 non-null  object 
 7   Value                  11187 non-null  float64
dtypes: float64(1), int64(1), object(6)
memory usage: 787.6+ KB


In [50]:
violence_data.nunique()

RecordID                 420
Country                   70
Gender                     2
Demographics Question      5
Demographics Response     15
Question                   6
Survey Year               18
Value                    757
dtype: int64

In [52]:
violence_data.isnull().sum()

RecordID                    0
Country                     0
Gender                      0
Demographics Question       0
Demographics Response       0
Question                    0
Survey Year                 0
Value                    1413
dtype: int64

In [54]:
violence_data.columns

Index(['RecordID', 'Country', 'Gender', 'Demographics Question',
       'Demographics Response', 'Question', 'Survey Year', 'Value'],
      dtype='object')

In [56]:
violence_data.describe()

Unnamed: 0,RecordID,Value
count,12600.0,11187.0
mean,210.5,19.762537
std,121.248024,16.986437
min,1.0,0.0
25%,105.75,6.2
50%,210.5,14.9
75%,315.25,29.2
max,420.0,86.9


## Data Information 
Some immediate insights are:
* There are 8 columns and 12600 rows.
* The name and datatype of each column most values are strings in this data set.
* We have missing data is in the Value column.

## Data Cleaning
As mentioned before, the `Value` column has missing data. We exactly have 1413 missing values in this column, it represents approximately the 11%. Even though is big percentage of missing data. We have decided to use `Listwise Deleteion`, because we consider this data as `Missing At Random` data, also known as `MAR`.  



In [60]:
violence_data.dropna(inplace=True)

In [62]:
violence_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 11187 entries, 1 to 12599
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   RecordID               11187 non-null  int64  
 1   Country                11187 non-null  object 
 2   Gender                 11187 non-null  object 
 3   Demographics Question  11187 non-null  object 
 4   Demographics Response  11187 non-null  object 
 5   Question               11187 non-null  object 
 6   Survey Year            11187 non-null  object 
 7   Value                  11187 non-null  float64
dtypes: float64(1), int64(1), object(6)
memory usage: 786.6+ KB


For this project we are interested in `women's mentality`, so we will take only female's answers. 
In order to do that we will filter the `gender` column.

In [65]:
female = violence_data[violence_data['Gender'] == 'F']
female.head(20)

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
1,1,Afghanistan,F,Education,Higher,... if she burns the food,01/01/2015,10.1
2,1,Afghanistan,F,Education,Secondary,... if she burns the food,01/01/2015,13.7
3,1,Afghanistan,F,Education,Primary,... if she burns the food,01/01/2015,13.8
4,1,Afghanistan,F,Marital status,"Widowed, divorced, separated",... if she burns the food,01/01/2015,13.8
5,1,Afghanistan,F,Employment,Employed for kind,... if she burns the food,01/01/2015,17.0
6,1,Afghanistan,F,Age,15-24,... if she burns the food,01/01/2015,17.3
7,1,Afghanistan,F,Employment,Unemployed,... if she burns the food,01/01/2015,18.0
8,1,Afghanistan,F,Residence,Rural,... if she burns the food,01/01/2015,18.1
9,1,Afghanistan,F,Age,25-34,... if she burns the food,01/01/2015,18.2
10,1,Afghanistan,F,Marital status,Married or living together,... if she burns the food,01/01/2015,18.3


Now we will create some data sets based on the `Demographics Question` column, in order to have a data set per demographic group. With this we can do a better analysis.

In [68]:
female['Demographics Question'].unique()

array(['Education', 'Marital status', 'Employment', 'Age', 'Residence'],
      dtype=object)

In [72]:
female_age = female[female['Demographics Question'] == 'Age']
female_edu = female[female['Demographics Question'] == 'Education']
female_marital = female[female['Demographics Question'] == 'Marital status']
female_employ = female[female['Demographics Question'] == 'Employment']
female_residence = female[female['Demographics Question'] == 'Residence']
display(female_age.head())
display(female_edu.head())
display(female_marital.head())
display(female_employ.head())
display(female_residence.head())

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
6,1,Afghanistan,F,Age,15-24,... if she burns the food,01/01/2015,17.3
9,1,Afghanistan,F,Age,25-34,... if she burns the food,01/01/2015,18.2
12,1,Afghanistan,F,Age,35-49,... if she burns the food,01/01/2015,18.8
30,351,Afghanistan,F,Age,15-24,... for at least one specific reason,01/01/2015,80.1
31,351,Afghanistan,F,Age,25-34,... for at least one specific reason,01/01/2015,81.5


Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
1,1,Afghanistan,F,Education,Higher,... if she burns the food,01/01/2015,10.1
2,1,Afghanistan,F,Education,Secondary,... if she burns the food,01/01/2015,13.7
3,1,Afghanistan,F,Education,Primary,... if she burns the food,01/01/2015,13.8
13,1,Afghanistan,F,Education,No education,... if she burns the food,01/01/2015,19.1
45,351,Afghanistan,F,Education,Higher,... for at least one specific reason,01/01/2015,61.1


Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
4,1,Afghanistan,F,Marital status,"Widowed, divorced, separated",... if she burns the food,01/01/2015,13.8
10,1,Afghanistan,F,Marital status,Married or living together,... if she burns the food,01/01/2015,18.3
80,351,Afghanistan,F,Marital status,Married or living together,... for at least one specific reason,01/01/2015,80.6
82,351,Afghanistan,F,Marital status,"Widowed, divorced, separated",... for at least one specific reason,01/01/2015,67.6
83,71,Afghanistan,F,Marital status,Married or living together,... if she argues with him,01/01/2015,59.5


Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
5,1,Afghanistan,F,Employment,Employed for kind,... if she burns the food,01/01/2015,17.0
7,1,Afghanistan,F,Employment,Unemployed,... if she burns the food,01/01/2015,18.0
14,1,Afghanistan,F,Employment,Employed for cash,... if she burns the food,01/01/2015,20.8
65,351,Afghanistan,F,Employment,Employed for cash,... for at least one specific reason,01/01/2015,80.2
66,351,Afghanistan,F,Employment,Employed for kind,... for at least one specific reason,01/01/2015,86.9


Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
8,1,Afghanistan,F,Residence,Rural,... if she burns the food,01/01/2015,18.1
11,1,Afghanistan,F,Residence,Urban,... if she burns the food,01/01/2015,18.3
95,351,Afghanistan,F,Residence,Rural,... for at least one specific reason,01/01/2015,82.1
96,351,Afghanistan,F,Residence,Urban,... for at least one specific reason,01/01/2015,74.0
97,71,Afghanistan,F,Residence,Rural,... if she argues with him,01/01/2015,60.6


In [76]:
display(female_age['Demographics Response'].unique())
display(female_edu['Demographics Response'].unique())
display(female_marital['Demographics Response'].unique())
display(female_employ['Demographics Response'].unique())
display(female_residence['Demographics Response'].unique())

array(['15-24', '25-34', '35-49'], dtype=object)

array(['Higher', 'Secondary', 'Primary', 'No education'], dtype=object)

array(['Widowed, divorced, separated', 'Married or living together',
       'Never married'], dtype=object)

array(['Employed for kind', 'Unemployed', 'Employed for cash'],
      dtype=object)

array(['Rural', 'Urban'], dtype=object)

### For this project, it has been decided to work with the `female_residence` dataframe, to analyze the different mentality between women who live in rural and urban areas, in different regions around the world. 