### Loading and inspecting the data

In [1]:
#important pandas library as pd
import pandas as pd

In [2]:
#Load our csv file
filepath ="DataFiles/violence_data.csv"
df=pd.read_csv(filepath)

In [3]:
#Display the first 5 rows of our data set
df.head()

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question,Survey Year,Value
0,1,Afghanistan,F,Marital status,Never married,... if she burns the food,01/01/2015,
1,1,Afghanistan,F,Education,Higher,... if she burns the food,01/01/2015,10.1
2,1,Afghanistan,F,Education,Secondary,... if she burns the food,01/01/2015,13.7
3,1,Afghanistan,F,Education,Primary,... if she burns the food,01/01/2015,13.8
4,1,Afghanistan,F,Marital status,"Widowed, divorced, separated",... if she burns the food,01/01/2015,13.8


In [4]:
#Checking the number of rows and columns in our dataset
df.shape

(12600, 8)

In [5]:
#Checking the columns we have to determine which ones to keep and which ones to remove
df.columns

Index(['RecordID', 'Country', 'Gender', 'Demographics Question',
       'Demographics Response', 'Question', 'Survey Year', 'Value'],
      dtype='object')

### Cleaning and preparing the data

In [117]:
#Check for null values
df.isnull().any()

RecordID                 False
Country                  False
Gender                   False
Demographics Question    False
Demographics Response    False
Question                 False
dtype: bool

#### Our dataset contains on null values ,we proceed and check for duplicate values

In [122]:
#Check for duplicate
df.duplicated().any()

False

### Our dataset contains no duplicates as well

In [109]:
#Dropping the column named Value
df=df.drop("Value", axis=1)

KeyError: "['Value'] not found in axis"

In [9]:
#Drop the Survey Year column
df=df.drop("Survey Year" ,axis=1)

In [10]:
df

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question
0,1,Afghanistan,F,Marital status,Never married,... if she burns the food
1,1,Afghanistan,F,Education,Higher,... if she burns the food
2,1,Afghanistan,F,Education,Secondary,... if she burns the food
3,1,Afghanistan,F,Education,Primary,... if she burns the food
4,1,Afghanistan,F,Marital status,"Widowed, divorced, separated",... if she burns the food
...,...,...,...,...,...,...
12595,210,Zimbabwe,M,Residence,Urban,... if she goes out without telling him
12596,280,Zimbabwe,M,Residence,Rural,... if she neglects the children
12597,280,Zimbabwe,M,Residence,Urban,... if she neglects the children
12598,350,Zimbabwe,M,Residence,Rural,... if she refuses to have sex with him


## Data Manipulation

In [126]:
#Check the counts of females and males
df["Gender"].value_counts()

Gender
F    6300
M    6300
Name: count, dtype: int64

In [128]:
#Number of records per country
df["Country"].value_counts()

Country
Afghanistan    180
Mozambique     180
Nigeria        180
Niger          180
Nicaragua      180
              ... 
Haiti          180
Honduras       180
India          180
Indonesia      180
Zimbabwe       180
Name: count, Length: 70, dtype: int64

In [143]:
#Total number of countries in the dataset
df["Country"].nunique()

70

In [145]:
#Filtered Kenya data
Kenya = df[df["Country"] == "Kenya"]

In [147]:
#Inspect the data
Kenya["Gender"].value_counts()

Gender
F    90
M    90
Name: count, dtype: int64

In [66]:
#Filter based on Gender where Gender is Female
KenyaFemales= Kenya[Kenya["Gender"] =="F"]
KenyaFemales

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question
6120,385,Kenya,F,Age,15-24,... for at least one specific reason
6121,385,Kenya,F,Age,25-34,... for at least one specific reason
6122,385,Kenya,F,Age,35-49,... for at least one specific reason
6123,105,Kenya,F,Age,15-24,... if she argues with him
6124,105,Kenya,F,Age,25-34,... if she argues with him
...,...,...,...,...,...,...
6205,175,Kenya,F,Residence,Urban,... if she goes out without telling him
6206,245,Kenya,F,Residence,Rural,... if she neglects the children
6207,245,Kenya,F,Residence,Urban,... if she neglects the children
6208,315,Kenya,F,Residence,Rural,... if she refuses to have sex with him


In [70]:
KenyaFemales["Demographics Question"].value_counts()

Demographics Question
Education         24
Age               18
Employment        18
Marital status    18
Residence         12
Name: count, dtype: int64

#### 
The dataset KenyaFemales includes a column titled "Demographics Question," which contains various categories. Here's a summary of the counts for each category:

Education: Appears 24 times.
Age: Appears 18 times.
Employment: Appears 18 times.
Marital status: Appears 18 times.
Residence: Appears 12 times.
This distribution indicates that "Education" is the most frequently mentioned demographic question in the dataset, followed by "Age," "Employment," and "Marital status," each with an equal count, and "Residence" being the least mentioned.

In [73]:
KenyaFemales["Question"].value_counts()

Question
... for at least one specific reason       15
... if she argues with him                 15
... if she burns the food                  15
... if she goes out without telling him    15
... if she neglects the children           15
... if she refuses to have sex with him    15
Name: count, dtype: int64

In [151]:
#Filter based on Gender where Gender is Female
KenyaMales = Kenya[Kenya["Gender"]=="M"]

In [157]:
KenyaMales

Unnamed: 0,RecordID,Country,Gender,Demographics Question,Demographics Response,Question
6210,385,Kenya,M,Age,15-24,... for at least one specific reason
6211,385,Kenya,M,Age,25-34,... for at least one specific reason
6212,385,Kenya,M,Age,35-49,... for at least one specific reason
6213,105,Kenya,M,Age,15-24,... if she argues with him
6214,105,Kenya,M,Age,25-34,... if she argues with him
...,...,...,...,...,...,...
6295,175,Kenya,M,Residence,Urban,... if she goes out without telling him
6296,245,Kenya,M,Residence,Rural,... if she neglects the children
6297,245,Kenya,M,Residence,Urban,... if she neglects the children
6298,315,Kenya,M,Residence,Rural,... if she refuses to have sex with him


In [153]:
KenyaMales["Demographics Question"].value_counts()

Demographics Question
Education         24
Age               18
Employment        18
Marital status    18
Residence         12
Name: count, dtype: int64

In [155]:
KenyaMales["Question"].value_counts()

Question
... for at least one specific reason       15
... if she argues with him                 15
... if she burns the food                  15
... if she goes out without telling him    15
... if she neglects the children           15
... if she refuses to have sex with him    15
Name: count, dtype: int64

### Summary
The dataset seems to have similiar entries based on the value counts,It might be difficult to make comperative analysis 