### **PREDICTION OF JUSTIFICATION FOR VIOLENCE AGAINST WOMEN**

### ***LIFE CYCLE OF MACHINE LEARNING PROJECT***
- Understanding Problem Statement
- Data Collection
- Data checks to perform
- Exploratory Data Analysis
- Data pre-processing
- Model training
- Choose the best model

### **1. Problem Statement**
This project predicts public justification of violence against women based on demographic and cultural factors using machine learning.
The study identifies the most influential demographic factors, evaluates multiple regression models, and enhances prediction accuracy through model tuning.

### **2. Data Collection**

- Source : https://www.kaggle.com/code/gpreda/violence-against-women-and-girls
- The dataset has 8 columns and 12600 rows

### 2.1 Import Data and Required Packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Import CSV Data as Python DataFrame

In [None]:
df=pd.read_csv("data/violence_data.csv")
df.head()

### Shape Of Dataset

In [None]:
df.shape

#### 2.2 Dataset Information
There are 5 demographics questions:

- Age (15-24, 25-34, 35-49)
- Education (No Education, Primary, Secondary, Higher)
- Employement (Unemployed, Employed for cash, Employed for kind)
- Marital status (Maried or living together, Widowed, divorced, separated, Never Married)
- Residence (Rural, Urban)

### **3. Data Checks To Perform**
- Check missing value
- Check duplicates
- Check Data Type
- Check Number of Unique Values for each column
- Check statistics of dataset
- Check various categories present in different categorical columns

### 3.1 Check Missing Value

In [None]:
df.isnull().sum()

#### Insight
- Except Value, each column has no missing value

#### Filling the missing values with the median of the rest of the values

In [None]:
df['Value']=df['Value'].fillna(df['Value'].median())

### 3.2 Check Duplicates


In [None]:
df.duplicated().sum()

#### Insight
- There is no duplicate in the dataset.


### 3.3 Check Data Types

In [None]:
df.info()

### 3.4 Checking the number of unique values of each column

In [None]:
df.nunique()

### 3.5 Statistics of data

In [None]:
df.describe()

### 3.6 Exploring Data


#### Violence justification distribution per demographic group


In [None]:
demographics_df = df.groupby(["Demographics Question", "Demographics Response"])["Value"].agg(["median", "max", "min", "mean"]).reset_index()
demographics_df.columns = ["Question", "Response", "Median", "Max", "Min", "Mean"]
print("Violence % median, min, max, and mean per demographic group")
demographics_df.sort_values(["Question", "Median"])

#### Insights 
- The most expected age group is 15-24 with median of 14.90 but the absolute max can be seen for age group 25.35 with the max value of 81.5
- Education level for girls and women is a good predictor for justification for violence against women - Higher education, less justification.
- Employment factor also counts, from unemployed to employed for kind, the median is varying from 14.90 to 16.60.
- Married individuals tend to normalize or accept the violence, possibly due to societal conditioning.
- Urban residents tend to justify less, possibly due to awareness, education and gender equality exposure.


In [None]:
question_df = df.groupby(["Question"])["Value"].agg(["median", "max", "min", "mean"]).reset_index()
question_df.columns = ["Question", "Median", "Max", "Min", "Mean"]
print("Violence % median, min, max, and mean per question asked")
question_df.sort_values(["Median"])


#### Insights
- Very few people justify violence for a cause of burning food.
- Slightly higher justification for refusal of sex, shows patriarchal norms.
- Increasing justification for going out without telling displaying controlling attitude towards women's autonomy.
- Similar trend of arguing, seen as response to "disobedience"
- People view violence for the cause of neglecting children due to traditional gender roles.
- Overall, about one-third of respondents justify violence under at least one situation — a major social concern.

In [None]:
# Define numerical features and catrgorical features
numerical_features=[feat for feat in df.dtypes[df.dtypes!='object'].index]

categorical_features=[feat for feat in df.dtypes[df.dtypes=='object'].index]

print(f"Numerical features : {numerical_features}")
print(f"Categorical features  : {categorical_features}")

#### 3.7 Dropping Irrelevant Columns 

In [None]:
df=df.drop(columns=['RecordID','Survey Year'])

In [None]:
df.head()

### **4. Visualization**


In [None]:
plt.figure(figsize=(8,6))
sns.histplot(data=df,x='Value',kde=True,bins=30,hue='Demographics Question')

#### Insights
- Most values cluster around 20%, meaning the justification is generally low across all demographics.
- Education shows first peak at 0% and second around 15%, which shows that people with education are likely to justify less while less educated show higher justification.
- Residence has a smoother, broader distribution extending further right, suggesting urban–rural differences where one group may justify violence more often.
- Age, Employement and Marital Status have similar curves centred around values around 10% indicating thhese factor influence justification less than compared to residence and education.
- The low tail towards the higher values represents outliers where the justification is unusually high.

#### 4.1 Justification Level By Demographics

#### 4.1.1 By Education Level

In [None]:
edu_df=df[df['Demographics Question']=='Education']
edu_df

In [None]:
plt.figure(figsize=(6,5))
sns.boxplot(data=edu_df, x='Demographics Response', y='Value', palette='viridis')
plt.title('Justification of Violence by Education Level')
plt.xlabel('Education Level')
plt.ylabel('Justification Value')
plt.xticks(rotation=45)
plt.show()

#### Insight
- Justification decreases with the the education level.

#### 4.1.2 By Age

In [None]:
age_df=df[df['Demographics Question']=='Age']
age_df

In [None]:
plt.figure(figsize=(6,5))
sns.boxplot(data=age_df, x='Demographics Response', y='Value', palette='viridis')
plt.title('Justification of Violence by Age')
plt.xlabel('Age')
plt.ylabel('Justification Value')
plt.xticks(rotation=45)
plt.show()


#### Insight
- The justification is fairly consistent across all age groups, with only slight increase in the age group 15-24

#### 4.1.3 By Marital Status

In [None]:
mar_df=df[df['Demographics Question']=='Marital status']


In [None]:
plt.figure(figsize=(10,5))
sns.boxplot(data=mar_df, x='Demographics Response', y='Value', palette='viridis')
plt.title('Justification of Violence by Marital Status')
plt.xlabel('Marital Status')
plt.ylabel('Justification Value')
plt.xticks(rotation=45)
plt.show()


#### Insight
- The justification level is consistent across the marital status with slight increase in married individuals or widowed, divorced, separated individuals

#### 4.1.4 By Employment

In [None]:
emp_df = df[df['Demographics Question'] == 'Employment']
plt.figure(figsize=(8,5))
sns.boxplot(data=emp_df, x='Demographics Response', y='Value', palette='coolwarm')
plt.title('Justification of Violence by Employment Status')
plt.xlabel('Employment Status')
plt.ylabel('Justification Value')
plt.xticks(rotation=45)
plt.show()


#### Insight
- People who are employed for kind are usually justifying the violence

#### 4.1.5 By Residence

In [None]:
res_df = df[df['Demographics Question'] == 'Residence']
plt.figure(figsize=(8,5))
sns.boxplot(data=res_df, x='Demographics Response', y='Value', palette='cubehelix')
plt.title('Justification of Violence by Residence Type')
plt.xlabel('Residence Type')
plt.ylabel('Justification Value')
plt.xticks(rotation=45)
plt.show()


#### Insight
- There is a potential urban-rural differences in justification pattern.
- Rural residents tend to justify violence more, possibly due to less awareness.



#### 4.2 Justifcation level based on Gender

In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(x='Gender',y='Value',data=df,palette='Set2')
plt.title('Justification of Violence by Gender')
plt.ylabel('Justification Value')
plt.xlabel('Gender')
plt.show()

#### Insight
- Women tend to justify violence against women than men, implies the internalized patriarchal norms. 
- It suggests that over the time, due to cultural conditioning, lack of empowerment and societal acceptance of male authority, women may begin to perceive violence as "normal" part of marital life.

In [None]:
df.columns=df.columns.str.strip().str.replace(" ","_")

In [None]:
df.to_csv("data/cleaned_violence_data.csv",index=False)