<a href="https://www.kaggle.com/code/pratul007/fatalities-in-the-israeli-palestinian?scriptVersionId=147951107" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Libraries

In [1]:
import pandas as pd
import folium
import numpy as np
import plotly.express as px

# Dataset Import

In [2]:
# Load the CSV into a dataframe
data_df = pd.read_csv('/kaggle/input/fatalities-in-the-israeli-palestinian/fatalities_isr_pse_conflict_2000_to_2023.csv')

# Display the first few rows to understand the structure
data_df.head()

Unnamed: 0,name,date_of_event,age,citizenship,event_location,event_location_district,event_location_region,date_of_death,gender,took_part_in_the_hostilities,place_of_residence,place_of_residence_district,type_of_injury,ammunition,killed_by,notes
0,'Abd a-Rahman Suleiman Muhammad Abu Daghash,2023-09-24,32.0,Palestinian,Nur Shams R.C.,Tulkarm,West Bank,2023-09-24,M,,Nur Shams R.C.,Tulkarm,gunfire,live ammunition,Israeli security forces,Fatally shot by Israeli forces while standing ...
1,Usayed Farhan Muhammad 'Ali Abu 'Ali,2023-09-24,21.0,Palestinian,Nur Shams R.C.,Tulkarm,West Bank,2023-09-24,M,,Nur Shams R.C.,Tulkarm,gunfire,live ammunition,Israeli security forces,Fatally shot by Israeli forces while trying to...
2,'Abdallah 'Imad Sa'ed Abu Hassan,2023-09-22,16.0,Palestinian,Kfar Dan,Jenin,West Bank,2023-09-22,M,,al-Yamun,Jenin,gunfire,live ammunition,Israeli security forces,Fatally shot by soldiers while firing at them ...
3,Durgham Muhammad Yihya al-Akhras,2023-09-20,19.0,Palestinian,'Aqbat Jaber R.C.,Jericho,West Bank,2023-09-20,M,,'Aqbat Jaber R.C.,Jericho,gunfire,live ammunition,Israeli security forces,Shot in the head by Israeli forces while throw...
4,Raafat 'Omar Ahmad Khamaisah,2023-09-19,15.0,Palestinian,Jenin R.C.,Jenin,West Bank,2023-09-19,M,,Jenin,Jenin,gunfire,live ammunition,Israeli security forces,Wounded by soldiers’ gunfire after running awa...


# Fatality Trends from 2000 to 2023

In [3]:
# Convert date columns to datetime format
data_df['date_of_event'] = pd.to_datetime(data_df['date_of_event'])
data_df['date_of_death'] = pd.to_datetime(data_df['date_of_death'])

# Group by year and count the fatalities
fatality_by_year = data_df.groupby(data_df['date_of_event'].dt.year).size().reset_index(name='fatalities')

# Plot the trends in fatalities over time using Plotly
fig = px.line(fatality_by_year, x='date_of_event', y='fatalities', 
              title='Fatality Trends from 2000 to 2023',
              labels={'date_of_event': 'Year', 'fatalities': 'Number of Fatalities'},
              markers=True)

# Show the plot
fig.show()


**The graph above depicts the trends in fatalities from the year 2000 to 2023:**

* There was a significant spike in fatalities around the year 2002.
* A decline in fatalities was observed after 2002, reaching a low point around 2005.
* From 2005 to 2008, there was another increase in fatalities.
* After 2008, the fatalities decreased and remained relatively stable until around 2018.
* From 2018 to 2023, we see a gradual increase in the number of fatalities.

# Number of Fatalities

In [4]:
# Creating a histogram for age distribution using Plotly
fig = px.histogram(data_df, x='age', nbins=30, title='Age Distribution of Fatalities',
                   labels={'age': 'Age'}, opacity=0.75)

# Updating the layout
fig.update_layout(xaxis_title='Age', yaxis_title='Number of Fatalities',
                  bargap=0.1, bargroupgap=0.1)

# Show the plot
fig.show()

**The histogram above showcases the age distribution of the fatalities:**

* A significant number of fatalities occurred among individuals in the age range of late teens to early twenties.
* There's a noticeable decline in fatalities as age increases, with fewer fatalities among older individuals.
* There is also a smaller peak in fatalities among very young individuals, likely children.

# Gender Distribution of Fatalities

In [5]:
# Gender Distribution
gender_distribution = data_df['gender'].value_counts()

# Resetting index to get 'gender' as a column
gender_df = gender_distribution.reset_index()
gender_df.columns = ['Gender', 'Number of Fatalities']

# Creating a bar chart for gender distribution using Plotly
fig = px.bar(gender_df, x='Gender', y='Number of Fatalities',
             title='Gender Distribution of Fatalities',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale=['lightcoral', 'lightskyblue'])

# Updating the layout
fig.update_layout(xaxis_title='Gender', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array'), bargap=0.2)

# Show the plot
fig.show()

**The bar chart illustrates the gender distribution of fatalities:**

* A majority of the fatalities were male.
* The number of female fatalities is significantly lower in comparison.

# Citizenship Distribution of Fatalities

In [6]:
# Calculate citizenship distribution
citizenship_distribution = data_df['citizenship'].value_counts().reset_index()
citizenship_distribution.columns = ['Citizenship', 'Number of Fatalities']

# Create a bar chart using Plotly
fig = px.bar(citizenship_distribution, x='Citizenship', y='Number of Fatalities',
             title='Citizenship Distribution of Fatalities',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='Viridis')

# Update the layout
fig.update_layout(xaxis_title='Citizenship', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()

**The bar chart above provides insights into the citizenship distribution of fatalities:**

* A vast majority of the fatalities were Palestinians.

# Distribution of Fatalities by Region

In [7]:
# Distribution by Region
region_distribution = data_df['event_location_region'].value_counts()

# Resetting index to get 'event_location_region' as a column
region_df = region_distribution.reset_index()
region_df.columns = ['Region', 'Number of Fatalities']

# Creating a bar chart for region distribution using Plotly
fig = px.bar(region_df, x='Region', y='Number of Fatalities',
             title='Distribution of Fatalities by Region',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='Teal')

# Updating the layout
fig.update_layout(xaxis_title='Region', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()


**The bar chart showcases the distribution of fatalities by region:**

* The Gaza Strip has the highest number of fatalities, followed by the West Bank.
* Other regions have considerably fewer fatalities in comparison.

# Distribution of Fatalities by District

In [8]:
# Distribution by District
district_distribution = data_df['event_location_district'].value_counts()

# Resetting index to get 'event_location_district' as a column
district_df = district_distribution.reset_index()
district_df.columns = ['District', 'Number of Fatalities']

# Creating a bar chart for district distribution using Plotly
fig = px.bar(district_df, x='District', y='Number of Fatalities',
             title='Distribution of Fatalities by District',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='Viridis')

# Updating the layout
fig.update_layout(xaxis_title='District', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()


**The bar chart presents the distribution of fatalities by district:**

* The districts of Gaza, Hebron, and Jenin have the highest number of fatalities.
* Other districts also have significant numbers, with Nablus, Ramallah, and Bethlehem following closely.
* Some districts, like Jericho, have fewer fatalities compared to others.

# Distribution Based on Participation in Hostilities

In [9]:
# Participation Distribution
participation_distribution = data_df['took_part_in_the_hostilities'].value_counts()

# Resetting index to get 'took_part_in_the_hostilities' as a column
participation_df = participation_distribution.reset_index()
participation_df.columns = ['Took Part in the Hostilities', 'Number of Fatalities']

# Creating a bar chart for participation distribution using Plotly
fig = px.bar(participation_df, x='Took Part in the Hostilities', y='Number of Fatalities',
             title='Distribution Based on Participation in Hostilities',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='sunset')

# Updating the layout
fig.update_layout(xaxis_title='Took Part in the Hostilities', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array'), bargap=0.2)

# Show the plot
fig.show()


**The bar chart displays the distribution based on whether individuals took part in hostilities:**

* A significant number of fatalities involved individuals who did not take part in the hostilities.
* The number of fatalities among those who did participate in hostilities is lower.

# Type of Injury Based on Participation in Hostilities

In [10]:
# Relationship with Type of Injury
participation_injury = data_df.groupby('took_part_in_the_hostilities')['type_of_injury'].value_counts().unstack().fillna(0)

# Reshaping the DataFrame to long format
participation_injury_long = participation_injury.reset_index().melt(id_vars='took_part_in_the_hostilities', var_name='type_of_injury')

# Creating a stacked bar chart for type of injury based on participation in hostilities
fig = px.bar(participation_injury_long, 
             x='took_part_in_the_hostilities', 
             y='value', 
             color='type_of_injury',
             title='Type of Injury Based on Participation in Hostilities',
             labels={'value': 'Number of Fatalities', 'type_of_injury': 'Type of Injury'},
             color_discrete_sequence=px.colors.sequential.Viridis)

# Updating the layout
fig.update_layout(barmode='stack', xaxis_title='Took Part in the Hostilities', yaxis_title='Number of Fatalities', legend_title='Type of Injury')

# Show the plot
fig.show()

**The stacked bar chart illustrates the types of injuries sustained by individuals based on whether they took part in hostilities:**

* For both groups (those who participated in hostilities and those who did not), gunfire injuries are predominant.
* There's a notable presence of other injury types, but their proportions are smaller in comparison to gunfire.

# Distribution of Types of Injuries

In [11]:
# Types of Injuries
injury_distribution = data_df['type_of_injury'].value_counts()

# Converting the injury distribution to a DataFrame
injury_df = injury_distribution.reset_index()
injury_df.columns = ['Type of Injury', 'Number of Fatalities']

# Creating a bar chart for the distribution of types of injuries
fig = px.bar(injury_df, x='Type of Injury', y='Number of Fatalities',
             title='Distribution of Types of Injuries',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='teal')

# Updating the layout
fig.update_layout(xaxis_title='Type of Injury', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()

**The bar chart showcases the distribution of different types of injuries:**

* Gunfire is the predominant type of injury, accounting for a significant majority of the fatalities.
* Other types of injuries, such as those from explosions or missile strikes, are present but in much smaller numbers.

# Distribution Based on Type of Ammunition Used

In [12]:
# Ammunition Distribution
ammunition_distribution = data_df['ammunition'].value_counts()

# Converting the ammunition distribution to a DataFrame
ammunition_df = ammunition_distribution.reset_index()
ammunition_df.columns = ['Type of Ammunition', 'Number of Fatalities']

# Creating a bar chart for the distribution of types of ammunition
fig = px.bar(ammunition_df, x='Type of Ammunition', y='Number of Fatalities',
             title='Distribution Based on Type of Ammunition Used',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='YlOrBr')

# Updating the layout
fig.update_layout(xaxis_title='Type of Ammunition', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()

**The bar chart displays the distribution based on the type of ammunition used:**

* Live ammunition is by far the most commonly used type, resulting in a significant majority of the fatalities.
* Other types of ammunition, such as rubber-coated metal bullets and tear gas, have caused fatalities but are less prevalent in comparison.

# Distribution of Entities Responsible for Fatalities

In [13]:
# Distribution of Entities Responsible for Fatalities
killed_by_distribution = data_df['killed_by'].value_counts()

# Converting the killed_by distribution to a DataFrame for Plotly
killed_by_df = killed_by_distribution.reset_index()
killed_by_df.columns = ['Entity Responsible', 'Number of Fatalities']

# Creating a bar chart for the distribution of entities responsible for fatalities
fig = px.bar(killed_by_df, x='Entity Responsible', y='Number of Fatalities',
             title='Distribution of Entities Responsible for Fatalities',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='Blues')

# Updating the layout
fig.update_layout(xaxis_title='Entity Responsible', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()

**The bar chart displays the distribution of entities responsible for fatalities:**

* Israeli security forces are responsible for a significant majority of the fatalities.
* Other entities, such as Palestinian civilians, Palestinian groups, and Israeli civilians, have also been involved in fatal incidents, but their numbers are comparatively lower.

# Distribution of Fatalities by Age Group

In [14]:
# Categorizing age data into age groups
bins = [0, 12, 19, 29, 59, 100]
labels = ['Children (0-12)', 'Teenagers (13-19)', 'Young Adults (20-29)', 'Adults (30-59)', 'Elderly (60+)']

data_df['age_group'] = pd.cut(data_df['age'], bins=bins, labels=labels, right=True)


# Calculate age group distribution
age_group_distribution = data_df['age_group'].value_counts().reset_index()
age_group_distribution.columns = ['Age Group', 'Number of Fatalities']

# Create a bar chart using Plotly
fig = px.bar(age_group_distribution, x='Age Group', y='Number of Fatalities',
             title='Distribution of Fatalities by Age Group',
             labels={'Number of Fatalities': 'Number of Fatalities'},
             color='Number of Fatalities', color_continuous_scale='Purples')

# Update the layout
fig.update_layout(xaxis_title='Age Group', yaxis_title='Number of Fatalities',
                  xaxis=dict(tickmode='array', tickangle=45), bargap=0.2)

# Show the plot
fig.show()

**The bar chart showcases the distribution of fatalities based on age groups:**

* Young Adults (20-29 years) account for the highest number of fatalities.
* Adults (30-59 years) and Teenagers (13-19 years) follow closely, indicating significant fatalities in these age groups.
* Children (0-12 years) and Elderly (60+ years) have lower numbers of fatalities in comparison.

# Distribution of Fatalities by Gender and Citizenship

In [15]:
# Gender and Citizenship Combination
gender_citizenship_distribution = data_df.groupby(['gender', 'citizenship']).size().unstack().fillna(0)

# Resetting the index to use the 'gender' and 'citizenship' columns as variables in the plot
gender_citizenship_long = gender_citizenship_distribution.reset_index().melt(id_vars='gender')

# Creating a stacked bar chart for the distribution of fatalities by gender and citizenship
fig = px.bar(gender_citizenship_long, 
             x='gender', 
             y='value', 
             color='citizenship',
             title='Distribution of Fatalities by Gender and Citizenship',
             labels={'value': 'Number of Fatalities', 'citizenship': 'Citizenship', 'gender': 'Gender'},
             color_discrete_sequence=px.colors.sequential.Viridis)

# Updating the layout
fig.update_layout(barmode='stack', xaxis_title='Gender', yaxis_title='Number of Fatalities', legend_title='Citizenship')

# Show the plot
fig.show()

**The stacked bar chart depicts the distribution of fatalities based on gender and citizenship:**

* For both male and female fatalities, the overwhelming majority are Palestinians.
* There is a small number of other citizenships involved, but the disparity with Palestinian fatalities is evident.

# Top 10 Common Places of Residence Among Victims

In [16]:
# Getting the distribution of the top 10 common places of residence
residence_distribution = data_df['place_of_residence'].value_counts().head(10)

# Creating the bar chart using Plotly
fig = px.bar(
    x=residence_distribution.index,
    y=residence_distribution.values,
    title='Top 10 Common Places of Residence Among Victims',
    labels={'x': 'Place of Residence', 'y': 'Number of Fatalities'},
    color=residence_distribution.index,
    color_discrete_sequence=px.colors.qualitative.Plotly,
)

# Updating the layout
fig.update_layout(xaxis_tickangle=-45, xaxis_title='Place of Residence', yaxis_title='Number of Fatalities')

# Show the plot
fig.show()

**The bar chart highlights the top 10 common places of residence among the victims:**

* Gaza stands out as the place of residence with the highest number of fatalities.
* Other places, such as Hebron, Jenin, and Nablus, also have significant numbers of fatalities.
* The list provides a snapshot of areas that have been notably affected based on the residence of the victims.

# The Gaza Strip and the West Bank based on the number of fatalities

In [17]:
# Redefining the approximate coordinates for major districts
district_coords = {
    'Gaza': [31.5, 34.466667],
    'Hebron': [31.532569, 35.095388],
    'Jenin': [32.457336, 35.286865],
    'Nablus': [32.221481, 35.254417],
    'Ramallah': [31.902922, 35.206209],
    'Bethlehem': [31.705791, 35.200657],
    'Tulkarm': [32.308628, 35.028537],
    'Jericho': [31.857163, 35.444362],
    'Rafah': [31.296866, 34.245536],
    'Khan Yunis': [31.346201, 34.306286]
}

# Get fatality counts for each district
district_fatalities = data_df.groupby('event_location_district').size()

# Function to determine the color of the circle based on the number of fatalities
def get_color(fatalities):
    if fatalities > 500:
        return 'darkred'
    elif fatalities > 100:
        return 'red'
    elif fatalities > 50:
        return 'orange'
    else:
        return 'green'

# Create a base map centered around the region
m = folium.Map(location=[31.5, 34.75], zoom_start=8)

# Add markers and circles for districts
for district, coords in district_coords.items():
    fatalities = district_fatalities.get(district, 0)
    folium.Marker(
        location=coords,
        tooltip=f'{district}: {fatalities} fatalities',
        icon=None
    ).add_to(m)
    folium.Circle(
        location=coords,
        radius=np.sqrt(fatalities) * 1000,  # scale radius for better visualization
        color=get_color(fatalities),
        fill=True,
        fill_color=get_color(fatalities),
        fill_opacity=0.6,
    ).add_to(m)

# Add layer control
folium.LayerControl().add_to(m)

m

**Here's a basic map highlighting the Gaza Strip and the West Bank based on the number of fatalities:**

* The size of the circle is proportional to the square root of the number of fatalities, providing a clearer visual representation.
* The color of the circle represents the severity, with darker red indicating a higher number of fatalities.

# Please upvote if you like the solution