# Juveniles and Crimes | What the Heck is going on?
##### Let's explore what the data says.

**PS: This is my first kaggle notebook. I'm new to this field. Your feedback is very valuable and it will keep motivating me to come forward with new notebooks. If you find any mistake, please comment down below. If you found this notebook useful, then don't forget to give an Upvote.**

> Throughout this notebook, I've used the word crime and offence interchangeably
#### 1 Upvote = 1x10<sup>100</sup> support

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_profiling import ProfileReport

print("Imported necessary libraries.")

In [None]:
DATASET_PATH = "../input/us-juvenile-arrests-by-crime/arrests_national_juvenile.csv"
# Importing dataset into pandas dataframe
jvc_df = pd.read_csv(DATASET_PATH)
jvc_df.head()

### Before Diving deep into the data
##### Let's take a loolk at what the column means

> This dataset contains information on the number of juvenile arrests in the US per crime category for each year between 1995 and 2016. The number of arrests is further broken down by sex, age group, and race for each crime category. This data is collected by the FBI as part of the Uniform Crime Reporting Program.

**Columns**
* state_abbr: State Abbreviations

**All the others are self-explanatory**

Source: [Kaggle US Juvenile Arrest By Crime](https://www.kaggle.com/tjkyner/us-juvenile-arrests-by-crime)

## Data Planing

1. **Cleaning**
2. **Analysing**
3. **Visualising**
4. **Answering**

In [None]:
# Let's start cleaning the data
# Let's get the description of data
jvc_df.info()

In [None]:
# From the above description, we have to remove state_abbr column as it doesn't even have a single not-null value
# Dropping unwanted columns
jvc_df = jvc_df.drop('state_abbr', axis=1)
jvc_df.columns

In [None]:
# Now that the state_abbr column is removed, let's handle the other Nan values
jvc_df.isna().sum()

In [None]:
jvc_df.head()

In [None]:
# Short summary of the data
jvc_df.describe()

In [None]:
# Let's sort the dataset by year
jvc_df1 = jvc_df.sort_values("year")
jvc_df1.head()

In [None]:
jvc_df1.offense_code.value_counts()

In [None]:
# Now, we have to handle the na values. We can either drop rows with na values or fill then with approx value
# As this dataset is very small, we can't afford losing that much data (almost 96 rows). So let's fill
# The na values with previous year value

# For that we have to follow the step

# 1. Sort the row by offense_code
# 2. use fillna in backfill mode to fill them with previous year values

#     OR

# 1. Simply fill them with mean of the data

# Let's move with bfill method
jvc_df2 = jvc_df1.sort_values("offense_code").fillna(method='bfill')

# Reseting the index of dataframe
jvc_df2.reset_index(inplace=True)

### Now, our data is ready for Analysis.
### We have filled the na values using bfill and dropped unwanted columns.
### Now, let's jump straight into EDA and Questions!!!!

In [None]:
# We forgot something! We just reseted the index and haven't dropped the old index
# PS: we forgot to remove the unwanted id column, let's do it.
jvc_df3 = jvc_df2.drop(["id", "index"], axis=1)

In [None]:
# Now, it's perfect. Let's take a look at the first 5 rows of dataframe
jvc_df3.head()

In [None]:
# Neat and clean! Now let's get into EDA. This time, no excuses!!

# Let's take a look at the various offense_names
print("Various crimes: ", jvc_df3["offense_name"].unique())
print("Total number of various crimes: ", jvc_df3["offense_name"].nunique())

In [None]:
# Hmm, there are 30 different crime categories. Let's take a look total males and females getting into crimes with respect to year
# Let's plot a lineplot.
male_in_crime = jvc_df3.groupby("year")["total_male"].agg([np.sum])
female_in_crime = jvc_df3.groupby("year")["total_female"].agg([np.sum])

%matplotlib inline

plt.figure(figsize=(10, 6))
sns.set_style('darkgrid')
sns.set_context('notebook', font_scale=1.05)

plt.title("Crime Trend: Male and Female")
sns.lineplot(x=male_in_crime.index, y=male_in_crime["sum"], label="Male", color='#eb3734')
sns.lineplot(x=female_in_crime.index, y=female_in_crime["sum"], label="Female", color="#3452eb")
plt.xlabel("Year")
plt.ylabel("Count")
plt.legend()

In [None]:
# Wow, that's a down trend. The total number of male and female under 17 involved in crime is getting low
# That's good. Juveniles involving in crime is decresing!!!

# Now let's check in what crime both sex invoves the most
crime_involves_male_the_most = jvc_df3.groupby(by=["offense_name", "year"])["total_male"].sum().groupby('offense_name').max().sort_values(ascending=False)
crime_involves_female_the_most = jvc_df3.groupby(by=["offense_name", "year"])["total_female"].sum().groupby('offense_name').max().sort_values(ascending=False)

plt.figure(figsize=(8, 10))
sns.barplot(y=crime_involves_male_the_most.index, x=crime_involves_male_the_most, color='#eb3734', label="Male", alpha=0.5)
sns.barplot(y=crime_involves_female_the_most.index, x=crime_involves_female_the_most, color='#3452eb', label="Female", alpha=0.5)
plt.xlabel("Number of crimes commited")
plt.legend()
plt.show()

In [None]:
# This barplot shows that Most common offences are Larency, Runaway, Simple assault and all other stuff.
# So suprised to see that Manslaughter, Rape and Gambilng are very low(Juveniles Ha!).

# Let's analyse crime that juveniles belonging to different races does
# We have white, black, asian_pacific_islander and american_indians.

jvc_df_race = jvc_df3.drop(["m_0_9", "f_0_9", "m_10_12", "f_10_12", "m_13_14", "f_13_14", "f_15", "m_15", "m_16", "f_16", "m_17", "f_17"], axis=1)
jvc_df_race.head()

In [None]:
# Let's plot the number of juveniles involoved in crime with respect to their race

race_data = jvc_df_race.groupby("year")[["white", "black", "asian_pacific_islander", "american_indian", "total_male", "total_female"]].sum()
race_data["total_juvenile_in_crime"] = race_data["total_male"] +  race_data["total_female"]
race_data.head()

In [None]:
sns.set_style('whitegrid')
sns.set_context('notebook', 1.05)
plt.figure(figsize=(10, 8))
colors = ["#cf1d4c", "#ffc000", "#1d38cf", "#1dcf35"]
for idx, race in enumerate(race_data.drop(["total_juvenile_in_crime", "total_male", "total_female"], axis=1)/100):
    sns.lineplot(x=race_data.index, y=(race_data.drop(["total_juvenile_in_crime", "total_male", "total_female"], axis=1)/100)[race], label=race, color=colors[idx])
plt.xlabel("Year")
plt.ylabel("Crime count")
plt.title("Crime count commited by different through years (Divided by 100)")

In [None]:
# This plot shows that at the beginning there we almost double white juvenile cases as compared to black juvenile cases
# But both of them have dropped drastically, which is awesome.
# The other two races,  asian_pacific_islander and american_indians, are relatively low in number as compared to juveniles from black and white race.
# It's because, Asian Parents, DUH😂😂

# Now let's explore juvenille cases on different age_groups. I believe we will find something interesting there.
# Let's create a new dataframe with features we need.
jvc_df_age = jvc_df3.drop(["population","offense_code", "race_agencies", "race_population", "white", "black", "asian_pacific_islander", "american_indian", "agencies"], axis=1)
jvc_df_age.head()

In [None]:
# Now that we dropped unwanted columns and created new dataframe with columns that we want, let's jump into Analysis

# First let's look at the age group of both males and female and observe the trend on number of juvenile cases per year
# Let's plot them using line plot

# jvc_df_age_15_16_17 = 
jvc_df_age.groupby("year").sum().drop(["total_male", "total_female"], axis=1).plot(kind="line", xlabel="Year", ylabel="Number of juveniles involved in crime", title="Juveniles involved in crime by year")

In [None]:
# Holy cow! That visualization look like scrap. Let's divide the data into groups and visulize it in much better way.
jvc_df_age_grouped = jvc_df_age.groupby("year").sum().drop(["total_male", "total_female"], axis=1)

jvc_age_0_9 = jvc_df_age_grouped[["m_0_9", "f_0_9"]]
jvc_age_10_12 = jvc_df_age_grouped[["m_10_12", "f_10_12"]]
jvc_age_13_14 = jvc_df_age_grouped[["m_13_14", "f_13_14"]]
jvc_age_15 = jvc_df_age_grouped[["m_15", "f_15"]]
jvc_age_16 = jvc_df_age_grouped[["m_16", "f_16"]]
jvc_age_17 = jvc_df_age_grouped[["m_17", "f_17"]]

In [None]:
# Let's plot the data that we just divided by age group. This time, let's use multiple plot or subplots!!
# age_grouped_dfs =[jvc_age_0_9, jvc_age_10_12, jvc_age_13_14, jvc_age_15, jvc_age_16, jvc_age_17]

# Create a new figure object for subplots
fig, axarr = plt.subplots(2, 3, figsize=(18, 12))
fig.tight_layout(pad=4.0)
fig.suptitle("Juveniles and Number of crime done by year", fontsize=25)
# Plotting each df manually, you can us a loop to automate this. But for now let's keep it simple
jvc_age_0_9.plot(kind="line", ax=axarr[0, 0], xlabel="Year",ylabel="Number of Juveniles", title="Age group 0-9")
jvc_age_10_12.plot(kind="line", ax=axarr[0, 1], xlabel="Year",ylabel="Number of Juveniles", title="Age group 10-12")
jvc_age_13_14.plot(kind="line", ax=axarr[0, 2], xlabel="Year",ylabel="Number of Juveniles", title="Age group 13-14")
jvc_age_15.plot(kind="line", ax=axarr[1, 0], xlabel="Year",ylabel="Number of Juveniles", title="Age group 15")
jvc_age_16.plot(kind="line", ax=axarr[1, 1], xlabel="Year",ylabel="Number of Juveniles", title="Age group 16")
jvc_age_17.plot(kind="line", ax=axarr[1, 2], xlabel="Year",ylabel="Number of Juveniles", title="Age group 17")
plt.show()

In [None]:
# Now it's much better. The visualization is clean and we can see trends clearly.
# All age group shows a down trend which is actually good. Number of crimes done by juveniles are getting low
# But why and how? Maybe better education? Food? Educated parents? Support from family?
# Commment down you opinion below 🙂

# Last and final, let's take a look at what kinda crime juveniles get into most through out these years by visualizing the data
# jvc_df3.groupby(["offense_name", "year"]).max()[["m_0_9", "m_10_12", ]].idxmax()

age_group_columns = ["m_0_9", "m_10_12", "m_13_14", "m_15", "m_16", "m_17", "f_0_9", "f_10_12", "f_13_14", "f_15", "f_16", "f_17"]
jvc_df3.groupby(["year", "offense_name"])[age_group_columns].sum().idxmax()

#### The table above shows the max crime commmited by each juvenile group through out year

#### From this table we can see that juveniles engage more on Larency, Runaway and all other offenses.
#### Now Our EDA is over. Let's get into Summary and Questions part.

# Summary

* Juvenile crimes were at peak during 1994 and 2000
* Throughout these year, we can see that number of juvenile crime is decreasing
* Juveniles who belongs to white race did approx 2x crime committed by juvenile belongs to black race (Number are influenced by population)
* Larency and Runaway are most popular among juveniles
* Age group of 0-9 did least number of crimes and generally males are involved in crime more than females
* Throughout the years, crimes committed by juveniles are decreasing but Age group of 13-17 had a little rise in committing crimes during 2005-2010, does this has anything to do with Great Recession in 2008? i don't know? search for that on the internet.
* Rape, Gambling and Manslaughtering are rare among the crimes.
* The decreasing crime is a green flag. Maybe the are getting good education and care so that they don't have to get into bad things to earn money.

# What I realised from this data?
**In most of the webseries/movies I watched, Black juveniles commits more crime as compared to White juveniles. But this data show that that's wrong. Also in these series and movies, they show heavy usage of drugs and selling of drugs, but suprisingly, juveniles arrested by drug related crimes are low as compared to Larency and Runaway. Which also changed my thought about US Juveniles.**

# What do you think? Did I made any mistake? What you think about the data and the insights?
**Comment down below😃**

# It's Question time!!!
**Let's answer some question and know what you learned from this data and how we can improve lives.**

1. In you opinion, what are the reasons for rapid decrease in juvenile in crime?
2. Is there anything you can suggest the Govt. to bring and end to these crimes?
3. What do you think about movies and web series framing black kids as criminals?
4. On scale of 0 to 10, how much are you gonna rate my notebook🤔🤔🤔?

# In the end
Once again, this is my first notebook. The might be alot of mistakes. I believe you guys will spot them and comment down below. Lemme know whether there's anything that I can do to make my notebooks better. Please Upvote my notebook if you like the way I represented it. Connect with me on [LinkedIn](https://www.linkedin.com/in/muhammed-rajab/).