# ***# Project Name: EDA on  "Punjab Crime Stats | 2002 - 2015"***

### Team Mentor: Muhammad Abdul Raheem

### Team Members:

* Muhammad Umer Mayal
* Mohsin Abbas
* Ms. Uniba
* Ibrahim Azeem
* Masood Ullah
* Amir Raja
* Ali Khan

* Punjab Crime Stats | 2002 - 2015


- The data presented here is unique, highlighting the crimes in Punjab province from 2002 to 2015. It tracks violations of law and order regulations committed by the citizens of Punjab Pakistan in all divisions. As a result of these data, we will be able to overview all the crimes committed and observe what particular situations occurred in that described area.
  
- By analyzing crime patterns across Punjab, we will try to identify what caused people to commit those crimes. Moreover, our team will investigate hidden patterns to determine if a particular crime occurs at a particular time of year. This will allow us to assist the police in maintaining law and order in the area by enforcing the necessary parameters to prevent such incidents. As a result, the Punjab Police will be able to maintain Law and Order in a better way and will ensure its citizens' safety.
  
## ***Dataset:***

You can download the above mentioned dataset from the given link.
https://www.kaggle.com/datasets/sayyazahmad/punjab-crime-stats-2002-2015

### 1. Importing Librbaries

In [3]:
# pip install researchpy


In [2]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import researchpy as rp
from warnings import filterwarnings
filterwarnings('ignore')

ModuleNotFoundError: No module named 'researchpy'

### 2. Read the dataset

In [None]:
df=pd.read_csv('dataset.csv')
df.head()

: 

In [None]:
df.tail()

: 

### 3. Basic Information about the dataset

In [None]:
df.info()

: 

### 4. Checking for null values

In [None]:
df.isnull().sum()

: 

### 5. Types Casting

In [None]:
df['Population']=df['Population'].astype('int64')
df.info()

: 

### 5. Description of Dataset

In [None]:
df.describe()

: 

### 6. Unique Values

In [None]:
df.nunique()

: 

### 7. Values Count on Divisions

In [None]:
df['Division'].value_counts()

: 

### 8. Total Crime Count

In [None]:
df['CrimeType'].value_counts().head(10)

: 

In [None]:
df.duplicated().sum()

: 

In [None]:
numeric_df=df[['CrimeCount','Population']]
categoric_df=df[['Division','District','CrimeType','Year']]

: 

In [None]:
numeric_df.head()

: 

In [None]:
categoric_df.head()

: 

**Data Visualaziation / Insights**
---
Univariate Analysis:<br>
 ---
 
**Checking the normal distribution**<br>


**Interpetation**

The column is left skewed or positively skewed and have less skewness.

## Skewness and Kurtosis before and After Normalization

In [None]:
sns.histplot(x='Population',data=numeric_df)
print('Skewness and Kurtosis Before Normalization')
df['CrimeCount'].skew(),df['Population'].kurtosis()

: 

In [None]:
cbrt_transformation = np.cbrt(df['CrimeCount'])
sns.histplot(cbrt_transformation)

: 

In [None]:
Skewness,Kurtosis=cbrt_transformation.skew(),cbrt_transformation.kurtosis()
print(' Skewness after Normalization =', Skewness)
print(' Kurtosis after Normalization =', Kurtosis)

: 

### Histogram of Orignal and Transformed Data

In [None]:
#define grid of plots
fig, axs = plt.subplots(nrows=1, ncols=2)

#create histograms
axs[0].hist(df['Population'], edgecolor='black')
axs[1].hist(cbrt_transformation, edgecolor='black')
#add title to each histogram
axs[0].set_title('Original Data')
axs[1].set_title('Transformed Data')

: 

In [None]:
sns.histplot(x='CrimeCount',data=df)
print('Skewness and Kurtosis Before Normalization')
df['CrimeCount'].skew(),df['Population'].kurtosis()

: 

## Log Transformation

In [None]:
log_transformation=np.log(df['CrimeCount'])
sns.histplot(log_transformation)

: 

In [None]:
Skewness_Crime,Kurtosis_Crime=log_transformation.skew(),log_transformation.kurtosis()
print(' Skewness after Normalization =', Skewness_Crime,'/0')
print(' Kurtosis after Normalization =', Kurtosis_Crime,'/0')

: 

***Bivariate Analysis:***
  ---

Bivariate analysis is stated to be an analysis of any concurrent relation between two variables or attributes.

In [None]:
rp.summary_cont(df['CrimeCount'].groupby(df['Population']))

: 

In [None]:
df['Division'].value_counts()

: 

In [None]:
agg=df.groupby(['Year', 'CrimeType'])['CrimeCount'].sum().unstack().fillna(0)
agg

: 

In [None]:
#Pandas Stacked Bar Charts
from matplotlib import pyplot as plt

# Very simple one-liner using our agg_tips DataFrame.
agg.plot(kind='bar', stacked=True)

# Just add a title and rotate the x-axis labels to be horizontal.
plt.title('Crime Count Based On Year')
plt.xlabel("Year")
plt.ylabel("CrimeCount")
plt.xticks(rotation=0, ha='center')

: 

In [None]:
#Approach to find causes and relation pearson law
correlations = df.corr(method='pearson')
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(correlations, cmap="coolwarm")
plt.show()

: 

### ANOVA

In [None]:
#Conduct the one-way ANOVA:
# Importing library
from scipy.stats import f_oneway
 # Performance when each of the engine
# oil is applied
stats1 =df["CrimeCount"]
stats2 =df["Population"]
# Conduct the one-way ANOVA
f_oneway(stats1,stats2)

: 

## Apply general statistic for probability distribution

In [None]:
#Apply general statistic for probability distribution
import numpy as np
import matplotlib.pyplot as plt
from math import ceil, floor, sqrt

def pdf(x, mu=0, sigma=1):
    """
    Calculates the normal distribution's probability density 
    function (PDF).  

    """
    term1 = 1.0 / ( sqrt(2*np.pi) * sigma )
    term2 = np.exp( -0.5 * ( (x-mu)/sigma )**2 )
    return term1 * term2


# Drawing sample date poi
##################################################

# Random Gaussian data (mean=0, stdev=5)
df = np.random.normal(loc=0, scale=5.0, size=30)
df = np.random.normal(loc=2, scale=7.0, size=30)
df.sort(), df.sort()

min_val = floor(min(df+df))
max_val = ceil(max(df+df))

##################################################




fig = plt.gcf()
fig.set_size_inches(12,11)

# Cumulative distributions, stepwise:
plt.subplot(2,2,1)
plt.step(np.concatenate([df, df[[-1]]]), np.arange(df.size+1), label='$\mu=0, \sigma=5$')
plt.step(np.concatenate([df, df[[-1]]]), np.arange(df.size+1), label='$\mu=2, \sigma=7$') 

plt.title('30 samples from a random Gaussian distribution (cumulative)')
plt.ylabel('Count')
plt.xlabel('X-value')
plt.legend(loc='upper left')
plt.xlim([min_val, max_val])
plt.ylim([0, df.size+1])
plt.grid()

# Cumulative distributions, smooth:
plt.subplot(2,2,2)

plt.plot(np.concatenate([df, df[[-1]]]), np.arange(df.size+1), label='$\mu=0, \sigma=5$')
plt.plot(np.concatenate([df, df[[-1]]]), np.arange(df.size+1), label='$\mu=2, \sigma=7$') 

plt.title('30 samples from a random Gaussian (cumulative)')
plt.ylabel('Count')
plt.xlabel('X-value')
plt.legend(loc='upper left')
plt.xlim([min_val, max_val])
plt.ylim([0, df.size+1])
plt.grid()


# Probability densities of the sample points function
plt.subplot(2,2,3)

pdf1 = pdf(df, mu=0, sigma=5)
pdf2 = pdf(df, mu=2, sigma=7)
plt.plot(df, pdf1, label='$\mu=0, \sigma=5$')
plt.plot(df, pdf2, label='$\mu=2, \sigma=7$')

plt.title('30 samples from a random Gaussian')
plt.legend(loc='upper left')
plt.xlabel('X-value')
plt.ylabel('probability density')
plt.xlim([min_val, max_val])
plt.grid()


# Probability density function
plt.subplot(2,2,4)

x = np.arange(min_val, max_val, 0.05)

pdf1 = pdf(x, mu=0, sigma=5)
pdf2 = pdf(x, mu=2, sigma=7)
plt.plot(x, pdf1, label='$\mu=0, \sigma=5$')
plt.plot(x, pdf2, label='$\mu=2, \sigma=7$')

plt.title('PDFs of Gaussian distributions')
plt.legend(loc='upper left')
plt.xlabel('X-value')
plt.ylabel('probability density')
plt.xlim([min_val, max_val])
plt.grid()

plt.show()

: 

In [None]:
df1=pd.read_csv("dataset.csv")
df1.columns

: 

### Distplot

In [None]:
#distplot
sns.distplot(df1['CrimeCount'])

: 

In [None]:
sns.distplot(df1['CrimeCount'], kde=False, bins=10)

: 

### Joint plot

In [None]:
#joint plot
sns.jointplot(x='Year', y='CrimeCount', data=df1)

: 

In [None]:
sns.jointplot(x='Year', y='CrimeCount', data=df1, kind='hex')

: 

In [None]:
sns.rugplot(df1['CrimeCount'])


: 

In [None]:
df1.columns

: 

### Boxplot

In [None]:
sns.boxplot(x='Year', y='Population', data=df1)

: 

### Violin Plot

In [None]:
sns.violinplot(x='Year', y='Division', data=df1)

: 

### Stirplot

In [None]:
sns.stripplot(x='Division', y='Year', data=df1)

: 

### Displot

In [None]:
sns.displot(data = df1, kind = 'kde', x = 'Year', hue = 'Division', height = 5, aspect = 1.75, palette = 'flare')

: 

In [None]:
df = pd.read_csv("Punjab-crime-stats.csv")

: 

In [None]:
import plotly_express as px
df2 = df[df.CrimeType != "All Reported"]
df2.head()
df3=df2[df2["Division"]=="Sargodha"]

: 

In [None]:
import seaborn as sns
sns.set_theme(style="darkgrid")

sns.set_theme(style="ticks", palette="pastel")

sns.boxplot(x="District", y="CrimeCount",
            hue="Division", palette=["m", "g"],
            data=df3)
sns.despine(offset=10, trim=True)

: 

In [None]:
# Plot the responses for different events and regions
sns.lineplot(x="CrimeCount", y="CrimeType",
             hue="District", style="Year",
             data=df3)

: 

In [None]:
#cases of murders reported from 2002 to 2014?
sns.lineplot(x="Year", y="CrimeCount", hue=["CrimeCount"]=="Murder", data=df3)

: 

In [None]:
#Ratio of crimes in each districts.
import plotly.express as px
fig = px.bar(df3, x="District", y="CrimeCount", color="CrimeType", barmode="group")
fig.show()

: 

In [None]:
#number of cases
import plotly.express as px
data = df

: 

### Funnel Plot

In [None]:
fig = px.funnel(df, x='CrimeCount', y='CrimeType')
fig.show()

: 

### Pairplot

In [None]:
import seaborn as sns
sns.set_theme(style="ticks")
sns.pairplot(df3, hue="District")

: 

____________________________________________________________________________________________________________________