Part 1 and Part 2 crimes are classifications used by law enforcement agencies, including the Los Angeles Police Department (LAPD), to categorize different types of crimes.

# Part 1 Crimes:


 Part 1 crimes, also known as index crimes, are considered serious offenses and are tracked by the FBI's Uniform Crime Reporting (UCR) program.Part 1 crimes have a higher priority for law enforcement. These crimes are divided into two main categories:


a. Violent Crimes: Violent crimes involve the use or threat of force against a person. Examples of violent crimes include homicide, rape, robbery, and aggravated assault.

b. Property Crimes: Property crimes involve the unlawful taking or destruction of someone's property. Examples of property crimes include burglary, theft, motor vehicle theft, and arson.

# Part 2 Crimes:


 Part 2 crimes encompass a broader range of offenses that are not classified as Part 1 crimes. Part 2 crimes include less serious offenses or violations that may not fall under the UCR program's specific criteria for Part 1 crimes. These crimes can vary by jurisdiction but typically include offenses such as simple assault, fraud, drug offenses, vandalism, disorderly conduct, and other non-index crimes.

# Importing relevant libraries: 

In [1]:
import pandas as pd
import numpy as np
import pickle
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

In [2]:
# ignore warnings
from warnings import filterwarnings
filterwarnings('ignore')


In [3]:
data=pd.read_csv("C:/Users/skris/Downloads/Crime_Data_from_2020_to_Present.csv")


In [5]:
X = data[['AREA', 'Vict Age', 'Vict Sex', 'Vict Descent', 'Premis Cd', 'Status']].copy()
y = data['Part 1-2'].copy()

# Handle necessary null values in X if any
X.fillna(X.mean(), inplace=True)

# Applying Logistic Regression:

In [6]:

# Encode categorical variables
categorical_features = ['AREA', 'Vict Sex', 'Vict Descent', 'Premis Cd', 'Status']
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
X_encoded = pd.DataFrame(encoder.fit_transform(X[categorical_features]))
X_encoded.columns = encoder.get_feature_names(categorical_features)

# Concatenate encoded categorical variables with numerical variables
X_final = pd.concat([X_encoded, X.select_dtypes(include='number')], axis=1)

# Encode the response variable
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_final, y_encoded, test_size=0.2, random_state=42)

# Create and train the logistic regression classifier
clf = LogisticRegression()
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Decode the predicted labels
y_pred_decoded = label_encoder.inverse_transform(y_pred)

# Decode the true labels for evaluation
y_test_decoded = label_encoder.inverse_transform(y_test)

# Evaluate the classifier
print(classification_report(y_test_decoded, y_pred_decoded))

              precision    recall  f1-score   support

           1       0.74      0.80      0.77     86289
           2       0.68      0.61      0.64     61649

    accuracy                           0.72    147938
   macro avg       0.71      0.70      0.70    147938
weighted avg       0.72      0.72      0.72    147938



# Using Random Forest Classifier:

In [7]:
# Assuming your dataset is stored in a pandas DataFrame called 'data'
# Separate predictor variables (X) and response variable (y)
#X = data[['AREA', 'Vict Age', 'Vict Sex', 'Vict Descent', 'Premis Cd', 'Status']].copy()
#y = data['Part 1-2'].copy()

# Handle necessary null values in X if any
#X.fillna(X.mean(), inplace=True)

# Convert categorical variables to numerical using one-hot encoding
X_encoded = pd.get_dummies(X, drop_first=True)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# Create and train the random forest classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Evaluate the classifier
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           1       0.75      0.77      0.76     86289
           2       0.67      0.65      0.66     61649

    accuracy                           0.72    147938
   macro avg       0.71      0.71      0.71    147938
weighted avg       0.72      0.72      0.72    147938



# By Decision Tree Classifier:

In [8]:
# Create and train the decision tree classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Evaluate the classifier
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           1       0.73      0.78      0.76     86289
           2       0.66      0.60      0.63     61649

    accuracy                           0.71    147938
   macro avg       0.70      0.69      0.70    147938
weighted avg       0.71      0.71      0.71    147938

