# <center>Traffic Situation Prediction</center>

## Importing the libraries

1. Imports the NumPy package and assigns it the alias np. NumPy is a library for numerical computations in Python.

2. Imports the pandas package and assigns it the alias pd. pandas is a library for data manipulation and analysis.

3. Imports the MinMaxScaler and LabelEncoder classes from the sklearn.preprocessing module. These classes are used for data preprocessing tasks in machine learning.

4. Imports the train_test_split function from the sklearn.model_selection module. This function is used to split data into training and testing sets.

5.  Imports the DecisionTreeRegressor class from the sklearn.tree module. This class is used to create a decision tree regression model.

6. Imports the accuracy_score, precision_score, f1_score, and recall_score functions from the sklearn.metrics module. These functions are used to evaluate the performance of machine learning models.

7. Imports the joblib module. joblib is a library used for saving and loading machine learning models.

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

from sklearn.metrics import accuracy_score, precision_score, f1_score, recall_score

from sklearn.tree import DecisionTreeRegressor

import joblib


## Importing Dataset

This dataset contains traffic data collected from various roads at different times. It includes information about the number of vehicles (cars, bikes, buses, and trucks) passing through specific roads, as well as the overall traffic situation.

The dataset consists of the following columns:

Time: The time of the recorded traffic data.

Date: The date of the recorded traffic data.

Day of week: The day of the week when the data was collected.

CarRoad1: The number of cars passing through Road 1.

CarRoad2: The number of cars passing through Road 2.

BikeRoad1: The number of bikes passing through Road 1.

BikeRoad2: The number of bikes passing through Road 2.

BusRoad1: The number of buses passing through Road 1.

BusRoad2: The number of buses passing through Road 2.

TruckRoad1: The number of trucks passing through Road 1.

TruckRoad2: The number of trucks passing through Road 2.

TotalRoad1: The total number of vehicles passing through Road 1.

TotalRoad2: The total number of vehicles passing through Road 2.

Traffic Situation: The overall traffic situation, categorized as "BothSideLow".

In [3]:
df = pd.read_csv('total_data.csv')

In [4]:
df.head()

Unnamed: 0,Time,Date,Day of week,CarRoad1,CarRoad2,BikeRoad1,BikeRoad2,BusRoad1,BusRoad2,TruckRoad1,TruckRoad2,TotalRoad1,TotalRoad2,Traffic Situation
0,12:00:00 AM,10,Tuesday,52,27,0,2,6,2,17,50,75,81,BothSideLow
1,12:15:00 AM,10,Tuesday,64,29,1,3,4,2,14,72,83,106,BothSideLow
2,12:30:00 AM,10,Tuesday,61,17,2,2,5,3,25,64,93,86,BothSideLow
3,12:45:00 AM,10,Tuesday,75,21,1,2,2,2,33,56,111,81,BothSideLow
4,1:00:00 AM,10,Tuesday,75,22,8,3,15,3,32,54,130,82,BothSideLow


### columns

In [5]:
df.columns

Index(['Time', 'Date', 'Day of week', 'CarRoad1', 'CarRoad2', 'BikeRoad1',
       'BikeRoad2', 'BusRoad1', 'BusRoad2', 'TruckRoad1', 'TruckRoad2',
       'TotalRoad1', 'TotalRoad2', 'Traffic Situation'],
      dtype='object')

### Creating `Midday` column 

In [6]:
df['midday'] = ''  

for i in range(len(df['Time'])):
    
    if df['Time'][i][-2:] == 'AM':
        df.loc[i, 'midday'] = 'AM'
        
    elif df['Time'][i][-2:] == 'PM':
        df.loc[i, 'midday'] = 'PM'
        
# removing 'AM' or 'PM' form Time column        
df['Time'] = df['Time'].str[:-2]

### Changing the Time column from `H:M:S` format to `Second`

In [7]:
df['Time'] = pd.to_datetime(df['Time']).dt.hour * 3600 + \
                     pd.to_datetime(df['Time']).dt.minute * 60 + \
                     pd.to_datetime(df['Time']).dt.second

  df['Time'] = pd.to_datetime(df['Time']).dt.hour * 3600 + \
  pd.to_datetime(df['Time']).dt.minute * 60 + \
  pd.to_datetime(df['Time']).dt.second


In [8]:
df['Traffic Situation'].value_counts()

Traffic Situation
BothSideNormal    898
BothSideLow       792
BothSideHeavy     254
Side1High         238
Side1Heavy        236
BothSideHigh      226
Side2Heavy        128
Side2High          95
Side1Normal        48
Side2Normal        14
Name: count, dtype: int64

In [9]:
df.head()

Unnamed: 0,Time,Date,Day of week,CarRoad1,CarRoad2,BikeRoad1,BikeRoad2,BusRoad1,BusRoad2,TruckRoad1,TruckRoad2,TotalRoad1,TotalRoad2,Traffic Situation,midday
0,43200,10,Tuesday,52,27,0,2,6,2,17,50,75,81,BothSideLow,AM
1,44100,10,Tuesday,64,29,1,3,4,2,14,72,83,106,BothSideLow,AM
2,45000,10,Tuesday,61,17,2,2,5,3,25,64,93,86,BothSideLow,AM
3,45900,10,Tuesday,75,21,1,2,2,2,33,56,111,81,BothSideLow,AM
4,3600,10,Tuesday,75,22,8,3,15,3,32,54,130,82,BothSideLow,AM


## Data Preprocessing

In [10]:
# Separate the features and target variable
features = df.drop(['Traffic Situation'], axis=1)
target = df['Traffic Situation']

# Normalize the numeric features using MinMaxScaler
numeric_columns = ['Time', 'Date']
scaler = MinMaxScaler()
features[numeric_columns] = scaler.fit_transform(features[numeric_columns])

# Encode the categorical feature 'midday' using LabelEncoder
le = LabelEncoder()
features['midday'] = le.fit_transform(features['midday'])
features['Day of week'] = le.fit_transform(features['Day of week'])

# Encode the target variable 'Traffic Situation' using LabelEncoder
le_target = LabelEncoder()
target = le_target.fit_transform(target)

# Concatenate the features and target variable
normalized_encoded_data = pd.concat([features, pd.Series(target, name='Traffic Situation')], axis=1)

In [11]:
df = normalized_encoded_data
traffic_df = df.drop(['CarRoad1', 'CarRoad2', 'BikeRoad1', 'BikeRoad2', 'BusRoad1', 'BusRoad2', 'TruckRoad1', 'TruckRoad2', 'TotalRoad1', 'TotalRoad2'], axis=1)

In [12]:
traffic_df

Unnamed: 0,Time,Date,Day of week,midday,Traffic Situation
0,0.936170,0.300000,5,0,2
1,0.957447,0.300000,5,0,2
2,0.978723,0.300000,5,0,2
3,1.000000,0.300000,5,0,2
4,0.000000,0.300000,5,0,2
...,...,...,...,...,...
2971,0.829787,0.266667,4,1,2
2972,0.851064,0.266667,4,1,2
2973,0.872340,0.266667,4,1,2
2974,0.893617,0.266667,4,1,2


### Data Spliting to Training set and Testing set

In [13]:
X = traffic_df.drop(['Traffic Situation'], axis=1)
y = traffic_df['Traffic Situation']


In [14]:
X

Unnamed: 0,Time,Date,Day of week,midday
0,0.936170,0.300000,5,0
1,0.957447,0.300000,5,0
2,0.978723,0.300000,5,0
3,1.000000,0.300000,5,0
4,0.000000,0.300000,5,0
...,...,...,...,...
2971,0.829787,0.266667,4,1
2972,0.851064,0.266667,4,1
2973,0.872340,0.266667,4,1
2974,0.893617,0.266667,4,1


In [15]:
y

0       2
1       2
2       2
3       2
4       2
       ..
2971    2
2972    2
2973    2
2974    2
2975    2
Name: Traffic Situation, Length: 2976, dtype: int64

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.1, random_state=42)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((2678, 4), (298, 4), (2678,), (298,))

## Training Model

In [17]:
model = DecisionTreeRegressor().fit(X_train, y_train)

In [18]:
y_pred = model.predict(X_test)

### Evaluating Model

In [19]:
print('Accuracy: ', accuracy_score(y_pred, y_test))
print('precision: ', precision_score(y_pred, y_test, average='macro'))
print('recall: ', recall_score(y_pred, y_test, average='macro'))
print('F1 Score: ', f1_score(y_pred, y_test, average='macro'))

Accuracy:  0.587248322147651
precision:  0.3239325326090032
recall:  0.31916774755265714
F1 Score:  0.320499754929666


### Saving Trained Model

In [20]:
joblib.dump(model, 'after15Situation.joblib')

['after15Situation.joblib']