# **CONGESTION PREDICTION MODEL USING TREE ENSEMBLE**




## **Install Library/Dependencies** ##


---

*Note : We will be using the traffic_processed.csv file which you can get in the GitHub Page of this project*

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd


Below we will engineer the target feature y

In [None]:
df = pd.read_csv('traffic_processed.csv')

df['future_speed'] = df['current_speed'].shift(-3)
df['will_congest_change'] = (df['future_speed'] < 30).astype(int)

df.dropna(subset=['future_speed'],inplace=True)

Since features tend to be more basic, we can engineer new, more relevant features

In [None]:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['speed_diff'] = df['current_speed'].diff()

Below we will One Hot Encode the "road" feature, due to it being categorical data in string format

In [None]:
encoder = OneHotEncoder(sparse_output=False)
encoded = encoder.fit_transform(df[['road']])
encoded_df = pd.DataFrame(encoded,columns=encoder.get_feature_names_out(['road']))
df= pd.concat([df.drop('road',axis=1), encoded_df], axis=1)


We will now set both labels(X,y)

In [None]:
features = ['current_speed',
            'hour',
            'congestion_percent',
            'day_of_week',
            'speed_diff',
            'road_Boulevard',
            'road_Pettarani',
            'road_Sultan Alauddin',
            'rush_hour',
            ]
X = df[features]
y = df['will_congest_change']

For accuracy purposes of the model, we will split the data set into 2 parts, the training set which consists of 80% of the data set, and the test set which consists of 20% of the data set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

We will finally train the model in this code block below, training will only use the train set(80% of the dataset)

In [None]:
classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train, y_train)

Once the model is done training we will now move onto the prediction, which will also include the accuracy report(F1 Score)

In [None]:
y_pred = classifier.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.90      0.90      0.90        52
           1       0.97      0.97      0.97       193

    accuracy                           0.96       245
   macro avg       0.94      0.94      0.94       245
weighted avg       0.96      0.96      0.96       245



As you can see here the model predicted congestion levels in the next 15 minutes with a pretty good accuracy score