# Exam on Artificial Neural Networks (ANN)

Welcome the Artificial Neural Networks (ANN) practical exam. In this exam, you will work on a classification task to predict the outcome of incidents involving buses. You are provided with a dataset that records breakdowns and delays in bus operations. Your task is to build, train, and evaluate an ANN model.

---

## Dataset Overview

### **Dataset:**
* Just run the command under the `Load Data` section to get the data downloaded and unzipped or you can access it [here](https://drive.google.com/file/d/1Flvj3qDkV2rPw7GGi5zOR-WGJgEBtRk-/view?usp=sharing)

### **Dataset Name:** Bus Breakdown and Delays

### **Description:**  
The dataset contains records of incidents involving buses that were either running late or experienced a breakdown. Your task is to predict whether the bus was delayed or had a breakdown based on the features provided.

### **Features:**
The dataset contains the following columns:

- `School_Year`
- `Busbreakdown_ID`
- `Run_Type`
- `Bus_No`
- `Route_Number`
- `Reason`
- `Schools_Serviced`
- `Occurred_On`
- `Created_On`
- `Boro`
- `Bus_Company_Name`
- `How_Long_Delayed`
- `Number_Of_Students_On_The_Bus`
- `Has_Contractor_Notified_Schools`
- `Has_Contractor_Notified_Parents`
- `Have_You_Alerted_OPT`
- `Informed_On`
- `Incident_Number`
- `Last_Updated_On`
- `Breakdown_or_Running_Late` (Target Column)
- `School_Age_or_PreK`

## Load Data

In [578]:
#https://drive.google.com/file/d/1Flvj3qDkV2rPw7GGi5zOR-WGJgEBtRk-/view?usp=sharing
!pip install gdown
!gdown --id 1Flvj3qDkV2rPw7GGi5zOR-WGJgEBtRk-

Failed to retrieve file url:

	Too many users have viewed or downloaded this file recently. Please
	try accessing the file again later. If the file you are trying to
	access is particularly large or is shared with many people, it may
	take up to 24 hours to be able to view or download the file. If you
	still can't access a file after 24 hours, contact your domain
	administrator.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id=1Flvj3qDkV2rPw7GGi5zOR-WGJgEBtRk-

but Gdown can't. Please check connections and permissions.


## Importing Libraries

In [579]:
import pandas as pd
from sklearn import preprocessing
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Exploratory Data Analysis (EDA)
This could include:
* **Inspect the dataset**

* **Dataset structure**

* **Summary statistics**

* **Check for missing values**

* **Distribution of features**

* **Categorical feature analysis**

* **Correlation matrix**

* **Outlier detection**

And add more as needed!

In [580]:
df = pd.read_csv('/content/Bus_Breakdown_and_Delays.csv')
df

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
0,2015-2016,1224901,Pre-K/EI,811,1,Other,C353,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,Bronx,...,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,,10/26/2015 08:40:39 AM,Running Late,Pre-K
1,2015-2016,1225098,Pre-K/EI,9302,1,Heavy Traffic,C814,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,Bronx,...,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,,10/27/2015 07:11:22 AM,Running Late,Pre-K
2,2015-2016,1215800,Pre-K/EI,358,2,Heavy Traffic,C195,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,Bronx,...,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,,09/18/2015 07:38:44 AM,Running Late,Pre-K
3,2015-2016,1215511,Pre-K/EI,331,2,Other,C178,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,Bronx,...,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,,09/17/2015 08:12:08 AM,Running Late,Pre-K
4,2015-2016,1215828,Pre-K/EI,332,2,Other,S176,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,Bronx,...,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,,09/18/2015 07:56:40 AM,Running Late,Pre-K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147967,2016-2017,1338452,Pre-K/EI,9345,2,Heavy Traffic,C530,04/05/2017 08:00:00 AM,04/05/2017 08:10:00 AM,Bronx,...,15-20,7,Yes,Yes,No,04/05/2017 08:10:00 AM,,04/05/2017 08:10:15 AM,Running Late,Pre-K
147968,2016-2017,1341521,Pre-K/EI,0001,5,Heavy Traffic,C579,04/24/2017 07:42:00 AM,04/24/2017 07:44:00 AM,Bronx,...,20 MINS,0,Yes,Yes,No,04/24/2017 07:44:00 AM,,04/24/2017 07:44:15 AM,Running Late,Pre-K
147969,2016-2017,1353044,Special Ed PM Run,GC0112,X928,Heavy Traffic,09003,05/25/2017 04:22:00 PM,05/25/2017 04:28:00 PM,Bronx,...,20-25MINS,0,Yes,Yes,Yes,05/25/2017 04:28:00 PM,90323827,05/25/2017 04:34:36 PM,Running Late,School-Age
147970,2016-2017,1353045,Special Ed PM Run,5525D,Q920,Won`t Start,24457,05/25/2017 04:27:00 PM,05/25/2017 04:30:00 PM,Queens,...,,0,Yes,Yes,No,05/25/2017 04:30:00 PM,,05/25/2017 04:30:07 PM,Breakdown,School-Age


In [581]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147972 entries, 0 to 147971
Data columns (total 21 columns):
 #   Column                           Non-Null Count   Dtype 
---  ------                           --------------   ----- 
 0   School_Year                      147972 non-null  object
 1   Busbreakdown_ID                  147972 non-null  int64 
 2   Run_Type                         147883 non-null  object
 3   Bus_No                           147972 non-null  object
 4   Route_Number                     147884 non-null  object
 5   Reason                           147870 non-null  object
 6   Schools_Serviced                 147972 non-null  object
 7   Occurred_On                      147972 non-null  object
 8   Created_On                       147972 non-null  object
 9   Boro                             141654 non-null  object
 10  Bus_Company_Name                 147972 non-null  object
 11  How_Long_Delayed                 126342 non-null  object
 12  Number_Of_Studen

In [582]:
df.head()

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
0,2015-2016,1224901,Pre-K/EI,811,1,Other,C353,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,Bronx,...,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,,10/26/2015 08:40:39 AM,Running Late,Pre-K
1,2015-2016,1225098,Pre-K/EI,9302,1,Heavy Traffic,C814,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,Bronx,...,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,,10/27/2015 07:11:22 AM,Running Late,Pre-K
2,2015-2016,1215800,Pre-K/EI,358,2,Heavy Traffic,C195,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,Bronx,...,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,,09/18/2015 07:38:44 AM,Running Late,Pre-K
3,2015-2016,1215511,Pre-K/EI,331,2,Other,C178,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,Bronx,...,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,,09/17/2015 08:12:08 AM,Running Late,Pre-K
4,2015-2016,1215828,Pre-K/EI,332,2,Other,S176,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,Bronx,...,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,,09/18/2015 07:56:40 AM,Running Late,Pre-K


In [583]:
df.tail()

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
147967,2016-2017,1338452,Pre-K/EI,9345,2,Heavy Traffic,C530,04/05/2017 08:00:00 AM,04/05/2017 08:10:00 AM,Bronx,...,15-20,7,Yes,Yes,No,04/05/2017 08:10:00 AM,,04/05/2017 08:10:15 AM,Running Late,Pre-K
147968,2016-2017,1341521,Pre-K/EI,0001,5,Heavy Traffic,C579,04/24/2017 07:42:00 AM,04/24/2017 07:44:00 AM,Bronx,...,20 MINS,0,Yes,Yes,No,04/24/2017 07:44:00 AM,,04/24/2017 07:44:15 AM,Running Late,Pre-K
147969,2016-2017,1353044,Special Ed PM Run,GC0112,X928,Heavy Traffic,09003,05/25/2017 04:22:00 PM,05/25/2017 04:28:00 PM,Bronx,...,20-25MINS,0,Yes,Yes,Yes,05/25/2017 04:28:00 PM,90323827.0,05/25/2017 04:34:36 PM,Running Late,School-Age
147970,2016-2017,1353045,Special Ed PM Run,5525D,Q920,Won`t Start,24457,05/25/2017 04:27:00 PM,05/25/2017 04:30:00 PM,Queens,...,,0,Yes,Yes,No,05/25/2017 04:30:00 PM,,05/25/2017 04:30:07 PM,Breakdown,School-Age
147971,2016-2017,1353046,Project Read PM Run,2530,K617,Other,21436,05/25/2017 04:36:00 PM,05/25/2017 04:37:00 PM,Brooklyn,...,45min,7,Yes,Yes,Yes,05/25/2017 04:37:00 PM,,05/25/2017 04:37:37 PM,Running Late,School-Age


In [584]:
df.sample()

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
29236,2015-2016,1249211,Special Ed AM Run,V71,X687,Heavy Traffic,10005,02/09/2016 06:25:00 AM,02/09/2016 06:26:00 AM,Bronx,...,10 minutes,0,Yes,No,No,02/09/2016 06:26:00 AM,,02/09/2016 06:26:30 AM,Running Late,School-Age


In [585]:
df.shape

(147972, 21)

In [586]:
df.describe()

Unnamed: 0,Busbreakdown_ID,Number_Of_Students_On_The_Bus
count,147972.0,147972.0
mean,1287779.0,3.590071
std,43243.38,55.365859
min,1212681.0,0.0
25%,1250438.0,0.0
50%,1287844.0,0.0
75%,1325191.0,4.0
max,1362605.0,9007.0


In [587]:
df.duplicated().sum()

0

In [588]:
df.isna().sum()

Unnamed: 0,0
School_Year,0
Busbreakdown_ID,0
Run_Type,89
Bus_No,0
Route_Number,88
Reason,102
Schools_Serviced,0
Occurred_On,0
Created_On,0
Boro,6318


## Data Preprocessing
This could include:

* **Handle Missing Values**
    * Impute missing values or drop them.

* **Encode Categorical Variables**
    * One-hot encoding
    * Label encoding

* **Scale and Normalize Data**
    * Standardization (Z-score)
    * Min-Max scaling

* **Feature Engineering**
    * Create new features
    * Feature selection

* **Handle Imbalanced Data**
    * Oversampling
    * Undersampling

* **Handle Outliers**
    * Remove outliers
    * Transform outliers

* **Remove Duplicates**
    * Remove redundant or duplicate data


And add more as needed!

Please treat these as suggestions. Feel free to use your judgment for the rest.

In [589]:
 df.drop(['Incident_Number','School_Year','Busbreakdown_ID','Route_Number','School_Age_or_PreK','Schools_Serviced'], axis=1, inplace=True)

In [590]:
df['Run_Type'] = df['Run_Type'].fillna(df['Run_Type'].mode().iloc[0])
df['Reason'] = df['Reason'].fillna(df['Reason'].mode().iloc[0])
df['Run_Type'] = df['Run_Type'].fillna(df['Run_Type'].mode().iloc[0])
df['Boro'] = df['Boro'].fillna(df['Boro'].mode().iloc[0])
df['Run_Type'] = df['Run_Type'].fillna(df['Run_Type'].mode().iloc[0])
df['How_Long_Delayed'] = df['How_Long_Delayed'].fillna(df['How_Long_Delayed'].mode().iloc[0])

In [591]:
df.isna().sum()

Unnamed: 0,0
Run_Type,0
Bus_No,0
Reason,0
Occurred_On,0
Created_On,0
Boro,0
Bus_Company_Name,0
How_Long_Delayed,0
Number_Of_Students_On_The_Bus,0
Has_Contractor_Notified_Schools,0


In [592]:
df_one_hot_encoded = pd.get_dummies(df, columns=['Run_Type', 'Boro', 'Reason', 'Bus_Company_Name'])

In [593]:
print(df.dtypes)

Run_Type                           object
Bus_No                             object
Reason                             object
Occurred_On                        object
Created_On                         object
Boro                               object
Bus_Company_Name                   object
How_Long_Delayed                   object
Number_Of_Students_On_The_Bus       int64
Has_Contractor_Notified_Schools    object
Has_Contractor_Notified_Parents    object
Have_You_Alerted_OPT               object
Informed_On                        object
Last_Updated_On                    object
Breakdown_or_Running_Late          object
dtype: object


In [594]:
df_one_hot_encoded

Unnamed: 0,Bus_No,Occurred_On,Created_On,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Last_Updated_On,...,"Bus_Company_Name_THOMAS BUSES, INC. (B2321",Bus_Company_Name_TWENTY FIRST AV TRANSP (B,Bus_Company_Name_VAN TRANS LLC,Bus_Company_Name_VAN TRANS LLC (B2192),Bus_Company_Name_VINNY`S BUS SERVICES (B23,Bus_Company_Name_Y & M TRANSIT CORP (B2192,Bus_Company_Name_Y & M TRANSIT CORP (B2192),Bus_Company_Name_Y & M TRANSIT CORP (B2321,Bus_Company_Name_alina,Bus_Company_Name_phillip bus service
0,811,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,10/26/2015 08:40:39 AM,...,False,False,False,False,False,False,False,False,False,False
1,9302,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,10/27/2015 07:11:22 AM,...,False,False,False,False,False,False,False,False,False,False
2,358,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,09/18/2015 07:38:44 AM,...,False,False,False,False,False,False,False,False,False,False
3,331,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,09/17/2015 08:12:08 AM,...,False,False,False,False,False,False,False,False,False,False
4,332,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,09/18/2015 07:56:40 AM,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147967,9345,04/05/2017 08:00:00 AM,04/05/2017 08:10:00 AM,15-20,7,Yes,Yes,No,04/05/2017 08:10:00 AM,04/05/2017 08:10:15 AM,...,False,False,False,False,False,False,False,False,False,False
147968,0001,04/24/2017 07:42:00 AM,04/24/2017 07:44:00 AM,20 MINS,0,Yes,Yes,No,04/24/2017 07:44:00 AM,04/24/2017 07:44:15 AM,...,False,False,False,False,False,False,False,False,False,False
147969,GC0112,05/25/2017 04:22:00 PM,05/25/2017 04:28:00 PM,20-25MINS,0,Yes,Yes,Yes,05/25/2017 04:28:00 PM,05/25/2017 04:34:36 PM,...,False,False,False,False,False,False,False,False,False,False
147970,5525D,05/25/2017 04:27:00 PM,05/25/2017 04:30:00 PM,20 MINS,0,Yes,Yes,No,05/25/2017 04:30:00 PM,05/25/2017 04:30:07 PM,...,False,False,False,False,False,False,False,False,False,False


In [595]:
bool_columns = [col for col in df_one_hot_encoded.columns if df_one_hot_encoded[col].dtype == 'bool']
for col in bool_columns:
    df_one_hot_encoded[col] = df_one_hot_encoded[col].astype(int)

In [596]:
df_one_hot_encoded

Unnamed: 0,Bus_No,Occurred_On,Created_On,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Last_Updated_On,...,"Bus_Company_Name_THOMAS BUSES, INC. (B2321",Bus_Company_Name_TWENTY FIRST AV TRANSP (B,Bus_Company_Name_VAN TRANS LLC,Bus_Company_Name_VAN TRANS LLC (B2192),Bus_Company_Name_VINNY`S BUS SERVICES (B23,Bus_Company_Name_Y & M TRANSIT CORP (B2192,Bus_Company_Name_Y & M TRANSIT CORP (B2192),Bus_Company_Name_Y & M TRANSIT CORP (B2321,Bus_Company_Name_alina,Bus_Company_Name_phillip bus service
0,811,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,10/26/2015 08:40:39 AM,...,0,0,0,0,0,0,0,0,0,0
1,9302,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,10/27/2015 07:11:22 AM,...,0,0,0,0,0,0,0,0,0,0
2,358,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,09/18/2015 07:38:44 AM,...,0,0,0,0,0,0,0,0,0,0
3,331,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,09/17/2015 08:12:08 AM,...,0,0,0,0,0,0,0,0,0,0
4,332,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,09/18/2015 07:56:40 AM,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147967,9345,04/05/2017 08:00:00 AM,04/05/2017 08:10:00 AM,15-20,7,Yes,Yes,No,04/05/2017 08:10:00 AM,04/05/2017 08:10:15 AM,...,0,0,0,0,0,0,0,0,0,0
147968,0001,04/24/2017 07:42:00 AM,04/24/2017 07:44:00 AM,20 MINS,0,Yes,Yes,No,04/24/2017 07:44:00 AM,04/24/2017 07:44:15 AM,...,0,0,0,0,0,0,0,0,0,0
147969,GC0112,05/25/2017 04:22:00 PM,05/25/2017 04:28:00 PM,20-25MINS,0,Yes,Yes,Yes,05/25/2017 04:28:00 PM,05/25/2017 04:34:36 PM,...,0,0,0,0,0,0,0,0,0,0
147970,5525D,05/25/2017 04:27:00 PM,05/25/2017 04:30:00 PM,20 MINS,0,Yes,Yes,No,05/25/2017 04:30:00 PM,05/25/2017 04:30:07 PM,...,0,0,0,0,0,0,0,0,0,0


In [598]:
df["Created_On"]= pd.to_datetime(df["Created_On"])
df["Occurred_On"]= pd.to_datetime(df["Occurred_On"])
df["Informed_On"]= pd.to_datetime(df["Informed_On"])
df["Last_Updated_On"]= pd.to_datetime(df["Last_Updated_On"])

In [599]:
df['Created_On_day'] = df['Created_On'].dt.day
df['Created_On_month'] = df['Created_On'].dt.month
df['Created_On_year'] = df['Created_On'].dt.year
df['Created_On_hour'] = df['Created_On'].dt.hour
df['Created_On_minute'] = df['Created_On'].dt.minute

In [600]:

df['Occurred_On_day'] = df['Occurred_On'].dt.day
df['Occurred_On_month'] = df['Occurred_On'].dt.month
df['Occurred_On_year'] = df['Occurred_On'].dt.year
df['Occurred_On_hour'] = df['Occurred_On'].dt.hour
df['Occurred_On_minute'] = df['Occurred_On'].dt.minute

In [601]:
df['Informed_On_day'] = df['Informed_On'].dt.day
df['Informed_On_month'] = df['Informed_On'].dt.month
df['Informed_On_year'] = df['Informed_On'].dt.year
df['Informed_On_hour'] = df['Informed_On'].dt.hour
df['Informed_On_minute'] = df['Informed_On'].dt.minute

In [602]:
df['Last_Updated_On_day'] = df['Last_Updated_On'].dt.day
df['Last_Updated_On_month'] = df['Last_Updated_On'].dt.month
df['Last_Updated_On_year'] = df['Last_Updated_On'].dt.year
df['Last_Updated_On_hour'] = df['Last_Updated_On'].dt.hour
df['Last_Updated_On_minute'] = df['Last_Updated_On'].dt.minute

In [603]:
df['Breakdown_or_Running_Late'].value_counts()

Unnamed: 0_level_0,count
Breakdown_or_Running_Late,Unnamed: 1_level_1
Running Late,130857
Breakdown,17115


In [604]:
label_encoder = LabelEncoder()
df['Has_Contractor_Notified_Schools'] = label_encoder.fit_transform(df['Has_Contractor_Notified_Schools'])
df['Has_Contractor_Notified_Parents'] = label_encoder.fit_transform(df['Has_Contractor_Notified_Parents'])
df['Have_You_Alerted_OPT'] = label_encoder.fit_transform(df['Have_You_Alerted_OPT'])
df['Breakdown_or_Running_Late'] = label_encoder.fit_transform(df['Breakdown_or_Running_Late'])

In [605]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147972 entries, 0 to 147971
Data columns (total 35 columns):
 #   Column                           Non-Null Count   Dtype         
---  ------                           --------------   -----         
 0   Run_Type                         147972 non-null  object        
 1   Bus_No                           147972 non-null  object        
 2   Reason                           147972 non-null  object        
 3   Occurred_On                      147972 non-null  datetime64[ns]
 4   Created_On                       147972 non-null  datetime64[ns]
 5   Boro                             147972 non-null  object        
 6   Bus_Company_Name                 147972 non-null  object        
 7   How_Long_Delayed                 147972 non-null  object        
 8   Number_Of_Students_On_The_Bus    147972 non-null  int64         
 9   Has_Contractor_Notified_Schools  147972 non-null  int64         
 10  Has_Contractor_Notified_Parents  147972 non-

In [606]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().set_output(transform="pandas")

## Split the Dataset
Next, split the dataset into training, validation, and testing sets.

In [607]:
from sklearn.model_selection import train_test_split

In [608]:
X = df.drop('Breakdown_or_Running_Late', axis=1)
y = df['Breakdown_or_Running_Late']

In [609]:
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=42)

In [615]:
print(X_train.dtypes)
print(X_test.dtypes)

Run_Type                                   object
Bus_No                                     object
Reason                                     object
Occurred_On                        datetime64[ns]
Created_On                         datetime64[ns]
Boro                                       object
Bus_Company_Name                           object
How_Long_Delayed                           object
Number_Of_Students_On_The_Bus               int64
Has_Contractor_Notified_Schools             int64
Has_Contractor_Notified_Parents             int64
Have_You_Alerted_OPT                        int64
Informed_On                        datetime64[ns]
Last_Updated_On                    datetime64[ns]
Created_On_day                              int32
Created_On_month                            int32
Created_On_year                             int32
Created_On_hour                             int32
Created_On_minute                           int32
Occurred_On_day                             int32


## Building the ANN Model
In this section, define the architecture of the ANN by specifying the number of layers, neurons, and activation functions.

In [610]:
import keras
from keras import layers
from keras import ops
model =Sequential()
model.add(Dense(units=6, input_dim=X_train.shape[1], kernel_initializer='uniform',activation='relu')),
model.add(Dense(units=6, activation='relu')),
model.add(Dense(units=1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Compile the Model
Compile the ANN model by defining the optimizer, loss function, and evaluation metrics.

In [611]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=['accuracy'],
)

In [612]:
model.summary()

## Training the Model
Train the ANN model using the training data.

In [613]:
model.fit(X_train, y_train, epochs=50, batch_size=20, validation_split=0.2)

ValueError: Invalid dtype: datetime64[ns]

## Evaluate the Model
Evaluate the performance of the model on the test set.

In [614]:
model.evaluate(X_test, y_test)

ValueError: Invalid dtype: datetime64[ns]

## Make Predictions
Use the trained model to make predictions on new or unseen data.

## Model Performance Visualization
Visualize the performance metrics such as accuracy and loss over the epochs.

## Save the Model
Save the trained model for submission.

## Project Questions:

1. **Data Preprocessing**: Explain why you chose your specific data preprocessing techniques (e.g., normalization, encoding). How did these techniques help prepare the data for training the model?
2. **Model Architecture**: Describe the reasoning behind your model’s architecture (e.g., the number of layers, type of layers, number of neurons, and activation functions). Why did you believe this architecture was appropriate for the problem at hand?
3. **Training Process**: Discuss why you chose your batch size, number of epochs, and optimizer. How did these choices affect the training process? Did you experiment with different values, and what were the outcomes?
4. **Loss Function and Metrics**: Why did you choose the specific loss function and evaluation metrics? How do they align with the objective of the task (e.g., regression vs classification)?
5. **Regularization Techniques**: If you used regularization techniques such as dropout or weight decay, explain why you implemented them and how they influenced the model's performance.
6. **Model Evaluation**: Justify your approach to evaluating the model. Why did you choose the specific performance metrics, and how do they reflect the model's success in solving the task?
7. **Model Tuning (If Done)**: Describe any tuning you performed (e.g., hyperparameter tuning) and why you felt it was necessary. How did these adjustments improve model performance?
8. **Overfitting and Underfitting**: Analyze whether the model encountered any overfitting or underfitting during training. What strategies could you implement to mitigate these issues?

### Answer Here: