# Exam on Artificial Neural Networks (ANN)

Welcome the Artificial Neural Networks (ANN) practical exam. In this exam, you will work on a classification task to predict the outcome of incidents involving buses. You are provided with a dataset that records breakdowns and delays in bus operations. Your task is to build, train, and evaluate an ANN model.

---

## Dataset Overview

### **Dataset:**
* Just run the command under the `Load Data` section to get the data downloaded and unzipped or you can access it [here](www.kaggle.com/datasets/khaledzsa/bus-breakdown-and-delays)

### **Dataset Name:** Bus Breakdown and Delays

### **Description:**  
The dataset contains records of incidents involving buses that were either running late or experienced a breakdown. Your task is to predict whether the bus was delayed or had a breakdown based on the features provided.

### **Features:**
The dataset contains the following columns:

- `School_Year`
- `Busbreakdown_ID`
- `Run_Type`
- `Bus_No`
- `Route_Number`
- `Reason`
- `Schools_Serviced`
- `Occurred_On`
- `Created_On`
- `Boro`
- `Bus_Company_Name`
- `How_Long_Delayed`
- `Number_Of_Students_On_The_Bus`
- `Has_Contractor_Notified_Schools`
- `Has_Contractor_Notified_Parents`
- `Have_You_Alerted_OPT`
- `Informed_On`
- `Incident_Number`
- `Last_Updated_On`
- `Breakdown_or_Running_Late` (Target Column)
- `School_Age_or_PreK`

## Load Data

In [1]:
!kaggle datasets download -d khaledzsa/bus-breakdown-and-delays
!unzip bus-breakdown-and-delays.zip

Dataset URL: https://www.kaggle.com/datasets/khaledzsa/bus-breakdown-and-delays
License(s): unknown
Downloading bus-breakdown-and-delays.zip to /content
  0% 0.00/4.75M [00:00<?, ?B/s]
100% 4.75M/4.75M [00:00<00:00, 190MB/s]
Archive:  bus-breakdown-and-delays.zip
  inflating: Bus_Breakdown_and_Delays.csv  


## Importing Libraries

In [2]:
import pandas as pd

## Exploratory Data Analysis (EDA)
This could include:
* **Inspect the dataset**

* **Dataset structure**

* **Summary statistics**

* **Check for missing values**

* **Distribution of features**

* **Categorical feature analysis**

* **Correlation matrix**

* **Outlier detection**

And add more as needed!

In [3]:
df=pd.read_csv('Bus_Breakdown_and_Delays.csv')

In [4]:
df.head()

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
0,2015-2016,1224901,Pre-K/EI,811,1,Other,C353,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,Bronx,...,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,,10/26/2015 08:40:39 AM,Running Late,Pre-K
1,2015-2016,1225098,Pre-K/EI,9302,1,Heavy Traffic,C814,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,Bronx,...,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,,10/27/2015 07:11:22 AM,Running Late,Pre-K
2,2015-2016,1215800,Pre-K/EI,358,2,Heavy Traffic,C195,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,Bronx,...,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,,09/18/2015 07:38:44 AM,Running Late,Pre-K
3,2015-2016,1215511,Pre-K/EI,331,2,Other,C178,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,Bronx,...,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,,09/17/2015 08:12:08 AM,Running Late,Pre-K
4,2015-2016,1215828,Pre-K/EI,332,2,Other,S176,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,Bronx,...,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,,09/18/2015 07:56:40 AM,Running Late,Pre-K


In [5]:
df.shape

(147972, 21)

In [6]:
for col in df.columns:
  print('name',col,'value',df[col].unique())
  print('//////')

name School_Year value ['2015-2016' '2016-2017' '2017-2018' '2019-2020']
//////
name Busbreakdown_ID value [1224901 1225098 1215800 ... 1353044 1353045 1353046]
//////
name Run_Type value ['Pre-K/EI' 'Special Ed AM Run' 'General Ed AM Run' 'Special Ed PM Run'
 'General Ed PM Run' 'Special Ed Field Trip' 'General Ed Field Trip' nan
 'Project Read PM Run' 'Project Read AM Run' 'Project Read Field Trip']
//////
name Bus_No value ['811' '9302' '358' ... '0096' 'GVC510' 'K9345']
//////
name Route_Number value ['1' '2' 'P640' ... '012' '29AM' '1409B']
//////
name Reason value ['Other' 'Heavy Traffic' 'Flat Tire' 'Mechanical Problem'
 'Delayed by School' 'Problem Run' 'Late return from Field Trip'
 'Won`t Start' 'Weather Conditions' 'Accident' nan]
//////
name Schools_Serviced value ['C353' 'C814' 'C195' ... 'C148' '02654, 02721,' '04377, 04454, 04658']
//////
name Occurred_On value ['10/26/2015 08:30:00 AM' '10/27/2015 07:10:00 AM'
 '09/18/2015 07:36:00 AM' ... '05/25/2017 04:22:00 PM'
 '05/

In [7]:
df.sample(8)

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
119893,2016-2017,1341108,Special Ed PM Run,2545,K555,Heavy Traffic,20187,04/21/2017 02:08:00 PM,04/21/2017 02:09:00 PM,Brooklyn,...,30min,0,Yes,Yes,Yes,04/21/2017 02:09:00 PM,,04/21/2017 02:09:04 PM,Running Late,School-Age
82531,2016-2017,1303288,Special Ed PM Run,9944,M023,Heavy Traffic,06004,11/29/2016 01:32:00 PM,11/29/2016 01:33:00 PM,Manhattan,...,60 MIN,0,Yes,Yes,No,11/29/2016 01:33:00 PM,,11/29/2016 01:35:41 PM,Running Late,School-Age
120635,2016-2017,1341877,Special Ed AM Run,1358,M217,Mechanical Problem,02507,04/25/2017 07:00:00 AM,04/25/2017 07:15:00 AM,Manhattan,...,1 HR,0,Yes,Yes,Yes,04/25/2017 07:15:00 AM,,04/25/2017 07:15:04 AM,Running Late,School-Age
100081,2016-2017,1321160,Special Ed AM Run,265,X888,Heavy Traffic,1106811514,02/06/2017 06:45:00 AM,02/06/2017 06:48:00 AM,Bronx,...,30MNS,0,Yes,Yes,No,02/06/2017 06:48:00 AM,,02/06/2017 06:48:04 AM,Running Late,School-Age
42495,2015-2016,1262770,Special Ed AM Run,NI4496,K299,Heavy Traffic,16308,04/15/2016 07:10:00 AM,04/15/2016 07:20:00 AM,Brooklyn,...,15,5,Yes,Yes,No,04/15/2016 07:20:00 AM,,04/15/2016 07:20:54 AM,Running Late,School-Age
38074,2015-2016,1258232,Special Ed AM Run,NI3185,K014,Other,13093,03/28/2016 06:30:00 AM,03/28/2016 06:43:00 AM,Brooklyn,...,20,0,Yes,Yes,No,03/28/2016 06:43:00 AM,,03/28/2016 06:43:06 AM,Running Late,School-Age
94515,2016-2017,1315352,Pre-K/EI,9815,2,Other,C329,01/12/2017 02:00:00 PM,01/12/2017 02:23:00 PM,Bronx,...,25-45,0,No,Yes,No,01/12/2017 02:23:00 PM,,01/12/2017 02:23:40 PM,Running Late,Pre-K
68711,2016-2017,1289412,Special Ed AM Run,4510,L783,Mechanical Problem,21736,09/29/2016 07:55:00 AM,09/29/2016 07:56:00 AM,,...,,1,Yes,Yes,Yes,09/29/2016 07:56:00 AM,,09/29/2016 07:56:51 AM,Breakdown,School-Age


In [8]:
#Busbreakdown_ID Informed_On value Bus_Company_Name

In [9]:
df1=df

In [10]:

df1.head()

Unnamed: 0,School_Year,Busbreakdown_ID,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,...,How_Long_Delayed,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Informed_On,Incident_Number,Last_Updated_On,Breakdown_or_Running_Late,School_Age_or_PreK
0,2015-2016,1224901,Pre-K/EI,811,1,Other,C353,10/26/2015 08:30:00 AM,10/26/2015 08:40:00 AM,Bronx,...,10MINUTES,5,Yes,Yes,No,10/26/2015 08:40:00 AM,,10/26/2015 08:40:39 AM,Running Late,Pre-K
1,2015-2016,1225098,Pre-K/EI,9302,1,Heavy Traffic,C814,10/27/2015 07:10:00 AM,10/27/2015 07:11:00 AM,Bronx,...,25 MINUTES,3,Yes,Yes,No,10/27/2015 07:11:00 AM,,10/27/2015 07:11:22 AM,Running Late,Pre-K
2,2015-2016,1215800,Pre-K/EI,358,2,Heavy Traffic,C195,09/18/2015 07:36:00 AM,09/18/2015 07:38:00 AM,Bronx,...,15 MINUTES,12,Yes,Yes,Yes,09/18/2015 07:38:00 AM,,09/18/2015 07:38:44 AM,Running Late,Pre-K
3,2015-2016,1215511,Pre-K/EI,331,2,Other,C178,09/17/2015 08:08:00 AM,09/17/2015 08:12:00 AM,Bronx,...,10 minutes,11,Yes,Yes,Yes,09/17/2015 08:12:00 AM,,09/17/2015 08:12:08 AM,Running Late,Pre-K
4,2015-2016,1215828,Pre-K/EI,332,2,Other,S176,09/18/2015 07:39:00 AM,09/18/2015 07:45:00 AM,Bronx,...,10MINUTES,12,Yes,Yes,No,09/18/2015 07:45:00 AM,,09/18/2015 07:56:40 AM,Running Late,Pre-K


In [11]:
df1=df1.drop(columns='Busbreakdown_ID')

In [12]:
df1=df1.drop(columns='Informed_On')

In [13]:
df1=df1.drop(columns='Bus_Company_Name')

In [None]:
vlue=df1[[]]

## Data Preprocessing
This could include:

* **Handle Missing Values**
    * Impute missing values or drop them.

* **Encode Categorical Variables**
    * One-hot encoding
    * Label encoding

* **Scale and Normalize Data**
    * Standardization (Z-score)
    * Min-Max scaling

* **Feature Engineering**
    * Create new features
    * Feature selection

* **Handle Imbalanced Data**
    * Oversampling
    * Undersampling

* **Handle Outliers**
    * Remove outliers
    * Transform outliers

* **Remove Duplicates**
    * Remove redundant or duplicate data


And add more as needed!

Please treat these as suggestions. Feel free to use your judgment for the rest.

In [14]:
df1.isnull().sum()

Unnamed: 0,0
School_Year,0
Run_Type,89
Bus_No,0
Route_Number,88
Reason,102
Schools_Serviced,0
Occurred_On,0
Created_On,0
Boro,6318
How_Long_Delayed,21630


In [15]:
df1=df1.drop(columns='How_Long_Delayed')

In [16]:
df1['Run_Type']=df1['Run_Type'].dropna

In [17]:
df1['Route_Number']=df1['Route_Number'].dropna

In [18]:
df1['Reason']=df1['Reason'].dropna

In [19]:
df1['Boro']=df1['Boro'].fillna(df1['Boro'].mode)

In [55]:
df1.isnull().sum()

Unnamed: 0,0
School_Year,0
Run_Type,0
Bus_No,0
Route_Number,0
Reason,0
Schools_Serviced,0
Occurred_On,0
Created_On,0
Boro,0
Number_Of_Students_On_The_Bus,0


In [21]:
df1.duplicated().sum()

35

In [22]:
#df1.drop_duplicated()

## Split the Dataset
Next, split the dataset into training, validation, and testing sets.

In [23]:
from sklearn.preprocessing import LabelEncoder

In [24]:
le=LabelEncoder()
for col in df1.columns:
  if df1[col].dtype=='object':
    df1[col]=le.fit_transform(df[col])

In [None]:
#from sklearn.model_selection import train_test_split
#x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=42)

In [25]:
x=df1.drop(columns='Breakdown_or_Running_Late')
y=df1['Breakdown_or_Running_Late']

In [26]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=42)

## Building the ANN Model
In this section, define the architecture of the ANN by specifying the number of layers, neurons, and activation functions.

In [27]:
import tensorflow
import keras
from tensorflow.keras import Sequential
from keras.layers import Dense

In [28]:
model=Sequential([
    Dense(3,input_dim=x_train.shape[1],kernel_initializer='uniform',activation='relu'),
    Dense(8,activation='relu'),
    Dense(1,activation='sigmoid')
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [29]:
model.summary()

## Compile the Model
Compile the ANN model by defining the optimizer, loss function, and evaluation metrics.

In [30]:
model.compile(optimizer='adam',loss='BinaryCrossentropy',metrics=['accuracy'])

## Training the Model
Train the ANN model using the training data.

In [31]:
x_train

Unnamed: 0,School_Year,Run_Type,Bus_No,Route_Number,Reason,Schools_Serviced,Occurred_On,Created_On,Boro,Number_Of_Students_On_The_Bus,Has_Contractor_Notified_Schools,Has_Contractor_Notified_Parents,Have_You_Alerted_OPT,Incident_Number,Last_Updated_On,School_Age_or_PreK
144656,1,3,8127,447,3,5021,59425,64283,1,12,1,1,0,4666,123037,0
26428,0,9,1165,4492,3,584,6067,6599,4,1,1,1,1,133,14511,1
110260,1,7,8821,11139,3,1914,16780,18189,11,4,1,1,1,4666,37388,1
9490,0,7,8033,4082,3,47,51473,55467,4,3,1,1,1,4666,106009,1
58164,1,3,76,215,6,4953,43219,46390,2,11,1,1,0,4666,88870,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119879,1,9,4760,4228,3,634,24581,26513,4,0,1,0,0,4666,53592,1
103694,1,2,3668,10863,4,1739,10576,11528,1,0,0,1,0,4666,24687,1
131932,1,0,1061,6788,5,4278,34412,36985,7,0,0,0,0,4666,73729,1
146867,1,7,5586,7120,3,4047,35242,37910,7,0,1,1,1,4666,75329,1


In [32]:
model.fit(x_train, y_train, epochs=5, batch_size=5, validation_split=0.15)

Epoch 1/5
[1m18867/18867[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 2ms/step - accuracy: 0.8772 - loss: 6.0731 - val_accuracy: 0.8875 - val_loss: 0.3516
Epoch 2/5
[1m18867/18867[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 2ms/step - accuracy: 0.8827 - loss: 0.3617 - val_accuracy: 0.8875 - val_loss: 0.3517
Epoch 3/5
[1m18867/18867[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 2ms/step - accuracy: 0.8818 - loss: 0.3637 - val_accuracy: 0.8875 - val_loss: 0.3517
Epoch 4/5
[1m18867/18867[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 2ms/step - accuracy: 0.8814 - loss: 0.3645 - val_accuracy: 0.8875 - val_loss: 0.3517
Epoch 5/5
[1m18867/18867[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 2ms/step - accuracy: 0.8819 - loss: 0.3634 - val_accuracy: 0.8875 - val_loss: 0.3525


<keras.src.callbacks.history.History at 0x78fba63c8f40>

In [51]:
model.fit(x_train, y_train, epochs=10, batch_size=30, validation_split=0.15)

Epoch 1/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.8824 - loss: 0.3625 - val_accuracy: 0.8875 - val_loss: 0.3516
Epoch 2/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 2ms/step - accuracy: 0.8836 - loss: 0.3598 - val_accuracy: 0.8875 - val_loss: 0.3518
Epoch 3/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.8829 - loss: 0.3612 - val_accuracy: 0.8875 - val_loss: 0.3516
Epoch 4/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 2ms/step - accuracy: 0.8838 - loss: 0.3593 - val_accuracy: 0.8875 - val_loss: 0.3516
Epoch 5/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 2ms/step - accuracy: 0.8845 - loss: 0.3578 - val_accuracy: 0.8875 - val_loss: 0.3516
Epoch 6/10
[1m3145/3145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 2ms/step - accuracy: 0.8835 - loss: 0.3599 - val_accuracy: 0.8875 - val_loss: 0.3517
Epoch 7/10


<keras.src.callbacks.history.History at 0x78fba53fcaf0>

## Evaluate the Model
Evaluate the performance of the model on the test set.

In [33]:
results = model.evaluate(x_test, y_test)

[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8860 - loss: 0.3559


## Make Predictions
Use the trained model to make predictions on new or unseen data.

In [34]:
y_pre=model.predict(x_train)

[1m3469/3469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 1ms/step


## Model Performance Visualization
Visualize the performance metrics such as accuracy and loss over the epochs.

In [54]:
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import OneHotEncoder
hot=OneHotEncoder()
x_train=hot.fit_transform(x_train)
cm=confusion_matrix(y_train,y_pre)
sns.heatmap(cm, annot=True, fmt=".1f")

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

## Save the Model
Save the trained model for submission.

In [None]:
model.summary()


In [50]:
tensorflow.saved_model.save(model, 'Bus_Breakdown_and_Delays')

## Project Questions:

1. **Data Preprocessing**: Explain why you chose your specific data preprocessing techniques (e.g., normalization, encoding). How did these techniques help prepare the data for training the model?
2. **Model Architecture**: Describe the reasoning behind your model’s architecture (e.g., the number of layers, type of layers, number of neurons, and activation functions). Why did you believe this architecture was appropriate for the problem at hand?
3. **Training Process**: Discuss why you chose your batch size, number of epochs, and optimizer. How did these choices affect the training process? Did you experiment with different values, and what were the outcomes?
4. **Loss Function and Metrics**: Why did you choose the specific loss function and evaluation metrics? How do they align with the objective of the task (e.g., regression vs classification)?
5. **Regularization Techniques**: If you used regularization techniques such as dropout or weight decay, explain why you implemented them and how they influenced the model's performance.
6. **Model Evaluation**: Justify your approach to evaluating the model. Why did you choose the specific performance metrics, and how do they reflect the model's success in solving the task?
7. **Model Tuning (If Done)**: Describe any tuning you performed (e.g., hyperparameter tuning) and why you felt it was necessary. How did these adjustments improve model performance?
8. **Overfitting and Underfitting**: Analyze whether the model encountered any overfitting or underfitting during training. What strategies could you implement to mitigate these issues?

In [None]:
#Data Preprocessing: I use label incoding for the catogracal fetuar
#Model Architecture:one input layer with 3 unit one hiden with 8 unit output layer with 1 unit
#Training Process:I  use tow traing with defrent  batch size, number of epochs the best accuracy whene i use big number
#specific loss function and evaluation metrics: for the target Binar
#Model Evaluation:The confusion matrix is always
#Model Tuning (If Done): nothing
#The data needs some adjustments and has extreme values ​​that may lead to some errors.

### Answer Here: