<a href="https://colab.research.google.com/github/ZygimantasRudys/Baigiamasis_darbas/blob/main/Baigiamasis_darbas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network with Keras tutorial:
In this tutorial I used Feedforward Neural Network (FNN), aka a Fully Connected Network or Multi-Layer Perceptron (MLP).

Architecture: consists of an input layer, one or more hidden layers, and an output layer. Each layer consists of neurons that are fully connected to the neurons in the previous and next layers.

Usage: used for tabular data, regression, and classification tasks.
This code: implements FNN with 2 hidden layers.

Step 0: download the DataSet

In [27]:
# Download the dataset using "wget" command
!wget -O driver_standings.csv "https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020?resource=download&select=driver_standings.csv"

# Display the first few rows of the dataset
print(df.head())

--2024-05-30 09:50:14--  https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020?resource=download&select=driver_standings.csv
Resolving www.kaggle.com (www.kaggle.com)... 35.244.233.98
Connecting to www.kaggle.com (www.kaggle.com)|35.244.233.98|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘driver_standings.csv’

driver_standings.cs     [ <=>                ]  10.62K  --.-KB/s    in 0.06s   

2024-05-30 09:50:15 (189 KB/s) - ‘driver_standings.csv’ saved [10875]

   driverStandingsId  raceId  driverId  points  position positionText  wins  \
0                  1      18         1    10.0         1            1     1   
1                  2      18         2     8.0         2            2     0   
2                  3      18         3     6.0         3            3     0   
3                  4      18         4     5.0         4            4     0   
4                  5      18         5     4.0    

1-st step: import "Pandas" library for data manipulation.

Then load a CSV file containing the dataset into a DataFrame called df.

In [1]:
# Prepare the tools required for a dataset
import pandas as pd

# Load the dataset
df = pd.read_csv('driver_standings.csv')

2-nd step: create a new column called 'win' in the DataFrame df.

This column contains binary values (1 if the driver won the race, if not then 0).

In [2]:
# Create a new target column 'win' indicating if the driver won the race
df['win'] = df['position'].apply(lambda x: 1 if x == 1 else 0)

3-rd step: define a list of feature columns that will be used for training (features).

Create X as a DataFrame containing selected feature columns.

Create Y as a Series containing the target variable ('win').

In [3]:
# Selection of relevant features for training
features = ['raceId', 'driverId', 'points', 'wins']  # You can adjust features as needed based on dataset
X = df[features]
y = df['win']

4-th step: import the "train_test_split" f-tion from sklearn for splitting the data.

Then import classes for creating and defining a Neural Network from "tensorflow.keras."

In [4]:
# Import necessary libraries for model training
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

5-th step: split the dataset into training and testing sets (80% for training, 20% for testing) using train_test_split.

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# any number could be used in "random_state=" - this a cultural reference from Douglas Adams science fiction series
# "The Hitchhiker's Guide to the Galaxy," where 42 is humorously suggested as the "answer to the ultimate question
# of life, the universe, and everything."

6-th step: create a Sequential model. (allows us to stack layers linearly, suitable for building simple FNN, where each layer has 1 input tensor and 1 output tensor).

Add three dense (fully connected) layers to the model:
The first layer has 64 neurons and uses the ReLU activation function.
The second layer has 32 neurons and also uses the ReLU activation function.

The output layer has 1 neuron and uses the sigmoid activation function for binary classification.

In [6]:
# Step 6: Define the neural network model
model = Sequential() #
model.add(Dense(64, input_dim=len(features), activation='relu')) # More neurons in 1st layer allow NN to learn more complex patterns
model.add(Dense(32, activation='relu')) # fewer neurons in the 2nd layer reduce the model complexity gradually
model.add(Dense(1, activation='sigmoid'))  # Sigmoid activation for binary classification

7-th step: compile the model specifying the loss function (binary_crossentropy), optimizer (adam), and metric (accuracy).

In [7]:
# Compiling the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

8-th step: train the model using the training data (X_train, y_train), for 50 epochs, with a batch size of 32.

Provide the validation data (X_test, y_test) to monitor the model's performance on unseen data during training.

In [8]:
# Training the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))
# 50 epochs - ensure the model learns sufficiently without overfitting

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7c9e2197df60>

9-th step: evaluate the trained model using the testing data (X_test, y_test).

Print the model's accuracy on the testing data.

In [9]:
# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Model accuracy: {accuracy * 100:.2f}%")

Model accuracy: 97.32%


10-th step: save the trained model to a file named 'f1_model.h5' for later use.

In [10]:
# Save the model
model.save('f1_model.h5')

  saving_api.save_model(


11-th step: prompt the user to enter the values for:

raceId,
driverId,
points,
wins.

Inputs will be converted to integers and stored in variables.

In [23]:
# Input fields for user data
raceId = int(input('Enter Race ID: '))
driverId = int(input('Enter Driver ID: '))
points = int(input('Enter Points: '))
wins = int(input('Enter Wins: '))

Enter Race ID: 2
Enter Driver ID: 12
Enter Points: 50
Enter Wins: 1


12-th step: create a dictionary with the user input values.

Convert this dictionary into a DataFrame (new_race_df).

In [24]:
# Creating a dataframe for new (user) inputs
new_race_data = {
    'raceId': [raceId],
    'driverId': [driverId],
    'points': [points],
    'wins': [wins]
}
new_race_df = pd.DataFrame(new_race_data)

13-th step: use the trained model to predict the probability of winning for the new (user) input data.

In [25]:
# Making a prediction
new_race_predictions = model.predict(new_race_df)



14-th step: extract the predicted probability of winning from the model's output.

Then print the predicted win probability for the specified driver in the specified race.

In [26]:
# Show predicted result
win_probability = new_race_predictions[0][0]
print(f'Predicted win probability for driver {driverId} in race {raceId}: {win_probability * 100:.2f}%')

Predicted win probability for driver 12 in race 2: 0.00%


Example of potential user inputs, to test the model:

Race ID: 8

Driver ID: 30

Points: 200

Wins: 10