### Project Imports 

In [65]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

### Load in dataset

In [66]:
games_df = pd.read_csv("flames.csv") 
games_df.isnull().sum() #Check if any data needs to be cleaned

size         0
fuel         0
distance     0
decibel      0
airflow      0
frequency    0
status       0
dtype: int64

#### Dataset retrieved from: https://www.kaggle.com/datasets/muratkokludataset/acoustic-extinguisher-fire-dataset

"The dataset was obtained as a result of the extinguishing tests of four different fuel flames with a sound wave extinguishing system. The sound wave fire-extinguishing system consists of 4 subwoofers with a total power of 4,000 Watt placed in the collimator cabinet. There are two amplifiers that enable the sound come to these subwoofers as boosted. Power supply that powers the system and filter circuit ensuring that the sound frequencies are properly transmitted to the system is located within the control unit. While computer is used as frequency source, anemometer was used to measure the airflow resulted from sound waves during the extinguishing phase of the flame, and a decibel meter to measure the sound intensity. An infrared thermometer was used to measure the temperature of the flame and the fuel can, and a camera is installed to detect the extinction time of the flame. 

A total of 17,442 tests were conducted with this experimental setup. The experiments are planned as follows:
1. Three different liquid fuels and LPG fuel were used to create the flame.
2. 5 different sizes of liquid fuel cans are used to achieve different size of flames.
3. Half and full gas adjustment is used for LPG fuel.
4. While carrying out each experiment, the fuel container, at 10 cm distance, was moved forward up to 190 cm by increasing the distance by 10 cm each time.
5. Along with the fuel container, anemometer and decibel meter were moved forward in the same dimensions.
6. Fire extinguishing experiments was conducted with 54 different frequency sound waves at each distance and flame size.

Throughout the flame extinguishing experiments, the data obtained from each measurement device was recorded and a dataset was created. The dataset includes the features of fuel container size representing the flame size, fuel type, frequency, decibel, distance, airflow and flame extinction. Accordingly, 6 input features and 1 output feature will be used in models. The explanation of a total of seven features for liquid fuels in the dataset is given in Table 1, and the explanation of 7 features for LPG fuel is given in Table 2.
The status property (flame extinction or non-extinction states) can be predicted by using six features in the dataset. Status and fuel features are categorical, while other features are numerical. 8,759 of the 17,442 test results are the non-extinguishing state of the flame. 8,683 of them are the extinction state of the flame. According to these numbers, it can be said that the class distribution of the dataset is almost equal."				


In [67]:
games_df.head()

Unnamed: 0,size,fuel,distance,decibel,airflow,frequency,status
0,1,gasoline,10,96,0.0,75,0
1,1,gasoline,10,96,0.0,72,1
2,1,gasoline,10,96,2.6,70,1
3,1,gasoline,10,96,3.2,68,1
4,1,gasoline,10,109,4.5,67,1


### Extract the Feature Matrix and Labels


In [68]:
X = games_df[['decibel', 'airflow', 'frequency']].to_numpy() #convention X as matrix

y = games_df[['status']].to_numpy().flatten() #technically this a 2d grid but sklearn wants a 1d array for labels
# Originally fuel type was used as a lebel, but it would be better to use status

In [69]:
y

array([0, 1, 1, ..., 0, 0, 0])

In [70]:
y.shape #check if all the data works

(17442,)

### Train Test Split

In [71]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13) #split data into test and train set

In [72]:
print(f"X_train: {X_train.shape}, y_train: {y_train.shape}")
print(f"X_test: {X_test.shape}, y_test: {y_test.shape}") #check ammount of rows per test/train split

X_train: (13953, 3), y_train: (13953,)
X_test: (3489, 3), y_test: (3489,)


### Training the Model

In [73]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier #May try this new dataset should we have enough time
from sklearn.metrics import accuracy_score, f1_score
model = KNeighborsClassifier(n_neighbors=best_index) #k = amount of neighbors to consider, then make a majority vote on what to classify it as #hyperparameter
model.fit(X_train, y_train)

In [74]:
model

### Hyperparameter Tuning

In [75]:
X_train2, X_val, y_train2, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=13)


In [76]:
accuracies = []
for i in range(1, 100):
    print(f"Training a model with k = {i}.")
    model = KNeighborsClassifier(n_neighbors=i)
    model.fit(X_train2, y_train2)
    predictions = model.predict(X_val)
    accuracy = accuracy_score(y_val, predictions)
    accuracies.append(accuracy)
print(max(accuracies))

Training a model with k = 1.


Training a model with k = 2.
Training a model with k = 3.
Training a model with k = 4.
Training a model with k = 5.
Training a model with k = 6.
Training a model with k = 7.
Training a model with k = 8.
Training a model with k = 9.
Training a model with k = 10.
Training a model with k = 11.
Training a model with k = 12.
Training a model with k = 13.
Training a model with k = 14.
Training a model with k = 15.
Training a model with k = 16.
Training a model with k = 17.
Training a model with k = 18.
Training a model with k = 19.
Training a model with k = 20.
Training a model with k = 21.
Training a model with k = 22.
Training a model with k = 23.
Training a model with k = 24.
Training a model with k = 25.
Training a model with k = 26.
Training a model with k = 27.
Training a model with k = 28.
Training a model with k = 29.
Training a model with k = 30.
Training a model with k = 31.
Training a model with k = 32.
Training a model with k = 33.
Training a model with k = 34.
Training a model w

In [77]:
best_index = accuracies.index(max(accuracies))
best_index

8

### Test Model Performance

In [78]:
y_pred = model.predict(X_test)

In [79]:
y_test

array([0, 0, 1, ..., 1, 1, 0])

In [80]:
y_test

array([0, 0, 1, ..., 1, 1, 0])

### Model Evaluation

In [81]:

accuracy = accuracy_score(y_test, y_pred)
F1 = f1_score(y_test, y_pred, average='micro')
print(f"Accuracy: {accuracy}, F1 Score: {F1}")

Accuracy: 0.87847520779593, F1 Score: 0.87847520779593
