### Naives Bayes Classification 
We are trying to make a Naives Bayes model, that can predict top10, 
with features final_draw_position, final_televote_points, final_jury_points

#### Import Libraries

In [None]:
import pandas as pd
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB

#### Load Data

In [None]:
df = pd.read_csv('Data/finalists_cleaned.csv')

#### Handle Missing Values

In [None]:
df.isnull().sum()

Since we know what features we are gonna use in this model, we are only looking into those for inspection

In [None]:
# Show rows where final_televote_points or final_jury_points is NaN
df[df['final_televote_points'].isna() | df['final_jury_points'].isna()]


This visualize that the final_televotes_points and final_jury_points in 2013 is missing for all the countries, there for we decide to remove the whole year. Bonus info - we first ran the model with 2013 at got a F1-score on 0.91, but after we removed it we hit 0,96.

In [None]:
df = df[df['year'] != 2013]

#### Create Binary Target Variable

In [None]:
# Binary classification: 1 = Top 10, 0 = Not Top 10
df['top_10'] = df['final_place'].apply(lambda x: 1 if x <= 10 else 0)

#### Select Features and Target

In [None]:
features = [
    'final_draw_position',
    'final_televote_points', 'final_jury_points'
]
X = df[features]
y = df['top_10']

In [None]:
print(X.isna().sum())

In [None]:
X = X.fillna(X.mean())

Since there is only 6 nans in both final_televote_points and final_jury_points we are filling them with mean

In [None]:
X.shape

In [None]:
X.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()

shows alot of outlier. But since these points represent that some scores a high amount of points and some allmost nothing.

#### Splitting For Test

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=7)

#### Train Naive Bayes Model

In [None]:
model = GaussianNB()
model.fit(X_train, y_train)

#### Evaluate Model Performance

In [None]:
y_pred = model.predict(X_test)

# Accuracy and report
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.savefig('Images/NB_confusion_matrix.png')
plt.show()

The F1 score of 0.96 indicates a high level of accuracy in Naive bayes classification model. 

#### Prediction on a New Sample

In [None]:
sample = [[2, 20, 13]]  # draw, televote pts, jury pts
sample_df = pd.DataFrame(sample, columns=features)
prediction = model.predict(sample_df)
print("Top 10 prediction:", "Yes" if prediction[0] == 1 else "No")

#### Saving The Model

In [None]:
import joblib

In [None]:
# Store the model in a file
model_file = 'Models/bayes.pkl'

In [None]:
joblib.dump(model, model_file)