# F1 Score

* Evaluate classification models using F1 score

* F1 score, also known as balanced F-score or F-measure

* F1 score combines precision and recall relative to a specific positive class

* The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. 
   The relative contribution of precision and recall to the F1 score are equal.


https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

In [1]:
# FORMULA

# F1 = 2 * (precision * recall) / (precision + recall)

precision is the ability of the classifier not to label as positive a sample that is negative, and 
recall is the ability of the classifier to find all the positive samples.

In [2]:
# precision = tp / (tp+fp)

# recall = tp / (tp+ fn)

In [3]:
# imports 

import pandas as pd

In [4]:
pwd

'C:\\Users\\anura\\Machine_Learning'

In [5]:
# load dataset

data = pd.read_csv(r"C:\Users\anura\Desktop\Data_set\TitanicDataset\titanic_data.csv ")
data.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


In [6]:
# only store numeric data in features

data= data._get_numeric_data()

data.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
0,1,0,3,22.0,1,0,7.25
1,2,1,1,38.0,1,0,71.2833
2,3,1,3,26.0,0,0,7.925


In [7]:
# create response vector target

target = data.Survived

target.head(3)

0    0
1    1
2    1
Name: Survived, dtype: int64

In [8]:
# delete 'Survived', the response vector (Series)

data.drop('Survived', axis=1, inplace=True)

# we drop age for the sake of this example because it contains NaN in some examples

data.drop('Age', axis=1, inplace=True)

In [9]:
# check delete

data.head()

Unnamed: 0,PassengerId,Pclass,SibSp,Parch,Fare
0,1,3,1,0,7.25
1,2,1,1,0,71.2833
2,3,3,0,0,7.925
3,4,1,1,0,53.1
4,5,3,0,0,8.05


In [10]:
# imports for classifiers and metrics

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import f1_score

In [11]:
# train/test split

from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test = train_test_split(data, target, random_state=0)

In [12]:
# Decision Tree Classifier

# instantiate
dtc = DecisionTreeClassifier()

# fit
dtc.fit(data_train, target_train)

# predict
target_pred = dtc.predict(data_test)

# f1 score
score = f1_score(target_pred, target_test)

# print
print ("Decision Tree F1 score: {:.2f}".format(score))

Decision Tree F1 score: 0.56


In [13]:
# Gaussian Naive Bayes

from sklearn.naive_bayes import GaussianNB

In [14]:
# instantiate
gnb = GaussianNB()

# fit
gnb.fit(data_train, target_train)

# predict
target_pred_2 = gnb.predict(data_test)

# f1 score
score_2 = f1_score(target_pred_2, target_test)

# print
print ("GaussianNB F1 score: {: .2f}".format(score_2))

GaussianNB F1 score:  0.53
