## Modele de clasificare

Folositi urmatoarele seturi de date:
1. [Semeion Handwritten Digit Data Set](http://archive.ics.uci.edu/ml/datasets/Semeion+Handwritten+Digit)
1. [Wireless Indoor Localization Data Set](http://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization)
1. [Spambase Data Set](http://archive.ics.uci.edu/ml/datasets/Spambase)
1. [Smartphone Dataset for Human Activity Recognition (HAR) in Ambient Assisted Living (AAL) Data Set ](http://archive.ics.uci.edu/ml/datasets/Smartphone+Dataset+for+Human+Activity+Recognition+%28HAR%29+in+Ambient+Assisted+Living+%28AAL%29)

plus inca doua seturi de date de clasificare alese de voi, din repository-urile specificate in Cursul 4. 

1. Daca e necesar, aplicati o metoda de missing value imputation sau eliminati inregistrarile/coloanele care au valori lipsa; documentati metoda folosita.
1. Pentru fiecare set de date aplicati 5 modele de clasificare din scikit learn. Pentru fiecare raportati: acuratete, scor F1 - a se vedea [sklearn.metrics](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics) folosind 5 fold cross validation. Raportati mediile rezultatelor atat pentru fold-urile de antrenare, cat si pentru cele de testare. Revedeti formele ulterioare ale acestui document pentru precizari despre: continutul rezultatelor raportate, modalitate de notare. 
1. Documentati in jupyter notebook fiecare din modelele folosite, in limba romana. Puteti face o sectiune o sectiune separata cu documentarea algoritmilor + trimitere la algoritm. 
1. Pentru fiecare model: efectuati o cautare a hiperparametrilor optimi folosind grid search si random search.

Exemple de modele de clasificare:
1. [Multi-layer Perceptron classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)
1. [KNN](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier)
1. [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
1. [Gaussian processes](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html#sklearn.gaussian_process.GaussianProcessClassifier)
1. [RBF](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.RBF.html#sklearn.gaussian_process.kernels.RBF)
1. [Decision tree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
1. [Random forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)
1. [Gaussian Naive bayes](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB) 

Pentru fiecare set de date raportati rezultalele obtinute de fiecare model. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import sklearn 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=ConvergenceWarning)
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import f1_score
from sklearn.metrics import make_scorer

# kNeighbors

In [63]:
def kNeighbors(x,y,k):
    k_neighbor=KNeighborsClassifier()
    accuracy=cross_val_score(k_neighbor,x,y,cv=k,scoring='accuracy')
    f1=cross_val_score(k_neighbor,x,y,cv=5,scoring=make_scorer(f1_score,average='weighted'))
    print("Accuratete: ", accuracy.mean())
    print("F1: ",f1.mean())

In [79]:
def mlp(x,y,k):
    mlp_classifier=MLPClassifier(hidden_layer_sizes=(30, ),max_iter=400)    
    accuracy=cross_val_score(mlp_classifier,x,y,cv=k,scoring='accuracy')
    f1=cross_val_score(mlp_classifier,x,y,cv=k,scoring=make_scorer(f1_score,average='weighted'))
    print("Accuratete: ", accuracy.mean())
    print("F1: ",f1.mean())

In [65]:
def svc(x,y,k):
    svc_classifire=SVC()  
    accuracy=cross_val_score(svc_classifire,x,y,cv=k,scoring='accuracy')
    f1=cross_val_score(svc_classifire,x,y,cv=k,scoring=make_scorer(f1_score,average='weighted'))
    print("Accuratete: ", accuracy.mean())
    print("F1: ",f1.mean())

In [71]:
def gauss(x,y,k):
    gaussian_classifier=GaussianProcessClassifier(max_iter_predict=50)
    accuracy=cross_val_score(gaussian_classifier,x,y,cv=k,scoring='accuracy')
    f1=cross_val_score(gaussian_classifier,x,y,cv=k,scoring=make_scorer(f1_score,average='weighted'))
    print("Accuratete: ", accuracy.mean())
    print("F1: ",f1.mean())

In [67]:
def tree(x,y,k):
    tree_classifier=DecisionTreeClassifier()
    accuracy=cross_val_score(tree_classifier,x,y,cv=k,scoring='accuracy')
    f1=cross_val_score(tree_classifier,x,y,cv=k,scoring=make_scorer(f1_score,average='weighted'))
    print("Accuratete: ", accuracy.mean())
    print("F1: ",f1.mean())

# Handwrittein Digit Data Set

In [None]:
k=5
data_semeion= pd.read_csv('semeion.csv', header=None, sep=' ')
del data_semeion[266]
digits = data_semeion.values[:, 256:266].copy()
indices=np.where(digits==1)
indices[1]
columns = np.arange(256,266)
data_semeion.drop(data_semeion.columns[columns],axis=1, inplace=True)
data_semeion[256]=indices[1]
npdata_semeion=data_semeion.values
x=npdata_semeion[:,:-1]
y=npdata_semeion[:,-1]
kNeighbors(x,y,k)
mlp(x,y,k)
svc(x,y,k)
gauss(x,y,k)
tree(x,y,k)

Accuratete:  0.9032869312296843
F1:  0.9020031731070397
Accuratete:  0.9164063557467035
F1:  0.9231465804645891
Accuratete:  0.9315139094632986
F1:  0.9317859763349526


# Wireless Indoor Localization Data Set 

In [69]:
wifi_dataframe= pd.read_csv('wifi_localization.csv','\t',names=np.arange(0,8))
np_wifi=wifi_dataframe.values
wifi_dataframe.head()
x=np_wifi[:,:-1]
y=np_wifi[:,-1]
kNeighbors(x,y,k)
mlp(x,y,k)
svc(x,y,k)
gauss(x,y,k)
tree(x,y,k)

Accuratete:  0.9765
F1:  0.9764979677189622




Accuratete:  0.9765
F1:  0.9749313379907996
Accuratete:  0.6679999999999999
F1:  0.6826532379244802


KeyboardInterrupt: 

# Spambase Data Set

In [55]:
spam_dataframe=pd.read_csv('spambase.csv',',',names=np.arange(0,58),index_col=False)
np_spam=spam_dataframe.values
spam_dataframe.head()

x=np_spam[:,:-1]
y=np_spam[:,-1]
kNeighbors(x,y,k)
mlp(x,y,k)
svc(x,y,k)
gauss(x,y,k)
tree(x,y,k)
f1 = f1_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)

Accuratete:  0.7728652798502834
Accuratete:  0.9141263163118104
Accuratete:  0.7969903083533889
Accuratete:  0.7622043504304707
Accuratete:  0.8884797831755472


# Smartphone Data Set

In [None]:
#smart_dataframe=pd.read_csv('')

# Glass Data Set

In [29]:
glass_dataframe=pd.read_csv('glass.csv',names=np.arange(0,11))
np_glass=glass_dataframe.values
x=np_glass[:,:-1]
y=np_glass[:,-1]
kNeighbors(x,y,k)
mlp(x,y,k)
svc(x,y,k)
gauss(x,y,k)
tree(x,y,k)

# Abalone Data Set

In [None]:
abalone=pd.read_csv('abalone.csv',names=np.arange(0,9))
#np_mushroom=mushroom.values
abalone[0]=abalone[0].replace(["M"],1)
abalone[0]=abalone[0].replace(["F"],2)
abalone[0]=abalone[0].replace(["I"],3)
np_abalone=abalone.values
abalone.head()
x=np_abalone[:,:-1]
y=np_abalone[:,-1]
kNeighbors(x,y,k)
mlp(x,y,k)
svc(x,y,k)
gauss(x,y,k)
tree(x,y,k)