### Star Classification
In this example we are going to classify different types of stars from the `Stars` data set.


### Import modules

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import pandas as pd

In [3]:
stars = pd.read_csv('Stars.csv')
stars.head()

Unnamed: 0,Temperature,L,R,A_M,Color,Spectral_Class,Type
0,3068,0.0024,0.17,16.12,Red,M,0
1,3042,0.0005,0.1542,16.6,Red,M,0
2,2600,0.0003,0.102,18.7,Red,M,0
3,2800,0.0002,0.16,16.65,Red,M,0
4,1939,0.000138,0.103,20.06,Red,M,0


> By looking at the data we can say the `Type` of a star is a dependend variable and `temperature, L, R` are independent variable

### Getting data

In [45]:
X = stars[:].values[:, :3] ## This is the Temp, L, R of the datasets
#y = stars[:].values[:, -2:]
y = stars.Type.values

### Spliting data

In [46]:
from sklearn.model_selection import train_test_split

In [47]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33, test_size=.2)

In [48]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((192, 3), (192,), (48, 3), (48,))

In [49]:
X_train[:2], y_train[:7]

(array([[2890, 0.0034, 0.24],
        [3324, 0.0065, 0.471]], dtype=object),
 array([1, 1, 2, 2, 5, 3, 2], dtype=int64))

In [59]:
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

### Model Selection

In [80]:
model = SVC(kernel="linear") 
model.fit(X_train, y_train)

SVC(kernel='linear')

In [81]:
model.score(X_test, y_test)

0.9375

In [58]:
model2 = DecisionTreeClassifier()
model2.fit(X_train, y_train)
model2.score(X_test, y_test)

0.9791666666666666

In [61]:
model3 = GaussianNB()
model3.fit(X_train, y_train)
model3.score(X_test, y_test)

0.8125

In [62]:
model4 = KNeighborsClassifier(n_neighbors=2)
model4.fit(X_train, y_train)
model4.score(X_test, y_test)

0.5208333333333334

> Model 2 if the best so far so let's use that.

In [68]:
clf =  DecisionTreeClassifier(criterion="entropy")
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.9791666666666666

### Making Predictions

In [71]:
clf.predict(X_test[:4]), y_test[:4]

(array([1, 2, 3, 2], dtype=int64), array([1, 2, 3, 2], dtype=int64))

> **Awesome** it seems like our model is accurate enough to make predictions on the test data pretty well.

### Evaluation

In [73]:
from sklearn.metrics import f1_score

In [75]:
f1_score(y_test, clf.predict(X_test[:]), average=None, labels=[0, 1, 2, 3, 4, 5])

array([1.        , 0.94117647, 1.        , 0.94117647, 1.        ,
       1.        ])