# **Gamma Telescope (MAGIC) Report**

##### This dataset has been cleaned to consist of telescope data including length, width, size, distance and telesope classification.  Using this data and predictive modeling methods(Gaussian and Logistic Regression), machine learning will be utilized for predictive classification of the telescope(Gamma or Hadron).

#### Link to Dataset: https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope

In [302]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import brier_score_loss
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, log_loss

import numpy as np
import pandas as pd

dataSet=pd.read_csv('GammaTelescopeDataCleaned.csv')
X = dataSet.iloc[:,:-1]
y = dataSet.iloc[:,4]
dataSet.head()



Unnamed: 0,length,width,size,distance,class
0,28.7967,16.0021,2.6449,81.8828,g
1,31.6036,11.7235,2.5185,205.261,g
2,162.052,136.031,4.0612,256.788,g
3,23.8172,9.5728,2.3385,116.737,g
4,75.1362,30.9205,3.1611,356.462,g


### Using 80% of the dataset for training and 20% for testing the model

In [303]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=42)

### Setting up the Gaussian Model

In [304]:
NBModel=GaussianNB()
NBModel.fit(X_train,y_train)

y_predicted=NBModel.predict(X_test)
y_predicted_proba=NBModel.predict_proba(X_test)

y_train_predicted=NBModel.predict(X_train)
y_train_predicted_proba=NBModel.predict_proba(X_train)

### Evaluation Metrics : Test Data

In [305]:
print("Classification Report")
print(metrics.classification_report(y_test,y_predicted))

print("Confusion Matrix")
print(metrics.confusion_matrix(y_test,y_predicted))

Classification Report
              precision    recall  f1-score   support

           g       0.71      0.91      0.80      2460
           h       0.65      0.32      0.43      1344

    accuracy                           0.70      3804
   macro avg       0.68      0.61      0.61      3804
weighted avg       0.69      0.70      0.67      3804

Confusion Matrix
[[2227  233]
 [ 915  429]]


In [306]:
print("Accuracy = ",accuracy_score(y_test,y_predicted)*100)
print("Sensitivity = ", 429/(429+915))
print("Specificity = ", 2227/(2227+233))
print("f1-score = ",metrics.f1_score(y_test,y_predicted, average="micro"))
print("log loss = ", metrics.log_loss(y_test,y_predicted_proba))

Accuracy =  69.82124079915877
Sensitivity =  0.31919642857142855
Specificity =  0.9052845528455284
f1-score =  0.6982124079915878
log loss =  0.8496213023668047


### Evaluation Metrics : Training Data

In [307]:
print("Classification Report")
print(metrics.classification_report(y_train,y_train_predicted))

print("Confusion Matrix")
print(metrics.confusion_matrix(y_train,y_train_predicted))

Classification Report
              precision    recall  f1-score   support

           g       0.71      0.90      0.80      9872
           h       0.64      0.33      0.43      5344

    accuracy                           0.70     15216
   macro avg       0.68      0.61      0.62     15216
weighted avg       0.69      0.70      0.67     15216

Confusion Matrix
[[8900  972]
 [3592 1752]]


In [308]:
print("Accuracy = ",accuracy_score(y_train,y_train_predicted)*100)
print("Sensitivity = ", 1752/(1752+3592))
print("Specificity = ", 8900/(8900+972))
print("f1-score = ",metrics.f1_score(y_train,y_train_predicted, average="micro"))
print("log loss = ", metrics.log_loss(y_train,y_train_predicted_proba))

Accuracy =  70.00525762355416
Sensitivity =  0.3278443113772455
Specificity =  0.9015397082658023
f1-score =  0.7000525762355415
log loss =  0.8396647741269704


# Logistic Regression 

In [310]:
clf = make_pipeline(StandardScaler(), SGDClassifier(max_iter=1000, tol=1e-3))
clf.fit(X_train, y_train)
clf_predicted = clf.predict(X_test)
clf_train_predicted=clf.predict(X_train)

print("Test score: ",clf.score(X_test,y_test))
print("Training score: ",clf.score(X_train,y_train))

Test score:  0.7381703470031545
Training score:  0.7383017875920084


### Evaluation Metrics: Test Data

In [311]:
print("Classification Report")
print(metrics.classification_report(y_test,clf_predicted))

print("Confusion Matrix")
print(metrics.confusion_matrix(y_test,clf_predicted))

Classification Report
              precision    recall  f1-score   support

           g       0.72      0.98      0.83      2460
           h       0.89      0.30      0.44      1344

    accuracy                           0.74      3804
   macro avg       0.80      0.64      0.64      3804
weighted avg       0.78      0.74      0.69      3804

Confusion Matrix
[[2411   49]
 [ 947  397]]


In [312]:
print("Accuracy = ",accuracy_score(y_test,clf_predicted)*100)
print("Sensitivity = ", 331/(331+1013))
print("Specificity = ", 2432/(2432+28))
print("f1-score = ",metrics.f1_score(y_test,clf_predicted, average="micro"))
print("log loss = ", metrics.log_loss(y_test,y_predicted_proba))

Accuracy =  73.81703470031546
Sensitivity =  0.24627976190476192
Specificity =  0.9886178861788618
f1-score =  0.7381703470031546
log loss =  0.8496213023668047


### Evaluation Metrics: Training Data

In [313]:
print("Classification Report")
print(metrics.classification_report(y_train,clf_train_predicted))

print("Confusion Matrix")
print(metrics.confusion_matrix(y_train,clf_train_predicted))

Classification Report
              precision    recall  f1-score   support

           g       0.72      0.98      0.83      9872
           h       0.88      0.29      0.44      5344

    accuracy                           0.74     15216
   macro avg       0.80      0.64      0.64     15216
weighted avg       0.78      0.74      0.69     15216

Confusion Matrix
[[9660  212]
 [3770 1574]]


In [314]:
print("Accuracy = ",accuracy_score(y_train,clf_train_predicted)*100)
print("Sensitivity = ", 1281/(1281+4063))
print("Specificity = ", 9755/(9755+117))
print("f1-score = ",metrics.f1_score(y_train,clf_train_predicted, average="micro"))
print("log loss = ", metrics.log_loss(y_train,y_train_predicted_proba))

Accuracy =  73.83017875920083
Sensitivity =  0.23970808383233533
Specificity =  0.9881482982171799
f1-score =  0.7383017875920085
log loss =  0.8396647741269704
