# Tic-Tac-Toe Endgame database

1. Title: Tic-Tac-Toe Endgame database

2. Source Information
   -- Creator: David W. Aha (aha@cs.jhu.edu)
   -- Donor: David W. Aha (aha@cs.jhu.edu)
   -- Date: 19 August 1991
 
3. Known Past Usage: 
   1. Matheus,~C.~J., \& Rendell,~L.~A. (1989).  Constructive
      induction on decision trees.  In {\it Proceedings of the
      Eleventh International Joint Conference on Artificial Intelligence} 
      (pp. 645--650).  Detroit, MI: Morgan Kaufmann.
      -- CITRE was applied to 100-instance training and 200-instance test
         sets.  In a study using various amounts of domain-specific
         knowledge, its highest average accuracy was 76.7% (using the
         final decision tree created for testing).

   2. Matheus,~C.~J. (1990). Adding domain knowledge to SBL through
      feature construction.  In {\it Proceedings of the Eighth National
      Conference on Artificial Intelligence} (pp. 803--808). 
      Boston, MA: AAAI Press.
      -- Similar experiments with CITRE, includes learning curves up
         to 500-instance training sets but used _all_ instances in the
         database for testing.  Accuracies reached above 90%, but specific
         values are not given (see Chris's dissertation for more details).

   3. Aha,~D.~W. (1991). Incremental constructive induction: An instance-based
      approach.  In {\it Proceedings of the Eighth International Workshop
      on Machine Learning} (pp. 117--121).  Evanston, ILL: Morgan Kaufmann.
      -- Used 70% for training, 30% of the instances for testing, evaluated
         over 10 trials.  Results reported for six algorithms:
         -- NewID:   84.0%
         -- CN2:     98.1%  
         -- MBRtalk: 88.4%
         -- IB1:     98.1% 
         -- IB3:     82.0%
         -- IB3-CI:  99.1%
      -- Results also reported when adding an additional 10 irrelevant 
         ternary-valued attributes; similar _relative_ results except that
         IB1's performance degraded more quickly than the others.

4. Relevant Information:

   This database encodes the complete set of possible board configurations
   at the end of tic-tac-toe games, where "x" is assumed to have played
   first.  The target concept is "win for x" (i.e., true when "x" has one
   of 8 possible ways to create a "three-in-a-row").  

   Interestingly, this raw database gives a stripped-down decision tree
   algorithm (e.g., ID3) fits.  However, the rule-based CN2 algorithm, the
   simple IB1 instance-based learning algorithm, and the CITRE 
   feature-constructing decision tree algorithm perform well on it.

5. Number of Instances: 958 (legal tic-tac-toe endgame boards)

6. Number of Attributes: 9, each corresponding to one tic-tac-toe square

7. Attribute Information: (x=player x has taken, o=player o has taken, b=blank)

    1. top-left-square: {x,o,b}
    2. top-middle-square: {x,o,b}
    3. top-right-square: {x,o,b}
    4. middle-left-square: {x,o,b}
    5. middle-middle-square: {x,o,b}
    6. middle-right-square: {x,o,b}
    7. bottom-left-square: {x,o,b}
    8. bottom-middle-square: {x,o,b}
    9. bottom-right-square: {x,o,b}
   10. Class: {positive,negative}

8. Missing Attribute Values: None

9. Class Distribution: About 65.3% are positive (i.e., wins for "x")


# Importing Basic Libraries

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

# Reading Library

In [4]:
data = pd.read_csv(r"C:\Users\Pragya\Downloads\tic+tac+toe+endgame\tic-tac-toe.data",header = None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,x,x,x,x,o,o,x,o,o,positive
1,x,x,x,x,o,o,o,x,o,positive
2,x,x,x,x,o,o,o,o,x,positive
3,x,x,x,x,o,o,o,b,b,positive
4,x,x,x,x,o,o,b,o,b,positive
...,...,...,...,...,...,...,...,...,...,...
953,o,x,x,x,o,o,o,x,x,negative
954,o,x,o,x,x,o,x,o,x,negative
955,o,x,o,x,o,x,x,o,x,negative
956,o,x,o,o,x,x,x,o,x,negative


# DATA PREPROCESSING

**Assigning column to the data frame**

In [5]:
data.columns=(["top-left-square",
    "top-middle-square",
    "top-right-square",
    "middle-left-square",
   "middle-middle-square",
    "middle-right-square",
    "bottom-left-square",
    "bottom-middle-square",
    "bottom-right-square",
"Class"])

In [6]:
data.head()

Unnamed: 0,top-left-square,top-middle-square,top-right-square,middle-left-square,middle-middle-square,middle-right-square,bottom-left-square,bottom-middle-square,bottom-right-square,Class
0,x,x,x,x,o,o,x,o,o,positive
1,x,x,x,x,o,o,o,x,o,positive
2,x,x,x,x,o,o,o,o,x,positive
3,x,x,x,x,o,o,o,b,b,positive
4,x,x,x,x,o,o,b,o,b,positive


**SHAPE**

In [7]:
data.shape

(958, 10)

There are 958 rows and 10 cloumns in data set

**COLUMNS**

In [8]:
data.columns

Index(['top-left-square', 'top-middle-square', 'top-right-square',
       'middle-left-square', 'middle-middle-square', 'middle-right-square',
       'bottom-left-square', 'bottom-middle-square', 'bottom-right-square',
       'Class'],
      dtype='object')

**DESCRIPTIVE STATISTICS**

In [9]:
data.describe()

Unnamed: 0,top-left-square,top-middle-square,top-right-square,middle-left-square,middle-middle-square,middle-right-square,bottom-left-square,bottom-middle-square,bottom-right-square,Class
count,958,958,958,958,958,958,958,958,958,958
unique,3,3,3,3,3,3,3,3,3,2
top,x,x,x,x,x,x,x,x,x,positive
freq,418,378,418,378,458,378,418,378,418,626


Total Rows: 958

Total Columns: 10 (all categorical)

Column Types: All columns are of type object (categorical data).

Unique Values per Column:

Each of the first 9 columns (representing Tic-Tac-Toe board positions) has 3 unique values (x, o, and b (blank)).

The last column (target variable) has 2 unique values: positive (626 occurrences) and negative (332 occurrences).

Most Frequent Values:

The x symbol appears the most frequently in multiple board positions.

The dataset has more positive outcomes than negative ones.

In [10]:
data.describe(include="all")

Unnamed: 0,top-left-square,top-middle-square,top-right-square,middle-left-square,middle-middle-square,middle-right-square,bottom-left-square,bottom-middle-square,bottom-right-square,Class
count,958,958,958,958,958,958,958,958,958,958
unique,3,3,3,3,3,3,3,3,3,2
top,x,x,x,x,x,x,x,x,x,positive
freq,418,378,418,378,458,378,418,378,418,626


**SUBMARIZE INFORMATION**

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 958 entries, 0 to 957
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   top-left-square       958 non-null    object
 1   top-middle-square     958 non-null    object
 2   top-right-square      958 non-null    object
 3   middle-left-square    958 non-null    object
 4   middle-middle-square  958 non-null    object
 5   middle-right-square   958 non-null    object
 6   bottom-left-square    958 non-null    object
 7   bottom-middle-square  958 non-null    object
 8   bottom-right-square   958 non-null    object
 9   Class                 958 non-null    object
dtypes: object(10)
memory usage: 75.0+ KB


**checking for missing values**

In [12]:
data.isnull().sum()

top-left-square         0
top-middle-square       0
top-right-square        0
middle-left-square      0
middle-middle-square    0
middle-right-square     0
bottom-left-square      0
bottom-middle-square    0
bottom-right-square     0
Class                   0
dtype: int64

There are no missing values in the dataset

**class distribution of target variable**

In [13]:
data["Class"].value_counts()

Class
positive    626
negative    332
Name: count, dtype: int64

As this dataset is small but comperatively amount of negative class is much less than positive class.

**DATA TYPES**

In [15]:
data.dtypes

top-left-square         object
top-middle-square       object
top-right-square        object
middle-left-square      object
middle-middle-square    object
middle-right-square     object
bottom-left-square      object
bottom-middle-square    object
bottom-right-square     object
Class                   object
dtype: object

**DUPLICATED**

In [17]:
data.duplicated().sum()

0

There are no duplicated entries in this data set

In [14]:
for i in data.columns:
    print({i:data[i].unique()})


{'top-left-square': array(['x', 'o', 'b'], dtype=object)}
{'top-middle-square': array(['x', 'o', 'b'], dtype=object)}
{'top-right-square': array(['x', 'o', 'b'], dtype=object)}
{'middle-left-square': array(['x', 'o', 'b'], dtype=object)}
{'middle-middle-square': array(['o', 'b', 'x'], dtype=object)}
{'middle-right-square': array(['o', 'b', 'x'], dtype=object)}
{'bottom-left-square': array(['x', 'o', 'b'], dtype=object)}
{'bottom-middle-square': array(['o', 'x', 'b'], dtype=object)}
{'bottom-right-square': array(['o', 'x', 'b'], dtype=object)}
{'Class': array(['positive', 'negative'], dtype=object)}


The uniques category in this data is[x,o,b]

**Converting categorical variable to numerival variable**

In [31]:
colname=[]
for x in data.columns:
    if data[x].dtype=='object':
        colname.append(x)
print(colname)

from sklearn.preprocessing import LabelEncoder
 
le=LabelEncoder()
 
for x in colname:
    data[x]=le.fit_transform(data[x])

[]


In [32]:
data.head()

Unnamed: 0,top-left-square,top-middle-square,top-right-square,middle-left-square,middle-middle-square,middle-right-square,bottom-left-square,bottom-middle-square,bottom-right-square,Class
0,2,2,2,2,1,1,2,1,1,1
1,2,2,2,2,1,1,1,2,1,1
2,2,2,2,2,1,1,1,1,2,1
3,2,2,2,2,1,1,1,0,0,1
4,2,2,2,2,1,1,0,1,0,1


**SEPERATING DATA INTO TRAIN AND TEST**

In [20]:
X = data.values[:,:-1]
Y = data.values[:,-1]

**SPLITTING DATA INTO TRAIN AND TEST**

In [21]:
# split the data into test and train
from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state = 10)  

**APPLYING SMOTE**

In [22]:
print("Before OverSampling, counts of label '1': ", (sum(Y_train == 1)))
print("Before OverSampling, counts of label '0': ", (sum(Y_train == 0)))
  
# import SMOTE from imblearn library
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 10,k_neighbors=5)
X_train_res, Y_train_res = sm.fit_resample(X_train, Y_train)
  
print('After OverSampling, the shape of train_X: ', (X_train_res.shape))
print('After OverSampling, the shape of train_y: ', (Y_train_res.shape))
  
print("After OverSampling, counts of label '1': ", (sum(Y_train_res == 1)))
print("After OverSampling, counts of label '0': ", (sum(Y_train_res == 0)))

Before OverSampling, counts of label '1':  500
Before OverSampling, counts of label '0':  266
After OverSampling, the shape of train_X:  (1000, 9)
After OverSampling, the shape of train_y:  (1000,)
After OverSampling, counts of label '1':  500
After OverSampling, counts of label '0':  500


# BUILDING MODE

# LOGISTIC REGRESSION

**APPLYING LOGISTIC REGRESSION**

**CREATING MODEL OBJECT**

In [59]:
from sklearn.linear_model import LogisticRegression
# create a model
classifier=LogisticRegression(random_state=10)

**FITTING THE TRAINING DATASET**

In [60]:
classifier.fit(X_train_res,Y_train_res)


**FINDING BETA PARAMETERS**

In [61]:
print(classifier.intercept_)   # intercept is beta-not
print(classifier.coef_)     # coef is beta-1,beta-2

[-2.09678202]
[[ 0.26846947 -0.05358319  0.31160424 -0.01274917  0.75508552 -0.00739829
   0.29497477 -0.01122891  0.18625504]]


**PREDICTING THE VALUES**

In [62]:
Y_pred=classifier.predict(X_test)
print(Y_pred)

[1 1 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 1 1 1
 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1
 0 0 0 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 1 0 0 1 0
 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 1 0 0 1 1 0 1 1
 1 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 1 0
 0 1 1 1 1 0 1]


**EVALUATION MATRIX**

In [63]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
 
# confusion matrix
cfm=confusion_matrix(Y_test,Y_pred)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,Y_pred))


# accuracy_score
acc=accuracy_score(Y_test, Y_pred)
print("Accuracy of the model: ",acc)

[[36 30]
 [43 83]]
Classification report: 
              precision    recall  f1-score   support

           0       0.46      0.55      0.50        66
           1       0.73      0.66      0.69       126

    accuracy                           0.62       192
   macro avg       0.60      0.60      0.60       192
weighted avg       0.64      0.62      0.63       192

Accuracy of the model:  0.6197916666666666


**TUNNING LOGISTIC REGRESSION**

In [64]:
y_pred_prob = classifier.predict_proba(X_test)
#print(y_pred_prob)


y_pred_class = []
for value in y_pred_prob[:,1]:
    if value >0.5:                         # agar value 0.5 se jayada h to consider as 1
        y_pred_class.append(1)
    else:
        y_pred_class.append(0)              # agar value 0.5 se kum h to consider as 0


from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
 
# confusion matrix
cfm=confusion_matrix(Y_test,y_pred_class)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,y_pred_class))


# accuracy_score
acc=accuracy_score(Y_test, y_pred_class)
print("Accuracy of the model: ",acc)


[[36 30]
 [43 83]]
Classification report: 
              precision    recall  f1-score   support

           0       0.46      0.55      0.50        66
           1       0.73      0.66      0.69       126

    accuracy                           0.62       192
   macro avg       0.60      0.60      0.60       192
weighted avg       0.64      0.62      0.63       192

Accuracy of the model:  0.6197916666666666


Here there is no meaning of tunning the logistic regression becaue none of the class are class of importance.  
there is an equal probability of getting o and x.

# APPLYING DECISION TREE

**CALLING MODEL OBJECT**

In [85]:
from sklearn.tree import DecisionTreeClassifier

# create a model
model_DT =DecisionTreeClassifier(random_state=10 , criterion = "gini",max_depth=15, min_samples_leaf=1)

**FITTING MODEL OBJECT**

In [86]:
# fitting training data to the model
model_DT.fit(X_train_res,Y_train_res)


**PREDICTING VALUES OF X_TEST**

In [87]:
Y_pred = model_DT.predict(X_test)
print(Y_pred)

[1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0
 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 1
 1 1 1 1 1 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1
 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 1
 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 1 1 1
 0 1 0 1 1 1 0]


**CHECKING MODEL IS OVER FITTED OR NOT**

In [88]:
model_DT.score(X_train_res,Y_train_res)

1.0

The accuracy of score function is 100% which means model is overfitted

**EVALUATION MATRIX**

In [70]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
 
# confusion matrix
cfm=confusion_matrix(Y_test,Y_pred)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,Y_pred))


# accuracy_score
acc=accuracy_score(Y_test, Y_pred)
print("Accuracy of the model: ",acc)

[[ 62   4]
 [ 11 115]]
Classification report: 
              precision    recall  f1-score   support

           0       0.85      0.94      0.89        66
           1       0.97      0.91      0.94       126

    accuracy                           0.92       192
   macro avg       0.91      0.93      0.92       192
weighted avg       0.93      0.92      0.92       192

Accuracy of the model:  0.921875


# PRUNNING DECISION TREE

**CREATING MODEL OBJECT**

In [89]:
from sklearn.tree import DecisionTreeClassifier

# create a model
model_DT =DecisionTreeClassifier(random_state=10 , criterion = "gini",max_depth=15, min_samples_leaf=1)

**fitting training data to the model**

In [90]:
model_DT.fit(X_train,Y_train)

**PERDICTING VALUES OF X_TEST**

In [91]:
Y_pred = model_DT.predict(X_test)
print(Y_pred)

[1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0
 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1
 1 0 1 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1
 0 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 1 0 1 0 1 1 1
 1 1 1 1 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 1 1 1
 0 1 0 1 1 1 0]


**CHECKING MODEL IS OVERFITTED OR NOT**

In [92]:
model_DT.score(X_train_res,Y_train_res)

0.976

ACCURACY OF MODEL IS NOT 100% WHICH MEANS MODEL IS NOT OVERFITTED

**EVALUATION MATRIX**

In [74]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
 
# confusion matrix
cfm=confusion_matrix(Y_test,Y_pred)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,Y_pred))


# accuracy_score
acc=accuracy_score(Y_test, Y_pred)
print("Accuracy of the model: ",acc)

[[ 59   7]
 [  6 120]]
Classification report: 
              precision    recall  f1-score   support

           0       0.91      0.89      0.90        66
           1       0.94      0.95      0.95       126

    accuracy                           0.93       192
   macro avg       0.93      0.92      0.92       192
weighted avg       0.93      0.93      0.93       192

Accuracy of the model:  0.9322916666666666


# APPLYING RANDOM FOREST

**CREATING RANDOM FOREST OBJECT**

In [75]:
from sklearn.ensemble import RandomForestClassifier

model_RandomForest=RandomForestClassifier(n_estimators=500,
                                          random_state=10, bootstrap=True,
                                         n_jobs=-1)

**FITTING MODEL OBJECT**

In [76]:
#fit the model on the data and predict the values
model_RandomForest.fit(X_train_res,Y_train_res)

**PREDICTING VALUES OF X_TEST**

In [77]:
Y_pred=model_RandomForest.predict(X_test)
print(Y_pred)

[1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0
 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1
 1 1 1 1 1 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1
 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 1
 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 1 1
 0 1 0 1 1 1 0]


**EVALUATION MATRIX**

In [78]:
# confusion matrix
cfm=confusion_matrix(Y_test,Y_pred)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,Y_pred))


# accuracy_score
acc=accuracy_score(Y_test, Y_pred)
print("Accuracy of the model: ",acc)

[[ 61   5]
 [  0 126]]
Classification report: 
              precision    recall  f1-score   support

           0       1.00      0.92      0.96        66
           1       0.96      1.00      0.98       126

    accuracy                           0.97       192
   macro avg       0.98      0.96      0.97       192
weighted avg       0.97      0.97      0.97       192

Accuracy of the model:  0.9739583333333334


# PRUNNING RANDOM FOREST

**CREATING RANDOM FOREST MODEL OBJECT**

In [79]:
from sklearn.ensemble import RandomForestClassifier

model_RandomForest=RandomForestClassifier(n_estimators=500,
                                          random_state=10, bootstrap=True,
                                         n_jobs=-1,max_depth=15, min_samples_leaf=1)

**FITTING MODEL OBJECT**

In [80]:
model_RandomForest.fit(X_train,Y_train)

**PREDING VALUES OF X_TEST**

In [81]:
Y_pred=model_RandomForest.predict(X_test)
print(Y_pred)

[1 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0
 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1
 1 1 1 1 1 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1
 1 0 1 1 0 1 0 0 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 1
 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 1 1
 0 1 0 1 1 1 0]


**EVALUATION MATRIX**

In [82]:
# confusion matrix
cfm=confusion_matrix(Y_test,Y_pred)
print(cfm)
 
print("Classification report: ")
 
print(classification_report(Y_test,Y_pred))


# accuracy_score
acc=accuracy_score(Y_test, Y_pred)
print("Accuracy of the model: ",acc)

[[ 58   8]
 [  1 125]]
Classification report: 
              precision    recall  f1-score   support

           0       0.98      0.88      0.93        66
           1       0.94      0.99      0.97       126

    accuracy                           0.95       192
   macro avg       0.96      0.94      0.95       192
weighted avg       0.95      0.95      0.95       192

Accuracy of the model:  0.953125


# FIND BEST PARAMETERS FOR MY MODEL ---> BY GRID SEARCH METHOD

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

model_EXT=ExtraTreesClassifier( random_state=10, bootstrap=True,) #fixed parameters should be passsed here

#parameters for trial and error should be passed here
parameter_space = {
    'n_estimators':[100,300,500,1000],       #np.arange(100, 1001,50),
    'max_depth':[10,15,8, 12],
    'min_samples_leaf':[1,3,4,5,6,7]
    }
from sklearn.model_selection import GridSearchCV #RandomizedSearchCV
clf = GridSearchCV(model_EXT, parameter_space, n_jobs=-1, cv=5)


clf.fit(X_train,Y_train)


print('Best parameters found:\n', clf.best_params_)


clf.best_score_

# CONCLUSION

In [None]:
Out of all the model prunned random forest gives the best result

In [None]:
[[ 58   8]
 [  1 125]]
Classification report: 
              precision    recall  f1-score   support

           0       0.98      0.88      0.93        66
           1       0.94      0.99      0.97       126

    accuracy                           0.95       192
   macro avg       0.96      0.94      0.95       192
weighted avg       0.95      0.95      0.95       192

Accuracy of the model:  0.953125


In [None]:
As this basic model gives the g