# Loading Data

Here , we are using Pima Indians diabetes data set

In [4]:
import pandas as pd

# list for column headers
names = ['preg','plas','pres','skin','test','mass','pedi','age','class']

# Open file with pd.read_csv
df = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv", names=names)
                
print(df.shape)

# print head of data set  
print(df.head())                

(768, 9)
   preg  plas  pres  skin  test  mass   pedi  age  class
0     6   148    72    35     0  33.6  0.627   50      1
1     1    85    66    29     0  26.6  0.351   31      0
2     8   183    64     0     0  23.3  0.672   32      1
3     1    89    66    23    94  28.1  0.167   21      0
4     0   137    40    35   168  43.1  2.288   33      1


# Creating a Random Forest Model

We are trying to predict whether a patient has diabetes. This coincides with the ‘class’ column, 
which will be our independent variable. 
We’ll use all the other columns as features for our model.

In [5]:
X = df.drop('class',axis = 1)
y = df['class']

In [7]:
# Spliting the data into training and testing set
from sklearn.model_selection import train_test_split

# implementing train-test-split
X_train,X_test,y_train,y_test = train_test_split(X,y,
test_size = 0.33,random_state = 66)

In [11]:
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier

# random forest model_creation
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)

# predictions
rfc_predict = rfc.predict(X_test)



# Evaluating Performance

In [12]:
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report,confusion_matrix

In [13]:
# Running cross validation
rfc_cv_score = cross_val_score(rfc,X,y,cv = 10,scoring = 'roc_auc')

In [14]:
# printing the results
print("=== Confusion matrix ===")
print(confusion_matrix(y_test,rfc_predict))

print("=== Classification Report ===")
print(classification_report(y_test,rfc_predict))

print("=== All AUC Scores ===")

print(rfc_cv_score)
print('\n')

print("=== Mean AUC Score ===")
print("Mean AUC Score - Random Forest: ",rfc_cv_score.mean())

=== Confusion matrix ===
[[149  27]
 [ 35  43]]
=== Classification Report ===
              precision    recall  f1-score   support

           0       0.81      0.85      0.83       176
           1       0.61      0.55      0.58        78

   micro avg       0.76      0.76      0.76       254
   macro avg       0.71      0.70      0.70       254
weighted avg       0.75      0.76      0.75       254

=== All AUC Scores ===
[0.81296296 0.83296296 0.78407407 0.67740741 0.77592593 0.7962963
 0.8437037  0.88666667 0.76423077 0.81615385]


=== Mean AUC Score ===
Mean AUC Score - Random Forest:  0.7990384615384616


Here , Mean AUC Score gives the accuracy of the model,which is 79.9%