<h1 align=center style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Diabetes Prediction for Pima Women
</font>
</h1>


<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=2.5>
    The objective of this dataset is to build a predictive model for diagnosing diabetes in female patients who are at least 21 years old and of Pima Indian heritage. The model should predict whether a patient has diabetes (Outcome = 1) or does not have diabetes (Outcome = 0) based on several diagnostic measurements, including glucose level, blood pressure, skin thickness, insulin level, BMI, diabetes pedigree function, and age.
</font>
</p>

<h2 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Librarys
</font>
</h2>
<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    Import required libraries : 
</font>
</p>

In [25]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

<h2 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Read data
</font>
</h2>
<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    From csv file read diabetes data :
</font>
</p>

In [4]:
df = pd.read_csv('Diabetes_prediction.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Diagnosis
0,2,115.863387,56.410731,24.336736,94.385783,26.45594,0.272682,20.100494,0
1,2,92.490122,70.61552,23.443591,138.652426,23.910167,0.66516,44.912281,0
2,1,88.141469,63.262618,23.404364,149.358082,21.94825,0.676022,48.247873,1
3,2,108.453101,67.793632,20.75158,108.751638,24.209304,0.289636,42.749868,0
4,1,127.849443,94.725685,22.603078,25.269987,32.997477,0.601315,32.797789,0


<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    Convert float value to int except DiabetesPedigreeFunction :
</font>
</p>

In [6]:
def to_int(row):
    return row.apply(lambda x : round(x))

DiabetesPedigreeFunction = df['DiabetesPedigreeFunction']
df = df.apply(to_int)
df['DiabetesPedigreeFunction'] = DiabetesPedigreeFunction
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Diagnosis
0,2,116,56,24,94,26,0.272682,20,0
1,2,92,71,23,139,24,0.665160,45,0
2,1,88,63,23,149,22,0.676022,48,1
3,2,108,68,21,109,24,0.289636,43,0
4,1,128,95,23,25,33,0.601315,33,0
...,...,...,...,...,...,...,...,...,...
995,1,103,41,25,44,26,0.455884,20,0
996,1,61,64,25,112,19,0.250560,44,1
997,0,98,64,22,108,23,0.761463,59,1
998,0,67,56,25,220,32,0.382877,47,0


<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    Convert dataframe type to numpy array :
</font>
</p>

In [28]:
x = np.array(df.iloc[:, 0:8])
y = np.array(df.iloc[:,8])
x

array([[  2.        , 116.        ,  56.        , ...,  26.        ,
          0.2726819 ,  20.        ],
       [  2.        ,  92.        ,  71.        , ...,  24.        ,
          0.66515966,  45.        ],
       [  1.        ,  88.        ,  63.        , ...,  22.        ,
          0.67602165,  48.        ],
       ...,
       [  0.        ,  98.        ,  64.        , ...,  23.        ,
          0.76146319,  59.        ],
       [  0.        ,  67.        ,  56.        , ...,  32.        ,
          0.38287689,  47.        ],
       [  0.        ,  88.        ,  69.        , ...,  31.        ,
          0.60582821,  42.        ]])

<h2 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Normalize
</font>
</h2>
<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    We normalize all data by scaler from sklearn :
</font>
</p>

In [29]:
scaler = StandardScaler()
x = scaler.fit_transform(X=x)
x

array([[ 0.16916344,  0.85112643, -1.16387648, ...,  0.15374092,
        -0.88689813, -1.60935961],
       [ 0.16916344, -0.38253913, -0.08461417, ..., -0.3885551 ,
         1.08302816,  0.11927585],
       [-0.56954153, -0.58815006, -0.66022073, ..., -0.93085112,
         1.13754671,  0.3267121 ],
       ...,
       [-1.30824649, -0.07412274, -0.58826991, ..., -0.65970311,
         1.56639535,  1.0873117 ],
       [-1.30824649, -1.66760742, -1.16387648, ...,  1.78062898,
        -0.33380688,  0.25756668],
       [-1.30824649, -0.58815006, -0.22851581, ...,  1.50948097,
         0.78523145, -0.08816041]])

<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    We split train and test data by train_test_split from sklearn :
</font>
</p>

In [30]:
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.2)
X_test.shape

(200, 8)

<p dir=rtl style="direction: ltr ;text-align: justify;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazir" size=3>
    Save data to numpy file array :
</font>
</p>

In [31]:
np.save('X_train.npy',X_train)
np.save('y_train.npy',y_train)
np.save('X_test.npy',X_test)
np.save('y_test.npy',y_test)