# Langkah-langkah
1. Tentukan Library yang digunakan
2. Load Dataset
3. Standarisasi Data
4. Memisahkan Data Trainning dan Data Testing
5. Membuat Data Latih Dengan Algoritma SVM
6. Membuat Model Evaluasi Untuk Mengukur Tingkat Akurasi
7. Membuat Model Prediksi
8. Simpan Model

# 1. Tentukan Library yang Digunakan

In [1]:
pip install scikit-learn




In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

# Load Dataset
Dataset yang digunakan berasal dari kaggle dengan link nya sebagai berikut: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database?resource=download

In [3]:
df = pd.read_csv("diabetes.csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
# Memisahkan kolom feature dengan kolom label
X = df.drop(columns='Outcome',axis = 1)
Y = df['Outcome']

In [5]:
X.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [6]:
Y.head()

0    1
1    0
2    1
3    0
4    1
Name: Outcome, dtype: int64

# 3. Standarisasi Data
Karena pada kolom yang ada pada dataset nilai-nilainya berbeda beda, maka kita melakukan standarisasi. Agar nilai-nilainya seimbang.

In [7]:
scaler = StandardScaler()

In [8]:
scaler.fit(X)

In [9]:
standarized_data = scaler.transform(X)

In [10]:
print(standarized_data)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]


Dari hasil diatas, didapat nilai-nilainya sudah seimbang.

In [11]:
X = standarized_data
Y = df['Outcome']

In [12]:
print(X)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]


In [13]:
print(Y)

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64


# 4. Memisahkan Data Trainning dan Data Testing

In [14]:
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2,stratify = Y, random_state=2)

In [15]:
print(X.shape, X_train.shape, X_test.shape)

(768, 8) (614, 8) (154, 8)


In [16]:
print(Y.shape, Y_train.shape, Y_test.shape)

(768,) (614,) (154,)


# 5. Membuat Data Latih Menggunakan Algoritma SVM

In [17]:
classifier = svm.SVC(kernel='linear')

In [18]:
classifier.fit(X_train,Y_train)

# 6. Membuat Model Evaluasi Untuk Mengukur Tingkat Akurasi

In [21]:
X_train_prediction = classifier.predict(X_train)
trainning_data_accuracy = accuracy_score(X_train_prediction,Y_train)
print("Akurasi data trainning adalah = ",trainning_data_accuracy)

Akurasi data trainning adalah =  0.7866449511400652


In [22]:
X_test_prediction = classifier.predict(X_test)
testing_data_accuraccy = accuracy_score(X_test_prediction,Y_test)
print("Akurasi data testing adalah = ",testing_data_accuraccy)

Akurasi data testing adalah =  0.7727272727272727


# 7. Membuat Model Prediksi

In [23]:
input_data = (6,148,72,35,0,33.6,0.627,50)

input_data_array = np.array(input_data)

input_data_reshape = input_data_array.reshape(1,-1)

standarized_input_data = scaler.transform(input_data_reshape)
print(standarized_input_data)

[[ 0.63994726  0.84832379  0.14964075  0.90726993 -0.69289057  0.20401277
   0.46849198  1.4259954 ]]




In [24]:
prediction = classifier.predict(standarized_input_data)
print(prediction)

if (prediction[0]==0):
  print('Pasien tidak terkena diabetes')
else:
  print('Pasien terkena diabetes')

[1]
Pasien terkena diabetes


# 8. Simpan Model

In [25]:
import pickle

In [26]:
filename = 'diabetes_svm.sav'
pickle.dump(classifier, open(filename,'wb'))