# Mini Project #1: Decision Tree
Melihat proses parameterisasi model berpotensi untuk menghasilkan model Decision Tree yang lebih baik, Sunyi ingin mencoba mengembangkan model untuk kombinasi nilai parameter berikut.

max_depth : 24, 28, 32, 36

min_samples_split : 6, 7, 8, 9, 10

In [1]:
#Kode program sebelumnya
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

pd.set_option('display.max_column', 20)

In [2]:
df = pd.read_excel('https://storage.googleapis.com/dqlab-dataset/cth_churn_analysis_train.xlsx')
df.drop('ID_Customer', axis=1, inplace=True)
df.drop('harga_per_bulan', axis=1, inplace=True)
df.drop('jumlah_harga_langganan', axis=1, inplace=True)

In [3]:
y = df.pop('churn').to_list()
y = [1 if label == 'Yes' else 0 for label in y]

In [4]:
labelers = {}
column_categorical_non_binary = []
for col in df.select_dtypes(include=['object']):
    if len(df[col].unique()) == 2:
        labelers[col] = LabelEncoder()
        df[col] = labelers[col].fit_transform(df[col])
    else:
        column_categorical_non_binary.append(col)

In [5]:
df = pd.get_dummies(df, columns=column_categorical_non_binary)
X = df.to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1, random_state=23)

In [6]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [7]:
#mendefinisikan nilai dari parameter 'min_samples_split' yang akan dicobakan
min_samples_split_search = [6, 7, 8, 9, 10]
#mendefinisikan nilai dari parameter 'max_depth' yang akan dicobakan
max_depth_search = [24, 28, 32, 36]

In [8]:
max_score = 0
best_model = None

In [9]:
for ms in min_samples_split_search:
	for md in max_depth_search:
		model = DecisionTreeClassifier(min_samples_split=ms, max_depth=md, random_state=57)

		model.fit(X_train,y_train)

		#melakukan prediksi terhadap data X_test
		y_pred = model.predict(X_test)

		#menghitung skor berdasarkan nilai aktual (y_test) dan (y_pred)
		score = accuracy_score(y_test,y_pred)

		#jika score yang dihasilkan oleh model lebih besar dari skor
		#terbesar yang dicatat (max_score), maka
		if max_score < score:
			best_model = model
			max_score = score

In [10]:
print("Skor testing terbaik: ",max_score)
print("Parameter model: max_depth=",
      best_model.get_params()['max_depth'],
      ", min_samples_split=",
      best_model.get_params()['min_samples_split'])

Skor testing terbaik:  0.44
Parameter model: max_depth= 24 , min_samples_split= 6


# Mini Project #2: Random Forest
Dikarenakan masih penasaran terkait dengan akurasi dari model Random Forest yang dikembangkan Sunyi diam-diam ingin mencoba sekumpulan nilai parameter yang berbeda seperti pada bagian di bawah ini.

max_depth :6, 8, 10, 12, 16

min_samples_split : 4, 5, 6, 7, 8

n_estimators : 20, 30, 40, 50, 60

In [11]:
from sklearn.ensemble import RandomForestClassifier

In [12]:
#parameter untuk mengatur setiap Decision Tree yang akan dibentuk pada model Random Forest
min_samples_split_search = [4, 5, 6, 7, 8]
max_depth_search = [6, 8, 10, 12, 16]

In [13]:
#parameter untuk mengatur jumlah model Decision Tree yang akan terbentuk pada model Random Forest
n_estimators_search = [20, 30, 40, 50, 60]

In [14]:
max_score = 0
best_model = None
for ms in min_samples_split_search:
	for md in max_depth_search:
		for ne in n_estimators_search:
			model = RandomForestClassifier(n_estimators = ne, min_samples_split=ms, max_depth=md, random_state=57)
			model.fit(X_train,y_train)
			y_pred = model.predict(X_test)
			score = accuracy_score(y_test,y_pred)
			if max_score < score:
				best_model = model
				max_score = score

In [15]:
print("Skor testing terbaik: ",max_score)
print("Parameter model: max_depth=",
      best_model.get_params()['max_depth'],
      ", min_samples_split=",
      best_model.get_params()['min_samples_split'],
      ", n_estimators=",
      best_model.get_params()['n_estimators']
      )

Skor testing terbaik:  0.54
Parameter model: max_depth= 12 , min_samples_split= 5 , n_estimators= 40
