#  Project Prediksi Churn Nasabah
**Author:** [Lannnn]

**Tujuan:** Memprediksi apakah pelanggan akan berhenti berlangganan (churn) menggunakan Machine Learning (Random Forest).

## 1. Import Library & Load Data


In [1]:
#Persiapan Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report


In [2]:
# Load Dataset
df = pd.read_csv('Customer-Churn.csv')

In [4]:
#Cek kondisi data awal
print("--- 5 Data Teratas ---")
display(df.head())


--- 5 Data Teratas ---


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
#Cek tipe data pada setiap kolom
print("\n--- Informasi Tipe Data ---")
df.info()


--- Informasi Tipe Data ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBill

## 2. Data Cleaning
Membersihkan data kotor:
1. Mengubah `TotalCharges` dari teks ke angka.
2. Mengisi data kosong (NaN) dengan 0.
3. Membuang kolom `customerID` yang tidak relevan.
4. Mengubah target `Churn` menjadi angka (1/0).

In [6]:
# 1. Mengubah TotalCharges jadi angka
# errors='coerce' memaksa error (spasi kosong) berubah jadi NaN (Not a Number)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Mengisi data NaN tadi dengan 0 (asumsi pelanggan baru belum punya tagihan total)
df['TotalCharges'] = df['TotalCharges'].fillna(0)

In [7]:
# 2. Membuang kolom yang tidak perlu
# customerID itu unik tiap orang, tidak punya pola untuk dipelajari
if 'customerID' in df.columns:
    df.drop(columns=['customerID'], inplace=True)

In [8]:
# 3. Mengubah Target 'Churn' jadi Angka (Binary)
# Mesin lebih suka angka 1 (Ya) dan 0 (Tidak) daripada teks
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})

In [9]:
# Cek ulang datanya
print("--- Info Setelah Cleaning ---")
df.info()

--- Info Setelah Cleaning ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   gender            7043 non-null   object 
 1   SeniorCitizen     7043 non-null   int64  
 2   Partner           7043 non-null   object 
 3   Dependents        7043 non-null   object 
 4   tenure            7043 non-null   int64  
 5   PhoneService      7043 non-null   object 
 6   MultipleLines     7043 non-null   object 
 7   InternetService   7043 non-null   object 
 8   OnlineSecurity    7043 non-null   object 
 9   OnlineBackup      7043 non-null   object 
 10  DeviceProtection  7043 non-null   object 
 11  TechSupport       7043 non-null   object 
 12  StreamingTV       7043 non-null   object 
 13  StreamingMovies   7043 non-null   object 
 14  Contract          7043 non-null   object 
 15  PaperlessBilling  7043 non-null   object 
 16  PaymentMetho

## 3. Preprocessing (One-Hot Encoding)
Mengubah data kategori (teks) menjadi format angka biner agar bisa dibaca oleh Machine Learning.
Contoh: `Gender` (Male/Female) menjadi `Gender_Male` (1/0).


In [10]:
# MENGUBAH TEKS MENJADI ANGKA
print("Jumlah kolom sebelum encoding:", df.shape[1])
df = pd.get_dummies(df, drop_first=True)
print("Jumlah kolom setelah encoding:", df.shape[1])

# Cek hasil akhirnya
df.head()

Jumlah kolom sebelum encoding: 20
Jumlah kolom setelah encoding: 31


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,Churn,gender_Male,Partner_Yes,Dependents_Yes,PhoneService_Yes,MultipleLines_No phone service,...,StreamingTV_No internet service,StreamingTV_Yes,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,0,1,29.85,29.85,0,False,True,False,False,True,...,False,False,False,False,False,False,True,False,True,False
1,0,34,56.95,1889.5,0,True,False,False,True,False,...,False,False,False,False,True,False,False,False,False,True
2,0,2,53.85,108.15,1,True,False,False,True,False,...,False,False,False,False,False,False,True,False,False,True
3,0,45,42.3,1840.75,0,True,False,False,False,True,...,False,False,False,False,True,False,False,False,False,False
4,0,2,70.7,151.65,1,False,False,False,True,False,...,False,False,False,False,False,False,True,False,True,False


## 4. Splitting & Modeling
1. Membagi data menjadi **Train Set (80%)** untuk latihan dan **Test Set (20%)** untuk ujian.
2. Melatih model menggunakan algoritma **Random Forest Classifier**.

In [14]:
# Pembagian Data
# X = Soal Ujian (Semua kolom KECUALI Churn)
X = df.drop('Churn', axis=1)

# y = Kunci Jawaban (Hanya kolom Churn)
y = df['Churn']
# Membagi data: 80% untuk latihan (train), 20% untuk tes (test)
# random_state=42 fungsinya agar hasil acakan kita sama (konsisten)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Jumlah data latih:", X_train.shape[0])
print("Jumlah data uji:", X_test.shape[0])

Jumlah data latih: 5634
Jumlah data uji: 1409


# 5. Training & Evaluasi Model


In [15]:
# Memanggil algoritma Random Forest
model = RandomForestClassifier(random_state=42)
print("\nSedang melatih model")
model.fit(X_train, y_train)
print("Selesai!")


Sedang melatih model
Selesai!


In [16]:
# Minta model mengerjakan soal ujian (X_test)
y_pred = model.predict(X_test)

# Bandingkan jawaban model (y_pred) dengan kunci jawaban asli (y_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Akurasi Model: {accuracy * 100:.2f}%")
print("\n--- Detail Laporan ---")
print(classification_report(y_test, y_pred))

Akurasi Model: 78.50%

--- Detail Laporan ---
              precision    recall  f1-score   support

           0       0.82      0.91      0.86      1036
           1       0.64      0.44      0.52       373

    accuracy                           0.78      1409
   macro avg       0.73      0.67      0.69      1409
weighted avg       0.77      0.78      0.77      1409



## 6. Simpan & Download Model
Menyimpan model (`.pkl`) dan daftar kolom agar bisa digunakan nanti di aplikasi Streamlit (VS Code).

In [17]:
# MENYIMPAN MODEL & KOLOM

import joblib

# Simpan Model
joblib.dump(model, 'model_churn_rf.pkl')

# Simpan Nama Kolom (PENTING untuk sinkronisasi input nanti)
joblib.dump(X_train.columns, 'model_columns.pkl')

print("Model dan Kolom berhasil disimpan!")

# ==========================================
# 10. DOWNLOAD FILE KE LOCAL
# ==========================================
from google.colab import files

# Download file ke komputer kamu
files.download('model_churn_rf.pkl')
files.download('model_columns.pkl')

Model dan Kolom berhasil disimpan!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>