**Instalasi Library**

In [1]:
!pip install Sastrawi

Collecting Sastrawi
  Downloading Sastrawi-1.0.1-py2.py3-none-any.whl.metadata (909 bytes)
Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.7/209.7 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: Sastrawi
Successfully installed Sastrawi-1.0.1


**Data Acquisition & Load Dataset**

In [2]:
import pandas as pd
import os

# 1. DATA ACQUISITION

dataset_path = '/kaggle/input/datasets/gevabriel/indonesian-sms-spam/sms_spam_indo.csv'


if os.path.exists(dataset_path):
    print(f"Dataset ditemukan di: {dataset_path}")
    df = pd.read_csv(dataset_path)
    
    print("\n--- 5 Data Teratas ---")
    print(df.head())
    
   
    if 'label' in df.columns:
        print("\n--- Distribusi Label ---")
        print(df['label'].value_counts())
else:
    print(f"Error: File tidak ditemukan di {dataset_path}")
    print("Pastikan nama folder dataset di panel kanan sudah sesuai.")

Dataset ditemukan di: /kaggle/input/datasets/gevabriel/indonesian-sms-spam/sms_spam_indo.csv

--- 5 Data Teratas ---
  Kategori                                              Pesan
0     spam  Plg Yth: Simcard anda mendptkan bonus poin plu...
1      ham    Iya ih ko sedih sih gtau kapan lg ke bandung :(
2      ham  Kalau mau bikin model/controller mending per a...
3      ham  Selamat nama1. Semoga selalu menempuh hidup ya...
4     spam  Tingkatkan nilai isi ulang Anda selanjutnya mi...


**Text Preprocessing**

In [3]:
# 2. PREPROCESSING
import re
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory


df.columns = df.columns.str.lower()

factory = StemmerFactory()
stemmer = factory.create_stemmer()

def preprocess_text(text):
    text = str(text).lower() 
    text = re.sub(r'[^a-z\s]', '', text) 
    text = stemmer.stem(text) 
    return text

print("\nMelakukan preprocessing (Sastrawi memakan waktu beberapa menit)...")


df['teks_bersih'] = df['pesan'].apply(preprocess_text)

print("Preprocessing selesai!")
print(df[['pesan', 'teks_bersih']].head())


Melakukan preprocessing (Sastrawi memakan waktu beberapa menit)...
Preprocessing selesai!
                                               pesan  \
0  Plg Yth: Simcard anda mendptkan bonus poin plu...   
1    Iya ih ko sedih sih gtau kapan lg ke bandung :(   
2  Kalau mau bikin model/controller mending per a...   
3  Selamat nama1. Semoga selalu menempuh hidup ya...   
4  Tingkatkan nilai isi ulang Anda selanjutnya mi...   

                                         teks_bersih  
0  plg yth simcard anda mendptkan bonus poin plus...  
1       iya ih ko sedih sih gtau kapan lg ke bandung  
2  kalau mau bikin modelcontroller mending per apa y  
3  selamat nama moga selalu tempuh hidup yang bah...  
4  tingkat nilai isi ulang anda lanjut minimal rp...  


**Feature Extraction & Splitting**

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

# 3. FEATURE EXTRACTION & SPLITTING

X = df['teks_bersih']
y = df['kategori'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

print(f"Dimensi Data Latih: {X_train_tfidf.shape}")

Dimensi Data Latih: (914, 3445)


**Modeling, Evaluation, & Export Model**

In [5]:

from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import joblib

# 4. MODELING
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

y_pred = model.predict(X_test_tfidf)

# 5. EVALUATION
print("\n=== HASIL EVALUASI MODEL ===")
print("Accuracy Score:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# 6. EXPORT
joblib.dump(model, '/kaggle/working/model_spam.pkl')
joblib.dump(vectorizer, '/kaggle/working/vectorizer_spam.pkl')
print("\nModel dan Vectorizer berhasil disimpan di direktori /kaggle/working/!")


=== HASIL EVALUASI MODEL ===
Accuracy Score: 0.9606986899563319

Classification Report:
               precision    recall  f1-score   support

         ham       0.96      0.95      0.96       111
        spam       0.96      0.97      0.96       118

    accuracy                           0.96       229
   macro avg       0.96      0.96      0.96       229
weighted avg       0.96      0.96      0.96       229


Model dan Vectorizer berhasil disimpan di direktori /kaggle/working/!
