# Nama  : Abdullah Azam
# NIM   : 2209106056

LINK DATASET  : [Sentiment Analysis Word Lists Dataset](https://www.kaggle.com/datasets/prajwalkanade/sentiment-analysis-word-lists-dataset)

Pada dataset ini terdapat kumpulan kata-kata dalam bahasa inggris dengan 2 class yaitu positive dan negative yang masing-masing disimpan dalam file berbeda dengan format txt

# Import Library

In [1]:
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

# Load Dataset
Membaca dataset

In [None]:
def load_words(file_path):
    with open(file_path, 'r', encoding='ISO-8859-1') as file:
        words = file.read().splitlines()
    words = [w for w in words if w and not w.startswith(';')]
    return words

positive_words = load_words("word_class/positive-words.txt")
negative_words = load_words("word_class/negative-words.txt")

# DataFrame
Mengkelompokkan data berdasarkan label Positive dan Negative untuk setiap kata pada dataset, lalu menggabungkan data yang telah berlabel menjadi satu data

In [3]:
df_pos = pd.DataFrame(positive_words, columns=["word"])
df_pos["label"] = "positive"

df_neg = pd.DataFrame(negative_words, columns=["word"])
df_neg["label"] = "negative"

df = pd.concat([df_pos, df_neg], ignore_index=True)
print("Jumlah data:", df.shape)

Jumlah data: (6789, 2)


# Preprocessing
Digunakan untuk memberikan konteks kalimat karena data hanya berupa kalimat tunggal, serta membersihkan data agar hanya tersisa huruf kecil dan spasi

In [None]:
df["sentence"] = "This is " + df["word"]

def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    return text

df["sentence"] = df["sentence"].apply(clean_text)

# Ekstraksi Fitur
TF-IDF (Term Frequency - Inverse Document Frequency) adalah metode yang memberi bobot pada kata berdasarkan:
1. Seberapa sering kata muncul di kalimat.
2. Seberapa jarang kata tersebut muncul di seluruh dataset

In [5]:
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df["sentence"])
y = df["label"]

# Split Data
Dataset dibagi menjadi 2 yaitu untuk pengujian dan test dengan pembagian seperti berikut:
1. 80% data pelatihan
2. 20% data pengujian

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# KNN Classifier
Melatih Model

In [7]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Evaluasi Model
menguji akurasi model terhadap data uji yang telah disiapkan berupa 20% dari dataset. 

In [9]:
y_pred = knn.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Akurasi:", accuracy_score(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

    negative       0.70      1.00      0.82       945
    positive       0.80      0.01      0.02       413

    accuracy                           0.70      1358
   macro avg       0.75      0.50      0.42      1358
weighted avg       0.73      0.70      0.58      1358

Akurasi: 0.6980854197349042


# Prediksi Kalimat
Memprediksi sentimen kalimat yang dimasukkan user.

Pada contoh ini kalimat 1-10 adalah kalimat negative, dan kalimat 11-20 adalah kalimat positive

In [13]:
def predict_sentiment(text):
    text = clean_text(text)
    text_vec = vectorizer.transform([text])
    prediction = knn.predict(text_vec)
    return prediction[0]

test_sentences = [
    "This is absolutely dreadful, I can't believe how bad it is",
    "I utterly despise this and everything it stands for",
    "The entire experience was a complete disaster from start to finish",
    "I'm so angry and frustrated with this situation",
    "This is the most disappointing thing I've encountered in a long time",
    "The thought of doing that fills me with dread",
    "I feel so let down and betrayed by this",
    "This is simply unacceptable and completely inadequate",
    "My patience has worn thin; I can't tolerate this anymore",
    "Everything about this has been a massive disappointment",
    "This is absolutely wonderful, it brings me so much joy",
    "I'm so thrilled and delighted with this amazing outcome",
    "The team's performance was outstanding and truly inspiring",
    "This news has filled me with immense happiness and excitement",
    "I'm so grateful and appreciative of this incredible gesture",
    "The beauty of this is simply breathtaking and stunning",
    "I feel so much love and affection for this",
    "This is absolutely perfect and exactly what I was hoping for",
    "My heart is filled with happiness and contentment",
    "Everything about this has been a truly positive and uplifting experience",
]

print("\nPrediksi Kalimat:")
for i, sent in enumerate(test_sentences, start=1):
    print(f"{i}. '{sent}' => {predict_sentiment(sent)}")


Prediksi Kalimat:
1. 'This is absolutely dreadful, I can't believe how bad it is' => negative
2. 'I utterly despise this and everything it stands for' => negative
3. 'The entire experience was a complete disaster from start to finish' => negative
4. 'I'm so angry and frustrated with this situation' => negative
5. 'This is the most disappointing thing I've encountered in a long time' => negative
6. 'The thought of doing that fills me with dread' => negative
7. 'I feel so let down and betrayed by this' => negative
8. 'This is simply unacceptable and completely inadequate' => negative
9. 'My patience has worn thin; I can't tolerate this anymore' => positive
10. 'Everything about this has been a massive disappointment' => negative
11. 'This is absolutely wonderful, it brings me so much joy' => positive
12. 'I'm so thrilled and delighted with this amazing outcome' => positive
13. 'The team's performance was outstanding and truly inspiring' => positive
14. 'This news has filled me with imme