# Praktikum 2 - Klasifikasi Berita dengan Perceptron

**Langkah 1 - Import Library**

In [1]:
from sklearn.datasets import fetch_20newsgroups # download dataset
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import Perceptron
from sklearn.metrics import f1_score, classification_report

**Langkah 2 - Pilih Label dan Split Data**

In [2]:
categories = ['rec.sport.hockey', 'rec.sport.baseball', 'rec.autos']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))

Explanation:
1. "categories = ['rec.sport.hockey', 'rec.sport.baseball', 'rec.autos']" - This line defines a list of categories that we are interested in. In this case, the categories are 'rec.sport.hockey', 'rec.sport.baseball', and 'rec.autos'.
2. "newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))" - This line fetches the training data from the 20 Newsgroups dataset using the fetch_20newsgroups function from the sklearn.datasets module. The 'subset' parameter is set to 'train' to fetch the training data. The 'categories' parameter is set to the list of categories defined in line 1 to fetch data only from those categories. The 'remove' parameter is set to remove headers, footers, and quotes from the fetched data.
3. "newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))" - This line fetches the testing data from the 20 Newsgroups dataset using the same fetch_20newsgroups function.

By executing these lines of code, we will have the training and testing data for the specified categories from the 20 Newsgroups dataset.

**Langkah 3 - Ekstrak Fitur dan Buat Model Perceptron**

In [3]:
# Ekstrak Fitur
vectorizer = TfidfVectorizer()

# Fit fitur
X_train = vectorizer.fit_transform(newsgroups_train.data)
X_test = vectorizer.transform(newsgroups_test.data)

# Fit Model
clf = Perceptron(random_state=11)
clf.fit(X_train, newsgroups_train.target)

# Prediksi
predictions = clf.predict(X_test)
print(classification_report(newsgroups_test.target, predictions))

              precision    recall  f1-score   support

           0       0.88      0.88      0.88       396
           1       0.82      0.83      0.83       397
           2       0.88      0.87      0.87       399

    accuracy                           0.86      1192
   macro avg       0.86      0.86      0.86      1192
weighted avg       0.86      0.86      0.86      1192



Explanation:
1. "vectorizer = TfidfVectorizer()" - This line creates an instance of the TfidfVectorizer class from the sklearn.feature_extraction.text module. TfidfVectorizer is a feature extraction method that converts a collection of raw documents into a matrix of TF-IDF features.
2. "X_train = vectorizer.fit_transform(newsgroups_train.data)" - This line fits the feature extractor to the training data and transforms the training data into a matrix of TF-IDF features.
3. "X_test = vectorizer.transform(newsgroups_test.data)" - This line transforms the testing data into a matrix of TF-IDF features using the vocabulary learned from the training data. The transform method applies the learned vocabulary to transform the testing data into a matrix of TF-IDF features.
4. "clf = Perceptron(random_state=11)" - This line creates an instance of the Perceptron class from the sklearn.linear_model module. Perceptron is a linear classification algorithm that is trained using stochastic gradient descent.
5. "clf.fit(X_train, newsgroups_train.target)" - This line fits the Perceptron model to the training data. The fit method trains the model using the input features (X_train) and the target values (newsgroups_train.target).
6. "predictions = clf.predict(X_test)" - This line makes predictions on the testing data using the trained Perceptron model. The predict method applies the trained model to the input features (X_test) to predict the target values.
7. "print(classification_report(newsgroups_test.target, predictions))" - This line prints the classification report, which provides a summary of the precision, recall, F1-score, and support for each class in the target values.

By executing these lines of code, we will perform feature extraction, fit the features, fit a Perceptron model, make predictions, and print the classification report for the testing data.