# Fake News Detector

## Import Library

In [1]:
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import re
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler
from imblearn.pipeline import Pipeline as ImbPipeline
import os

In [2]:
files = os.listdir()

for file in files:
    print(f"{file}")

.git
Fake.csv
Progress Laporan STKI Kelompok 2.docx
Progress Laporan STKI Kelompok 2.pdf
STKI_Fake_News_Detector.ipynb
True.csv


## Import Data

In [3]:
true = pd.read_csv('True.csv', engine='python', encoding='utf-8', on_bad_lines='skip')
fake = pd.read_csv('Fake.csv', engine='python', encoding='utf-8', on_bad_lines='skip')

In [4]:
true

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"
...,...,...,...,...
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017"
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017"
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017"
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017"


In [5]:
fake

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"
...,...,...,...,...
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016"
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016"
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016"
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016"


In [6]:
true['label'] = 1
fake['label'] = 0

# Data Preprocessing

## Data Integration

In [7]:
news = pd.concat([fake, true], axis=0)

In [8]:
news.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [9]:
news.tail()

Unnamed: 0,title,text,subject,date,label
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017",1
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017",1
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017",1
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017",1
21416,Indonesia to buy $1.14 billion worth of Russia...,JAKARTA (Reuters) - Indonesia will buy 11 Sukh...,worldnews,"August 22, 2017",1


## Data Cleaning

### Checking Null Values

In [10]:
news.isnull().sum()

title      0
text       0
subject    0
date       0
label      0
dtype: int64

### Dropping Unnecessary Column

In [11]:
news = news.drop(['title', 'subject', 'date'], axis=1)

In [12]:
news

Unnamed: 0,text,label
0,Donald Trump just couldn t wish all Americans ...,0
1,House Intelligence Committee Chairman Devin Nu...,0
2,"On Friday, it was revealed that former Milwauk...",0
3,"On Christmas day, Donald Trump announced that ...",0
4,Pope Francis used his annual Christmas Day mes...,0
...,...,...
21412,BRUSSELS (Reuters) - NATO allies on Tuesday we...,1
21413,"LONDON (Reuters) - LexisNexis, a provider of l...",1
21414,MINSK (Reuters) - In the shadow of disused Sov...,1
21415,MOSCOW (Reuters) - Vatican Secretary of State ...,1


### Scramble Data

In [13]:
news = news.sample(frac=1)
news.reset_index(inplace=True)
news.drop(['index'], axis = 1, inplace=True)

In [14]:
news

Unnamed: 0,text,label
0,"MANCHESTER, England (Reuters) - Britain is wor...",1
1,WASHINGTON (Reuters) - The U.S. State Departme...,1
2,We ve already discussed Barack Obama s many ac...,0
3,WASHINGTON President Trump said on Thursday ...,0
4,Just another nice immigrant family trying to a...,0
...,...,...
44893,In an outdoor ceremony to celebrate the super ...,0
44894,21st Century Wire says One TV personality Tru...,0
44895,SEOUL (Reuters) - North Korean leader Kim Jong...,1
44896,"On Wednesday, the GOP in an effort to deflec...",0


### WordOPT

In [15]:
def wordopt(text):
    # Convert into lowercase
    text = text.lower()

    # Remove URLs
    text = re.sub(r'https?://\S+|\www\.\S+', '', text)

    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)

    # Remove punctuation
    text = re.sub(r'[^\w\s]', '', text)

    # Remove digits
    text = re.sub(r'\d', '', text)

    # Remove newline characters
    text = re.sub(r'\n', ' ', text)

    return text


In [16]:
news['text'] = news['text'].apply(wordopt)

In [17]:
news['text']

0        manchester england reuters  britain is worried...
1        washington reuters  the us state department on...
2        we ve already discussed barack obama s many ac...
3        washington   president trump said on thursday ...
4        just another nice immigrant family trying to a...
                               ...                        
44893    in an outdoor ceremony to celebrate the super ...
44894     st century wire says one tv personality trump...
44895    seoul reuters  north korean leader kim jong un...
44896    on wednesday the gop   in an effort to deflect...
44897    the la times reported yesterday that hillary c...
Name: text, Length: 44898, dtype: object

## Stopwords and Stemming

In [18]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Kalea\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [19]:
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()

In [20]:
def preprocess_text(text):
    words = text.split()

    # Hapus stopwords dan stem kata
    processed_words = [stemmer.stem(word) for word in words if word.lower() not in stop_words]

    # Gabungkan kembali kata yang telah diproses
    return ' '.join(processed_words)

In [21]:
X_stem = [preprocess_text(sentence) for sentence in news['text']]

## Split Data

In [22]:
x = news['text']
y = news['label']

### Split Data: TF-IDF

In [23]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=10)

### Split Data: TF IDF & Stemming & Stopwords

In [24]:
x_train_stem, x_test_stem, y_train_stem, y_test_stem = train_test_split(X_stem, y, test_size=0.3, random_state=10)

# Model

## Logistic Regression Model

### Model Logistic Regression

In [25]:
tf_idf_LR = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('log_reg', LogisticRegression())
])

stem_LR = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('log_reg', LogisticRegression())
]) 

full_LR = ImbPipeline([
    ('tfidf', TfidfVectorizer()),
    ('smote', SMOTE()),
    ('scaler', StandardScaler(with_mean=False)),
    ('log_reg', LogisticRegression(class_weight='balanced'))
])

In [26]:
tf_idf_LR.fit(x_train, y_train)
stem_LR.fit(x_train_stem, y_train_stem)
full_LR.fit(x_train_stem, y_train_stem)

In [27]:
full_LR.fit(x_train_stem, y_train_stem)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Model Evaluation

#### TF-IDF Score

In [28]:
tf_idf_train_pred_lr = tf_idf_LR.predict(x_train)
tf_idf_test_pred_lr = tf_idf_LR.predict(x_test)

In [29]:
tf_idf_train_score_LR = tf_idf_LR.score(x_train, y_train)
tf_idf_test_score_LR = tf_idf_LR.score(x_test, y_test)

In [30]:
print(classification_report(y_train, tf_idf_train_pred_lr))
print(classification_report(y_test, tf_idf_test_pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6998
           1       0.99      0.99      0.99      6472

    accuracy                           0.99     13470
   macro avg       0.99      0.99      0.99     13470
weighted avg       0.99      0.99      0.99     13470



In [31]:
print(f"Logistic Regression TF IDF Train Score: {tf_idf_train_score_LR}")
print(f"Logistic Regression TF IDF Test Score: {tf_idf_test_score_LR}")

Logistic Regression TF IDF Train Score: 0.9932226040473463
Logistic Regression TF IDF Test Score: 0.9874536005939124


#### TF-IDF with Stemming and Stopwords Score

In [32]:
stem_train_pred_lr = stem_LR.predict(x_train_stem)
stem_test_pred_lr = stem_LR.predict(x_test_stem)

In [33]:
stem_train_score_LR = stem_LR.score(x_train_stem, y_train_stem)
stem_test_score_LR = stem_LR.score(x_test_stem, y_test_stem)

In [34]:
print(classification_report(y_train_stem, stem_train_pred_lr))
print(classification_report(y_test_stem, stem_test_pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.98      0.98      6998
           1       0.98      0.98      0.98      6472

    accuracy                           0.98     13470
   macro avg       0.98      0.98      0.98     13470
weighted avg       0.98      0.98      0.98     13470



In [35]:
print(f"Stemming, Stopwords, TF IDF Train Score: {stem_train_score_LR}")
print(f"Stemming, Stopwords, TF IDF Test Score: {stem_test_score_LR}")

Stemming, Stopwords, TF IDF Train Score: 0.9915998472699503
Stemming, Stopwords, TF IDF Test Score: 0.9825538233110617


#### TF-IDF with Stemming, Stopwords, Class Balancing, and Standard Scaler Score

In [36]:
full_train_pred_lr = full_LR.predict(x_train_stem)
full_test_pred_lr = full_LR.predict(x_test_stem)

In [37]:
full_train_score_LR = full_LR.score(x_train_stem, y_train_stem)
full_test_score_LR = full_LR.score(x_test_stem, y_test_stem)

In [38]:
print(classification_report(y_train_stem, full_train_pred_lr))
print(classification_report(y_test_stem, full_test_pred_lr))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       0.99      0.95      0.97      6998
           1       0.95      0.99      0.97      6472

    accuracy                           0.97     13470
   macro avg       0.97      0.97      0.97     13470
weighted avg       0.97      0.97      0.97     13470



In [39]:
print(f"Stemming, Stopwords, Class Balancing, and Standard Scaler Train Score: {full_train_score_LR}")
print(f"Stemming, Stopwords, Class Balancing, and Standard Scaler Test Score: {full_test_score_LR}")

Stemming, Stopwords, Class Balancing, and Standard Scaler Train Score: 0.9999681812396589
Stemming, Stopwords, Class Balancing, and Standard Scaler Test Score: 0.9691907943578322


#### Model Comparison (Logistic Regression)

In [40]:
# Model 1: Logistic Regression TF IDF Scores
print("Model 1: Logistic Regression TF IDF Scores Scores:")
print(f"Train Score: {tf_idf_train_score_LR}")
print(f"Test Score: {tf_idf_test_score_LR}")

# Model 2: Logistic Regression Stemming, Stopwords, TF IDF Scores
print("\nModel 2: Logistic Regression Stemming, Stopwords, TF IDF Scores:")
print(f"Train Score: {stem_train_score_LR}")
print(f"Test Score: {stem_test_score_LR}")

# Model 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores
print("\nModel 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores:")
print(f"Train Score: {full_train_score_LR}")
print(f"Test Score: {full_test_score_LR}")

print("")

# Percentage Differences between Model 1 and Model 2 on train and test data
print("Percentage Differences between Model 1 and Model 2:")
train_score_diff_1_2 = tf_idf_train_score_LR - stem_train_score_LR
test_score_diff_1_2 = tf_idf_test_score_LR - stem_test_score_LR
train_score_percent_diff_1_2 = (train_score_diff_1_2 / tf_idf_train_score_LR) * 100
test_score_percent_diff_1_2 = (test_score_diff_1_2 / tf_idf_test_score_LR) * 100
print(f"Train Data: {train_score_diff_1_2} = {train_score_percent_diff_1_2:.2f}%")
print(f"Test Data: {test_score_diff_1_2} = {test_score_percent_diff_1_2:.2f}%")

# Percentage Differences between Model 1 and Model 3 on train and test data
print("\nPercentage Differences between Model 1 and Model 3:")
train_score_diff_1_3 = tf_idf_train_score_LR - full_train_score_LR
test_score_diff_1_3 = tf_idf_test_score_LR - full_test_score_LR
train_score_percent_diff_1_3 = (train_score_diff_1_3 / tf_idf_train_score_LR) * 100
test_score_percent_diff_1_3 = (test_score_diff_1_3 / tf_idf_test_score_LR) * 100
print(f"Train Data: {train_score_diff_1_3} = {train_score_percent_diff_1_3:.2f}%")
print(f"Test Data: {test_score_diff_1_3} = {test_score_percent_diff_1_3:.2f}%")

# Percentage Differences between Model 2 and Model 3 on train and test data
print("\nPercentage Differences between Model 2 and Model 3:")
train_score_diff_2_3 = stem_train_score_LR - full_train_score_LR
test_score_diff_2_3 = stem_test_score_LR - full_test_score_LR
train_score_percent_diff_2_3 = (train_score_diff_2_3 / stem_train_score_LR) * 100
test_score_percent_diff_2_3 = (test_score_diff_2_3 / stem_test_score_LR) * 100
print(f"Train Data: {train_score_diff_2_3} = {train_score_percent_diff_2_3:.2f}%")
print(f"Test Data: {test_score_diff_2_3} = {test_score_percent_diff_2_3:.2f}%")


Model 1: Logistic Regression TF IDF Scores Scores:
Train Score: 0.9932226040473463
Test Score: 0.9874536005939124

Model 2: Logistic Regression Stemming, Stopwords, TF IDF Scores:
Train Score: 0.9915998472699503
Test Score: 0.9825538233110617

Model 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores:
Train Score: 0.9999681812396589
Test Score: 0.9691907943578322

Percentage Differences between Model 1 and Model 2:
Train Data: 0.001622756777395984 = 0.16%
Test Data: 0.004899777282850737 = 0.50%

Percentage Differences between Model 1 and Model 3:
Train Data: -0.006745577192312613 = -0.68%
Test Data: 0.01826280623608023 = 1.85%

Percentage Differences between Model 2 and Model 3:
Train Data: -0.008368333969708597 = -0.84%
Test Data: 0.013363028953229494 = 1.36%


### Logistic Regression Hyperparameter Optimization

#### GridSearchCV

In [40]:
# Definisikan parameter grid
LR_param_grid = {
    'tfidf__max_df': [0.75, 1.0],
    'tfidf__ngram_range': [(1, 1), (1, 2)],
    'log_reg__C': [0.1, 1.0, 10.0],
    'log_reg__penalty': ['l1', 'l2'],
    'log_reg__solver': ['liblinear', 'saga'],
    'log_reg__max_iter': [1000, 5000]
}


In [41]:
# Definisikan GridSearchCV
tf_idf_LR_random_search = GridSearchCV(estimator=tf_idf_LR, param_grid=LR_param_grid, cv=5, scoring='accuracy', n_jobs=4)
tf_idf_LR_random_search.fit(x_train, y_train)

In [42]:
# Definisikan GridSearchCV
stem_LR_random_search = GridSearchCV(estimator=stem_LR, param_grid=LR_param_grid, cv=5, scoring='accuracy', n_jobs=4)
stem_LR_random_search.fit(x_train_stem, y_train_stem)

In [42]:
# Definisikan GridSearchCV
full_LR_random_search = GridSearchCV(estimator=full_LR, param_grid=LR_param_grid, cv=5, scoring='accuracy', n_jobs=4)
full_LR_random_search.fit(x_train_stem, y_train_stem)



In [45]:
# Hasil terbaik
print(f"Best parameters for TF IDF Logistic Regression: {tf_idf_LR_random_search.best_params_}")
print(f"Best parameters for Stemming and Stopwords Logistic Regression: {stem_LR_random_search.best_params_}")
print(f"Best parameters for Stemming, Stopwords, Class Balancing, and Standard Scaler Logistic Regression: {full_LR_random_search.best_params_}")

Best parameters for TF IDF Logistic Regression: {'log_reg__C': 10.0, 'log_reg__max_iter': 5000, 'log_reg__penalty': 'l1', 'log_reg__solver': 'liblinear', 'tfidf__max_df': 0.75, 'tfidf__ngram_range': (1, 2)}
Best parameters for Stemming and Stopwords Logistic Regression: {'log_reg__C': 10.0, 'log_reg__max_iter': 1000, 'log_reg__penalty': 'l1', 'log_reg__solver': 'liblinear', 'tfidf__max_df': 1.0, 'tfidf__ngram_range': (1, 2)}
Best parameters for Stemming, Stopwords, Class Balancing, and Standard Scaler Logistic Regression: {'log_reg__C': 10.0, 'log_reg__max_iter': 1000, 'log_reg__penalty': 'l1', 'log_reg__solver': 'liblinear', 'tfidf__max_df': 1.0, 'tfidf__ngram_range': (1, 2)}


### Logistic Regression Model Setelah Optimisasi (GridSearchCV)

In [43]:
tf_idf_LR.set_params(
    tfidf__max_df=0.75,
    tfidf__ngram_range=(1, 2),
    log_reg__C=10.0,
    log_reg__max_iter=5000,
    log_reg__penalty='l1',
    log_reg__solver='liblinear'
)
optimized_tf_idf_LR = tf_idf_LR

stem_LR.set_params(
    tfidf__max_df=1.0,
    tfidf__ngram_range=(1, 2),
    log_reg__C=10.0,
    log_reg__max_iter=1000,
    log_reg__penalty='l1',
    log_reg__solver='liblinear'
)

optimized_stem_LR = stem_LR

full_LR.set_params(
    tfidf__max_df=1.0,
    tfidf__ngram_range=(1, 2),
    log_reg__C=10.0,
    log_reg__max_iter=1000,
    log_reg__penalty='l1',
    log_reg__solver='liblinear'
)

optimized_full_LR = full_LR

# optimized_tf_idf_LR = tf_idf_LR_random_search.best_estimator_
# optimized_stem_LR = stem_LR_random_search.best_estimator_
# optimized_full_LR = full_LR_random_search.best_estimator_

#### TF IDF

In [44]:
optimized_tf_idf_train_pred_LR = optimized_tf_idf_LR.predict(x_train)
optimized_tf_idf_test_pred_LR = optimized_tf_idf_LR.predict(x_test)

In [45]:
optimized_tf_idf_train_score_LR = optimized_tf_idf_LR.score(x_train, y_train)
optimized_tf_idf_test_score_LR = optimized_tf_idf_LR.score(x_test, y_test)

In [46]:
print(classification_report(y_train, optimized_tf_idf_train_pred_LR))
print(classification_report(y_test, optimized_tf_idf_test_pred_LR))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6998
           1       0.99      0.99      0.99      6472

    accuracy                           0.99     13470
   macro avg       0.99      0.99      0.99     13470
weighted avg       0.99      0.99      0.99     13470



In [47]:
print(f"Optimized Logistic Regression TF IDF Train Score: {optimized_tf_idf_train_score_LR}")
print(f"Optimized Logistic Regression TF IDF Test Score: {optimized_tf_idf_test_score_LR}")

Optimized Logistic Regression TF IDF Train Score: 0.9932226040473463
Optimized Logistic Regression TF IDF Test Score: 0.9874536005939124


#### TF IDF with Stemming and Stopwords

In [48]:
optimized_stem_train_pred_LR = optimized_stem_LR.predict(x_train_stem)
optimized_stem_test_pred_LR = optimized_stem_LR.predict(x_test_stem)

In [49]:
optimized_stem_train_score_LR = optimized_stem_LR.score(x_train_stem, y_train_stem)
optimized_stem_test_score_LR = optimized_stem_LR.score(x_test_stem, y_test_stem)

In [50]:
print(classification_report(y_train_stem, optimized_stem_train_pred_LR))
print(classification_report(y_test_stem, optimized_stem_test_pred_LR))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.98      0.98      6998
           1       0.98      0.98      0.98      6472

    accuracy                           0.98     13470
   macro avg       0.98      0.98      0.98     13470
weighted avg       0.98      0.98      0.98     13470



In [51]:
print(f"Optimized Logistic Regression Stemming, Stopwords, TF IDF Train Score: {optimized_stem_train_score_LR}")
print(f"Optimized Logistic Regression Stemming, Stopwords, TF IDF Test Score: {optimized_stem_test_score_LR}")

Optimized Logistic Regression Stemming, Stopwords, TF IDF Train Score: 0.9915998472699503
Optimized Logistic Regression Stemming, Stopwords, TF IDF Test Score: 0.9825538233110617


#### TF-IDF with Stemming, Stopwords, Class Balancing, and Standard Scaler

In [52]:
optimized_full_train_pred_LR = optimized_full_LR.predict(x_train_stem)
optimized_full_test_pred_LR = optimized_full_LR.predict(x_test_stem)

In [53]:
optimized_full_train_score_LR = optimized_full_LR.score(x_train_stem, y_train_stem)
optimized_full_test_score_LR = optimized_full_LR.score(x_test_stem, y_test_stem)


In [54]:
print(classification_report(y_train_stem, optimized_stem_train_pred_LR))
print(classification_report(y_test_stem, optimized_stem_test_pred_LR))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.98      0.98      6998
           1       0.98      0.98      0.98      6472

    accuracy                           0.98     13470
   macro avg       0.98      0.98      0.98     13470
weighted avg       0.98      0.98      0.98     13470



In [55]:
print(f"Optimized Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: {optimized_full_train_score_LR}")
print(f"Optimized Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: {optimized_full_test_score_LR}")

Optimized Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: 0.9999681812396589
Optimized Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: 0.9691907943578322


### Score Differential

#### TF IDF

In [56]:
# Train vs Train Comparison for TF-IDF Logistic Regression
print("Train vs Train Comparison for TF-IDF Logistic Regression:")
print(f"TF-IDF Train Score vs Optimized Train Score: {tf_idf_train_score_LR} vs {optimized_tf_idf_train_score_LR}")
print("")

# Test vs Test Comparison for TF-IDF Logistic Regression
print("Test vs Test Comparison for TF-IDF Logistic Regression:")
print(f"TF-IDF Test Score vs Optimized Test Score: {tf_idf_test_score_LR} vs {optimized_tf_idf_test_score_LR}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): TF-IDF vs Optimized: {((optimized_tf_idf_train_score_LR - tf_idf_train_score_LR) / tf_idf_train_score_LR) * 100:.2f}%")
print(f"Percentage Difference (Test Score): TF-IDF vs Optimized: {((optimized_tf_idf_test_score_LR - tf_idf_test_score_LR) / tf_idf_test_score_LR) * 100:.2f}%")


Train vs Train Comparison for TF-IDF Logistic Regression:
TF-IDF Train Score vs Optimized Train Score: 0.9932226040473463 vs 0.9932226040473463

Test vs Test Comparison for TF-IDF Logistic Regression:
TF-IDF Test Score vs Optimized Test Score: 0.9874536005939124 vs 0.9874536005939124

Percentage Differences:
Percentage Difference (Train Score): TF-IDF vs Optimized: 0.00%
Percentage Difference (Test Score): TF-IDF vs Optimized: 0.00%


#### TF IDF with Stemming and Stopwords

In [57]:
# Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:")
print(f"Stemming, Stopwords, TF-IDF Train Score vs Optimized Train Score: {stem_train_score_LR} vs {optimized_stem_train_score_LR}")
print("")

# Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:")
print(f"Stemming, Stopwords, TF-IDF Test Score vs Optimized Test Score: {stem_test_score_LR} vs {optimized_stem_test_score_LR}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): Stemming, Stopwords, TF-IDF vs Optimized: {((optimized_stem_train_score_LR - stem_train_score_LR) / stem_train_score_LR) * 100:.2f}%")
print(f"Percentage Difference (Test Score): Stemming, Stopwords, TF-IDF vs Optimized: {((optimized_stem_test_score_LR - stem_test_score_LR) / stem_test_score_LR) * 100:.2f}%")


Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:
Stemming, Stopwords, TF-IDF Train Score vs Optimized Train Score: 0.9915998472699503 vs 0.9915998472699503

Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:
Stemming, Stopwords, TF-IDF Test Score vs Optimized Test Score: 0.9825538233110617 vs 0.9825538233110617

Percentage Differences:
Percentage Difference (Train Score): Stemming, Stopwords, TF-IDF vs Optimized: 0.00%
Percentage Difference (Test Score): Stemming, Stopwords, TF-IDF vs Optimized: 0.00%


#### Optimized Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler

In [58]:
# Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Train vs Train Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:")
print(f"Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score vs Optimized Train Score: {full_train_score_LR} vs {optimized_full_train_score_LR}")
print("")

# Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Test vs Test Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:")
print(f"Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: {full_test_score_LR} vs {optimized_full_test_score_LR}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: {((optimized_full_train_score_LR - full_train_score_LR) / full_train_score_LR) * 100:.2f}%")
print(f"Percentage Difference (Test Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: {((optimized_full_test_score_LR - full_test_score_LR) / full_test_score_LR) * 100:.2f}%")


Train vs Train Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:
Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score vs Optimized Train Score: 0.9999681812396589 vs 0.9999681812396589

Test vs Test Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:
Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: 0.9691907943578322 vs 0.9691907943578322

Percentage Differences:
Percentage Difference (Train Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: 0.00%
Percentage Difference (Test Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: 0.00%


#### Optimized Model Comparison

In [59]:
# Model 1: Optimized Logistic Regression TF IDF Scores
print("Model TF IDF Scores:")
print(f"Train Score: {optimized_tf_idf_train_score_LR}")
print(f"Test Score: {optimized_tf_idf_test_score_LR}")

# Model 2: Optimized Logistic Regression Stemming, Stopwords, TF IDF Scores
print("\nStem & Stopwords Model Scores:")
print(f"Train Score: {optimized_stem_train_score_LR}")
print(f"Test Score: {optimized_stem_test_score_LR}")

# Model 3: Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores
print("\nFull Model Scores:")
print(f"Train Score: {optimized_full_train_score_LR}")
print(f"Test Score: {optimized_full_test_score_LR}")

print("")

# Percentage Differences between Model 1 and Model 2 on train and test data
print("Percentage Differences between Model TF IDF and Stem & Stopwords Model:")
train_score_diff_1_2 = optimized_tf_idf_train_score_LR - optimized_stem_train_score_LR
test_score_diff_1_2 = optimized_tf_idf_test_score_LR - optimized_stem_test_score_LR
train_score_percent_diff_1_2 = (train_score_diff_1_2 / optimized_tf_idf_train_score_LR) * 100
test_score_percent_diff_1_2 = (test_score_diff_1_2 / optimized_tf_idf_test_score_LR) * 100
print(f"Train Data: {train_score_diff_1_2} = {train_score_percent_diff_1_2:.2f}%")
print(f"Test Data: {test_score_diff_1_2} = {test_score_percent_diff_1_2:.2f}%")

# Percentage Differences between Model 1 and Model 3 on train and test data
print("\nPercentage Differences between Model TF IDF and Full Model:")
train_score_diff_1_3 = optimized_tf_idf_train_score_LR - optimized_full_train_score_LR
test_score_diff_1_3 = optimized_tf_idf_test_score_LR - optimized_full_test_score_LR
train_score_percent_diff_1_3 = (train_score_diff_1_3 / optimized_tf_idf_train_score_LR) * 100
test_score_percent_diff_1_3 = (test_score_diff_1_3 / optimized_tf_idf_test_score_LR) * 100
print(f"Train Data: {train_score_diff_1_3} = {train_score_percent_diff_1_3:.2f}%")
print(f"Test Data: {test_score_diff_1_3} = {test_score_percent_diff_1_3:.2f}%")

# Percentage Differences between Model 2 and Model 3 on train and test data
print("\nPercentage Differences between Stem & Stopwords Model and Full Model:")
train_score_diff_2_3 = optimized_stem_train_score_LR - optimized_full_train_score_LR
test_score_diff_2_3 = optimized_stem_test_score_LR - optimized_full_test_score_LR
train_score_percent_diff_2_3 = (train_score_diff_2_3 / optimized_stem_train_score_LR) * 100
test_score_percent_diff_2_3 = (test_score_diff_2_3 / optimized_stem_test_score_LR) * 100
print(f"Train Data: {train_score_diff_2_3} = {train_score_percent_diff_2_3:.2f}%")
print(f"Test Data: {test_score_diff_2_3} = {test_score_percent_diff_2_3:.2f}%")

Model TF IDF Scores:
Train Score: 0.9932226040473463
Test Score: 0.9874536005939124

Stem & Stopwords Model Scores:
Train Score: 0.9915998472699503
Test Score: 0.9825538233110617

Full Model Scores:
Train Score: 0.9999681812396589
Test Score: 0.9691907943578322

Percentage Differences between Model TF IDF and Stem & Stopwords Model:
Train Data: 0.001622756777395984 = 0.16%
Test Data: 0.004899777282850737 = 0.50%

Percentage Differences between Model TF IDF and Full Model:
Train Data: -0.006745577192312613 = -0.68%
Test Data: 0.01826280623608023 = 1.85%

Percentage Differences between Stem & Stopwords Model and Full Model:
Train Data: -0.008368333969708597 = -0.84%
Test Data: 0.013363028953229494 = 1.36%
Model Ranking 


### Logistic Regression Models Ranking

In [181]:
models_train_LR = [
    ('TF IDF Model', tf_idf_train_score_LR, tf_idf_LR),
    ('Stem & Stopwords Model', stem_train_score_LR, stem_LR),
    ('Full Model', full_train_score_LR, full_LR),
    ('Optimized TF IDF Model', optimized_tf_idf_train_score_LR, optimized_tf_idf_LR),
    ('Optimized Stem & Stopwords Model', optimized_stem_train_score_LR, optimized_stem_LR),
    ('Optimized Full Model', optimized_full_train_score_LR, optimized_full_LR)
]

models_test_LR = [
    ('TF IDF Model', tf_idf_test_score_LR, tf_idf_LR),
    ('Stem & Stopwords Model', stem_test_score_LR, stem_LR),
    ('Full Model', full_test_score_LR, full_LR),
    ('Optimized TF IDF Model', optimized_tf_idf_test_score_LR, optimized_tf_idf_LR),
    ('Optimized Stem & Stopwords Model', optimized_stem_test_score_LR, optimized_stem_LR),
    ('Optimized Full Model', optimized_full_test_score_LR, optimized_full_LR)
]

models_train_sorted = sorted(models_train_LR, key=lambda x: x[1], reverse=True)
models_test_sorted = sorted(models_test_LR, key=lambda x: x[1], reverse=True)

print("Model Rankings based on Train Accuracy:")
for rank, (desc, score, model) in enumerate(models_train_sorted, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")

print("\nModel Rankings based on Test Accuracy:")
for rank, (desc, score, model) in enumerate(models_test_sorted, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")

Model Rankings based on Train Accuracy:
1. Full Model with accuracy 0.99997
2. Optimized Full Model with accuracy 0.99997
3. TF IDF Model with accuracy 0.99322
4. Optimized TF IDF Model with accuracy 0.99322
5. Stem & Stopwords Model with accuracy 0.99160
6. Optimized Stem & Stopwords Model with accuracy 0.99160

Model Rankings based on Test Accuracy:
1. TF IDF Model with accuracy 0.98745
2. Optimized TF IDF Model with accuracy 0.98745
3. Stem & Stopwords Model with accuracy 0.98255
4. Optimized Stem & Stopwords Model with accuracy 0.98255
5. Full Model with accuracy 0.96919
6. Optimized Full Model with accuracy 0.96919


## Pembuatan Model Decision Tree Classifier

### Model Decision Tree Classifier

In [61]:
tf_idf_DTC = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('dtc', DecisionTreeClassifier())
])

stem_DTC = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('dtc', DecisionTreeClassifier())
])

full_DTC = ImbPipeline([
    ('tfidf', TfidfVectorizer()),
    ('smote', SMOTE()),
    ('dtc', DecisionTreeClassifier(class_weight='balanced'))
])

In [62]:
tf_idf_DTC.fit(x_train, y_train)
stem_DTC.fit(x_train_stem, y_train_stem)
full_DTC.fit(x_train_stem, y_train_stem)

### Model Evaluation

#### TF IDF

In [63]:
tf_idf_train_pred_DTC = tf_idf_DTC.predict(x_train)
tf_idf_test_pred_DTC = tf_idf_DTC.predict(x_test)

In [64]:
tf_idf_train_score_DTC = tf_idf_DTC.score(x_train, y_train)
tf_idf_test_score_DTC = tf_idf_DTC.score(x_test, y_test)

In [65]:
print(classification_report(y_train, tf_idf_train_pred_DTC))
print(classification_report(y_test, tf_idf_test_pred_DTC))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6998
           1       1.00      1.00      1.00      6472

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



In [66]:
print(f"Decision Tree Classifier TF IDF Train Score: {tf_idf_train_score_DTC}")
print(f"Decision Tree Classifier TF IDF Test Score: {tf_idf_test_score_DTC}")

Decision Tree Classifier TF IDF Train Score: 0.9999681812396589
Decision Tree Classifier TF IDF Test Score: 0.996362286562732


#### TF IDF with Stemming and Stopwords Score

In [67]:
stem_train_pred_DTC = stem_DTC.predict(x_train_stem)
stem_test_pred_DTC = stem_DTC.predict(x_test_stem)

In [68]:
stem_train_score_DTC = stem_DTC.score(x_train_stem, y_train_stem)
stem_test_score_DTC = stem_DTC.score(x_test_stem, y_test_stem)

In [69]:
print(classification_report(y_train_stem, stem_train_pred_DTC))
print(classification_report(y_test_stem, stem_test_pred_DTC))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6998
           1       1.00      1.00      1.00      6472

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



In [70]:
print(f"Decision Tree Classifier Stemming, Stopwords, TF IDF Train Score: {stem_train_score_DTC}")
print(f"Decision Tree Classifier Stemming, Stopwords, TF IDF Test Score: {stem_test_score_DTC}")

Decision Tree Classifier Stemming, Stopwords, TF IDF Train Score: 0.9999681812396589
Decision Tree Classifier Stemming, Stopwords, TF IDF Test Score: 0.996362286562732


#### Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Score

In [71]:
full_train_pred_DTC = full_DTC.predict(x_train)
full_test_pred_DTC = full_DTC.predict(x_test)

In [72]:
full_train_score_DTC = full_DTC.score(x_train_stem, y_train_stem)
full_test_score_DTC = full_DTC.score(x_test_stem, y_test_stem)

In [73]:
print(classification_report(y_train_stem, stem_train_pred_DTC))
print(classification_report(y_test_stem, stem_test_pred_DTC))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6998
           1       1.00      1.00      1.00      6472

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



In [74]:

print(f"Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: {full_train_score_DTC}")
print(f"Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: {full_test_score_DTC}")

Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: 0.9999681812396589
Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: 0.9962880475129918


#### Model Comparison

In [75]:
# Model 1: Decision Tree Classifier TF IDF Scores
print("Model 1: Decision Tree Classifier TF IDF Scores Scores:")
print(f"Train Score: {tf_idf_train_score_DTC}")
print(f"Test Score: {tf_idf_test_score_DTC}")

# Model 2: Decision Tree Classifier Stemming, Stopwords, TF IDF Scores
print("\nModel 2: Decision Tree Classifier Stemming, Stopwords, TF IDF Scores:")
print(f"Train Score: {stem_train_score_DTC}")
print(f"Test Score: {stem_test_score_DTC}")

# Model 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores
print("\nModel 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores:")
print(f"Train Score: {full_train_score_DTC}")
print(f"Test Score: {full_test_score_DTC}")

print("")

# Percentage Differences between Model 1 and Model 2 on train and test data
print("Percentage Differences between Model 1 and Model 2:")
train_score_diff_1_2 = tf_idf_train_score_DTC - stem_train_score_DTC
test_score_diff_1_2 = tf_idf_test_score_DTC - stem_test_score_DTC
train_score_percent_diff_1_2 = (train_score_diff_1_2 / tf_idf_train_score_DTC) * 100
test_score_percent_diff_1_2 = (test_score_diff_1_2 / tf_idf_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_1_2} = {train_score_percent_diff_1_2:.2f}%")
print(f"Test Data: {test_score_diff_1_2} = {test_score_percent_diff_1_2:.2f}%")

# Percentage Differences between Model 1 and Model 3 on train and test data
print("\nPercentage Differences between Model 1 and Model 3:")
train_score_diff_1_3 = tf_idf_train_score_DTC - full_train_score_DTC
test_score_diff_1_3 = tf_idf_test_score_DTC - full_test_score_DTC
train_score_percent_diff_1_3 = (train_score_diff_1_3 / tf_idf_train_score_DTC) * 100
test_score_percent_diff_1_3 = (test_score_diff_1_3 / tf_idf_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_1_3} = {train_score_percent_diff_1_3:.2f}%")
print(f"Test Data: {test_score_diff_1_3} = {test_score_percent_diff_1_3:.2f}%")

# Percentage Differences between Model 2 and Model 3 on train and test data
print("\nPercentage Differences between Model 2 and Model 3:")
train_score_diff_2_3 = stem_train_score_DTC - full_train_score_DTC
test_score_diff_2_3 = stem_test_score_DTC - full_test_score_DTC
train_score_percent_diff_2_3 = (train_score_diff_2_3 / stem_train_score_DTC) * 100
test_score_percent_diff_2_3 = (test_score_diff_2_3 / stem_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_2_3} = {train_score_percent_diff_2_3:.2f}%")
print(f"Test Data: {test_score_diff_2_3} = {test_score_percent_diff_2_3:.2f}%")


Model 1: Decision Tree Classifier TF IDF Scores Scores:
Train Score: 0.9999681812396589
Test Score: 0.996362286562732

Model 2: Decision Tree Classifier Stemming, Stopwords, TF IDF Scores:
Train Score: 0.9999681812396589
Test Score: 0.996362286562732

Model 3: Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores:
Train Score: 0.9999681812396589
Test Score: 0.9962880475129918

Percentage Differences between Model 1 and Model 2:
Train Data: 0.0 = 0.00%
Test Data: 0.0 = 0.00%

Percentage Differences between Model 1 and Model 3:
Train Data: 0.0 = 0.00%
Test Data: 7.42390497401324e-05 = 0.01%

Percentage Differences between Model 2 and Model 3:
Train Data: 0.0 = 0.00%
Test Data: 7.42390497401324e-05 = 0.01%


### Decision Tree Classifier Hyperparameter Optimization

In [90]:
# Definisikan parameter grid
param_grid_DTC = {
    'dtc__max_depth': [None, 5, 10, 20, 30, 40, 50],
    'dtc__min_samples_split': [2, 5, 10, 15, 20, 25, 30],
    'dtc__min_samples_leaf': [1, 2, 3, 4, 5, 6, 7]
}


In [92]:
# Definisikan GridSearchCV
tf_idf_grid_search_DTC = GridSearchCV(estimator=tf_idf_DTC, param_grid=param_grid_DTC, cv=5, scoring='accuracy', n_jobs=4)

# Fit model
tf_idf_grid_search_DTC.fit(x_train, y_train)

In [93]:
# Definisikan GridSearchCV
stem_grid_search_DTC = GridSearchCV(estimator=stem_DTC, param_grid=param_grid_DTC, cv=5, scoring='accuracy', n_jobs=4)

# Fit model
stem_grid_search_DTC.fit(x_train_stem, y_train_stem)

In [95]:
# Definisikan GridSearchCV
full_grid_search_DTC = GridSearchCV(estimator=full_DTC, param_grid=param_grid_DTC, cv=5, scoring='accuracy', n_jobs=4)

# Fit model
full_grid_search_DTC.fit(x_train_stem, y_train_stem)

In [96]:
print(f"Best parameters for TF IDF Decision Tree Classifier: {tf_idf_grid_search_DTC.best_params_}")
print(f"Best parameters for Stem and Stopwords Decision Tree Classifier: {stem_grid_search_DTC.best_params_}")
print(f"Best parameters for Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler: {full_grid_search_DTC.best_params_}")

Best parameters for TF IDF Decision Tree Classifier: {'dtc__max_depth': None, 'dtc__min_samples_leaf': 1, 'dtc__min_samples_split': 2}
Best parameters for Stem and Stopwords Decision Tree Classifier: {'dtc__max_depth': None, 'dtc__min_samples_leaf': 1, 'dtc__min_samples_split': 2}
Best parameters for Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler: {'dtc__max_depth': None, 'dtc__min_samples_leaf': 1, 'dtc__min_samples_split': 2}


### Decision Tree Classifier Model Setelah Optimisasi

In [76]:
tf_idf_DTC.set_params(
    dtc__max_depth=None,
    dtc__min_samples_leaf=1,
    dtc__min_samples_split=2
)

optimized_tf_idf_DTC = tf_idf_DTC

stem_DTC.set_params(
    dtc__max_depth=None,
    dtc__min_samples_leaf=1,
    dtc__min_samples_split=2
)

optimized_stem_DTC = stem_DTC

full_DTC.set_params(
    dtc__max_depth=None,
    dtc__min_samples_leaf=1,
    dtc__min_samples_split=2
)

optimized_full_DTC = full_DTC

# optimized_tf_idf_DTC = tf_idf_grid_search_DTC.best_estimator_
# optimized_stem_DTC = stem_grid_search_DTC.best_estimator_
# optimized_full_DTC = full_grid_search_DTC.best_estimator_

#### TF IDF

In [77]:
optimized_tf_idf_train_pred_DTC = optimized_tf_idf_DTC.predict(x_train)
optimized_tf_idf_test_pred_DTC = optimized_tf_idf_DTC.predict(x_test)

In [78]:
optimized_tf_idf_train_score_DTC = optimized_tf_idf_DTC.score(x_train, y_train)

In [79]:
optimized_tf_idf_test_score_DTC = optimized_tf_idf_DTC.score(x_test, y_test)

In [80]:
print(classification_report(y_train, tf_idf_train_pred_lr))
print(classification_report(y_test, tf_idf_test_pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99     16483
           1       0.99      0.99      0.99     14945

    accuracy                           0.99     31428
   macro avg       0.99      0.99      0.99     31428
weighted avg       0.99      0.99      0.99     31428

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6998
           1       0.99      0.99      0.99      6472

    accuracy                           0.99     13470
   macro avg       0.99      0.99      0.99     13470
weighted avg       0.99      0.99      0.99     13470



In [81]:
print(f"Optimized Decision Tree Classifier TF IDF Train Score: {optimized_tf_idf_train_score_DTC}")
print(f"Optimized Decision Tree Classifier TF IDF Test Score: {optimized_tf_idf_test_score_DTC}")

Optimized Decision Tree Classifier TF IDF Train Score: 0.9999681812396589
Optimized Decision Tree Classifier TF IDF Test Score: 0.996362286562732


#### TF IDF With Stemming & Stopwords

In [82]:
optimized_stem_train_pred_DTC = optimized_stem_DTC.predict(x_train_stem)
optimized_stem_test_pred_DTC = optimized_stem_DTC.predict(x_test_stem)

In [83]:
optimized_stem_train_score_DTC = optimized_stem_DTC.score(x_train_stem, y_train_stem)

In [84]:
optimized_stem_test_score_DTC = optimized_stem_DTC.score(x_test_stem, y_test_stem)

In [85]:
print(classification_report(y_train_stem, optimized_stem_train_pred_DTC))
print(classification_report(y_test_stem, optimized_stem_test_pred_DTC))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6998
           1       1.00      1.00      1.00      6472

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



In [86]:
print(f"Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF Train Score: {optimized_stem_train_score_DTC}")
print(f"Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF Test Score: {optimized_stem_test_score_DTC}")

Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF Train Score: 0.9999681812396589
Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF Test Score: 0.996362286562732


#### TF IDF with Stemming and Stopwords 

In [87]:
optimized_full_train_pred_DTC = optimized_full_DTC.predict(x_train_stem)
optimized_full_test_pred_DTC = optimized_full_DTC.predict(x_test_stem)

In [88]:
optimized_full_train_score_DTC = optimized_full_DTC.score(x_train_stem, y_train_stem)

In [89]:
optimized_full_test_score_DTC = optimized_full_DTC.score(x_test_stem, y_test_stem)

In [90]:
print(classification_report(y_train_stem, optimized_full_train_pred_DTC))
print(classification_report(y_test_stem, optimized_full_test_pred_DTC))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     16483
           1       1.00      1.00      1.00     14945

    accuracy                           1.00     31428
   macro avg       1.00      1.00      1.00     31428
weighted avg       1.00      1.00      1.00     31428

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6998
           1       1.00      1.00      1.00      6472

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



In [91]:
print(f"Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: {optimized_full_train_score_DTC}")
print(f"Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: {optimized_full_test_score_DTC}")

Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score: 0.9999681812396589
Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: 0.9962880475129918


### Score Differential

#### TF IDF

In [92]:
# Train vs Train Comparison for TF-IDF Logistic Regression
print("Train vs Train Comparison for TF-IDF Logistic Regression:")
print(f"TF-IDF Train Score vs Optimized Train Score: {tf_idf_train_score_DTC} vs {optimized_tf_idf_train_score_DTC}")
print("")

# Test vs Test Comparison for TF-IDF Logistic Regression
print("Test vs Test Comparison for TF-IDF Logistic Regression:")
print(f"TF-IDF Test Score vs Optimized Test Score: {tf_idf_test_score_DTC} vs {optimized_tf_idf_test_score_DTC}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): TF-IDF vs Optimized: {((optimized_tf_idf_train_score_DTC - tf_idf_train_score_DTC) / tf_idf_train_score_DTC) * 100:.2f}%")
print(f"Percentage Difference (Test Score): TF-IDF vs Optimized: {((optimized_tf_idf_test_score_DTC - tf_idf_test_score_DTC) / tf_idf_test_score_DTC) * 100:.2f}%")


Train vs Train Comparison for TF-IDF Logistic Regression:
TF-IDF Train Score vs Optimized Train Score: 0.9999681812396589 vs 0.9999681812396589

Test vs Test Comparison for TF-IDF Logistic Regression:
TF-IDF Test Score vs Optimized Test Score: 0.996362286562732 vs 0.996362286562732

Percentage Differences:
Percentage Difference (Train Score): TF-IDF vs Optimized: 0.00%
Percentage Difference (Test Score): TF-IDF vs Optimized: 0.00%


#### TF IDF With Stemming & Stopwords

In [93]:
# Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:")
print(f"Stemming, Stopwords, TF-IDF Train Score vs Optimized Train Score: {stem_train_score_DTC} vs {optimized_stem_train_score_DTC}")
print("")

# Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:")
print(f"Stemming, Stopwords, TF-IDF Test Score vs Optimized Test Score: {stem_test_score_DTC} vs {optimized_stem_test_score_DTC}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): Stemming, Stopwords, TF-IDF vs Optimized: {((optimized_stem_train_score_DTC - stem_train_score_DTC) / stem_train_score_DTC) * 100:.2f}%")
print(f"Percentage Difference (Test Score): Stemming, Stopwords, TF-IDF vs Optimized: {((optimized_stem_test_score_DTC - stem_test_score_DTC) / stem_test_score_DTC) * 100:.2f}%")


Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:
Stemming, Stopwords, TF-IDF Train Score vs Optimized Train Score: 0.9999681812396589 vs 0.9999681812396589

Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression:
Stemming, Stopwords, TF-IDF Test Score vs Optimized Test Score: 0.996362286562732 vs 0.996362286562732

Percentage Differences:
Percentage Difference (Train Score): Stemming, Stopwords, TF-IDF vs Optimized: 0.00%
Percentage Difference (Test Score): Stemming, Stopwords, TF-IDF vs Optimized: 0.00%


#### Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler

In [94]:
# Train vs Train Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Train vs Train Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:")
print(f"Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score vs Optimized Train Score: {full_train_score_DTC} vs {optimized_full_train_score_DTC}")
print("")

# Test vs Test Comparison for Stemming, Stopwords, TF-IDF Logistic Regression
print("Test vs Test Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:")
print(f"Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: {full_test_score_DTC} vs {optimized_full_test_score_DTC}")
print("")

# Percentage Differences
print("Percentage Differences:")
print(f"Percentage Difference (Train Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: {((optimized_full_train_score_DTC - full_train_score_DTC) / full_train_score_DTC) * 100:.2f}%")
print(f"Percentage Difference (Test Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: {((optimized_full_test_score_DTC - full_test_score_DTC) / full_test_score_DTC) * 100:.2f}%")


Train vs Train Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:
Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Train Score vs Optimized Train Score: 0.9999681812396589 vs 0.9999681812396589

Test vs Test Comparison for Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Logistic Regression:
Logistic Regression Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Test Score: 0.9962880475129918 vs 0.9962880475129918

Percentage Differences:
Percentage Difference (Train Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: 0.00%
Percentage Difference (Test Score): Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler vs Optimized: 0.00%


#### Optimized Model Comparison

In [95]:
# Model 1: Optimized Decision Tree Classifier TF IDF Scores
print("Model 1 Scores:")
print(f"Train Score: {optimized_tf_idf_train_score_DTC}")
print(f"Test Score: {optimized_tf_idf_test_score_DTC}")

# Model 2: Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF Scores
print("\nModel 2 Scores:")
print(f"Train Score: {optimized_stem_train_score_DTC}")
print(f"Test Score: {optimized_stem_test_score_DTC}")

# Model 3: Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler Scores
print("\nModel 3 Scores:")
print(f"Train Score: {optimized_full_train_score_DTC}")
print(f"Test Score: {optimized_full_test_score_DTC}")

print("")

# Percentage Differences between Model 1 and Model 2 on train and test data
print("Percentage Differences between Model 1 and Model 2:")
train_score_diff_1_2 = optimized_tf_idf_train_score_DTC - optimized_stem_train_score_DTC
test_score_diff_1_2 = optimized_tf_idf_test_score_DTC - optimized_stem_test_score_DTC
train_score_percent_diff_1_2 = (train_score_diff_1_2 / optimized_tf_idf_train_score_DTC) * 100
test_score_percent_diff_1_2 = (test_score_diff_1_2 / optimized_tf_idf_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_1_2} = {train_score_percent_diff_1_2:.2f}%")
print(f"Test Data: {test_score_diff_1_2} = {test_score_percent_diff_1_2:.2f}%")

# Percentage Differences between Model 1 and Model 3 on train and test data
print("\nPercentage Differences between Model 1 and Model 3:")
train_score_diff_1_3 = optimized_tf_idf_train_score_DTC - optimized_full_train_score_DTC
test_score_diff_1_3 = optimized_tf_idf_test_score_DTC - optimized_full_test_score_DTC
train_score_percent_diff_1_3 = (train_score_diff_1_3 / optimized_tf_idf_train_score_DTC) * 100
test_score_percent_diff_1_3 = (test_score_diff_1_3 / optimized_tf_idf_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_1_3} = {train_score_percent_diff_1_3:.2f}%")
print(f"Test Data: {test_score_diff_1_3} = {test_score_percent_diff_1_3:.2f}%")

# Percentage Differences between Model 2 and Model 3 on train and test data
print("\nPercentage Differences between Model 2 and Model 3:")
train_score_diff_2_3 = optimized_stem_train_score_DTC - optimized_full_train_score_DTC
test_score_diff_2_3 = optimized_stem_test_score_DTC - optimized_full_test_score_DTC
train_score_percent_diff_2_3 = (train_score_diff_2_3 / optimized_stem_train_score_DTC) * 100
test_score_percent_diff_2_3 = (test_score_diff_2_3 / optimized_stem_test_score_DTC) * 100
print(f"Train Data: {train_score_diff_2_3} = {train_score_percent_diff_2_3:.2f}%")
print(f"Test Data: {test_score_diff_2_3} = {test_score_percent_diff_2_3:.2f}%")


Model 1 Scores:
Train Score: 0.9999681812396589
Test Score: 0.996362286562732

Model 2 Scores:
Train Score: 0.9999681812396589
Test Score: 0.996362286562732

Model 3 Scores:
Train Score: 0.9999681812396589
Test Score: 0.9962880475129918

Percentage Differences between Model 1 and Model 2:
Train Data: 0.0 = 0.00%
Test Data: 0.0 = 0.00%

Percentage Differences between Model 1 and Model 3:
Train Data: 0.0 = 0.00%
Test Data: 7.42390497401324e-05 = 0.01%

Percentage Differences between Model 2 and Model 3:
Train Data: 0.0 = 0.00%
Test Data: 7.42390497401324e-05 = 0.01%


### Decision Tree Classifier Models Ranking

In [179]:
models_train_DTC = [
    ('Decision Tree Classifier TF IDF', tf_idf_train_score_DTC, tf_idf_DTC),
    ('Decision Tree Classifier Stemming, Stopwords, TF IDF', stem_train_score_DTC, stem_DTC),
    ('Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler', full_train_score_DTC, full_DTC),
    ('Optimized Decision Tree Classifier TF IDF', optimized_tf_idf_train_score_DTC, optimized_tf_idf_DTC),
    ('Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF', optimized_stem_train_score_DTC, optimized_stem_DTC),
    ('Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler', optimized_full_train_score_DTC, optimized_full_DTC)
]

models_test_DTC = [
    ('Decision Tree Classifier TF IDF', tf_idf_test_score_DTC, tf_idf_DTC),
    ('Decision Tree Classifier Stemming, Stopwords, TF IDF', stem_test_score_DTC, stem_DTC),
    ('Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler', full_test_score_DTC, full_DTC),
    ('Optimized Decision Tree Classifier TF IDF', optimized_tf_idf_test_score_DTC, optimized_tf_idf_DTC),
    ('Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF', optimized_stem_test_score_DTC, optimized_stem_DTC),
    ('Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler', optimized_full_test_score_DTC, optimized_full_DTC)
]

models_train_sorted_DTC = sorted(models_train_DTC, key=lambda x: x[1], reverse=True)
models_test_sorted_DTC = sorted(models_test_DTC, key=lambda x: x[1], reverse=True)

print("\nModel Rankings based on Train Accuracy:")
for rank, (desc, score, model) in enumerate(models_train_sorted_DTC, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")

print("\nModel Rankings based on Test Accuracy:")
for rank, (desc, score, model) in enumerate(models_test_sorted_DTC, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")


Model Rankings based on Train Accuracy:
1. Decision Tree Classifier TF IDF with accuracy 0.99997
2. Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99997
3. Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler with accuracy 0.99997
4. Optimized Decision Tree Classifier TF IDF with accuracy 0.99997
5. Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99997
6. Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler with accuracy 0.99997

Model Rankings based on Test Accuracy:
1. Decision Tree Classifier TF IDF with accuracy 0.99636
2. Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99636
3. Optimized Decision Tree Classifier TF IDF with accuracy 0.99636
4. Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99636
5. Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler with accuracy 0.99629
6. Optimized Stemming, Stopwords, TF IDF, Class Balancin

## Models Ranking

In [183]:
models_train = models_train_LR + models_train_DTC
models_test = models_test_LR + models_test_DTC

models_train_sorted = sorted(models_train, key=lambda x: x[1], reverse=True)
models_test_sorted = sorted(models_test, key=lambda x: x[1], reverse=True)

print("\nModel Rankings based on Train Accuracy:")
for rank, (desc, score, model) in enumerate(models_train_sorted, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")

print("\nModel Rankings based on Test Accuracy:")
for rank, (desc, score, model) in enumerate(models_test_sorted, start=1):
    print(f"{rank}. {desc} with accuracy {score:.5f}")


Model Rankings based on Train Accuracy:
1. Full Model with accuracy 0.99997
2. Optimized Full Model with accuracy 0.99997
3. Decision Tree Classifier TF IDF with accuracy 0.99997
4. Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99997
5. Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler with accuracy 0.99997
6. Optimized Decision Tree Classifier TF IDF with accuracy 0.99997
7. Optimized Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99997
8. Optimized Stemming, Stopwords, TF IDF, Class Balancing, and Standard Scaler with accuracy 0.99997
9. TF IDF Model with accuracy 0.99322
10. Optimized TF IDF Model with accuracy 0.99322
11. Stem & Stopwords Model with accuracy 0.99160
12. Optimized Stem & Stopwords Model with accuracy 0.99160

Model Rankings based on Test Accuracy:
1. Decision Tree Classifier TF IDF with accuracy 0.99636
2. Decision Tree Classifier Stemming, Stopwords, TF IDF with accuracy 0.99636
3. Optimized Decision Tre

# Model Implementation

In [112]:
def output_label(n):
  if n==0:
    return "It Is Fake News"
  elif n==1:
    return "It Is Genuine News"


In [192]:
best_models = models_test_sorted[0]

In [193]:
print(f"Best Models\n{best_models[0]} -> {best_models[1]}")

Best Models
Decision Tree Classifier TF IDF -> 0.996362286562732


In [218]:
def manual_testing(news):
    testing_news = {"text": [news]}
    new_def_test = pd.DataFrame(testing_news)
    new_x_test = new_def_test["text"].apply(wordopt)

    # Model predictions
    pred = best_models[2].predict(new_x_test)

    return print(f"Text: {news}\nPrediction: {output_label(pred[0])}")


In [221]:
news_article = str(input())
manual_testing(news_article)

Text: WASHINGTON (Reuters) - The head of a conservative Republican faction in the U.S. Congress, who voted this month for a huge expansion of the national debt to pay for tax cuts, called himself a “fiscal conservative” on Sunday and urged budget restraint in 2018. In keeping with a sharp pivot under way among Republicans, U.S. Representative Mark Meadows, speaking on CBS’ “Face the Nation,” drew a hard line on federal spending, which lawmakers are bracing to do battle over in January. When they return from the holidays on Wednesday, lawmakers will begin trying to pass a federal budget in a fight likely to be linked to other issues, such as immigration policy, even as the November congressional election campaigns approach in which Republicans will seek to keep control of Congress. President Donald Trump and his Republicans want a big budget increase in military spending, while Democrats also want proportional increases for non-defense “discretionary” spending on programs that support e

In [222]:
news_article = str(input())
manual_testing(news_article)

Text: It almost seems like Donald Trump is trolling America at this point. In the beginning, when he tried to gaslight the country by insisting that the crowd at his inauguration was the biggest ever   or that it was even close to the last couple of inaugurations   we all kind of scratched our heads and wondered what kind of bullshit he was playing at. Then when he started appointing people to positions they had no business being in, we started to worry that this was going to go much worse than we had expected.After 11 months of Donald Trump pulling the rhetorical equivalent of whipping his dick out and slapping it on every table he gets near, I think it s time we address what s happening: Dude is a straight-up troll. He gets pleasure out of making other people uncomfortable or even seeing them in distress. He actively thinks up ways to piss off people he doesn t like.Let s set aside just for a moment the fact that that s the least presidential  behavior anyone s ever heard of   it s d