## Fake news Classification
Description:
This project is focused on building a machine learning model to classify news articles as either true or fake. The project uses a dataset of news articles and their corresponding labels (true or fake) to train and evaluate several machine learning algorithms, including Decision Tree Classifier, Random Forest Classifier, and Logistic Regression. The text data from the news articles is vectorized using CountVectorizer before being used as input to the machine learning algorithms. The project also includes a graphical user interface (GUI) built using the Tkinter library, allowing users to enter new text and classify it as either true or fake using the trained machine learning model. Overall, this project demonstrates the use of machine learning techniques for text classification and provides a practical application for identifying fake news.

In [2]:
import pandas as pd
import numpy as np

from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import CountVectorizer

## Data preprocessing
In the data preprocessing section, the code reads in a CSV file containing news articles and their corresponding labels. The 'id', 'title', and 'author' columns are dropped from the DataFrame, and CountVectorizer is used to vectorize the text data from the 'text' column. The vectorized text data and labels are then split into training and testing sets using train_test_split.

In [3]:
data = pd.read_csv("train.csv")
print(data.head())

   id                                              title              author  \
0   0  House Dem Aide: We Didn’t Even See Comey’s Let...       Darrell Lucus   
1   1  FLYNN: Hillary Clinton, Big Woman on Campus - ...     Daniel J. Flynn   
2   2                  Why the Truth Might Get You Fired  Consortiumnews.com   
3   3  15 Civilians Killed In Single US Airstrike Hav...     Jessica Purkiss   
4   4  Iranian woman jailed for fictional unpublished...      Howard Portnoy   

                                                text  label  
0  House Dem Aide: We Didn’t Even See Comey’s Let...      1  
1  Ever get the feeling your life circles the rou...      0  
2  Why the Truth Might Get You Fired October 29, ...      1  
3  Videos 15 Civilians Killed In Single US Airstr...      1  
4  Print \nAn Iranian woman has been sentenced to...      1  


In [4]:
df = data.drop('id', axis=1)
df = data.drop('title', axis=1)
df = data.drop('author', axis=1)
print(df.head())

   id                                              title  \
0   0  House Dem Aide: We Didn’t Even See Comey’s Let...   
1   1  FLYNN: Hillary Clinton, Big Woman on Campus - ...   
2   2                  Why the Truth Might Get You Fired   
3   3  15 Civilians Killed In Single US Airstrike Hav...   
4   4  Iranian woman jailed for fictional unpublished...   

                                                text  label  
0  House Dem Aide: We Didn’t Even See Comey’s Let...      1  
1  Ever get the feeling your life circles the rou...      0  
2  Why the Truth Might Get You Fired October 29, ...      1  
3  Videos 15 Civilians Killed In Single US Airstr...      1  
4  Print \nAn Iranian woman has been sentenced to...      1  


In [5]:
# Create CountVectorizer objects for each text column
vectorizer_text = CountVectorizer()

X_text = vectorizer_text.fit_transform(df['text'].astype('U').values)

X = X_text
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape

(16640, 180445)

In [6]:
X_test.shape

(4160, 180445)

## Machine learning model building and evaluation
In the machine learning model building and evaluation section, three machine learning algorithms are trained and evaluated using the training and testing sets: Decision Tree Classifier, Random Forest Classifier, and Logistic Regression. The accuracy and classification report for each algorithm are printed to the console.

In [7]:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9


In [8]:
print(classification_report(y_test , y_pred))

              precision    recall  f1-score   support

           0       0.90      0.90      0.90      2132
           1       0.90      0.90      0.90      2028

    accuracy                           0.90      4160
   macro avg       0.90      0.90      0.90      4160
weighted avg       0.90      0.90      0.90      4160



In [9]:
rfc = RandomForestClassifier()
rfc.fit(X_train , y_train)
pred_rfc = rfc.predict(X_test)

accuracy = accuracy_score(y_test, pred_rfc)
print("Accuracy:", accuracy)

Accuracy: 0.9117788461538462


In [10]:
print(classification_report(y_test , pred_rfc))

              precision    recall  f1-score   support

           0       0.89      0.94      0.92      2132
           1       0.94      0.88      0.91      2028

    accuracy                           0.91      4160
   macro avg       0.91      0.91      0.91      4160
weighted avg       0.91      0.91      0.91      4160



In [11]:
lr = LogisticRegression()
lr.fit(X_train , y_train)
pred_lr = lr.predict(X_test)

accuracy = accuracy_score(y_test, pred_lr)
print("Accuracy:", accuracy)

Accuracy: 0.9622596153846154


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [12]:
print(classification_report(y_test , pred_lr))

              precision    recall  f1-score   support

           0       0.97      0.96      0.96      2132
           1       0.96      0.97      0.96      2028

    accuracy                           0.96      4160
   macro avg       0.96      0.96      0.96      4160
weighted avg       0.96      0.96      0.96      4160



## GUI
GUI is created using the Tkinter library. The user can enter new text into a text box and click a button to classify the text as either true or fake using the trained Logistic Regression Classifier. If the text is classified as fake, a warning message box is displayed. If the text is classified as true, an info message box is displayed.

In [13]:
import tkinter as tk
from tkinter import messagebox


# Create the GUI
window = tk.Tk()
window.title("Fake News")

# Define a function to classify the input text
def classify():
    input_text = text_input.get("1.0", "end-1c")
    
    if len(input_text) == 0:
        messagebox.showwarning("Warning", "Please enter some text.")
    else:
        # Vectorize the input text and predict its label
        X_input_text = vectorizer_text.transform([input_text])
        y_pred = lr.predict(X_input_text)
        
        if y_pred[0] == 0:
            messagebox.showinfo("Result", "This news is true.")
        else:
            messagebox.showwarning("Result", "This news is fake.")

# Create the input text box and the classify button
text_label = tk.Label(window, text="Enter the news text:")
text_label.pack()
text_input = tk.Text(window, height=30)
text_input.pack()
classify_button = tk.Button(window, text="Classify", command = classify)
classify_button.pack()

# Run the GUI
window.mainloop()