# Dataset Description

This dataset is designed for fake news classification and contains two CSV files: `True.csv` for articles labeled as true and `Fake.csv` for articles labeled as fake. Both files share the same structure, with columns for `title`, `text`, `subject`, and `date`. It consists of approximately 45,000 news articles in total, with around 23,500 labeled as fake and 21,400 as true, making it a substantial resource for training and evaluating machine learning models. The dataset was created by Clément Bisaillon and is publicly available on Kaggle. You can download it from [here](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset/data).

## Improting required libraries

In [89]:
import pandas as pd
import string
import joblib
import re 
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

### Read the both datasets

In [90]:
data_fake = pd.read_csv('Fake.csv')
data_true = pd.read_csv('True.csv')

### Adding the target/response variable column to both datasets

In [91]:
data_fake['label'] = 0
data_true['label'] = 1

### Display the first 5 rows of both datasets

In [92]:
data_true.head()

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",1


In [93]:
data_fake.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


### Checking the shape of the dataset

In [94]:
data_fake.shape, data_true.shape

((23481, 5), (21417, 5))

### Checking the information of the both dataset (Column Name, Non-Null Count, DataType)

In [95]:
data_fake.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23481 entries, 0 to 23480
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    23481 non-null  object
 1   text     23481 non-null  object
 2   subject  23481 non-null  object
 3   date     23481 non-null  object
 4   label    23481 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 917.4+ KB


In [96]:
data_true.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21417 entries, 0 to 21416
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    21417 non-null  object
 1   text     21417 non-null  object
 2   subject  21417 non-null  object
 3   date     21417 non-null  object
 4   label    21417 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 836.7+ KB


### Mergeing the both datasets into one for further processing

In [97]:
data = pd.concat([data_true, data_fake], axis=0, ignore_index=True)
data.shape

(44898, 5)

### Taking 5 sample from the dataset

In [98]:
data.sample(5)

Unnamed: 0,title,text,subject,date,label
12289,"Arms supplied by U.S., Saudi ended up with Isl...",BAGHDAD (Reuters) - Arms provided by the Unite...,worldnews,"December 14, 2017",1
32130,Liberal #BillMaher Uses The “N-word” on Live T...,"On Friday night s episode of HBO s Real Time, ...",politics,"Jun 3, 2017",0
3607,Trump budget cuts may stir backlash in rural A...,WASHINGTON (Reuters) - President Donald Trump’...,politicsNews,"May 23, 2017",1
5363,Trump's defense chief visits UAE in first Midd...,ABU DHABI - U.S. President Donald Trump’s defe...,politicsNews,"February 18, 2017",1
9490,U.S. extends overtime pay to 4.2 million salar...,(Reuters) - The Obama administration on Tuesda...,politicsNews,"May 18, 2016",1


In [99]:
data.head(-5)

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",1
...,...,...,...,...,...
44888,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
44889,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
44890,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
44891,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0


### Dropping the 'subject' & 'date' column from the dataset

In [100]:
data = data.drop(['date', 'subject'], axis=1)

In [101]:
data.columns

Index(['title', 'text', 'label'], dtype='object')

In [102]:
def regularExp(text):
    # Convert to lowercase
    text = text.lower()
    # Remove text within square brackets
    text = re.sub(r'\[.*?\]', '', text)
    # Remove non-word characters (fixed the \W replacement)
    text = re.sub(r'\W', ' ', text)
    # Remove URLs
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)
    # Remove punctuation
    text = re.sub(f'[{re.escape(string.punctuation)}]', '', text)
    # Remove newlines
    text = re.sub(r'\n', '', text)
    # Remove words containing numbers
    text = re.sub(r'\w*\d\w*', '', text)
    # Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()
    return text

### Apply text cleaning to title and text columns

In [103]:
data['title'] = data['title'].apply(regularExp)
data['text'] = data['text'].apply(regularExp)

### Combine title and text columns into a single feature

In [104]:
data['combined'] = data['title'] + " " + data['text']

##### Setting the response variable 'y' and explanatory variables 'x' also splitting the dataset into training and testing sets

In [105]:
x = data['combined']
y = data['label']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

### Converting Text Data to Numerical Features Using TfidfVectorizer

- **TfidfVectorizer** is used to convert text data into numerical features based on the Term Frequency-Inverse Document Frequency (TF-IDF) representation.

In [106]:
vect = TfidfVectorizer()
train_x = vect.fit_transform(x_train)
test_x = vect.transform(x_test)

## Using various classification algorithms to test the model performance

- **Multinomial Naive Bayes**

In [107]:
MNB = MultinomialNB()
MNB.fit(train_x, y_train)
pred_MNB = MNB.predict(test_x)
print("Accuracy of Multinomial Naive Bayes:", accuracy_score(y_test, pred_MNB))

Accuracy of Multinomial Naive Bayes: 0.9367483296213809


- **Logistic Regression**

In [108]:
LR = LogisticRegression()
LR.fit(train_x, y_train)
pred_LR = LR.predict(test_x)
print("Accuracy of Logistic Regression:", accuracy_score(y_test, pred_LR))

Accuracy of Logistic Regression: 0.988641425389755


- **Random Forest Classifier**

In [109]:
RFC = RandomForestClassifier()
RFC.fit(train_x, y_train)
pred_RFC = RFC.predict(test_x)
print("Accuracy of Random Forest Classifier:", accuracy_score(y_test, pred_RFC))

Accuracy of Random Forest Classifier: 0.9917594654788419


- **Decision Tree Classifier**

In [110]:
DTC = DecisionTreeClassifier()
DTC.fit(train_x, y_train)
pred_DTC = DTC.predict(test_x)
print("Accuracy of Decision Tree Classifier:", accuracy_score(y_test, pred_DTC))

Accuracy of Decision Tree Classifier: 0.994988864142539


In [111]:
XGB = XGBClassifier()
XGB.fit(train_x, y_train)
pred_XGB = XGB.predict(test_x)
print("Accuracy of XGBoost Classifier:", accuracy_score(y_test, pred_XGB))

Accuracy of XGBoost Classifier: 0.9971046770601336


### Identifying model is Overfitting or not. Testing Decision Tree Classifier and XGB model performance because it is giving the best accuracy score and using it forther.
- If training accuracy is very high, but testing accuracy is significantly lower then it is overfitting if not then good to go.

In [112]:
train_accuracy = DTC.score(train_x, y_train)
test_accuracy = DTC.score(test_x, y_test)

print("Training Accuracy:", train_accuracy)
print("Testing Accuracy:", test_accuracy)

Training Accuracy: 1.0
Testing Accuracy: 0.994988864142539


In [113]:
train_accuracy = XGB.score(train_x, y_train)
test_accuracy = XGB.score(test_x, y_test)

print("Training Accuracy:", train_accuracy)
print("Testing Accuracy:", test_accuracy)

Training Accuracy: 0.9999721588061696
Testing Accuracy: 0.9971046770601336


In [114]:
joblib.dump(vect, open('vect.joblib', 'wb'))
joblib.dump(XGB, open('DTC.joblib', 'wb'))

In [115]:
load_vector = joblib.load(open('vect.joblib', 'rb'))
load_model = joblib.load(open('DTC.joblib', 'rb'))

In [116]:
def output_label(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Not A Fake News"

In [117]:
def manual_testing(news):
    # Create a DataFrame with the input news
    testing_news = {"text": [news]}
    new_def_test = pd.DataFrame(testing_news)

    # Apply preprocessing function 'regularExp' to the text
    new_def_test["text"] = new_def_test["text"].apply(regularExp)

    # Extract the text for vectorization
    new_x_test = new_def_test["text"]

    # Transform the text using the loaded vectorizer
    new_xv_test = load_vector.transform(new_x_test)

    # Prediction using the loaded Decision Tree Classifier model
    prediction = load_model.predict(new_xv_test)
    probabilities = load_model.predict_proba(new_xv_test)
    confidence_score = max(probabilities[0]) * 100

    # Display the prediction
    print("\nPrediction: {}".format(output_label(prediction[0])))
    print("Confidence Score: {:.2f}%".format(confidence_score))

In [118]:
news = str(input())
manual_testing("BOGOTA (Reuters) - Pope Francis arrived in Colombia on Wednesday with a message of unity for a nation deeply divided by a peace deal that ended a five-decade war with Marxist FARC rebels but left many victims of the bloodshed wary of the fraught healing process. Francis, making his 20th foreign trip since becoming pontiff in 2013 and his fifth to his native Latin America, started his visit in Colombian capital Bogota. He will travel later in the week to the cities of Villavicencio, Medellin and Cartagena. Greeted at the airport by President Juan Manuel Santos as attendees waved white handkerchiefs, the Argentine pope hopes his presence will help build bridges in a nation torn apart by bitter feuding over a peace accord with the Revolutionary Armed Forces of Colombia (FARC). Speaking to reporters on the Bogota-bound plane, Francis said the trip was  a bit special because it is being made to help Colombia go forward on its path to peace.  Francis will encourage reconciliation as Colombians prepare to receive 7,000 former FARC fighters into society and repair divisions after a war that killed more than 220,000 people and displaced millions over five decades. References to the recent peace deal were immediate. A teenage boy, born in 2004 to vice presidential candidate Clara Rojas when she was held captive in the jungle by the FARC, handed Francis a white porcelain dove as a welcome present. On his drive to the Vatican Embassy in central Bogota, the leader of the world s Roman Catholics was mobbed in the  pope mobile  by screaming crowds tossing flowers and holding up children to be kissed.  Peace is what Colombia has been seeking for a long time and is working to achieve,  the pope said in a video message ahead of his arrival.  A stable, lasting peace, so that we see and treat each other as brothers, never as enemies.       The FARC, which began as a peasant revolt in 1964 and battled more than a dozen governments, has formed a political party and now hopes to use words instead of weapons to effect changes in Colombia s social and economic model. But many Colombians are furious that the 2016 peace deal with the government granted fighters amnesty and some will be rewarded with seats in congress. A referendum on the deal last year was narrowly rejected, before being later modified and passed by congress. Trumpet players, singing children and white-clad rappers greeted the pope - wearing a traditional woolen poncho - at the embassy where he urged young people to  keep smiling  and then led the crowd in the Hail Mary prayer.  Don t let anyone steal your hope,  he said. People lined up all day to see the pope pass by, queues stretched around the cathedral in Bogota as residents sought passes for his events, and street vendors sold t-shirts, baseball caps and posters carrying Francis s image.  Pope Francis coming to Colombia has to unite the people. We cannot continue to be polarized. We must learn to live in peace and respect our differences,  Lucia Camargo, a pensioner, said as she lined up for a glimpse of the pontiff. Although most church leaders have voiced support for the accord, some politicians and Catholic bishops have criticized the deal for being too lenient on the guerrillas. The pope is expected to urge them to set aside their differences.  The visit will leave us a sense of union, of forgiveness,   Bogota Mayor Enrique Penalosa told Reuters.  Colombia is very polarized at the moment. There are many passions, many hatreds.  Reconciliation will be the emphasis for events on Friday in the city of Villavicencio, south of Bogota, where the pope will listen to testimonials from people whose lives were affected by the violence and then deliver a homily. Victims and former rebels who demobilized prior to the accord will attend. The pope will not meet FARC leaders or the opposition. He also had a message of dialogue and forgiveness for neighboring Venezuela, wracked by months of protests against President Nicolas Maduro, who has tightened his hold on power as an economic crisis has escalated. As his plane flew over the socialist nation, the pope sent  cordial greetings  in a telegram to Maduro and Venezuelans.   Praying that all in the nation may promote paths of solidarity, justice and harmony, I willingly invoke upon all of you God s blessings of peace,  he said.")


Prediction: Not A Fake News
Confidence Score: 99.99%
