# Fake News Prediction

This notebook demonstrates how to use the pre-trained Fake News Detection model (`CNN.model`) and the corresponding TF-IDF vectorizer (`vectorizer.joblib`) to predict whether new, unseen news articles are real or fake.

It requires the `vectorizer.joblib` and `CNN.model` files generated by the `Fake news detection.ipynb` notebook to be present in the same directory.

In [1]:

from joblib import load

## Preprocessing Setup

It's essential to preprocess the new text data in exactly the same way as the training data. This section imports necessary libraries (NLTK, string) and defines the set of stopwords and punctuation to be removed.

In [2]:
import string
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))
punctuation = list(string.punctuation)
stop.update(punctuation)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\amro7\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Define the `clean_text` function. This function converts text to lowercase, splits it into words, and removes the stopwords and punctuation defined above. This function must be identical to the one used during training.

In [3]:
def clean_text(text):
    text = text.lower()
    text = text.split()
    text = [word for word in text if word not in stop]
    text = " ".join(text)
    return text

## Load Pre-trained Model and Vectorizer

Load the saved TF-IDF vectorizer and the trained Keras neural network model from the files created by the training notebook.

In [4]:
vectorizer = load('vectorizer.joblib') 
loaded_model = load('CNN.model')


Keras model archive loading:
File Name                                             Modified             Size
config.json                                    2023-01-25 03:58:50         2692
metadata.json                                  2023-01-25 03:58:50           64
variables.h5                                   2023-01-25 03:58:50      2272352
Keras weights file (<HDF5 file "variables.h5" (mode r)>) loading:
...layers\dense
......vars
.........0
.........1
...layers\dense_1
......vars
.........0
.........1
...layers\dense_2
......vars
.........0
.........1
...layers\dense_3
......vars
.........0
.........1
...layers\dense_4
......vars
.........0
.........1
...metrics\mean
......vars
.........0
.........1
...metrics\mean_metric_wrapper
......vars
.........0
.........1
...optimizer
......vars
.........0
.........1
.........10
.........11
.........12
.........13
.........14
.........15
.........16
.........17
.........18
.........19
.........2
.........20
.........3
.........4
........

## Input Articles for Prediction

Define a list containing sample news articles to test the prediction model. You can replace the text within the triple quotes (`"""..."""`) with any other articles you want to classify.

In [1]:
articles= ["""A judge in Georgia will soon decide whether to release a grand jury report on ex-President Donald Trump's efforts to overturn his 2020 election loss.

The report contains the findings of an eight-month criminal probe into Mr Trump's pressure campaign to challenge his narrow defeat in the state.

The grand jury, dissolved two weeks ago, did not have indictment powers but may have recommended charges.

No former president has been indicted in US history.

Fulton County District Attorney Fani Willis, who received the body's findings two weeks ago, will appear in court on Tuesday and likely call for the report's full or partial release.

Ms Willis convened the 26-member grand jury in January 2022 to investigate the attempts to reverse Mr Trump's 11,779-vote loss to Joe Biden as well as the efforts to send an "alternate" slate of Republican presidential electors from the state.

Among the potential crimes it looked into were solicitation of election fraud, making false statements to government officials and racketeering.

That includes an infamous phone call in January 2021 between Mr Trump and Georgia Secretary of State Brad Raffensperger, in which the then-president said he wanted "to find 11,780 votes, which is one more than we have".

Mr Trump has described the investigation as a "strictly political witch hunt" and has repeatedly characterised the call as "perfect".

Ahead of Tuesday's hearing, he insisted on his Truth Social platform that his Democratic opponents had cheated to win, writing: "Many people, including lawyers for both sides, were knowingly on the line. I was protesting a RIGGED & STOLEN Election, which evidence proves it was."

Attorneys for Mr Trump have said they "will not be present nor participating in Tuesday's hearing".

"To date, we have never been a part of this process. The grand jury compelled the testimony of dozens of other, often high-ranking, officials during the investigation, but never found it important to speak with the President," they wrote in a emailed statement to CBS News, the BBC's US partner.

But potential targets of the investigation may appear in the Fulton County courtroom.

That includes a group of 16 Georgia Republicans who participated in the alternate elector scheme, and Mr Trump's one-time personal attorney Rudy Giuliani.""",
"""
A viral post falsely claimed the climate activist being held by police was "all set up for the cameras".

Ms Thunberg and other activists were seeking to stop the abandoned village of Lützerath from being demolished for the expansion of a coal mine.

The video of her being removed by police has gained millions of views.

"We would never give ourselves to make such recordings," a spokesperson for local police told the BBC, denying allegations that Ms Thunberg's detainment was fake.

But it is important that the police enable reporting and guarantee the protection of media workers, they added.

The viral video shows the climate campaigner flanked by police officers on either side.

Meanwhile a few photographers can be seen snapping photos and moving around her, as Ms Thunberg smiles.

Several other police officers who were also standing nearby appear to be waiting with her before walking her away from the scene.

Some online have jumped onto these moments of officers and Ms Thunberg waiting around, to falsely claim that it is part of a staged photo opportunity.

However the interior ministry of the western state of North Rhine-Westphalia told the BBC that the police officers and Ms Thunberg were waiting for logistical reasons.

"They had to wait for a couple of minutes before they could bring her to a certain police car," said the spokesperson.

They added that "the whole situation has been used by those with political motives and the real reason is entirely practical and mundane."

A Twitter post falsely claiming that the detainment of Greta Thunberg was fake.
IMAGE SOURCE,TWITTER/SCREENSHOT
Image caption,
The viral post with the video of Greta Thunberg and police officers at the protest
Christian Wernicke, a journalist from German news outlet Süddeutsche Zeitung who was there at the time, said the police officers "were deciding how they would proceed with the identity check and waiting to take Greta to the police vehicle."

"My impression was that there was confusion. Greta was not the first protester who had been taken away from the sit-in," Mr Wernicke added.

"I've seen different reactions to the video. Some say that the footage looks like the police are setting her up to embarrass her and others say that it is all part of some propaganda.

"People are interpreting and using this footage for their own motives."

Many online also falsely claimed it was a "fake arrest" but police clarified that Ms Thunberg had not been arrested but had been briefly detained.

Greta Thunberg detained at German coal protest
Thunberg joins 'Pinky' and 'Brain' tunnel protest
The group of activists were detained after they "rushed towards the ledge" of the Garzweiler 2 mine, police had said on Tuesday.

Officers also confirmed all of those detained would not be charged.

Ms Thunberg has frequently been the target of conspiracy theories and false claims online, often by those who deny the existence of man-made climate change.

She tweeted: "Yesterday I was part of a group that peacefully protested the expansion of a coal mine in Germany. We were kettled by police and then detained but were let go later that evening.

"Climate protection is not a crime"
"""]

In [6]:
articles = [clean_text(text) for text in articles]
print(len(articles))

2


## Vectorize and Predict

1.  Use the **loaded** TF-IDF vectorizer's `.transform()` method (note: *not* `.fit_transform()`) to convert the cleaned text into numerical vectors based on the vocabulary learned during training.
2.  Use the **loaded** Keras model's `.predict()` method to get the probability of each article being 'Real' news (category 1).

In [8]:
X = vectorizer.transform(articles).toarray()
y_pred = loaded_model.predict(X)

print(y_pred)

[[7.3692799e-01]
 [4.2051708e-04]]


## Interpretation of Results

The output above shows the predicted probabilities for each article in the input `articles` list.
*   A value close to **1.0** suggests the model classifies the article as **Real**.
*   A value close to **0.0** suggests the model classifies the article as **Fake**.

You can set a threshold (e.g., 0.5 or the 0.7 used in the training notebook's evaluation) to make a final binary classification if needed. For example:
```python
# Example classification based on threshold
# threshold = 0.7
# final_predictions = ["Real" if p > threshold else "Fake" for p in y_pred]
# print(final_predictions)