First, read both CSV files and add labels:

Fake news → Label as 1 (Spam)
True news → Label as 0 (Not Spam)

In [2]:
import pandas as pd

total_rows = sum(1 for _ in open("Fake.csv")) - 1 

# Read 50% of the file
df_fake = pd.read_csv("Fake.csv", nrows=int(0.5 * total_rows))
df_true = pd.read_csv("True.csv")

df_fake["label"] = 1  # Fake news (Spam)
df_true["label"] = 0  # True news (Not Spam)


df = pd.concat([df_fake, df_true], axis=0).sample(frac=1, random_state=42).reset_index(drop=True)


df

Unnamed: 0,title,text,subject,date,label
0,Kremlin says Russia not accused in U.S. case a...,MOSCOW (Reuters) - The Kremlin said on Tuesday...,worldnews,"October 31, 2017",0
1,ICC reports Jordan to U.N. Security Council fo...,AMSTERDAM (Reuters) - The International Crimin...,worldnews,"December 11, 2017",0
2,Supreme Court gives Trump more time to file tr...,WASHINGTON (Reuters) - The U.S. Supreme Court ...,politicsNews,"June 13, 2017",0
3,WATCH: What Trump Just Said About 9/11 Proves...,Sprinting towards a likely win in New York s R...,News,"April 18, 2016",1
4,Coal CEO pressing Trump to speak up for miners...,WASHINGTON (Reuters) - Coal mining executive R...,politicsNews,"December 9, 2016",0
...,...,...,...,...,...
33180,Senate revokes Obama federal land-planning rule,WASHINGTON (Reuters) - The U.S. Senate on Tues...,politicsNews,"March 7, 2017",0
33181,Hillary’s Message To Former Miss Universe Cal...,Miss Universe 1996 Alicia Machado is now an Am...,News,"May 20, 2016",1
33182,UNREAL! CBS’S TED KOPPEL Tells Sean Hannity He...,,politics,"Mar 27, 2017",1
33183,Trump Stole An Idea From North Korean Propaga...,Jesus f*cking Christ our President* is a moron...,News,"July 14, 2017",1


In [3]:
print(df.info())
print(df.isnull().sum())  # Check for missing values


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33185 entries, 0 to 33184
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    33185 non-null  object
 1   text     33185 non-null  object
 2   subject  33185 non-null  object
 3   date     33185 non-null  object
 4   label    33185 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 1.3+ MB
None
title      0
text       0
subject    0
date       0
label      0
dtype: int64


In [4]:
import re
import nltk
from nltk.corpus import stopwords

nltk.download("stopwords")
stop_words = set(stopwords.words("english"))

def clean_text(text):
    text = re.sub(r'\W', ' ', text)  # Remove special characters
    text = text.lower()  # Convert to lowercase
    text = " ".join([word for word in text.split() if word not in stop_words])  # Remove stopwords
    return text

df["clean_text"] = df["text"].astype(str).apply(clean_text)


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/nikhilverma/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [5]:
df

Unnamed: 0,title,text,subject,date,label,clean_text
0,Kremlin says Russia not accused in U.S. case a...,MOSCOW (Reuters) - The Kremlin said on Tuesday...,worldnews,"October 31, 2017",0,moscow reuters kremlin said tuesday u charges ...
1,ICC reports Jordan to U.N. Security Council fo...,AMSTERDAM (Reuters) - The International Crimin...,worldnews,"December 11, 2017",0,amsterdam reuters international criminal court...
2,Supreme Court gives Trump more time to file tr...,WASHINGTON (Reuters) - The U.S. Supreme Court ...,politicsNews,"June 13, 2017",0,washington reuters u supreme court tuesday all...
3,WATCH: What Trump Just Said About 9/11 Proves...,Sprinting towards a likely win in New York s R...,News,"April 18, 2016",1,sprinting towards likely win new york republic...
4,Coal CEO pressing Trump to speak up for miners...,WASHINGTON (Reuters) - Coal mining executive R...,politicsNews,"December 9, 2016",0,washington reuters coal mining executive rober...
...,...,...,...,...,...,...
33180,Senate revokes Obama federal land-planning rule,WASHINGTON (Reuters) - The U.S. Senate on Tues...,politicsNews,"March 7, 2017",0,washington reuters u senate tuesday revoked ru...
33181,Hillary’s Message To Former Miss Universe Cal...,Miss Universe 1996 Alicia Machado is now an Am...,News,"May 20, 2016",1,miss universe 1996 alicia machado american cit...
33182,UNREAL! CBS’S TED KOPPEL Tells Sean Hannity He...,,politics,"Mar 27, 2017",1,
33183,Trump Stole An Idea From North Korean Propaga...,Jesus f*cking Christ our President* is a moron...,News,"July 14, 2017",1,jesus f cking christ president moron satisfied...


In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=50000)  # Keep top 5000 features
X = vectorizer.fit_transform(df["clean_text"]).toarray()
y = df["label"]


In [7]:
X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], shape=(33185, 50000))

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Split dataset (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, y_pred))
 

Model Accuracy: 98.90%
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      4256
           1       0.99      0.97      0.98      2381

    accuracy                           0.99      6637
   macro avg       0.99      0.99      0.99      6637
weighted avg       0.99      0.99      0.99      6637



In [9]:
import numpy as np
import re

def clean_text(text):
    """Preprocesses the input text (same as training phase)."""
    text = re.sub(r'\W', ' ', text)  # Remove special characters
    text = text.lower()  # Convert to lowercase
    text = " ".join([word for word in text.split() if word not in stop_words])  # Remove stopwords
    return text

def predict_news(news_text, vectorizer, model):
    """
    Predict whether a given news article is fake or true.

    Parameters:
    news_text (str): The input news text.
    vectorizer (TfidfVectorizer): The trained TF-IDF vectorizer.
    model (LogisticRegression): The trained logistic regression model.

    Returns:
    str: "Fake News" or "True News"
    """

    # Preprocess input text
    cleaned_text = clean_text(news_text)

    # Transform using the already trained vectorizer (IMPORTANT!)
    text_vector = vectorizer.transform([cleaned_text])  # Use .transform(), NOT .fit_transform()

    # Predict using the trained model
    prediction = model.predict(text_vector)[0]

    # Return the result
    return "Fake News" if prediction == 1 else "True News"
# Example usage:
news_sample = """These have been inventorised — at its core, air pollution is about combustion or the fuel we burn and the emissions from this. Vehicles are among the main sources — even though we have moved to cleaner fuel, the sheer number of vehicles on Indian roads is negating gains. There is also industrial pollution — much comes from burning coal which industry resorts to as the cost of electricity is high. Even though gas is available at lower prices in Delhi, for instance, industry is no longer based only in official industrial areas but has spread into non-authorised areas where the use of polluting fuel continues. Although coal has been banned in the entire National Capital Region (NCR), alternative fuel or natural gas isn’t under GST and hence, it is taxed heavily. Gas prices are thrice the cost of coal and given how electricity prices are also substantial, polluting fuels continue as industries see no alternative.

There is also the burning of waste — unlike other regions in India, where waste management practices have improved significantly, the NCR is failing to deal with the mountains of waste it generates. Consequently, whether it is landfill fires or smaller heaps being lit, waste burning is a major cause of air pollution now. Dust from construction, etc., also becomes a pollutant when solid particles from combustion are lifted into the wind, creating airborne particles which are inhaled. Household-related emissions are a cause as well — economically weaker sections in the outlying areas of large cities often have no option but to still use firewood, etc., to cook."""
result = predict_news(news_sample, vectorizer, model)
print(result)

True News
