# Fake News Detection Using Machine Learning  

## Introduction  
### Background  
Fake news has become a significant challenge in today's digital age, with misinformation spreading rapidly across social media and other platforms. Identifying and mitigating fake news is essential to ensuring the reliability of information available to the public.  

### Motivation  
With the growing influence of news on public opinion, detecting fake news using machine learning can play a crucial role in combating misinformation. This project aims to contribute to this domain by exploring effective machine learning models for fake news classification.  

### Objectives  
- To preprocess a dataset containing real and fake news articles.  
- To implement machine learning models for classifying news articles as real or fake.  
- To evaluate the performance of the models using appropriate metrics.  


In [1]:
# In this block we will import all the necessary modules
import pandas as pd
import numpy as np



## Dataset Description  

### Source  
The dataset used in this project is publicly available and has been sourced from [Kaggle/Dataset Source Name] (mention exact source if known). It contains labeled data for fake and real news articles.  

### Features  
The dataset includes the following features:  
- **Title**: The headline of the news article.  
- **Text**: The main content of the news article.  
- **Subject**: The category of the news article (e.g., politics, technology).  
- **Date**: The publication date of the article.  

### Target Variable  
The target variable is **Label**, which indicates whether the news article is:  
- **1**: Fake  
- **0**: Real  


### Preprocessing: Handling Separate Datasets  
The dataset consists of two separate files: one for fake news and another for real news. These will be loaded individually and combined into a single dataset for analysis.  

In [2]:
# Load the fake and true news datasets
fake_data = pd.read_csv('data/Fake.csv') 
true_data = pd.read_csv('data/True.csv')

# Add a label column to each dataset
fake_data['label'] = 1  # Label 1 for fake news
true_data['label'] = 0  # Label 0 for real news

# Combine the datasets
data = pd.concat([fake_data, true_data], axis=0).reset_index(drop=True)

# Shuffle the combined dataset to mix fake and real news
data = data.sample(frac=1, random_state=42).reset_index(drop=True)

# Display combined dataset info
print("Combined Dataset Shape:", data.shape)
print(data.head())


Combined Dataset Shape: (44898, 5)
                                               title  \
0  Ben Stein Calls Out 9th Circuit Court: Committ...   
1  Trump drops Steve Bannon from National Securit...   
2  Puerto Rico expects U.S. to lift Jones Act shi...   
3   OOPS: Trump Just Accidentally Confirmed He Le...   
4  Donald Trump heads for Scotland to reopen a go...   

                                                text       subject  \
0  21st Century Wire says Ben Stein, reputable pr...       US_News   
1  WASHINGTON (Reuters) - U.S. President Donald T...  politicsNews   
2  (Reuters) - Puerto Rico Governor Ricardo Rosse...  politicsNews   
3  On Monday, Donald Trump once again embarrassed...          News   
4  GLASGOW, Scotland (Reuters) - Most U.S. presid...  politicsNews   

                  date  label  
0    February 13, 2017      1  
1       April 5, 2017       0  
2  September 27, 2017       0  
3         May 22, 2017      1  
4       June 24, 2016       0  


### Data Cleaning  
To ensure the dataset is suitable for machine learning, the following cleaning steps will be performed:  
1. Check for and handle missing values.  
2. Remove duplicate rows.  
3. Drop unnecessary columns, if any.  

In [3]:
# Check for missing values
print("Dataset before cleaning:")
print("Shape:", data.shape)
print("Missing Values:\n", data.isnull().sum())

# Drop rows with missing values (if any)
data.dropna(inplace=True)

# Remove duplicate rows
data.drop_duplicates(inplace=True)

# Display updated dataset info
print("Dataset after cleaning:")
print("Shape:", data.shape)
print(data.head())


Missing Values:
 title      0
text       0
subject    0
date       0
label      0
dtype: int64
Dataset after cleaning:
Shape: (44689, 5)
                                               title  \
0  Ben Stein Calls Out 9th Circuit Court: Committ...   
1  Trump drops Steve Bannon from National Securit...   
2  Puerto Rico expects U.S. to lift Jones Act shi...   
3   OOPS: Trump Just Accidentally Confirmed He Le...   
4  Donald Trump heads for Scotland to reopen a go...   

                                                text       subject  \
0  21st Century Wire says Ben Stein, reputable pr...       US_News   
1  WASHINGTON (Reuters) - U.S. President Donald T...  politicsNews   
2  (Reuters) - Puerto Rico Governor Ricardo Rosse...  politicsNews   
3  On Monday, Donald Trump once again embarrassed...          News   
4  GLASGOW, Scotland (Reuters) - Most U.S. presid...  politicsNews   

                  date  label  
0    February 13, 2017      1  
1       April 5, 2017       0  
2  Septem