#Project Name: Fake News Detection Using NLP and Machine Learning

##Project Description
This project aims to detect fake news articles using natural language processing (NLP) techniques and machine learning models. The primary objective is to preprocess the text data, convert it into meaningful numerical representations, and train classifiers to distinguish between real and fake news.
##Dataset
The dataset consists of news articles with the following features:

* Text: The content of the news article.
* Label: The classification of the news article as either "Real" or "Fake".

##Libraries and Tools
* Pandas: For data manipulation and analysis.
* Spacy: For natural language processing tasks.
* Scikit-learn: For machine learning algorithms and evaluation metrics.

##Project Workflow




1.  Data Loading and Exploration:

* Load the dataset using pandas.
* Explore the dataset to understand its structure and content.

2. Text Vectorization:

* Use Spacy's pre-trained model (en_core_web_lg) to convert text data into vectors.
3. Data Preprocessing:

* Normalize the vectorized data using Min-Max Scaler.
4. Model Training and Evaluation:

* Train a Naive Bayes classifier and evaluate its performance using classification metrics.
* Train a K-Nearest Neighbors (KNN) classifier and evaluate its performance.

##1. Load Dataset

In [None]:
import pandas as pd


In [None]:
df = pd.read_csv('Fake_Real_Data.csv')
df.head()

Unnamed: 0,Text,label
0,Top Trump Surrogate BRUTALLY Stabs Him In The...,Fake
1,U.S. conservative leader optimistic of common ...,Real
2,"Trump proposes U.S. tax overhaul, stirs concer...",Real
3,Court Forces Ohio To Allow Millions Of Illega...,Fake
4,Democrats say Trump agrees to work on immigrat...,Real


In [None]:
df.shape

(9900, 2)

In [None]:
df.label.value_counts()

label
Fake    5000
Real    4900
Name: count, dtype: int64

In [None]:
df['label_num'] = df.label.map({'Real': 1, 'Fake': 0})
df.head()

Unnamed: 0,Text,label,label_num
0,Top Trump Surrogate BRUTALLY Stabs Him In The...,Fake,0
1,U.S. conservative leader optimistic of common ...,Real,1
2,"Trump proposes U.S. tax overhaul, stirs concer...",Real,1
3,Court Forces Ohio To Allow Millions Of Illega...,Fake,0
4,Democrats say Trump agrees to work on immigrat...,Real,1


##2. Text Vectorization

In [None]:
import spacy
nlp = spacy.load('en_core_web_lg')



In [None]:
df['vector'] = df.Text.apply(lambda text: nlp(text).vector)
df.head()

Unnamed: 0,Text,label,label_num,vector
0,Top Trump Surrogate BRUTALLY Stabs Him In The...,Fake,0,"[-0.6759837, 1.4263071, -2.318466, -0.451093, ..."
1,U.S. conservative leader optimistic of common ...,Real,1,"[-1.8355803, 1.3101058, -2.4919677, 1.0268308,..."
2,"Trump proposes U.S. tax overhaul, stirs concer...",Real,1,"[-1.9851209, 0.14389805, -2.4221718, 0.9133005..."
3,Court Forces Ohio To Allow Millions Of Illega...,Fake,0,"[-2.7812982, -0.16120885, -1.609772, 1.3624227..."
4,Democrats say Trump agrees to work on immigrat...,Real,1,"[-2.2010763, 0.9961637, -2.4088492, 1.128273, ..."


##3. Data Preprocessing

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.vector.values, df.label_num, test_size=0.2, random_state=42)

In [None]:
import numpy as np
X_train_2d = np.stack(X_train)
X_test_2d = np.stack(X_test)
X_train_2d

array([[-2.0152547 ,  0.98641217, -2.3584037 , ..., -1.0675061 ,
        -1.9569906 ,  0.93659127],
       [-1.0012243 ,  1.4828775 , -2.1455953 , ...,  0.3300781 ,
        -1.799691  ,  0.7323975 ],
       [-1.9412339 ,  1.0061342 , -1.2211968 , ..., -1.0381267 ,
        -1.6678015 ,  1.0008049 ],
       ...,
       [-1.1860418 ,  0.9153    , -2.4448311 , ..., -0.32731298,
        -2.97908   ,  1.1330711 ],
       [-2.0719678 ,  1.015092  , -2.2288282 , ..., -0.98319465,
        -1.6227347 ,  0.66743565],
       [-1.7736759 ,  0.6012506 , -1.835393  , ..., -0.8052035 ,
        -2.0403569 ,  0.6292755 ]], dtype=float32)

##4. Model Training and Evaluation:

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report

scaler = MinMaxScaler()
X_train_2d_scale = scaler.fit_transform(X_train_2d)
X_test_2d_scale = scaler.fit_transform(X_test_2d)
model = MultinomialNB()
model.fit(X_train_2d_scale, y_train)
y_pred = model.predict(X_test_2d_scale)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.94      0.96      0.95       973
           1       0.96      0.94      0.95      1007

    accuracy                           0.95      1980
   macro avg       0.95      0.95      0.95      1980
weighted avg       0.95      0.95      0.95      1980



In [None]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train_2d, y_train)
y_pred = model.predict(X_test_2d)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      0.99      0.99       973
           1       0.99      1.00      0.99      1007

    accuracy                           0.99      1980
   macro avg       0.99      0.99      0.99      1980
weighted avg       0.99      0.99      0.99      1980



##Results
* The Naive Bayes model achieved an accuracy of 95%.
* The KNN model achieved an accuracy of 99%.

##Conclusion
The project successfully demonstrates the application of NLP and machine learning techniques to detect fake news. The results indicate that the KNN model performs better in this context, achieving higher accuracy and precision.