# **Fake News Detection**
***

## **Introduction**


In the digital era, encountering deceptive or inaccurate content has become increasingly common. This phenomenon has been particularly evident in recent events such as the ongoing COVID-19 pandemic, US elections, and recent military conflicts like Russia-Ukraine and Israel-Hamas. This type of misinformation, often called **"fake news"**, is intentionally fabricated or deceptive content presented as authentic news. It is distributed through various channels such as media and social networks, which mimic the distribution methods of legitimate news sources.
In the current digital landscape, it has become a significant problem, in contrast to the traditional news channels. In the past, newspapers and television had a limited number of distribution channels, which resulted in strict fact-checking and the provision of accurate information. However, with the internet becoming the primary source of information, the ease of sharing content has increased significantly, leading to a rise in unverified information.

This surge in fake news poses a serious problem due to its potential to deceive and manipulate public opinion. False narratives can impact political processes public health, and even contribute to social unrest. The lack of strict editorial oversight and the rapid dissemination of information on digital platforms exacerbate the challenge of distinguishing between genuine and misleading content.

Fortunately, advancements in data science offer a promising solution to the problem of fake news. Today, data analytics tools and sophisticated algorithms can analyze vast amounts of data to detect patterns and anomalies that may indicate misinformation.  
My project aims to use these algorithms to build a predictive model that can determine whether a piece of news is **true** or **false**.  
The goal of developing such a tool is to not only improve the ability to identify fake news but also to contribute to larger efforts to combat the spread of misinformation.   
This project aligns with the growing need for technologically driven solutions to address the challenges posed by fake news in the contemporary information landscape.

## **Data Collection**

**Source**: kaggle.com  
**Link**: https://www.kaggle.com/datasets/c010104/fakenewsdetectiondataset?select=Fake.csv

In [6]:
# Import libraries for data manipulation
import pandas as pd

# Import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Import libraries for text preprocessing (NLP)
from wordcloud import WordCloud, STOPWORDS
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import re

# Import libraries for transformation (NLP)
from sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformer

# Import libraries for ML
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

In [7]:
# read csv
df_fake = pd.read_csv('Project1/Fake.csv')
df_true = pd.read_csv('Project1/True.csv')