Skip to content

HayatiYrtgl/Fake_News_Detection_using_sklearn_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

This Python code is a machine learning pipeline for classifying news articles as fake or real. Let's break down the analysis step by step:

  1. Importing Libraries: The code begins by importing necessary libraries including pandas, numpy, and scikit-learn modules.

  2. Read Data: It reads a CSV file containing news data into a pandas DataFrame.

  3. Data Cleaning:

    • It checks for null values in the DataFrame.
    • It replaces empty strings in the 'text' column with NaN values.
    • It drops rows with NaN values.
  4. Data Exploration:

    • It visualizes the distribution of labels ('fake' or 'real') using seaborn.
    • It prints the counts of each label.
  5. Train-Test Split: It splits the data into training and testing sets using 70% for training and 30% for testing.

  6. Pipeline Creation:

    • It creates a machine learning pipeline consisting of two steps:
      • TF-IDF Vectorization: Converts text data into numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).
      • Linear Support Vector Classification (LinearSVC): A linear SVM classifier is used for text classification.
  7. Model Training: It fits the pipeline to the training data.

  8. Prediction: It predicts the labels for the test data.

  9. Evaluation:

    • It computes confusion matrix and prints it.
    • It prints a classification report containing precision, recall, F1-score, and support for each class.

Overall, this code performs text classification on news articles using TF-IDF features and a LinearSVC classifier, and evaluates the model's performance using metrics such as precision, recall, and F1-score.