This project implements a multimodal classification model for detecting fake news using the Fakeddit dataset. The model combines BERT for text processing and ResNet for image analysis, leveraging both textual and visual features to improve classification accuracy. By integrating these modalities, the system aims to provide a robust solution for identifying fake news in diverse content formats.
- Multimodal approach combining text and image analysis.
- Utilizes BERT for text processing and ResNet for image feature extraction.
- Trained and evaluated on the Fakeddit dataset.
- Supports fine-tuning for specific use cases.
-
Clone the repository:
git clone https://github.com/yourusername/fake-news-classification.git cd fake-news-classification
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
Download the Fakeddit dataset from Fakeddit and place it in the data/
directory. Ensure the dataset is preprocessed as required by the model.
-
Download the images:
python get_data.py --tsv_path multimodal_train.tsv
-
Train the model: Run all the in train_model.ipynb file.
-
Download the validation images:
python get_data.py --tsv_path multimodal_validate.tsv --output_csv val_output.csv --image_dir val_images
-
Validate the model:
python validate.py
The model achieves the following performance metrics on the Fakeddit dataset:
- Accuracy: 85%
- Precision: 85%
- Recall: 85%
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and push the branch.
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.