This project detects deepfake videos using a Vision Transformer (ViT) model, classifying frames as real or manipulated with high accuracy.
- Dataset Preparation
- Model Architecture
- Training Process
- Validation and Metrics
- Video Prediction
- Installation and Setup
- Results
- Website Usage
- Real Videos:
/DFD_original_sequences - Manipulated Videos:
/DFD_manipulated_sequences
Extract frames at 1 frame per second for model input.
- Base Model: ViT (
vit_base_patch16_224) - Input Size: 224x224 pixels
- Classes: 2 (Real, Manipulated)
- Pretrained Weights: Yes (ImageNet)



