GitHub - Nagamedha/CSC8370_Data_Security

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Code		Code
Output		Output
Group5_DS Final Project Report.pdf		Group5_DS Final Project Report.pdf
Readme		Readme

Repository files navigation

CSC8370 Data Security Project: README
Project Overview
This project implements machine learning solutions for MNIST handwritten digit classification. The project is divided into three levels, each with increasing complexity:
• Level 1: Centralized Learning
• Level 2: Federated Learning
• Level 3: Robust Federated Learning with Malicious Client Detection
The models are evaluated on accuracy, and output files are generated to facilitate further analysis.

Team Members
• Nagamedha Sakhamuri – 002828574 – nsakhamuri1@student.gsu.edu
• Pranav Chowdary Nagothu – 002852538 – pnagothu1@student.gsu.edu
Contributions: Both team members collaborated equally on all levels, including coding, testing, and documentation.

Project Structure
The project consists of three Python scripts:
1 Level1_Final.ipynb: Implements centralized learning for MNIST classification.
2 Level2_Final.ipynb: Implements federated learning with 10 clients.
3 Level3_Final.ipynb: Implements federated learning with malicious client detection and robust aggregation.

Requirements
The project requires the following Python libraries:
• torch
• torchvision
• matplotlib
• random
• copy
• google.colab (for file downloads in Colab)
Install missing libraries using:
!pip install torch torchvision matplotlib

Instructions
Setup
1 Open Google Colab.
2 Upload the desired level script (Level1_Final.ipynb, Level2_Final.ipynb, or Level3_Final.ipynb).
3 Run the script step by step or all at once.
4 Ensure the runtime type is set to GPU for faster execution:
◦ Navigate to Runtime > Change runtime type.
◦ Set Hardware Accelerator to GPU.

Level 1: Centralized Learning
• Overview: All MNIST data is centralized, and a Convolutional Neural Network (CNN) is trained to classify the digits.
• Execution:
1 Open Level1_Final.ipynb in Google Colab.
2 Run the script to train the model.
• Output Files:
◦ mnist_training.log: Logs training and testing accuracy for each epoch.
◦ best_mnist_model.pth: Stores the best-performing model's parameters.
◦ mnist_performance_graph.png: Visualizes accuracy over epochs.

Level 2: Federated Learning
• Overview: Implements federated learning with 10 clients, each holding a partition of the MNIST dataset. Models are trained locally and aggregated on a central server.
• Execution:
1 Open Level1_Final.ipynb in Google Colab.
2 Run the script to simulate federated learning.
• Output Files:
◦ level2_training.log: Logs accuracy values for each global epoch.
◦ federated_mnist_model.pth: Stores the aggregated global model's parameters.
◦ performance_graph_level2.png: Visualizes accuracy over federated learning rounds.

Level 3: Robust Federated Learning with Malicious Client Detection
• Overview: Introduces malicious clients that submit falsified updates. Implements mechanisms to detect and mitigate the effects of such clients using robust aggregation.
• Execution:
1 Open Level1_Final.ipynb in Google Colab.
2 Run the script to simulate federated learning with malicious client detection.
• Output Files:
◦ federated_mnist_level3_model.pth: Stores the final model parameters after robust aggregation.
◦ level3_accuracy_graph.png: Visualizes accuracy during the training process.
◦ federated_learning_level3.log: Logs accuracy values and detected malicious clients per epoch.

Performance Analysis
The performance of all three levels is analyzed and visualized in a combined graph:
1 Level 1: Centralized Learning
◦ Accuracy consistently improves over 20 epochs, reaching 99.36%.
2 Level 2: Federated Learning
◦ Accuracy improves steadily over 10 rounds, reaching 98.01%.
3 Level 3: Robust Federated Learning
◦ With malicious client detection, accuracy improves over 15 rounds, reaching 98.52%.
• Output File:
◦ accuracy_comparison_graph.png: Shows accuracy comparison across all three levels.
Instructions to Hardcode Data: Update the level1_accuracies, level2_accuracies, and level3_accuracies lists with your performance results in the script to regenerate the graph.

Notes
• Ensure that output files are downloaded after execution using the integrated download functionality.
• Use GPU acceleration to avoid long training times.
• For Level 3, modify thresholds or parameters to experiment with detecting and mitigating malicious clients.
• Duplicate or replace output files automatically if they already exist during multiple runs.