📘 InfoGuard AI – Automated Data Integrity Monitoring System

📌 Project Overview

InfoGuard AI is an automated monitoring system that tracks Wikipedia pages in real time, detects content changes, analyzes semantic drift using NLP models, and flags potentially suspicious or high-risk edits.

🔍 The System Combines

Web scraping via Wikipedia API
NLP-based semantic similarity analysis
Heuristic risk detection (username + content)
Anomaly Detection (z-score analysis)
Saves the results to a CSV file
Cloud persistence with MongoDB
CI/CD automation using GitHub Actions

🔧 Use Cases

Misinformation detection
Public knowledge auditing
Historical data integrity
AI-assisted moderation
Research on content evolution

🛠️ Tech Stack

Purpose	Tools / Libraries
Language	Python 3.8+
Web Scraping	mwparserfromhell
NLP	Sentence Transformers
ML	Cosine Similarity
Database	MongoDB
CI/CD	GitHub Actions
Containerization	Docker

📊 Risk Scoring Logic

Final edit risk is computed using:

Final Risk = 0.5 × Semantic Change + 0.3 × Content Risk + 0.2 × Username Risk

Edits are flagged if:

Semantic similarity drops significantly
Risk score crosses threshold

🚀 How to Run

Clone the Repository

git clone https://github.com/SiddheshCodeMaster/InfoGuard-AI.git
cd InfoGuard-AI

Install Dependencies
```
pip install -r requirements.txt
```

Configure Environment

Create .env file:

MONGODB_URI=your_mongodb_connection_string

Run Locally

python services/scraper/wiki_scrapper.py

🐳 Docker Usage

docker build -t infoguard-ai .
docker run --env MONGODB_URI=your_uri infoguard-ai

⏱️ Automated Monitoring

The system runs automatically every 30 minutes using GitHub Actions. Workflow file:
```
.github/workflows/monitor.yml
```

🔮 Future Enhancements

Real-time alerting system
Web dashboard
Editor behavior profiling
Advanced explainable AI
Multilingual monitoring

👨‍💻 Author

Siddhesh Shankar Data Science | NLP | DevOps | Backend Engineering

Portfolio: https://datasavywithsiddhesh.onrender.com/

📜 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
dashboard		dashboard
engine		engine
services		services
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
runtime_analysis.csv		runtime_analysis.csv
throughput_analysis.csv		throughput_analysis.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 InfoGuard AI – Automated Data Integrity Monitoring System

📌 Project Overview

🔧 Use Cases

🛠️ Tech Stack

📊 Risk Scoring Logic

🚀 How to Run

🔮 Future Enhancements

👨‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

SiddheshCodeMaster/InfoGuard-AI

Folders and files

Latest commit

History

Repository files navigation

📘 InfoGuard AI – Automated Data Integrity Monitoring System

📌 Project Overview

🔧 Use Cases

🛠️ Tech Stack

📊 Risk Scoring Logic

🚀 How to Run

🔮 Future Enhancements

👨‍💻 Author

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages