InfoGuard AI is an automated monitoring system that tracks Wikipedia pages in real time, detects content changes, analyzes semantic drift using NLP models, and flags potentially suspicious or high-risk edits.
🔍 The System Combines
Web scrapingvia Wikipedia API- NLP-based
semantic similarity analysis Heuristic risk detection(username + content)Anomaly Detection(z-score analysis)- Saves the results to a CSV file
- Cloud persistence with
MongoDB CI/CD automationusingGitHub Actions
- Misinformation detection
- Public knowledge auditing
- Historical data integrity
- AI-assisted moderation
- Research on content evolution
| Purpose | Tools / Libraries |
|---|---|
| Language | Python 3.8+ |
| Web Scraping | mwparserfromhell |
| NLP | Sentence Transformers |
| ML | Cosine Similarity |
| Database | MongoDB |
| CI/CD | GitHub Actions |
| Containerization | Docker |
Final edit risk is computed using:
Final Risk = 0.5 × Semantic Change + 0.3 × Content Risk + 0.2 × Username Risk
Edits are flagged if:
- Semantic similarity drops significantly
- Risk score crosses threshold
-
Clone the Repository
git clone https://github.com/SiddheshCodeMaster/InfoGuard-AI.git cd InfoGuard-AI -
Install Dependencies
pip install -r requirements.txt
-
Configure Environment
Create
.envfile:MONGODB_URI=your_mongodb_connection_string
-
Run Locally
python services/scraper/wiki_scrapper.py
-
🐳 Docker Usage
docker build -t infoguard-ai . docker run --env MONGODB_URI=your_uri infoguard-ai -
⏱️ Automated Monitoring
The system runs automatically every 30 minutes using GitHub Actions. Workflow file:
.github/workflows/monitor.yml
- Real-time alerting system
- Web dashboard
- Editor behavior profiling
- Advanced explainable AI
- Multilingual monitoring
Siddhesh Shankar Data Science | NLP | DevOps | Backend Engineering
Portfolio: https://datasavywithsiddhesh.onrender.com/
MIT License