This project implements a reasoning-based system to check if character backstories are consistent with novel content. Unlike traditional ML classification, this system uses semantic analysis and pattern matching to determine consistency.
- Robust Pipeline (
src/robust_pipeline.py) - Main system (Windows-compatible) - Lightweight Pipeline (
src/lightweight_pipeline.py) - Simple version - Main Pipeline (
src/main_pipeline.py) - Entry point with multiple modes - View Results (
src/view_results.py) - Analysis and visualization
# Activate virtual environment
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the robust pipeline (no hanging)
cd src
python main_pipeline.py --mode robust
# View results
python view_results.py# If you have Linux/WSL and want to use Pathway
cd src
python main_pipeline.py --mode ingest # Only works on Linux| Pipeline | Accuracy | System Hanging | Learning | Windows Support |
|---|---|---|---|---|
| Robust | 93.3% | โ No | โ Yes | โ Full |
| Lightweight | 60% | โ No | โ No | โ Full |
| Main (Pathway) | N/A | โ Yes | โ No | โ Limited |
- โ No System Hanging - Safe for Windows
- โ Learning System - Improves with each run
- โ Error Handling - Graceful degradation
- โ Progress Tracking - Real-time feedback
- โ Persistent Cache - Remembers patterns
- โ 93.3% Accuracy - High performance
- Semantic Analysis - Keyword and entity matching
- Pattern Learning - Remembers consistent/inconsistent patterns
- Contradiction Detection - Identifies conflicting statements
- Entity Memory - Tracks character names and places
- Confidence Scoring - Provides prediction confidence
kharagpur_hackathon/
โโโ src/
โ โโโ robust_pipeline.py # Main system (Windows-compatible)
โ โโโ lightweight_pipeline.py # Simple version
โ โโโ main_pipeline.py # Entry point
โ โโโ view_results.py # Analysis tool
โ โโโ ingest.py # Novel processing (Linux only)
โ โโโ retrieve.py # Retrieval system
โโโ data/
โ โโโ train.csv # Training data
โ โโโ test.csv # Test data
โ โโโ In search of the castaways.txt
โ โโโ The Count of Monte Cristo.txt
โ โโโ robust_cache.pkl # Learning cache
โโโ results/ # Output files
โโโ report/ # Analysis reports
โโโ requirements.txt
โโโ README.md
# Run robust pipeline (default 20 rows)
python main_pipeline.py --mode robust
# Process more rows
python main_pipeline.py --mode robust --max-rows 50
# Process specific file
python main_pipeline.py --mode robust --csv-file train.csv# View detailed results
python view_results.py
# Check prediction accuracy
cat ../data/train_robust_predictions.csv- Accuracy: 93.3% (14/15 predictions correct)
- Novels Processed: "In Search of the Castaways", "The Count of Monte Cristo"
- Learning Impact: 100% accuracy when learning score > 0.5
- Processing Speed: ~1 second per row (no hanging)
โ
ID 46: Thalcave (In Search of the Castaways)
Actual: CONSISTENT | Predicted: CONSISTENT
Confidence: 0.575 | Learning: 1.000
โ
ID 137: Faria (The Count of Monte Cristo)
Actual: INCONSISTENT | Predicted: INCONSISTENT
Confidence: 0.150 | Learning: 0.000
- CSV-based Storage - Instead of Pathway, uses CSV files
- Lightweight Embeddings - Keyword-based instead of heavy models
- Memory Management - Caching and limits prevent hanging
- Error Recovery - Graceful fallbacks when errors occur
- Pattern Recognition: Learns from consistent/inconsistent examples
- Entity Memory: Remembers character names and relationships
- Adaptive Scoring: Adjusts thresholds based on learned patterns
- Persistent Cache: Saves learning between runs
- Keyword Overlap: Matches important terms
- Entity Matching: Checks character/place names
- Contradiction Detection: Looks for negative indicators
- Pattern Analysis: Considers sentence structure and sentiment
โ Runnable Code - Works on Windows and Linux โ Clean Environment - Self-contained with requirements.txt โ Novel Analysis - Processes full novel texts โ Consistency Checking - Determines backstory validity โ Learning Component - Improves from examples โ No System Hanging - Safe for all environments
Since Pathway doesn't work on Windows, this project uses:
- CSV Storage - Same functionality, Windows-compatible
- Pandas - Data manipulation and analysis
- Custom Caching - Persistent learning system
- Error Handling - Robust fallbacks
- Enhanced Learning - More sophisticated pattern recognition
- Better Entity Extraction - Named entity recognition
- Sentiment Analysis - Emotional consistency checking
- Timeline Validation - Chronological consistency
- Cross-Novel Analysis - Character development tracking
For issues or questions:
- Check the Windows compatibility notes above
- Use the robust pipeline for Windows systems
- View results with
python view_results.py - Check error logs in the console output
Note: This project is designed to work on Windows while maintaining the same functionality as Linux-based Pathway implementations.