Novel Backstory Consistency Checker

🎯 Hackathon Project Overview

This project implements a reasoning-based system to check if character backstories are consistent with novel content. Unlike traditional ML classification, this system uses semantic analysis and pattern matching to determine consistency.

🏗️ Architecture

Windows Compatibility Note

⚠️ Pathway is NOT natively supported on Windows. This project uses alternative approaches that work on Windows while maintaining the same functionality.

Pipeline Components

Robust Pipeline (src/robust_pipeline.py) - Main system (Windows-compatible)
Lightweight Pipeline (src/lightweight_pipeline.py) - Simple version
Main Pipeline (src/main_pipeline.py) - Entry point with multiple modes
View Results (src/view_results.py) - Analysis and visualization

🚀 Quick Start

For Windows Users (Recommended)

# Activate virtual environment
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the robust pipeline (no hanging)
cd src
python main_pipeline.py --mode robust

# View results
python view_results.py

For Linux/WSL Users (with Pathway)

# If you have Linux/WSL and want to use Pathway
cd src
python main_pipeline.py --mode ingest  # Only works on Linux

📊 Performance

Pipeline	Accuracy	System Hanging	Learning	Windows Support
Robust	93.3%	❌ No	✅ Yes	✅ Full
Lightweight	60%	❌ No	❌ No	✅ Full
Main (Pathway)	N/A	✅ Yes	❌ No	❌ Limited

🧠 Features

Robust Pipeline (Recommended)

✅ No System Hanging - Safe for Windows
✅ Learning System - Improves with each run
✅ Error Handling - Graceful degradation
✅ Progress Tracking - Real-time feedback
✅ Persistent Cache - Remembers patterns
✅ 93.3% Accuracy - High performance

Key Capabilities

Semantic Analysis - Keyword and entity matching
Pattern Learning - Remembers consistent/inconsistent patterns
Contradiction Detection - Identifies conflicting statements
Entity Memory - Tracks character names and places
Confidence Scoring - Provides prediction confidence

📁 Project Structure

kharagpur_hackathon/
├── src/
│   ├── robust_pipeline.py      # Main system (Windows-compatible)
│   ├── lightweight_pipeline.py # Simple version
│   ├── main_pipeline.py         # Entry point
│   ├── view_results.py          # Analysis tool
│   ├── ingest.py               # Novel processing (Linux only)
│   └── retrieve.py             # Retrieval system
├── data/
│   ├── train.csv               # Training data
│   ├── test.csv                # Test data
│   ├── In search of the castaways.txt
│   ├── The Count of Monte Cristo.txt
│   └── robust_cache.pkl        # Learning cache
├── results/                    # Output files
├── report/                     # Analysis reports
├── requirements.txt
└── README.md

🎮 Usage Examples

Basic Usage

# Run robust pipeline (default 20 rows)
python main_pipeline.py --mode robust

# Process more rows
python main_pipeline.py --mode robust --max-rows 50

# Process specific file
python main_pipeline.py --mode robust --csv-file train.csv

Analysis

# View detailed results
python view_results.py

# Check prediction accuracy
cat ../data/train_robust_predictions.csv

📈 Results

Current Performance

Accuracy: 93.3% (14/15 predictions correct)
Novels Processed: "In Search of the Castaways", "The Count of Monte Cristo"
Learning Impact: 100% accuracy when learning score > 0.5
Processing Speed: ~1 second per row (no hanging)

Sample Results

✅ ID 46: Thalcave (In Search of the Castaways)
   Actual: CONSISTENT | Predicted: CONSISTENT
   Confidence: 0.575 | Learning: 1.000

✅ ID 137: Faria (The Count of Monte Cristo)
   Actual: INCONSISTENT | Predicted: INCONSISTENT
   Confidence: 0.150 | Learning: 0.000

🔧 Technical Details

Windows Compatibility Solutions

CSV-based Storage - Instead of Pathway, uses CSV files
Lightweight Embeddings - Keyword-based instead of heavy models
Memory Management - Caching and limits prevent hanging
Error Recovery - Graceful fallbacks when errors occur

Learning Algorithm

Pattern Recognition: Learns from consistent/inconsistent examples
Entity Memory: Remembers character names and relationships
Adaptive Scoring: Adjusts thresholds based on learned patterns
Persistent Cache: Saves learning between runs

Consistency Checking

Keyword Overlap: Matches important terms
Entity Matching: Checks character/place names
Contradiction Detection: Looks for negative indicators
Pattern Analysis: Considers sentence structure and sentiment

🏆 Hackathon Compliance

Requirements Met

✅ Runnable Code - Works on Windows and Linux ✅ Clean Environment - Self-contained with requirements.txt ✅ Novel Analysis - Processes full novel texts ✅ Consistency Checking - Determines backstory validity ✅ Learning Component - Improves from examples ✅ No System Hanging - Safe for all environments

Pathway Alternative

Since Pathway doesn't work on Windows, this project uses:

CSV Storage - Same functionality, Windows-compatible
Pandas - Data manipulation and analysis
Custom Caching - Persistent learning system
Error Handling - Robust fallbacks

🎯 Future Improvements

Enhanced Learning - More sophisticated pattern recognition
Better Entity Extraction - Named entity recognition
Sentiment Analysis - Emotional consistency checking
Timeline Validation - Chronological consistency
Cross-Novel Analysis - Character development tracking

📞 Support

For issues or questions:

Check the Windows compatibility notes above
Use the robust pipeline for Windows systems
View results with python view_results.py
Check error logs in the console output

Note: This project is designed to work on Windows while maintaining the same functionality as Linux-based Pathway implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
data		data
output		output
src		src
test_data		test_data
DAY1_CHECKLIST.md		DAY1_CHECKLIST.md
DAY1_FINAL_STATUS.md		DAY1_FINAL_STATUS.md
Dockerfile		Dockerfile
FUNCTIONAL_DEMO_OUTPUT.md		FUNCTIONAL_DEMO_OUTPUT.md
README.md		README.md
SOLUTION_SUMMARY.md		SOLUTION_SUMMARY.md
WSL_INSTRUCTIONS.md		WSL_INSTRUCTIONS.md
install_docker.ps1		install_docker.ps1
main.py		main.py
pathway_ingest.py		pathway_ingest.py
requirements.txt		requirements.txt
validate_day1.ps1		validate_day1.ps1
validate_day1_simple.ps1		validate_day1_simple.ps1
wsl_setup_commands.sh		wsl_setup_commands.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Novel Backstory Consistency Checker

🎯 Hackathon Project Overview

🏗️ Architecture

Windows Compatibility Note

Pipeline Components

🚀 Quick Start

For Windows Users (Recommended)

For Linux/WSL Users (with Pathway)

📊 Performance

🧠 Features

Robust Pipeline (Recommended)

Key Capabilities

📁 Project Structure

🎮 Usage Examples

Basic Usage

Analysis

📈 Results

Current Performance

Sample Results

🔧 Technical Details

Windows Compatibility Solutions

Learning Algorithm

Consistency Checking

🏆 Hackathon Compliance

Requirements Met

Pathway Alternative

🎯 Future Improvements

📞 Support

About

Uh oh!

Releases

Packages

Languages

Code-r4Life/KDSH---Binary-Classification

Folders and files

Latest commit

History

Repository files navigation

Novel Backstory Consistency Checker

🎯 Hackathon Project Overview

🏗️ Architecture

Windows Compatibility Note

Pipeline Components

🚀 Quick Start

For Windows Users (Recommended)

For Linux/WSL Users (with Pathway)

📊 Performance

🧠 Features

Robust Pipeline (Recommended)

Key Capabilities

📁 Project Structure

🎮 Usage Examples

Basic Usage

Analysis

📈 Results

Current Performance

Sample Results

🔧 Technical Details

Windows Compatibility Solutions

Learning Algorithm

Consistency Checking

🏆 Hackathon Compliance

Requirements Met

Pathway Alternative

🎯 Future Improvements

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages