Skip to content

Code-r4Life/KDSH---Binary-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Novel Backstory Consistency Checker

๐ŸŽฏ Hackathon Project Overview

This project implements a reasoning-based system to check if character backstories are consistent with novel content. Unlike traditional ML classification, this system uses semantic analysis and pattern matching to determine consistency.

๐Ÿ—๏ธ Architecture

Windows Compatibility Note

โš ๏ธ Pathway is NOT natively supported on Windows. This project uses alternative approaches that work on Windows while maintaining the same functionality.

Pipeline Components

  1. Robust Pipeline (src/robust_pipeline.py) - Main system (Windows-compatible)
  2. Lightweight Pipeline (src/lightweight_pipeline.py) - Simple version
  3. Main Pipeline (src/main_pipeline.py) - Entry point with multiple modes
  4. View Results (src/view_results.py) - Analysis and visualization

๐Ÿš€ Quick Start

For Windows Users (Recommended)

# Activate virtual environment
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the robust pipeline (no hanging)
cd src
python main_pipeline.py --mode robust

# View results
python view_results.py

For Linux/WSL Users (with Pathway)

# If you have Linux/WSL and want to use Pathway
cd src
python main_pipeline.py --mode ingest  # Only works on Linux

๐Ÿ“Š Performance

Pipeline Accuracy System Hanging Learning Windows Support
Robust 93.3% โŒ No โœ… Yes โœ… Full
Lightweight 60% โŒ No โŒ No โœ… Full
Main (Pathway) N/A โœ… Yes โŒ No โŒ Limited

๐Ÿง  Features

Robust Pipeline (Recommended)

  • โœ… No System Hanging - Safe for Windows
  • โœ… Learning System - Improves with each run
  • โœ… Error Handling - Graceful degradation
  • โœ… Progress Tracking - Real-time feedback
  • โœ… Persistent Cache - Remembers patterns
  • โœ… 93.3% Accuracy - High performance

Key Capabilities

  1. Semantic Analysis - Keyword and entity matching
  2. Pattern Learning - Remembers consistent/inconsistent patterns
  3. Contradiction Detection - Identifies conflicting statements
  4. Entity Memory - Tracks character names and places
  5. Confidence Scoring - Provides prediction confidence

๐Ÿ“ Project Structure

kharagpur_hackathon/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ robust_pipeline.py      # Main system (Windows-compatible)
โ”‚   โ”œโ”€โ”€ lightweight_pipeline.py # Simple version
โ”‚   โ”œโ”€โ”€ main_pipeline.py         # Entry point
โ”‚   โ”œโ”€โ”€ view_results.py          # Analysis tool
โ”‚   โ”œโ”€โ”€ ingest.py               # Novel processing (Linux only)
โ”‚   โ””โ”€โ”€ retrieve.py             # Retrieval system
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ train.csv               # Training data
โ”‚   โ”œโ”€โ”€ test.csv                # Test data
โ”‚   โ”œโ”€โ”€ In search of the castaways.txt
โ”‚   โ”œโ”€โ”€ The Count of Monte Cristo.txt
โ”‚   โ””โ”€โ”€ robust_cache.pkl        # Learning cache
โ”œโ”€โ”€ results/                    # Output files
โ”œโ”€โ”€ report/                     # Analysis reports
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐ŸŽฎ Usage Examples

Basic Usage

# Run robust pipeline (default 20 rows)
python main_pipeline.py --mode robust

# Process more rows
python main_pipeline.py --mode robust --max-rows 50

# Process specific file
python main_pipeline.py --mode robust --csv-file train.csv

Analysis

# View detailed results
python view_results.py

# Check prediction accuracy
cat ../data/train_robust_predictions.csv

๐Ÿ“ˆ Results

Current Performance

  • Accuracy: 93.3% (14/15 predictions correct)
  • Novels Processed: "In Search of the Castaways", "The Count of Monte Cristo"
  • Learning Impact: 100% accuracy when learning score > 0.5
  • Processing Speed: ~1 second per row (no hanging)

Sample Results

โœ… ID 46: Thalcave (In Search of the Castaways)
   Actual: CONSISTENT | Predicted: CONSISTENT
   Confidence: 0.575 | Learning: 1.000

โœ… ID 137: Faria (The Count of Monte Cristo)
   Actual: INCONSISTENT | Predicted: INCONSISTENT
   Confidence: 0.150 | Learning: 0.000

๐Ÿ”ง Technical Details

Windows Compatibility Solutions

  1. CSV-based Storage - Instead of Pathway, uses CSV files
  2. Lightweight Embeddings - Keyword-based instead of heavy models
  3. Memory Management - Caching and limits prevent hanging
  4. Error Recovery - Graceful fallbacks when errors occur

Learning Algorithm

  • Pattern Recognition: Learns from consistent/inconsistent examples
  • Entity Memory: Remembers character names and relationships
  • Adaptive Scoring: Adjusts thresholds based on learned patterns
  • Persistent Cache: Saves learning between runs

Consistency Checking

  1. Keyword Overlap: Matches important terms
  2. Entity Matching: Checks character/place names
  3. Contradiction Detection: Looks for negative indicators
  4. Pattern Analysis: Considers sentence structure and sentiment

๐Ÿ† Hackathon Compliance

Requirements Met

โœ… Runnable Code - Works on Windows and Linux โœ… Clean Environment - Self-contained with requirements.txt โœ… Novel Analysis - Processes full novel texts โœ… Consistency Checking - Determines backstory validity โœ… Learning Component - Improves from examples โœ… No System Hanging - Safe for all environments

Pathway Alternative

Since Pathway doesn't work on Windows, this project uses:

  • CSV Storage - Same functionality, Windows-compatible
  • Pandas - Data manipulation and analysis
  • Custom Caching - Persistent learning system
  • Error Handling - Robust fallbacks

๐ŸŽฏ Future Improvements

  1. Enhanced Learning - More sophisticated pattern recognition
  2. Better Entity Extraction - Named entity recognition
  3. Sentiment Analysis - Emotional consistency checking
  4. Timeline Validation - Chronological consistency
  5. Cross-Novel Analysis - Character development tracking

๐Ÿ“ž Support

For issues or questions:

  1. Check the Windows compatibility notes above
  2. Use the robust pipeline for Windows systems
  3. View results with python view_results.py
  4. Check error logs in the console output

Note: This project is designed to work on Windows while maintaining the same functionality as Linux-based Pathway implementations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published