# Hadith Processing Workflow

This notebook demonstrates how to use the hadith processing package to analyze and process hadith texts using the Together API. The workflow includes:

1. Loading and setting up dependencies
2. Configuring the API client
3. Processing a single file
4. Processing an entire directory
5. Viewing and analyzing results

In [1]:
import os
from src import HadithProcessor
from dotenv import load_dotenv

# Load API key from environment
load_dotenv()
api_key = os.getenv('TOGETHER_API_KEY')

if not api_key:
    raise ValueError("Please set TOGETHER_API_KEY in .env file")

# Initialize processor
processor = HadithProcessor(api_key)

## Process a Single File

Let's start by processing a single hadith file to verify everything works correctly.

In [None]:
# Define input and output paths
DATA_DIR = "../DATA"
input_file = os.path.join(DATA_DIR, "ProcessingOutput/Sahih_bukhari/1.json")
output_file = os.path.join(DATA_DIR, "DistilitionOutput/Sahih_bukhari/1.json")

# Create output directory if needed
os.makedirs(os.path.dirname(output_file), exist_ok=True)

# Process single file
processed_count, tokens_before, tokens_after = processor.process_single_file(
    input_file=input_file,
    output_file=output_file
)

print(f"Processed {processed_count} hadiths")
print(f"Tokens before: {tokens_before}")
print(f"Tokens after: {tokens_after}")
if tokens_after > 0:
    print(f"Reduction ratio: {tokens_before/tokens_after:.2f}x")

## Process an Entire Directory

Now let's process all files in a directory. This will:
1. Find all JSON files in the input directory
2. Process each file and save results to output directory
3. Print summary statistics

In [None]:
# Define input and output directories
input_dir = os.path.join(DATA_DIR, "ProcessingOutput/Sahih_bukhari")
output_dir = os.path.join(DATA_DIR, "DistilitionOutput/Sahih_bukhari")

# Process entire directory
processor.process_directory(
    input_dir=input_dir,
    output_dir=output_dir
)

## Review Results

Let's look at an example of the processed output to verify the quality.

In [None]:
import json

# Load a processed file
with open(output_file, 'r', encoding='utf-8') as f:
    data = json.load(f)

# Show first entry's results
example = data[0]
print("Original Hadith:")
print(example['hadith'])
print("\nLessons Learned:")
print('\n'.join(example['hadith_lessons']))
print("\nPractical Applications:")
print('\n'.join(example['hadith_application']))
print("\nQuestion-Answer Pairs:")
for qa in example['FT_Pairs']:
    print(f"\nQuestion: {qa['question']}")
    print(f"Answer: {qa['answer']}")