# Phase 2: Story2Audio Pipeline

This notebook implements the Story2Audio pipeline for Phase 2 of the NLP project. It:
- Preprocesses a story into chunks.
- Enhances chunks using Phi-2 for storytelling tone.
- Generates audio using Bark TTS.
- Stitches audio into a final .mp3 file.

**Requirements**:
- Python 3.8+
- FFmpeg installed and added to PATH
- Dependencies: transformers, torch, bark, pydub, scipy
- Hardware: CPU (GPU recommended for faster inference)

**Output**: outputs/final_story.mp3


In [1]:
import numpy as np
print(np.__version__)
print(np.array([1, 2, 3]))


1.24.4
[1 2 3]


In [3]:
import os
import logging
from src.preprocess import chunk_story
from src.enhancer_local import StoryEnhancer
from src.kokoro_tts import text_to_coqui_audio
from src.utils import combine_audio

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


# # Step 1: Load and Chunk Story



In [4]:
try:
    # Load sample story
    with open('sample_story.txt', 'r', encoding='utf-8') as f:
        story_text = f.read()
    logger.info('✅Story loaded successfully')

    # Chunk story (~150 words per chunk)
    chunks = chunk_story(story_text, chunk_size=150)
    logger.info(f'✅Story split into {len(chunks)} chunks')
except Exception as e:
    logger.error(f'Error in preprocessing: {e}')
    raise


INFO:__main__:✅Story loaded successfully
INFO:__main__:✅Story split into 1 chunks


# # Step 2: Enhance Chunks with tiiuae/falcon-rw-1b


In [5]:
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch


model_id = "tiiuae/falcon-rw-1b"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",              # Let accelerate handle device assignment
    offload_folder="offload"        # Needed if model is too large for GPU
)

# Correct: no `device` argument
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

prompt = "Once upon a time in a forest full of mysteries,"
results = text_generator(prompt, max_length=150, do_sample=True)
print(results[0]["generated_text"])


  return self.fget.__get__(instance, owner)()
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Once upon a time in a forest full of mysteries, there was a story of a boy and his cat. At first, the boy was afraid and he thought it would be a disaster to go out into the forest. When finally he was on the place he decided to test a magical cat, he caught her by her tail and the cat was very scared. She jumped away and disappeared immediately. Then the boy came closer and caught her by her neck. He looked into the eyes of the cat in silence and that was the beginning of a great friendship between them. The boy kept doing experiments in the forest and his friends were watching very patiently. Then a magic book fell into the forest accidentally and the cat went to get it. The boys started to


In [6]:
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


In [None]:
try:
    # Initialize enhancer
    enhancer = StoryEnhancer()
    logger.info('✅StoryEnhancer initialized Locally')

    # Enhance each chunk
    enhanced_chunks = []
    for idx, chunk in enumerate(chunks):
        enhanced = enhancer.enhance_chunk(chunk)
        enhanced_chunks.append(enhanced)
        logger.info(f'✅Enhanced chunk {idx + 1}/{len(chunks)}')

except Exception as e:
    logger.error(f'Error in enhancement: {e}')
    raise


Device set to use cpu
INFO:src.enhancer_local:Initialized StoryEnhancer locally with model: tiiuae/falcon-rw-1b
INFO:__main__:StoryEnhancer initialized with Hugging Face API
INFO:src.enhancer_local:Tokenized input length: 164 tokens
INFO:__main__:Enhanced chunk 1/1


# # Step 3: Generate Audio with Kokoro_tts.


In [7]:
try:
    os.makedirs('outputs/temp', exist_ok=True)
    audio_files = text_to_coqui_audio(enhanced_chunks, output_dir='outputs/temp')
    logger.info(f'Generated audio files: {audio_files}')
except Exception as e:
    logger.error(f'Error in audio generation: {e}')
    raise

ERROR:__main__:Error in audio generation: name 'enhanced_chunks' is not defined


NameError: name 'enhanced_chunks' is not defined

# # Step 4: Stitch Audio into Final MP3


In [None]:
try:
    # Combine audio files
    output_path = 'outputs/final_story.mp3'
    combine_audio(audio_files, output_path)
    logger.info(f'Audio generated: {output_path}')
except Exception as e:
    logger.error(f'Error in audio stitching: {e}')
    raise


# # Step 5: Verify Output


In [None]:
if os.path.exists(output_path):
    logger.info('✅ Verification: Final audio file exists and is playable')
else:
    logger.error('❌ Verification: Final audio file not found')
    raise FileNotFoundError(f'Output file {output_path} not found')