# Story Generation
We remember things better as stories. The plan here is to pick a subset of our phrases, extract the vocabularly, and generate a story based off of them. We can then pull in more flashcards / phrases to ensure a more complete phrase coverage.

The story name will be story_some_title; when added as a 'tag' into Anki, this will add a hyperlink to a google cloud bucket of a specific format of bucket/language/story_name/story_name.html

This means it is easy to add new stories to an existing flashcard deck, and the links will update as soon as you add the tags

In [1]:
%load_ext autoreload
%autoreload 2
import os
import sys

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from google.auth import default
credentials, project = default()

In [2]:

from src.config_loader import config
from src.nlp import (
    get_vocab_dictionary_from_phrases,
)
from src.gcs_storage import get_phrase_path, upload_to_gcs, read_from_gcs, get_phrase_index_path, get_story_dialogue_path


### Add directories
story images can be re-used between languages, but audio files are language specific, so we structure the story directory story_name/language with audio files in 'language/' and images and the english JSON file in story_name dir

we already have flashcards generated for some phrases:
a flashcard index allows us to select flashcards that cover a specific vocabulary range, it's quite computationally expensive, but is generated
using create_flashcard_index

In [3]:
COLLECTION = "WarmUp150"
phrases = read_from_gcs(bucket_name=config.GCS_PRIVATE_BUCKET,
                        file_path=get_phrase_path(collection=COLLECTION))
phrase_index = read_from_gcs(bucket_name=config.GCS_PRIVATE_BUCKET,
                             file_path=get_phrase_index_path(collection=COLLECTION))



## If generating a new story - random sample some new phrases

We want to sample from phrases that have no tags

In [None]:
len(phrases)
# interactive cell, go through the phrases and generate a story, adjust the story name, upload
# 


In [None]:
vocab_dict_flashcards = get_vocab_dictionary_from_phrases(phrases[210:]) #75 phrases should give a decent amount of vocab
print(f"{len(vocab_dict_flashcards['verbs'])} verbs and {len(vocab_dict_flashcards['vocab'])}")

In [None]:
vocab_dict_flashcards

Now generate the story

In [None]:
from src.dialogue_generation import generate_story

story_name, story_dialogue = generate_story(vocab_dict_flashcards)
print(f"story_name is {story_name} for {COLLECTION}")
print(f"Story parts are {story_dialogue.keys()}")


In [None]:
story_dialogue

In [None]:
story_name = "Underwater Community Centre"
clean_story_name = f"story_{story_name.lower().replace(' ', '_')}"
upload_to_gcs(obj=story_dialogue, bucket_name=config.GCS_PRIVATE_BUCKET,
              file_name = get_story_dialogue_path(clean_story_name, COLLECTION))

Image files for each part of the story:

In [None]:
# --- Generate and upload images for all stories in the collection ---
from src.gcs_storage import get_story_names, get_story_dialogue_path, read_from_gcs
from src.images import generate_and_save_story_images
from src.config_loader import config

all_story_names = get_story_names(collection=COLLECTION, bucket_name=config.GCS_PRIVATE_BUCKET)
print(f"Found {len(all_story_names)} stories in collection '{COLLECTION}':", all_story_names)

for story_name in all_story_names[1:]:
    print(f"\nProcessing story: {story_name}")
    dialogue_path = get_story_dialogue_path(story_name, collection=COLLECTION)
    try:
        story_dialogue = read_from_gcs(
            bucket_name=config.GCS_PRIVATE_BUCKET,
            file_path=dialogue_path,
            expected_type="json"
        )
    except Exception as e:
        print(f"  ❌ Failed to load dialogue for {story_name}: {e}")
        continue
    try:
        image_results = generate_and_save_story_images(
            story_dict=story_dialogue,
            story_name=story_name,
            collection=COLLECTION,
            model_order=["deepai", "stability"]
        )
        print(f"  ✅ Images generated and uploaded for {story_name}: {image_results}")
    except Exception as e:
        print(f"  ❌ Failed to generate/upload images for {story_name}: {e}")
