# Story Generation
We remember things better as stories. The plan here is to pick a subset of our phrases, extract the vocabularly, and generate a story based off of them. We can then pull in more flashcards / phrases to ensure a more complete phrase coverage

In [2]:
%load_ext autoreload
%autoreload 2

In [47]:
from dotenv import load_dotenv
import sys
import os
import pickle
from pathlib import Path
load_dotenv()
# Add the parent directory of 'src' to the Python path
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [90]:
from src.utils import load_text_file, save_json, load_json
from src.nlp import get_vocab_dictionary_from_phrases, get_vocab_dict_from_dialogue, compare_vocab_overlap, create_flashcard_index
from src.config_loader import config
from pprint import pprint
import random

filepath = "../data/longman_1000_phrases.txt"
phrases = load_text_file(filepath)
pprint(f"First few phrases {phrases[:10]}")

#we already have flashcards generated for some phrases:
#a flashcard index allows us to select flashcards that cover a specific
#vocabulary range, it's quite computationally expensive, but is generated
#using create_flashcard_index



("First few phrases ['Do you want to become a famous writer?', 'Let me show "
 "you around the city', 'We need to handle this situation carefully', 'Stop "
 'wasting time on this\', \'Do you like playing the guitar at night?\', "I\'m '
 'taking a vacation next month", "Don\'t forget to wear a helmet while '
 'cycling", "Let\'s cut unnecessary expenses this year", "We\'re producing a '
 'new product soon", \'Did you remember to turn off the stove?\']')


## create the flashcard index

In [93]:
# long process, so only create if it doesn't exist
notebook_dir = Path().absolute()  # This gives src/notebooks
data_dir = notebook_dir.parent / "data" / "longman_1000_phrase_index.json"

if data_dir.exists():
    phrase_index = load_json(data_dir)
else:
    phrase_index = create_flashcard_index(phrases)
    save_json(phrase_index, data_dir)


Indexes phrases...: 100%|██████████| 841/841 [17:26<00:00,  1.24s/it]


## Sample some phrases to generate the story from
This will pin the story to the vocab found in some pre-existing phrases

In [7]:
vocab_dict_flashcards = get_vocab_dictionary_from_phrases(phrases[:50])

Now generate the story

In [None]:
from src.dialogue_generation import generate_story

story_path = notebook_dir.parent / "data" / "stories" / "test_story" / "story_community_park.json"

if story_path.exists():
    story_50_phrases = load_json(story_path)
else:
    story_50_phrases = generate_story(vocab_dict_flashcards)
    save_json(story_50_phrases, story_path)


We find that the LLM goes a bit beyond the vocab found in the flashcards

In [134]:
from src. nlp import get_vocab_dict_from_dialogue
vocab_dict_story = get_vocab_dict_from_dialogue(story_50_phrases, limit_story_parts=None)

In [135]:
from src.nlp import find_missing_vocabulary

vocab_overlap = find_missing_vocabulary(vocab_dict_flashcards, vocab_dict_story)

=== VOCABULARY COVERAGE ANALYSIS ===
Target verbs covered by flashcards: 46.3%
Target vocabulary covered by flashcards: 31.1%

Verbs needing new flashcards:
['give', 'could', 'cycle', 'love', 'involve'] ...

Vocabulary needing new flashcards:
['absolutely', 'work', 'forward', 'since', 'lot'] ...


In [136]:
from src.nlp import get_matching_flashcards_indexed
# Let's pull all the existing phrases we need to cover the vocab on our story
results = get_matching_flashcards_indexed(vocab_dict_story, phrase_index)

In [137]:
proposed_flashcard_phrases = [card.get('phrase') for card in results['selected_cards']]
vocab_from_new_flashcards = get_vocab_dictionary_from_phrases(proposed_flashcard_phrases)
new_overlap = find_missing_vocabulary(vocab_from_new_flashcards, vocab_dict_story)

=== VOCABULARY COVERAGE ANALYSIS ===
Target verbs covered by flashcards: 88.1%
Target vocabulary covered by flashcards: 83.3%

Verbs needing new flashcards:
['cycle', 'plant', 'involve', 'delay', 'brainstorm'] ...

Vocabulary needing new flashcards:
['potluck', 'hey', 'spot', 'charge', 'able'] ...


In [138]:
#we can fill in the gap with some missing flashcards:

missing_vocab_dict = new_overlap['missing_vocab']
missing_vocab_dict

{'verbs': ['cycle',
  'plant',
  'involve',
  'delay',
  'brainstorm',
  'ride',
  'accomplish',
  'create'],
 'vocab': ['potluck',
  'hey',
  'spot',
  'charge',
  'able',
  'outdoors',
  'sore',
  'perfect',
  '6',
  'empty',
  'productive',
  'campaign',
  'maybe',
  'fundraiser',
  'construction',
  'handiwork',
  'fundraising',
  'support',
  'finger',
  'glad',
  'wow',
  'downtown',
  'snack',
  'alright',
  'disappointing',
  'proud',
  'agreed',
  'mini',
  'apparently',
  'hmm']}

In [139]:
from src.phrase import generate_phrases_from_vocab_dict

missing_phrases = generate_phrases_from_vocab_dict(missing_vocab_dict)
missing_phrases

Function that called this one: generate_minimal_phrases_with_llm. Sleeping for 20 seconds
Iteration 1/10
Generated 11 phrases - with minimal phrase prompt
We have 0 verbs and 0 vocab words left
All words have been used. Phrase generation complete.


["Hey, let's brainstorm ideas for our fundraising campaign downtown.",
 'Wow, your handiwork on this mini plant is perfect!',
 'Maybe we can create a productive cycle to accomplish more.',
 'Did you spot that disappointing construction delay downtown?',
 "I'm glad you're able to support the potluck fundraiser.",
 'Shall we ride our bikes outdoors this afternoon?',
 'Hmm, apparently my finger is sore from typing.',
 "Don't forget to charge your phone before leaving.",
 'We agreed to involve 6 people in the project.',
 "Are you proud of what we've been able to accomplish?",
 "Alright, let's have a snack and empty our minds."]

In [140]:
num_cards = len(results["selected_cards"])
print(f"We need {num_cards + len(missing_phrases)} flashcards to cover the story")

We need 99 flashcards to cover the story


In [None]:
from src.utils import save_text_file

save_text_file(proposed_flashcard_phrases + missing_phrases, "../data/stories/test_story/test_phrases.txt")

We will need to generate images for the missing phrases, then we can create an anki deck for that particualr story

In [145]:
from src.images import add_images_to_phrases
PAY_FOR_API = True

output_dir = notebook_dir.parent / "data" / "longman_phrase_images" / "longman1000"

if not output_dir.exists():
    print("wrong directory")
    PAY_FOR_API = False

if PAY_FOR_API:
    image_files_and_prompts = add_images_to_phrases(phrases=missing_phrases, output_dir = output_dir)



  0%|          | 0/11 [00:00<?, ?it/s]

Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 16/16 [00:16<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A group of diverse people huddled around a table in a bustling downtown setting, their heads filled with glowing lightbulbs, while a large piggy bank sits centerpiece on the table. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 4/4 [00:04<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A person with an excited expression admiring a tiny, intricately crafted miniature garden in a small pot, their hands positioned as if they just finished working on it. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 5/5 [00:05<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 16/16 [00:16<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A group of diverse people forming a human chain around a giant spinning wheel, each person adding a gear or tool to the wheel as it rotates, with completed projects and achievements falling from the wheel onto a growing pile below. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 6/6 [00:06<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 16/16 [00:16<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A frustrated person pointing at a half-finished building in a city center, with construction signs and equipment scattered around, while other pedestrians look disappointed and shake their heads. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 4/4 [00:04<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A cheerful person holding a dish for a potluck, surrounded by a diverse group of supporters, with a festive fundraiser banner and donation jar nearby. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 6/6 [00:06<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: Two friends on colorful bicycles, smiling and gesturing towards a sunny outdoor park path, with an afternoon sun visible in the sky and trees lining the route. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 7/7 [00:07<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A frustrated person with an exaggerated, swollen red finger hovering over a keyboard, surrounded by thought bubbles containing question marks and exclamation points. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 2/2 [00:02<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 16/16 [00:16<00:00,  1.01s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: A stressed person frantically searching for a charger in a messy bag near an open door, with a faded phone battery icon hovering above them. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 6/6 [00:06<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.00s/it][0m


No image generated using imagen-3.0-generate-001 with prompt: Six diverse people huddled around a large project blueprint, enthusiastically pointing and discussing, with a handshake between two of them symbolizing agreement. in the style of a children's book illustration, Axel Scheffler style, thick brushstrokes, colored pencil texture, expressive characters, bold outlines, textured shading, pastel color palette


Waiting for API cooldown: 100%|[34m██████████████[0m| 5/5 [00:05<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.00s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 17/17 [00:17<00:00,  1.00s/it][0m
Waiting for API cooldown: 100%|[34m██████████████[0m| 9/9 [00:09<00:00,  1.01s/it][0m


Function that called this one: create_image_generation_prompt. Sleeping for 20 seconds


Waiting for API cooldown: 100%|[34m████████████[0m| 19/19 [00:19<00:00,  1.01s/it][0m
Waiting for API cooldown: 100%|[34m████████████[0m| 16/16 [00:16<00:00,  1.00s/it][0m
100%|██████████| 11/11 [10:34<00:00, 57.67s/it]
