# Existing Story in a new language
Here we will load a story JSON file (that contains the text), but process it into a new language

In [1]:
%load_ext autoreload
%autoreload 2
from dotenv import load_dotenv
load_dotenv()

PAY_FOR_API = True #change to True to run cells that cost money via API calls

In [2]:
from pathlib import Path
from src.config_loader import config
from PIL import Image
from src.utils import load_json, load_text_file, save_json, save_pickle, upload_to_gcs, upload_story_to_gcs
from src.generate import add_audio, add_translations
from src.story import generate_index_html
# Add the parent directory of 'src' to the Python path


FFmpeg path added to system PATH: C:\Program Files\ffmpeg-7.0-essentials_build\bin


### Add directories
story images can be re-used between languages, but audio files are language specific, so we structure the story directory story_name/language with audio files in 'language/' and images and the english JSON file in story_name dir

In [3]:
notebook_dir = Path().absolute()  # This gives src/notebooks
phrase_dir = notebook_dir.parent / "data" / "phrases" #where we store text files of phrases
story_dir = notebook_dir.parent / "outputs" / "stories" # where we store our stories

In [4]:
story_name = "dining_dilemma_at_local_restaurant" #omit the leading story_
clean_story_name = f"story_{story_name.lower().replace(' ', '_')}"
story_path = story_dir / clean_story_name / f"{clean_story_name}.json"

story_dict = load_json(story_path)
print(config.TARGET_LANGUAGE_NAME)
config.get_voice_models()

French


(VoiceInfo(name='en-GB-Studio-B', provider=<VoiceProvider.GOOGLE: 'google'>, voice_type=<VoiceType.STUDIO: 'studio'>, gender='MALE', language_code='en-GB', country_code='GB', voice_id='en-GB-Studio-B'),
 VoiceInfo(name='fr-FR-Studio-A', provider=<VoiceProvider.GOOGLE: 'google'>, voice_type=<VoiceType.STUDIO: 'studio'>, gender='FEMALE', language_code='fr-FR', country_code='FR', voice_id='fr-FR-Studio-A'),
 VoiceInfo(name='fr-FR-Studio-D', provider=<VoiceProvider.GOOGLE: 'google'>, voice_type=<VoiceType.STUDIO: 'studio'>, gender='MALE', language_code='fr-FR', country_code='FR', voice_id='fr-FR-Studio-D'))

## Generate the story files
Once you are happy with the flashcard coverage, you can:
* translate and add audio
* create the story images
* create the story album files (M4a files with synced lyrics)
* create the story HTML file using those previous files, and upload to Google Cloud Storage
* tag the flascards with the story name...this will then mean you can link to the story from within Anki (the template uses tags to auto-create hyperlinks)

In [5]:
story_dialogue_audio = add_translations(story_dict)
story_dialogue_audio = add_audio(story_dialogue_audio)

adding translations:   0%|          | 0/2 [00:00<?, ?it/s]

Beginning translation for setup


adding translations:  50%|█████     | 1/2 [00:02<00:02,  2.16s/it]

Translated dialogue
Beginning translation for resolution


adding translations: 100%|██████████| 2/2 [00:04<00:00,  2.07s/it]


Translated dialogue


adding audio:   0%|          | 0/2 [00:00<?, ?it/s]

Beginning text-to-speech for setup


Generating dialogue audio: 100%|██████████| 7/7 [00:23<00:00,  3.41s/it]
adding audio:  50%|█████     | 1/2 [00:40<00:40, 40.87s/it]

Text-to-speech for dialogue done
Beginning text-to-speech for resolution


Generating dialogue audio: 100%|██████████| 6/6 [00:20<00:00,  3.36s/it]
adding audio: 100%|██████████| 2/2 [01:01<00:00, 30.71s/it]

Text-to-speech for dialogue done





In [6]:
#this has target language content in now so we save in language dir
save_pickle(data=story_dialogue_audio, file_path=story_dir / clean_story_name / config.TARGET_LANGUAGE_NAME / f"{clean_story_name}.pkl")

M4A audio files which you will be able to download and play via a media player.
They have synced lyrics which can be viewed in the Oto Music Player app

In [7]:
from src.story import create_album_files
FIRST_STORY_PART = list(story_dialogue_audio.keys())[0]
#may need to change depending on size of story made and what parts there are
album_image = Image.open(story_dir / clean_story_name / f"{clean_story_name}_{FIRST_STORY_PART}.png")
#create m4a file:
create_album_files(story_data_dict=story_dialogue_audio, cover_image=album_image, output_dir=story_dir / clean_story_name / config.TARGET_LANGUAGE_NAME, story_name=clean_story_name)

creating album:  50%|█████     | 1/2 [00:03<00:03,  3.37s/it]

Saved M4A file track number 1


creating album: 100%|██████████| 2/2 [00:05<00:00,  2.97s/it]

Saved M4A file track number 2





Now we generate the main html file - this wraps up the M4A files and image files within it, so it's self-contained

In [8]:
from src.story import create_html_story

create_html_story(
            story_data_dict=story_dialogue_audio,
            image_dir=story_dir / clean_story_name, #the langauge sub-folders will be picked up automatically
            story_name=clean_story_name,
        )

Preparing HTML data: 100%|██████████| 2/2 [02:11<00:00, 65.62s/it]

HTML story created at: y:\Python Scripts\audio-language-trainer\outputs\stories\story_dining_dilemma_at_local_restaurant\French\story_dining_dilemma_at_local_restaurant.html





WindowsPath('y:/Python Scripts/audio-language-trainer/outputs/stories/story_dining_dilemma_at_local_restaurant/French/story_dining_dilemma_at_local_restaurant.html')

Upload to a public google cloud bucket

In [9]:
html_story_path = story_dir / clean_story_name / config.TARGET_LANGUAGE_NAME / f"{clean_story_name}.html"
assert html_story_path.exists()
upload_story_to_gcs(html_file_path=html_story_path)

'https://storage.googleapis.com/audio-language-trainer-stories/french/story_dining_dilemma_at_local_restaurant/story_dining_dilemma_at_local_restaurant.html'

Update the index webpage

In [10]:
generate_index_html()
#will default to public GCS bucket
upload_to_gcs(
    file_path="../outputs/stories/index.html",
    content_type="text/html"
)

'https://storage.googleapis.com/audio-language-trainer-stories/index.html'