<a href="https://colab.research.google.com/github/JaganK2Commit/Copilot/blob/main/StoryGenerator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### What is the project about ?


The project is to leverage some of the AI tools to automatically create videos that narrates moral stories to children.

I am using multiple models to produce the video content.
- OpenAI gpt-3.5-turbo for appropriate moral storylines.
- Detailed scene descriptions is then composed to provide prompts to ReplicateAI to generate relevant images and scenes.
- ResembleAI to create an expressive voiceover in a friendly storyteller tone.

These model calls are chained together using Langchain's SequentialChain to ensure continuity between the scenes by tracking the context and the history of the conversation/story. All of the visuals, audio narration, and background music later is combined using MoviePy to produce the final animated video.

By automating this video creation process, a quick and quality moral stories will be accessible for parents and childrens.

### AI tools and models used

| Task                | Service         | Model              |
|---------------------|-----------------|--------------------|
| Story Script        | OpenAI API      | gpt-3.5-turbo      |
| Text to Image       | Replicate AI API| ai-forever/kandinsky-2 |
| Back ground music   | Replicate AI API| suno-ai/bark |
| Text to Audio       | Resemble AI API | - |


### Install the dependencies

In [94]:
!pip install replicate
!pip install requests
!pip install openai
!pip install langchain
!pip install moviepy
!pip install ffmpeg --upgrade
!pip install pydub
!pip install resemble
from google.colab import output
output.clear()

In [71]:
import os
import openai
import time
import numpy as np
import replicate
import json

In [72]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain, SequentialChain, TransformChain
from langchain.llms import OpenAI, Replicate
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from pydub import AudioSegment

### Aquire Tokens for Replicate and OpenAI

In [73]:
# get your token from https://replicate.com/account
from getpass import getpass

REPLICATE_API_TOKEN = getpass()
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

··········


In [74]:
# get your key from https://platform.openai.com/account/api-keys
OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

··········


In [75]:
# get your key from https://app.resemble.ai/account/api
from resemble import Resemble

RESEMBLE_API_KEY = getpass()
os.environ["RESEMEBLE_API_KEY"] = RESEMBLE_API_KEY
Resemble.api_key(os.environ['RESEMEBLE_API_KEY'])

··········


### LLMChain for the story line generation.

In [175]:
### Use openAI model to generate a moral story line in a predefined output json format.

#### Fine tune the output - Fill the characters_descriptions with the last known description for each paragraph for story continuity

def fill_character_descriptions(input_data):
    character_descriptions = {}

    for item in input_data["story_lines"]:
        current_character_descriptions = item.get("character_descriptions", [])
        for character_description in current_character_descriptions:
            character_name = character_description["name"]
            character_description_text = character_description.get("description", '')
            if(character_description_text != ''):
              character_descriptions[character_name] = character_description_text

    for item in input_data["story_lines"]:
        updated_character_descriptions = []
        current_character_descriptions = item.get("character_descriptions", [])
        for character_description in current_character_descriptions:
            character_name = character_description["name"]
            if character_name in character_descriptions:
                updated_character_description = {
                    "name": character_name,
                    "description": character_descriptions[character_name]
                }
                updated_character_descriptions.append(updated_character_description)
        item["character_descriptions"] = updated_character_descriptions

    return input_data

# LLMChain for the story line generation
topic_template = """
You are a story teller, with wide knowledge of fairy tales, bedtime stories, kids stories etc.

Write a moral story for children. The story should be long and teach a positive lesson related to one of the following values:
[honesty, kindness, generosity, gratitude, forgiveness, empathy, respect, responsibility, courage].
Do not use any human names or profanity. Make the main character an animal like a dog, cat, rabbit, mouse, or bird.
Give the animal a descriptive name relating to their personality or appearance, like Happy Puppy, Busy Bunny, Clever Raven etc.
Tell a creative story that is fun and engaging for young children, while illustrating the moral theme.
Make sure to include a beginning, middle, and end. Provide 2-3 sentence descriptions of the scenes that could be illustrated.
The story should be different each time with varied characters, settings, and plots

The story should be in JSON format with the following keys:

{{
"title": Title for the story, not exceeding 15 words
"description":compose a description for the story, not exceeding 50 words
"story_lines" :
[{{
  "paragraphs": A paragraph that make up the story, not exceeding 30 words,
  "scene_description" : visual descriptions corresponding to each paragraph, so that this allow AI image tools to generate appropriate picture,
  "character_descriptions" : [{{
    "name" : Name of the character in this paragraph, name contain no adjectives,
    "description": appearance of the character, so that it allows to generate appropriate picture
  }}]
}}]

story_lines: Should contain 20-30 story lines.
Story should not include more than 2 characters.
Validate the output to be of json format, please include comma if it is being missed

Thank you!
"""
message_prompt = HumanMessagePromptTemplate(prompt=PromptTemplate(
                                                  template=topic_template,
                                                  input_variables=[]
                                                  ))

chat = ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo")
chat_prompt_template = ChatPromptTemplate.from_messages([message_prompt])
script_chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key='script')

output_script = script_chain.run({})
script = fill_character_descriptions(json.loads(output_script))
print(script)

{'title': 'Kindness of Clever Bunny', 'description': 'A story about a clever bunny who learns the value of kindness and helps a lonely bird', 'story_lines': [{'paragraphs': 'Once upon a time, in a beautiful meadow, lived a clever bunny named Clever Bunny. She was known for her quick thinking and problem-solving skills.', 'scene_description': 'A sunny meadow with colorful flowers and a clever bunny sitting under a shady tree.', 'character_descriptions': [{'name': 'Clever Bunny', 'description': 'A small, brown bunny with bright, intelligent eyes and long ears.'}]}, {'paragraphs': 'One day, while Clever Bunny was hopping around, she heard a sad chirping sound. She followed the sound and found a lonely bird named Lonely Bird sitting on a branch.', 'scene_description': 'Clever Bunny standing near a tree, looking up at a sad Lonely Bird sitting on a branch.', 'character_descriptions': [{'name': 'Clever Bunny', 'description': 'A small, brown bunny with bright, intelligent eyes and long ears.'

### LLM Chain for visual scenes


In [176]:
# LLMChain to create the replicate predictions for our text-to-image model
import re

previous_character_descriptions = {};

def update_scene_descriptions(input_data):
    character_descriptions = input_data["character_descriptions"]
    updated_scene_description = input_data["scene_description"]

    updated_scene_descriptions = []

    if character_descriptions:
        for character_description in character_descriptions:
            character_name = character_description["name"]
            character_description_text = character_description["description"]
            previous_character_descriptions[character_name] = character_description_text


    # Check if the character name appears in the scene description
    for character_name, character_description_text in previous_character_descriptions.items():
        # Append the character description to the scene description
        updated_scene_description = re.sub(re.escape(character_name), character_name + "," + character_description_text, updated_scene_description, flags=re.IGNORECASE)

    updated_scene_descriptions.append(updated_scene_description)

    return updated_scene_descriptions


def transform_func(inputs: dict) -> dict:
  video_model = replicate.models.get('ai-forever/kandinsky-2')
  #video_model = replicate.models.get('stability-ai/sdxl')

  video_version = video_model.versions.get("601eea49d49003e6ea75a11527209c4f510a93e2112c969d548fbb45b9c4f19f")
  #video_version = video_model.versions.get("2b017d9b67edd2ee1401238df49d75da53c523f36e363881e057f5dc3ed3c5b2")
  descriptions = inputs['script']['story_lines']

  predictions = []

  for index,description in enumerate(descriptions):
      print(description)
      updated_description = update_scene_descriptions(description)[0]
      script['story_lines'][index]['scene_description'] = updated_description
      print(f"Creating video prediction for '{updated_description}'...")
      video_prediction = replicate.predictions.create(version=video_version,
                                                      input={"prompt": updated_description, "prior_steps": '5', "guidance_scale": 4, "num_inference_steps": 100, "prior_cf_scale":4,
                                                             "scheduler": "p_sampler", "width": 1024, "height":768})
      # video_prediction = replicate.predictions.create(version=video_version,
      #                                                 input={"prompt": updated_description, "guidance_scale": 8, "num_inference_steps": 100, "scheduler": "DDIM",
      #                                                        "width": 1024, "height":768})
      predictions.append(video_prediction)
  return {'video_predictions': predictions}

video_predictions_chain = TransformChain(input_variables=['script'], output_variables=['video_predictions'], transform=transform_func)

# video_predictions = video_predictions_chain.run({"script": script})
# print(video_predictions)

In [177]:
# LLMChain to create the replicate predictions for our bark model
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  parsed_script = inputs['script']['paragraphs']

  predictions = []

  for line in parsed_script:
      print(f"Creating audio prediction for '{line}''...")
      audio_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": line, "history_prompt": "announcer", "text_temp": 0.7, "waveform_temp":0.8})
      predictions.append(audio_prediction)
  return {'audio_predictions': predictions}

audio_predictions_chain = TransformChain(input_variables=['script'], output_variables=['audio_predictions'], transform=transform_func)

In [178]:
# LLMChain to create the replicate predictions for our bark model
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  parsed_script = inputs['script']['paragraphs']

  predictions = []

  for line in parsed_script:
      print(f"Creating audio prediction for '{line}''...")
      audio_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": line, "history_prompt": "announcer", "text_temp": 0.7, "waveform_temp":0.8})
      predictions.append(audio_prediction)
  return {'audio_predictions': predictions}

audio_predictions_chain = TransformChain(input_variables=['script'], output_variables=['audio_predictions'], transform=transform_func)

# audio_predictions = audio_predictions_chain.run({"script": script})
# print(audio_predictions)

In [179]:
# LLMChain to write the thank you note at the end of our video
template = """Please come up with a creative and zany ending quote from our narrator.
The script is what the narrator just read. We want to close things out in less than 15 words.

Add "Please like and subscribe for more such stories!" to the end of your output.

Script:
{script}
Ending quote:
"""

message_prompt = HumanMessagePromptTemplate(prompt=PromptTemplate(
                                                  template=template,
                                                  input_variables=["script"]))
chat_prompt_template = ChatPromptTemplate.from_messages([message_prompt])
ending_quote_chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key='ending_quote')

# ending_quote = ending_quote_chain.run({"script": script})
# print(ending_quote)

In [180]:
# LLMChain to create the prediction that generates the audio for the thank you note
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  ending_quote_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": inputs['ending_quote'], "history_prompt": "announcer", "text_temp": 0.7, "waveform_temp":0.8}) #en_speaker_6 female, loud
  return {'ending_quote_prediction': ending_quote_prediction}

ending_quote_prediction_chain = TransformChain(input_variables=['ending_quote'], output_variables=['ending_quote_prediction'], transform=transform_func)

# ending_quote_prediction = ending_quote_prediction_chain.run({"ending_quote": ending_quote})
# print(ending_quote_prediction)

#chain_output['ending_quote_prediction'].output['audio_out']


In [181]:
# LLMChain to create the prediction that generates the background music
def transform_func(inputs: dict) -> dict:
  model = replicate.models.get("riffusion/riffusion")
  version = model.versions.get("8cf61ea6c56afd61d8f5b9ffd14d7c216c0a93844ce2d82ac1c9ecc9c7f24e05")
  music_prediction = replicate.predictions.create(version=version, input={"prompt": inputs['music_style']})

  return {'music_prediction': music_prediction}

music_prediction_chain = TransformChain(input_variables=['music_style'], output_variables=['music_prediction'], transform=transform_func)

# music_prediction = music_prediction_chain.run({"music_style": "Kids friendly random music"})
# print(music_prediction)

### Execute chains using LangChain

Now, let's execute the chain we created. This is relatively fast, because the chains that create long-running predictions (like the video_predictions_chain) make asynchronous calls to the Replicate API.

In [182]:
# Run the chain
overall_chain = SequentialChain(chains=[
                                        video_predictions_chain,
                                        ending_quote_chain,
                                        ending_quote_prediction_chain,
                                        music_prediction_chain
                                        ], input_variables=['script', 'music_style'], output_variables=[
                                            'video_predictions', 'ending_quote', 'ending_quote_prediction', 'music_prediction'], verbose=True)
chain_output = overall_chain({"script":script, "music_style": "Kids friendly random music"})



[1m> Entering new SequentialChain chain...[0m
{'paragraphs': 'Once upon a time, in a beautiful meadow, lived a clever bunny named Clever Bunny. She was known for her quick thinking and problem-solving skills.', 'scene_description': 'A sunny meadow with colorful flowers and a clever bunny sitting under a shady tree.', 'character_descriptions': [{'name': 'Clever Bunny', 'description': 'A small, brown bunny with bright, intelligent eyes and long ears.'}]}
Creating video prediction for 'A sunny meadow with colorful flowers and a Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. sitting under a shady tree.'...
{'paragraphs': 'One day, while Clever Bunny was hopping around, she heard a sad chirping sound. She followed the sound and found a lonely bird named Lonely Bird sitting on a branch.', 'scene_description': 'Clever Bunny standing near a tree, looking up at a sad Lonely Bird sitting on a branch.', 'character_descriptions': [{'name': 'Clever Bunny', 'descr

In [183]:
# unpack outputs
script = chain_output['script']
title = script['title']
split_script = script['story_lines']
video_descriptions = script['story_lines']
video_predictions = chain_output['video_predictions']
#audio_predictions = chain_output['audio_predictions']

# print(title)
print(len(split_script))
print(len(video_descriptions))
# print(video_predictions)
# print(audio_predictions)

9
9


In [184]:
# sanity check
assert len(split_script) == len(video_descriptions)

In [185]:
# response = Resemble.v2.voices.all(page, page_size)
# voice = list(filter(lambda x: x['name'] == 'Jagan-StoryNarrator', response['items']))[0]
# voice

# ⏳ Wait for our async predictions to complete
Here's a helper to check in on our predictions. This usually takes a minute or two.

In [186]:
def all_done(predictions):
    return set([p.status for p in predictions]) == {'succeeded'}

In [187]:
all_predictions = chain_output['video_predictions'] + \
                  [chain_output['ending_quote_prediction']] + \
                  [chain_output['music_prediction']]

In [188]:
done = False

while not done:
  [p.reload() for p in all_predictions]
  for p in all_predictions:
    print(f'https://replicate.com/p/{p.id}', p.status)
  done = all_done(all_predictions)
  time.sleep(2)
  output.clear()

print("Predictions complete")


Predictions complete


### Stitch them together for final video!

In [189]:
import requests
from PIL import Image
from io import BytesIO

video_urls = [v.output for v in video_predictions]
#audio_urls = [a.output['audio_out'] for a in audio_predictions]
music_url = chain_output['music_prediction'].output['audio']
subtitles = split_script

for index, url in enumerate(video_urls):
  try:

    # Send a GET request to fetch the image content
    response = requests.get(url[0])

    # Read the image content from the response
    image = Image.open(BytesIO(response.content))

    # Display the image
    image.show()

    print(script['story_lines'][index]['scene_description'])

  except Exception as e:
    print("Error displaying image:")
    print(e)
    # Display the text


A sunny meadow with colorful flowers and a Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. sitting under a shady tree.
Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. standing near a tree, looking up at a sad Lonely Bird,A small, colorful bird with fluffy feathers and a happy expression. sitting on a branch.
Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. sitting next to Lonely Bird,A small, colorful bird with fluffy feathers and a happy expression. on the branch, listening attentively.
Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. and Lonely Bird,A small, colorful bird with fluffy feathers and a happy expression. sitting on the branch, discussing and planning.
Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. leading Lonely Bird,A small, colorful bird with fluffy feathers and a happy expression. through a path towards a beautiful la

In [190]:
from getpass import getpass

INPUT_TEXT = getpass()

··········


In [191]:
# page = 1
# page_size = 100
# response = Resemble.v2.voices.all(page, page_size)
# response
#voice = list(filter(lambda x: x['name'] == 'josh', response['items']))[0]

In [192]:
#page = 1
# page_size = 100

# response = Resemble.v2.voices.all(page, page_size)
# voice = list(filter(lambda x: x['name'] == 'Jagan', response['items']))[0]

## josh - 987c99e9
## Jagan - f0426afb

# response = Resemble.v2.projects.all(page, page_size)
# projects = response['items']

# response = Resemble.v2.clips.all(project_uuid, page, page_size)
# clips = response['items']

# create a new clip

audio_descriptions = script['story_lines']
print(audio_descriptions)

audio_clips = []
for i, story_line in enumerate(audio_descriptions):
    print(f"Creating audio prediction for {story_line['paragraphs']}")
    project_uuid = '2a577ebf'
    voice_uuid = '987c99e9' #'fa25749e'
    callback_uri = 'https://example.com/callback/resemble-clip'
    body = story_line['paragraphs']
    print(body)
    response = Resemble.v2.clips.create_async(
        project_uuid,
        voice_uuid,
        callback_uri,
        body,
        title=f"My clip {i}",
        sample_rate=None,
        output_format=None,
        precision=None,
        include_timestamps=None,
        is_public=False,
        is_archived=False
    )

    audio_clips.append(response)


[{'paragraphs': 'Once upon a time, in a beautiful meadow, lived a clever bunny named Clever Bunny. She was known for her quick thinking and problem-solving skills.', 'scene_description': 'A sunny meadow with colorful flowers and a Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. sitting under a shady tree.', 'character_descriptions': [{'name': 'Clever Bunny', 'description': 'A small, brown bunny with bright, intelligent eyes and long ears.'}]}, {'paragraphs': 'One day, while Clever Bunny was hopping around, she heard a sad chirping sound. She followed the sound and found a lonely bird named Lonely Bird sitting on a branch.', 'scene_description': 'Clever Bunny,A small, brown bunny with bright, intelligent eyes and long ears. standing near a tree, looking up at a sad Lonely Bird,A small, colorful bird with fluffy feathers and a happy expression. sitting on a branch.', 'character_descriptions': [{'name': 'Clever Bunny', 'description': 'A small, brown bunny wi

In [193]:
end_text ='''Please like and subscribe for more such beautiful stories'''

print(f"Creating audio prediction for end_text")
project_uuid = '2a577ebf'
voice_uuid = '987c99e9' #'fa25749e'
callback_uri = 'https://example.com/callback/resemble-clip'
body = end_text
print(body)
response = Resemble.v2.clips.create_async(
    project_uuid,
    voice_uuid,
    callback_uri,
    body,
    title=f"My clip {i}",
    sample_rate=None,
    output_format=None,
    precision=None,
    include_timestamps=None,
    is_public=False,
    is_archived=False
)
audio_clips.append(response)

Creating audio prediction for end_text
Please like and subscribe for more such beautiful stories


In [194]:
done = False
while not done:
  done = True
  audio_urls = []
  for clip in audio_clips:
    response = Resemble.v2.clips.get(project_uuid, clip['item']['uuid'])
    time.sleep(5)
    if 'audio_src' in response['item']:
      print(response['item']['audio_src'])
      audio_urls.append(response['item']['audio_src'])
    else:
      print("Audio not ready yet for clip", clip['item']['uuid'])
      done = False
  if not done:
    print("Waiting for all audio clips...")
    time.sleep(10)
print("All audio clips processed")

audio_urls

Audio not ready yet for clip 8b83923a
Audio not ready yet for clip 5f361023
Audio not ready yet for clip d474ef82
Audio not ready yet for clip 88272c5d
Audio not ready yet for clip 90402365
Audio not ready yet for clip 93d9d0c8
Audio not ready yet for clip 91e7c741
Audio not ready yet for clip 7559f737
Audio not ready yet for clip 5dfbe6b3
Audio not ready yet for clip 4fb1db8f
Waiting for all audio clips...
https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSm1PTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--b0890ebfbc2f3c505d91a01c14015e4c92828967/My+clip+0-f398700a.wav
https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSkNPTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--c16ca3ffbf9896fc471a9d7550ef9f74193449a6/My+clip+1-fdc379a0.wav
https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSmVPTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--1ad2774ea33

['https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSm1PTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--b0890ebfbc2f3c505d91a01c14015e4c92828967/My+clip+0-f398700a.wav',
 'https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSkNPTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--c16ca3ffbf9896fc471a9d7550ef9f74193449a6/My+clip+1-fdc379a0.wav',
 'https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSmVPTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--1ad2774ea339fa97ad82f66e60f0fdc91a1fae2b/My+clip+2-81a8bc63.wav',
 'https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSk9PTlEwPSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--95c4e189a44555e3ce6989105501f82d3f11491e/My+clip+3-d7b7fe48.wav',
 'https://app.resemble.ai/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCSkdPTlEwPSIsImV4cCI6bnVsbCwicH

In [195]:
## slow down the audio urls
import requests
import os

def slow_down_audio(input_file, output_file, slowdown_factor):
    # Load the audio file
    audio = AudioSegment.from_wav(input_file)

    # Calculate the new length of the audio
    new_length = int(len(audio) / slowdown_factor)

    # Stretch the audio by duplicating frames
    slowed_audio = audio._spawn(audio.raw_data, overrides={
        "frame_rate": int(audio.frame_rate * slowdown_factor)
    })

    # Export the modified audio to a file
    slowed_audio.export(output_file, format="mp3")
    print("Slowdown complete. Output file:", output_file)

slowdown_factor = 1

In [196]:
audio_files = []

for i, url in enumerate(audio_urls):
    response = requests.get(url)
    audio_filename = f"temp_audio{i}.wav"
    with open(audio_filename, "wb") as audio_file:
        audio_file.write(response.content)

    # Generate the output file name
    output_file = os.path.join(audio_filename.replace(".wav", ".mp3"))

    # Call the slow_down_audio function
    slow_down_audio(audio_filename, output_file, slowdown_factor)

    # Append the output file name to the array
    audio_files.append(output_file)
    #audio_files.append(audio_filename)

Slowdown complete. Output file: temp_audio0.mp3
Slowdown complete. Output file: temp_audio1.mp3
Slowdown complete. Output file: temp_audio2.mp3
Slowdown complete. Output file: temp_audio3.mp3
Slowdown complete. Output file: temp_audio4.mp3
Slowdown complete. Output file: temp_audio5.mp3
Slowdown complete. Output file: temp_audio6.mp3
Slowdown complete. Output file: temp_audio7.mp3
Slowdown complete. Output file: temp_audio8.mp3
Slowdown complete. Output file: temp_audio9.mp3


In [197]:
from moviepy.editor import *

def animate_image(image_url, output_file):
  # Download the image
  image = ImageClip(image_url, duration=10)

  # Get the initial size of the image
  initial_size = image.size

  # Define the zoom effect function
  # time "t" varies from 0 to 10, zoom starts from 1 and goes to 1.5 and then back to 1
  def zoom_effect(t):
      zoom_level = 1 + 0.2 * (1 - abs(t - 5) / 5)
      return [int(s * zoom_level) for s in initial_size]

  # Apply the zoom effect
  final_clip = CompositeVideoClip([image.set_position("center").resize(lambda t: zoom_effect(t))])

  # Save as MP4
  final_clip.write_videofile(output_file, fps=24)

In [198]:
import requests
import os
import moviepy.editor as mp
import moviepy.video.fx.all as vfx
import textwrap
from moviepy.editor import *
from PIL import Image, ImageDraw, ImageFont
from io import BytesIO
import numpy as np


# Download video and audio files
video_files = []
audio_files = []
# for i, url in enumerate(video_urls):
#     response = requests.get(url)
#     video_filename = f"temp_video{i}.mp4"
#     with open(video_filename, "wb") as video_file:
#         video_file.write(response.content)
#     video_files.append(video_filename)
fps = 12.0
for i, url in enumerate(video_urls):
    response = requests.get(url[0])
    image = Image.open(BytesIO(response.content))
    image_np = np.array(image)
    clip = mp.ImageSequenceClip([image_np], fps=fps)
    video_filename = f"temp_video{i}.mp4"
    # animate_image(url[0],video_filename)

    clip.write_videofile(video_filename, codec='libx264', fps=fps)
    #with open(video_filename, "wb") as video_file:
        #video_file.write(response.content)
    video_files.append(video_filename)

for i, url in enumerate(audio_urls):
    response = requests.get(url)
    audio_filename = f"temp_audio{i}.wav"
    with open(audio_filename, "wb") as audio_file:
        audio_file.write(response.content)

    # Generate the output file name
    output_file = os.path.join(audio_filename.replace(".wav", ".mp3"))

    # Call the slow_down_audio function
    slow_down_audio(audio_filename, output_file, slowdown_factor)

    # Append the output file name to the array
    audio_files.append(output_file)
    #audio_files.append(audio_filename)

# Load and process video and audio files

processed_videos = []
for i, audio_file in enumerate(audio_files):
    try:
      video = mp.VideoFileClip(video_files[i])
    except IndexError:
      video = mp.VideoFileClip(video_files[i-1])
    #video = mp.VideoFileClip(video_files[i])
    audio = mp.AudioFileClip(audio_file)

    # Loop the video for the duration of the audio
    looped_video = mp.concatenate_videoclips([video] * int(audio.duration // video.duration + 1))

    # Set the audio of the video to the audio file
    video_with_audio = looped_video.set_audio(audio)
    processed_videos.append(video_with_audio)

# Concatenate all the processed videos
final_video = mp.concatenate_videoclips(processed_videos)

## The following adds the title image / narration to the video.
# Add this function to create the text image
def txt_image(img, txt, font_size, color):
    image = img.copy()
    draw = ImageDraw.Draw(image)
    draw.text((50, 50), txt, fill=(255, 255, 0))
    return image

# Download and create the image clip
image_url = video_urls[-1][0]
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))

# Resize the image to match the video dimensions
img_resized = img.resize((1200, 900))

# Download the audio file
audio_url = chain_output['ending_quote_prediction'].output['audio_out']
response = requests.get(audio_url)
audio_filename = "temp_audio_ending.wav"
with open(audio_filename, "wb") as audio_file:
    audio_file.write(response.content)

# Generate the output file name
output_file = os.path.join(audio_filename.replace(".wav", ".mp3"))

# Call the slow_down_audio function
slow_down_audio(audio_filename, output_file, slowdown_factor)

# Create the audio clip
audio_ending = AudioFileClip(output_file)

# make title empty for now, couldn't figure out how to get it bigger
text = ''
img_text = ImageClip(np.asarray(txt_image(img_resized, txt='text', font_size=72, color="white")), duration=4)

# Set the audio of the image clip to the audio file and trim it to the same duration
img_text_audio_ending = mp.concatenate_videoclips([img_text] * int(audio_ending.duration // img_text.duration + 1))
img_text_audio_ending = img_text.set_audio(audio_ending)

# Concatenate the image clip with the processed videos
width, height = processed_videos[0].size
ending_video = img_text_audio_ending.resize((width, height))

final_video = concatenate_videoclips(processed_videos)

# Download the background audio file
bg_audio_url = music_url
response = requests.get(bg_audio_url)
with open("temp_bg_audio.mp3", "wb") as audio_file:
    audio_file.write(response.content)

# Create the background audio clip
bg_audio = AudioFileClip("temp_bg_audio.mp3")

# Calculate the duration of the final video
video_duration = final_video.duration

# Loop the background audio to match the final video's duration
bg_audio_looped = bg_audio.fx(afx.audio_loop, duration=video_duration)
bg_audio_looped = bg_audio_looped.volumex(0.3)

# Overlay the background audio with the audio from the final video
final_audio = CompositeAudioClip([final_video.audio, bg_audio_looped])

# Set the audio of the final video to the combined audio
final_video_with_bg_audio = final_video.set_audio(final_audio)

# Save the final video
final_video_with_bg_audio.write_videofile(f"how_to.mp4", codec='libx264', audio_codec='aac')

# Clean up temporary files
for video_file, audio_file in zip(video_files, audio_files):
    os.remove(video_file)
for audio_file in audio_files:
    os.remove(audio_file)

Moviepy - Building video temp_video0.mp4.
Moviepy - Writing video temp_video0.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video0.mp4




Moviepy - Building video temp_video1.mp4.
Moviepy - Writing video temp_video1.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video1.mp4




Moviepy - Building video temp_video2.mp4.
Moviepy - Writing video temp_video2.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video2.mp4




Moviepy - Building video temp_video3.mp4.
Moviepy - Writing video temp_video3.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video3.mp4




Moviepy - Building video temp_video4.mp4.
Moviepy - Writing video temp_video4.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video4.mp4




Moviepy - Building video temp_video5.mp4.
Moviepy - Writing video temp_video5.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video5.mp4




Moviepy - Building video temp_video6.mp4.
Moviepy - Writing video temp_video6.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video6.mp4




Moviepy - Building video temp_video7.mp4.
Moviepy - Writing video temp_video7.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video7.mp4




Moviepy - Building video temp_video8.mp4.
Moviepy - Writing video temp_video8.mp4



                                                  

Moviepy - Done !
Moviepy - video ready temp_video8.mp4




Slowdown complete. Output file: temp_audio0.mp3
Slowdown complete. Output file: temp_audio1.mp3
Slowdown complete. Output file: temp_audio2.mp3
Slowdown complete. Output file: temp_audio3.mp3
Slowdown complete. Output file: temp_audio4.mp3
Slowdown complete. Output file: temp_audio5.mp3
Slowdown complete. Output file: temp_audio6.mp3
Slowdown complete. Output file: temp_audio7.mp3
Slowdown complete. Output file: temp_audio8.mp3
Slowdown complete. Output file: temp_audio9.mp3
Slowdown complete. Output file: temp_audio_ending.mp3
Moviepy - Building video how_to.mp4.
MoviePy - Writing audio in how_toTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video how_to.mp4





Moviepy - Done !
Moviepy - video ready how_to.mp4


In [199]:
#@title Watch the video
from IPython.display import HTML
from base64 import b64encode
mp4 = open(f'how_to.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [200]:
chain_output['ending_quote_prediction'].output['audio_out']

'https://replicate.delivery/pbxt/TX3W7LOvr3b4AZBeetffZGa5aDxxUAKtoSJqet2ioE5MAXpKC/audio.wav'

In [201]:
video_urls

[['https://pbxt.replicate.delivery/wqheJHj0KkVbSiZEPCyLlUIlTzYSLdEdD4LayX5Z6GsxXlqIA/out_0.png'],
 ['https://pbxt.replicate.delivery/PQJ1zHuUcbqlMpOVnnTlGHSzNPQzYYTpiYTXCvMqE955rSVE/out_0.png'],
 ['https://pbxt.replicate.delivery/ZXy9zVLU9sZQFNSkVrDslnHKZ0ckeLDxh00rKJ2ieC4wvKVRA/out_0.png'],
 ['https://pbxt.replicate.delivery/mZXVDLCpvY78Ghk7CPiFxkphYCfzf9MrxeGBDXK887LofqUFB/out_0.png'],
 ['https://pbxt.replicate.delivery/e84wdG4IFtyXYqiuN23gSeWG1ZGqdd8jThJ8qSJ10TF9vKVRA/out_0.png'],
 ['https://pbxt.replicate.delivery/ME2TIfA2TvULPKFC6mGeK8ZYCr7whOGDAwdWHTGE7xZCwKVRA/out_0.png'],
 ['https://pbxt.replicate.delivery/h3bXsue2ekgjNkTFN7ENnlk1O0fcmV3kI2TQMjBAOAfpArUFB/out_0.png'],
 ['https://pbxt.replicate.delivery/mVlICysfJvUbdKXlHyYxH23Ue32auf22RvUTtns9jwqeArUFB/out_0.png'],
 ['https://pbxt.replicate.delivery/fw76GfhxcEmcuEYeQlGOP3TxHl4PdaT8szmTaqlJhAsugVqiA/out_0.png']]