# Building an App with Aria and Allegro: Turning Travel Photos into Fun Fact Videos

### Install Dependencies and set environments

In [1]:
!pip -qqq install openai requests

Import packages and store api key for ARIA

In [3]:
from google.colab import userdata

base_url = 'https://api.rhymes.ai/v1'
api_key = userdata.get('ARIA_API_KEY')

Initialize the OpenAI class with the stored values

In [4]:
from openai import OpenAI

client = OpenAI(
    base_url=base_url,
    api_key=api_key
)

### Image Analysis with ARIA API

In [5]:
import base64

def image_to_base64(image_path):
    """
    Converts an image to a base64-encoded string.

    Args:
        image_path (str): The path to the image file.

    Returns:
        str: The base64-encoded string of the image.
    """
    try:
        with open(image_path, "rb") as image_file:
            base64_string = base64.b64encode(image_file.read()).decode("utf-8")
        return base64_string
    except FileNotFoundError:
        return "Image file not found. Please check the path."
    except Exception as e:
        return f"An error occurred: {str(e)}"



Analyze the image provided and get scences to be shown by **Allegro**

In [6]:
from textwrap import dedent
base64_image = image_to_base64('/content/image.webp')

response = client.chat.completions.create(
    model="aria",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}},
                {"type": "text", "text": dedent("""\
                <image>\nThis is an image of a place. Give three scenes and descriptions to bring it to life. Format:

                Scene <number>: <engaging description>

                Return 3 scenes in that format only.
                """)}
            ]
        }
    ],
    stream=False,
    temperature=0.6,
    max_tokens=1024,
    top_p=1,
    stop=["<|im_end|>"]
)


result = response.choices[0].message.content

print(result)

Scene 1: A cozy corner of a modern living room, where a young person in a blue hoodie lounges comfortably on a dark blue couch, holding a steaming cup of coffee. The vibrant red wall behind them is adorned with the motivational words "EAT SLEEP CODE REPEAT" in bold white letters, creating a lively and inspiring backdrop. The scene exudes a sense of calm and focus, perfect for a coding marathon or a relaxed coffee break.

Scene 2: The individual leans back, taking a sip of their coffee, looking relaxed and contemplative. The repetitive pattern on the wall emphasizes the importance of balance between work and rest, suggesting a lifestyle centered around coding, sleep, and coffee. The room's ambiance is both motivating and soothing, ideal for productivity and creativity.

Scene 3: The person shifts slightly, showing a thoughtful expression. The bright red wall with its motivational message serves as a constant reminder of their dedication to coding, while the comfortable couch and warm be

### Create Video Task

Function to start a video task and return the request id for the task.

In [7]:
import requests
from textwrap import dedent

def generate_video(token: str, result_scenes: str):
    url = "https://api.rhymes.ai/v1/generateVideoSyn"
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }
    data = {
        "refined_prompt": result_scenes,
        "num_step": 100,
        "cfg_scale": 7.5,
        "user_prompt": result_scenes,
        "rand_seed": 12345
    }

    try:
        response = requests.post(url, headers=headers, json=data)
        # Check if the request was successful
        response.raise_for_status()
        return response.json()  # Return the JSON response
    except requests.exceptions.RequestException as e:
        return f"An error occurred: {str(e)}"



Query the function with the result gotten from the ARIA multimodal api

In [8]:
from google.colab import userdata
# Replace 'your_bearer_token_here' with the actual token
bearer_token = userdata.get('ALLEGRO_API_KEY')
response_data = generate_video(bearer_token, result)
request_id = response_data.get('data')
print(request_id)

5e4a5911-a28f-4ca1-9300-9af9baab321f


### Query Video Task Status

Function to get the video link based on the request id gotten. This function automatically waits for **2minutes** before returning a value

In [9]:
import requests
import time

def query_video_status(token, request_id):
    # Wait for at least 2 minutes (120 seconds)
    time.sleep(120)

    url = "https://api.rhymes.ai/v1/videoQuery"
    headers = {
        "Authorization": f"Bearer {token}",
    }
    params = {
        "requestId": request_id  # Add the requestId as a query parameter
    }

    try:
        response = requests.get(url, headers=headers, params=params)
        # Check if the request was successful
        response.raise_for_status()
        return response.json()  # Return the JSON response
    except requests.exceptions.RequestException as e:
        return f"An error occurred: {str(e)}"

Wait for at least **2 minutes** before querying for the video link. When the video is ready, a link to an S3 bucket where the video is located will be displayed; otherwise, an empty string is returned.

In [12]:
response_data = query_video_status(bearer_token, request_id)
video_link = response_data.get('data')
print(video_link)

https://apiplatform-rhymes-prod-va.s3.amazonaws.com/20241102145602.mp4


In [13]:
video_link

'https://apiplatform-rhymes-prod-va.s3.amazonaws.com/20241102145602.mp4'