# Text-to-Video Generation

**CCAI-413: Deep Learning project.**



------------------------------------------------------------------


# About 
We developed the model **text-to-video-ms-1.7b** which is the text-to-video generation model and improve it to support the Arabic language

Baseline model link: https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/tree/main

# install the resources

In [1]:
!pip install diffusers transformers accelerate torch
!pip install googletrans==4.0.0-rc1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting diffusers
  Downloading diffusers-0.16.1-py3-none-any.whl (934 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m934.9/934.9 kB[0m [31m46.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m77.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.1/219.1 kB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub>=0.13.2 (from diffusers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting importlib-metadata (from diffuser

# import libraries

In [2]:
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
from moviepy.editor import VideoFileClip
from googletrans import Translator
from requests.exceptions import ReadTimeout

# initialize the baseline code

In [3]:
# create the Translator
translator = Translator()

# handle ReadTimeout exception
def retry(func, max_retries=3, delay=1):
    for i in range(max_retries):
        try:
            return func()
        except ReadTimeout:
            print(f"Request timed out. Retrying in {delay} seconds...")
            time.sleep(delay)
    raise Exception("Exceeded maximum number of retries.")

# Creates DiffusionPipeline class and loads a pre-trained model 
# with specified parameters
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", 
                                         torch_dtype=torch.float16,
                                         variant="fp16")

# Configures the model's scheduler using the DPMSolverMultistepScheduler
# class with the configuration
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Enables offloading of model
pipe.enable_model_cpu_offload()



Downloading (…)ain/model_index.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

Downloading (…)_encoder/config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

Downloading (…)tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/755 [00:00<?, ?B/s]

Downloading pytorch_model.fp16.bin:   0%|          | 0.00/681M [00:00<?, ?B/s]

Downloading (…)tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)8f74/vae/config.json:   0%|          | 0.00/657 [00:00<?, ?B/s]

Downloading (…)f74/unet/config.json:   0%|          | 0.00/787 [00:00<?, ?B/s]

Downloading (…)torch_model.fp16.bin:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

Downloading (…)torch_model.fp16.bin:   0%|          | 0.00/167M [00:00<?, ?B/s]

# Text-to-Video Generation

In [4]:
#take the text from the user
text = input("Enter the text ")

#translate the input text from Arabic to English
# using Google Translate API.
translation = None
while translation is None:
    try:
        translation = translator.translate(text, src='ar', dest='en')
    except AttributeError:
        translation = retry(lambda: translator.translate(text, src='ar',
                                                         dest='en'))

# Extracts the text
prompt = translation.text

# Generates a sequence of video frames 
# The 'num_inference_steps': the number of inference steps
# to use during video generation.
video_frames = pipe(prompt, num_inference_steps=100).frames

#Exports the generated video frames to a video file
video_path = export_to_video(video_frames)

#Loads the video file into a VideoFileClip object
clip = VideoFileClip(video_path)

#Displays the video
clip.ipython_display(width=300)


Enter the text كلب يرقص


  0%|          | 0/100 [00:00<?, ?it/s]

Moviepy - Building video __temp__.mp4.
Moviepy - Writing video __temp__.mp4



                                                   

Moviepy - Done !
Moviepy - video ready __temp__.mp4


