# Enhancing Accessibility: Making Silent Films Accessible to the Blind

This notebook explores the process of making dialogueless movies or silent films accessible to visually impaired individuals through the use of AI tools like Google Text-to-Speech (gTTS).


## Table of Contents
1. [Introduction](#Introduction)
2. [Installation of Required Libraries](#Installation-of-Required-Libraries)
3. [System Instructions](#System-Instructions)
4. [Analyzing Text Output from Video](#Analyzing-Text-Output-from-Video)
5. [Converting Text to Audio](#Converting-Text-to-Audio)
6. [Conclusion](#Conclusion)


## Installation of Required Libraries

In this section, we install necessary libraries such as gTTS (Google Text-to-Speech), which is essential for converting extracted text into audio.

In [1]:
## System Instructions

This section includes the core instructions or guidelines on how the system processes input video files to extract meaningful text.

Collecting gTTS
  Downloading gTTS-2.5.4-py3-none-any.whl.metadata (4.1 kB)
Collecting uuid
  Downloading uuid-1.30.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l- done
Downloading gTTS-2.5.4-py3-none-any.whl (29 kB)
Building wheels for collected packages: uuid
  Building wheel for uuid (setup.py) ... [?25l- \ done
[?25h  Created wheel for uuid: filename=uuid-1.30-py3-none-any.whl size=6479 sha256=3866c7bb18d787931c1c66a3bfff11aaaf8e147eaa60aedfc0f9b99f9a41633e
  Stored in directory: /root/.cache/pip/wheels/ed/08/9e/f0a977dfe55051a07e21af89200125d65f1efa60cbac61ed88
Successfully built uuid
Installing collected packages: uuid, gTTS
Successfully installed gTTS-2.5.4 uuid-1.30


## Analyzing Text Output from Video

Here, we analyze the output text generated from the video processing system. The text serves as the script for audio narration.

In [2]:
## Converting Text to Audio

Finally, the extracted text is converted into speech using gTTS, making it accessible to visually impaired individuals. The audio is saved as a file for playback.

### Model output
Text output of the video - analysis

In [3]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

In [4]:
%cd /kaggle/working
!mkdir movie

/kaggle/working


In [5]:
import os
import time
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

secret_value_1 = user_secrets.get_secret("GEMINI API KEY")

genai.configure(api_key=secret_value_1)

def upload_to_gemini(path, mime_type=None):
  """Uploads the given file to Gemini.

  See https://ai.google.dev/gemini-api/docs/prompting_with_media
  """
  file = genai.upload_file(path, mime_type=mime_type)
  print(f"Uploaded file '{file.display_name}' as: {file.uri}")
  return file

def wait_for_files_active(files):
  print("Waiting for file processing...")
  for name in (file.name for file in files):
    file = genai.get_file(name)
    while file.state.name == "PROCESSING":
      print(".", end="", flush=True)
      time.sleep(10)
      file = genai.get_file(name)
    if file.state.name != "ACTIVE":
      raise Exception(f"File {file.name} failed to process")
  print("...all files ready")
  print()

# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-pro",
  generation_config=generation_config,
)

# TODO Make these files available on the local file system
# You may need to update the file paths
files = [
    # upload_to_gemini("American Museum of Natural History Tour - 5 Min", mime_type="video/mp4"),
    
    #upload_to_gemini("/kaggle/input/charlie-chaplin/A Woman of Paris (1923).mp4", mime_type="video/mp4"),
    
    upload_to_gemini("/kaggle/input/pink-panther/001 The Pink Phink (Dec 18 1964).ia.mp4", mime_type="video/mp4"),
    #upload_to_gemini("/kaggle/input/charlie-chaplin/Charlie Chaplin In The Gold Rush 1925 Full Movie.mp4", mime_type="video/mp4"),
    #upload_to_gemini("/kaggle/input/charlie-chaplin/Charlie Chaplin _  Luzes da Cidade (City Lights) - 1931 - Legendado.mp4", mime_type="video/mp4"),
    #upload_to_gemini("", mime_type="video/mp4"),
    
    
    
]

# Some files have a processing delay. Wait for them to be ready.
wait_for_files_active(files)

chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        files[0],
      ],
    },
  ]
)

response = chat_session.send_message("Describe the video in detail, scene by scene, be as expressive as possible")
print(response.text)

Uploaded file '001 The Pink Phink (Dec 18 1964).ia.mp4' as: https://generativelanguage.googleapis.com/v1beta/files/k67dtnj7qu8p
Waiting for file processing...
.....all files ready

The animated short begins with the appearance of “Mirisch Films Inc. Presents” on a black screen. A blush-toned prism appears and then expands, becoming lighter as it moves toward the viewer. This fades out to a pink background with the words, “Blake Edwards’ Pink Panther” superimposed. The Pink Panther sits upright in profile and blows on a long paintbrush, which emits puffs of bright yellow smoke. He winks his eye. 


The pink background changes to a bright purple square with the title, “The Pink Phink.” This fades into another strip of paper with the text “Pink Panther Theme by Henry Mancini.” Then, a similar rectangle displays, “Produced by David H. DePatie & Fritz Freleng.”


The Pink Panther strolls from the right of the frame, carrying a small paintbrush between his lips. He nonchalantly paints a stri

### Text to audio

In [6]:
from gtts import gTTS
import uuid

def text_to_speech_file(text: str) -> str:
    # Using gTTS to convert text to speech
    tts = gTTS(text=text, lang='en', slow=False)

    global save_file_path
    save_file_path = f"{uuid.uuid4()}.mp3"

    with open(save_file_path, "wb") as f:
        tts.save(save_file_path)

    print(f"{save_file_path}: A new audio file was saved successfully!")
    return save_file_path

In [7]:
text_to_speech_file(response.text)

05260a89-d972-4c3b-a134-e4229768150e.mp3: A new audio file was saved successfully!


'05260a89-d972-4c3b-a134-e4229768150e.mp3'

In [8]:
from IPython.display import Audio
# Replace 'path_to_your_audio_file' with the actual file path
audio = Audio(save_file_path)
display(audio)

## Conclusion

This notebook demonstrates how silent films can be made accessible to the blind using AI-powered text-to-speech tools. By combining video processing, text analysis, and audio conversion, we can bridge the gap in accessibility for visually impaired audiences.