# Project

In this Project, you will bring together many of the tools and techniques that you have learned throughout this course into a final project. You can choose from many different paths to get to the solution. 

### Business scenario

You work for a training organization that recently developed an introductory course about machine learning (ML). The course includes more than 40 videos that cover a broad range of ML topics. You have been asked to create an application that will students can use to quickly locate and view video content by searching for topics and key phrases.

You have downloaded all of the videos to an Amazon Simple Storage Service (Amazon S3) bucket. Your assignment is to produce a dashboard that meets your supervisor’s requirements.

## Project steps

To complete this project, you will follow these steps:

1. [Viewing the video files](#1.-Viewing-the-video-files)
2. [Transcribing the videos](#2.-Transcribing-the-videos)
3. [Normalizing the text](#3.-Normalizing-the-text)
4. [Extracting key phrases and topics](#4.-Extracting-key-phrases-and-topics)
5. [Creating the dashboard](#5.-Creating-the-dashboard)

## Useful information

The following cell contains some information that might be useful as you complete this project.

In [7]:
bucket = "c56161a939430l3396553t1w744137092661-labbucket-rn642jaq01e9"
job_data_access_role = 'arn:aws:iam::744137092661:role/service-role/c56161a939430l3396553t1w7-ComprehendDataAccessRole-1P24MSS91ADHP'

## 1. Viewing the video files
([Go to top](#Capstone-8:-Bringing-It-All-Together))


The source video files are located in the following shared Amazon Simple Storage Service (Amazon S3) bucket.

In [8]:
!aws s3 ls s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/

875.66s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


2021-04-26 20:17:33  410925369 Mod01_Course Overview.mp4
2021-04-26 20:10:02   39576695 Mod02_Intro.mp4
2021-04-26 20:31:23  302994828 Mod02_Sect01.mp4
2021-04-26 20:17:33  416563881 Mod02_Sect02.mp4
2021-04-26 20:17:33  318685583 Mod02_Sect03.mp4
2021-04-26 20:17:33  255877251 Mod02_Sect04.mp4
2021-04-26 20:23:51   99988046 Mod02_Sect05.mp4
2021-04-26 20:24:54   50700224 Mod02_WrapUp.mp4
2021-04-26 20:26:27   60627667 Mod03_Intro.mp4
2021-04-26 20:26:28  272229844 Mod03_Sect01.mp4
2021-04-26 20:27:06  309127124 Mod03_Sect02_part1.mp4
2021-04-26 20:27:06  195635527 Mod03_Sect02_part2.mp4
2021-04-26 20:28:03  123924818 Mod03_Sect02_part3.mp4
2021-04-26 20:31:28  171681915 Mod03_Sect03_part1.mp4
2021-04-26 20:32:07  285200083 Mod03_Sect03_part2.mp4
2021-04-26 20:33:17  105470345 Mod03_Sect03_part3.mp4
2021-04-26 20:35:10  157185651 Mod03_Sect04_part1.mp4
2021-04-26 20:36:27  187435635 Mod03_Sect04_part2.mp4
2021-04-26 20:36:40  280720369 Mod03_Sect04_part3.mp4
2021-04-26 20:40:01  443479

## 2. Transcribing the videos
 ([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to implement your solution to transcribe the videos. 

In [9]:
import boto3

# Initialize S3 client
s3_client = boto3.client('s3')

# Your S3 bucket name
bucket = "arya-geetha-nlp"

# List all video files
response = s3_client.list_objects_v2(Bucket=bucket)

# Extract filenames of all videos
if 'Contents' in response:
    video_files = [file['Key'] for file in response['Contents'] if file['Key'].endswith('.mp4')]
    print(f"Found {len(video_files)} videos:", video_files)
else:
    print("No video files found in the S3 bucket.")


Found 46 videos: ['Mod01_Course Overview.mp4', 'Mod02_Intro.mp4', 'Mod02_Sect01.mp4', 'Mod02_Sect02.mp4', 'Mod02_Sect03.mp4', 'Mod02_Sect04.mp4', 'Mod02_Sect05.mp4', 'Mod02_WrapUp.mp4', 'Mod03_Intro.mp4', 'Mod03_Sect01.mp4', 'Mod03_Sect02_part1.mp4', 'Mod03_Sect02_part2.mp4', 'Mod03_Sect02_part3.mp4', 'Mod03_Sect03_part1.mp4', 'Mod03_Sect03_part2.mp4', 'Mod03_Sect03_part3.mp4', 'Mod03_Sect04_part1.mp4', 'Mod03_Sect04_part2.mp4', 'Mod03_Sect04_part3.mp4', 'Mod03_Sect05.mp4', 'Mod03_Sect06.mp4', 'Mod03_Sect07_part1.mp4', 'Mod03_Sect07_part2.mp4', 'Mod03_Sect07_part3.mp4', 'Mod03_Sect08.mp4', 'Mod03_WrapUp.mp4', 'Mod04_Intro.mp4', 'Mod04_Sect01.mp4', 'Mod04_Sect02_part1.mp4', 'Mod04_Sect02_part2.mp4', 'Mod04_Sect02_part3.mp4', 'Mod04_WrapUp.mp4', 'Mod05_Intro.mp4', 'Mod05_Sect01_ver2.mp4', 'Mod05_Sect02_part1_ver2.mp4', 'Mod05_Sect02_part2.mp4', 'Mod05_Sect03_part1.mp4', 'Mod05_Sect03_part2.mp4', 'Mod05_Sect03_part3.mp4', 'Mod05_Sect03_part4_ver2.mp4', 'Mod05_WrapUp_ver2.mp4', 'Mod06_Intr

In [10]:
!conda install -c conda-forge ffmpeg -y


884.37s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Retrieving noticesdone
Channels:
 - conda-forge
 - nvidia
 - pytorch
Platform: linux-64
doneecting package metadata (repodata.json): - 
doneing environment: \ 


    current version: 24.11.3
    latest version: 25.1.1

Please update conda by running

    $ conda update -n base -c conda-forge conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs:
    - ffmpeg


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    aws-c-auth-0.8.0           |      hb921021_15         105 KB  conda-forge
    aws-c-s3-0.7.7             |       hf454442_0         111 KB  conda-forge
    aws-c-sdkutils-0.2.1       |       h4e1184b_4          55 KB  conda-forge
    aws-crt-cpp-0.29.7         |       hd92328a_7         346 KB  conda-forge
    aws-sdk-cpp-1.11.458       |       hc430e4a_4         2.9 MB  conda-forge
    ffmpeg-7.1.0               | gpl_h4c1

In [11]:
!pip install openai-whisper torch


1054.51s - pydevd: Sending message related to process being replaced timed-out after 5 seconds




In [12]:
# Generate temporary links for each video
presigned_urls = {}
for video in video_files:
    url = s3_client.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': video},
        ExpiresIn=86400  # Link valid for 1 hour
    )
    presigned_urls[video] = url

print("Generated pre-signed URLs for videos.")


Generated pre-signed URLs for videos.


In [13]:
import os
import boto3

def download_mp4_files_from_s3(bucket_name, local_directory):
    """
    Downloads all .mp4 files from the specified S3 bucket to a local directory.

    Args:
        bucket_name (str): Name of the S3 bucket.
        local_directory (str): Local directory to save the downloaded files.
    """
    # Initialize the S3 client
    s3_client = boto3.client('s3')

    # Create the local directory if it doesn't exist
    if not os.path.exists(local_directory):
        os.makedirs(local_directory)
        print(f"Created local directory: {local_directory}")

    try:
        # List all objects in the S3 bucket
        bucket_objects = s3_client.list_objects_v2(Bucket=bucket_name)

        # Check if the bucket contains any objects
        if 'Contents' not in bucket_objects:
            print(f"The bucket '{bucket_name}' is empty or does not exist.")
            return

        # Iterate through each object in the bucket
        for obj in bucket_objects['Contents']:
            file_key = obj['Key']

            # Check if the file is a .mp4 file
            if file_key.lower().endswith('.mp4'):
                # Construct the local file path
                file_name = os.path.basename(file_key)
                local_file_path = os.path.join(local_directory, file_name)

                # Download the file
                s3_client.download_file(bucket_name, file_key, local_file_path)
                print(f"Downloaded '{file_key}' to '{local_file_path}'")

    except Exception as e:
        print(f"An error occurred while downloading files: {e}")

# Example usage
if __name__ == "__main__":
    # Specify the S3 bucket name and local directory
    s3_bucket = 'arya-geetha-nlp'
    local_folder = 'videos'

    # Call the function to download .mp4 files
    download_mp4_files_from_s3(s3_bucket, local_folder)

Downloaded 'Mod01_Course Overview.mp4' to 'videos/Mod01_Course Overview.mp4'
Downloaded 'Mod02_Intro.mp4' to 'videos/Mod02_Intro.mp4'
Downloaded 'Mod02_Sect01.mp4' to 'videos/Mod02_Sect01.mp4'
Downloaded 'Mod02_Sect02.mp4' to 'videos/Mod02_Sect02.mp4'
Downloaded 'Mod02_Sect03.mp4' to 'videos/Mod02_Sect03.mp4'
Downloaded 'Mod02_Sect04.mp4' to 'videos/Mod02_Sect04.mp4'
Downloaded 'Mod02_Sect05.mp4' to 'videos/Mod02_Sect05.mp4'
Downloaded 'Mod02_WrapUp.mp4' to 'videos/Mod02_WrapUp.mp4'
Downloaded 'Mod03_Intro.mp4' to 'videos/Mod03_Intro.mp4'
Downloaded 'Mod03_Sect01.mp4' to 'videos/Mod03_Sect01.mp4'
Downloaded 'Mod03_Sect02_part1.mp4' to 'videos/Mod03_Sect02_part1.mp4'
Downloaded 'Mod03_Sect02_part2.mp4' to 'videos/Mod03_Sect02_part2.mp4'
Downloaded 'Mod03_Sect02_part3.mp4' to 'videos/Mod03_Sect02_part3.mp4'
Downloaded 'Mod03_Sect03_part1.mp4' to 'videos/Mod03_Sect03_part1.mp4'
Downloaded 'Mod03_Sect03_part2.mp4' to 'videos/Mod03_Sect03_part2.mp4'
Downloaded 'Mod03_Sect03_part3.mp4' to 'v

In [14]:
s3_uri = os.path.join("videos")
output_uri = os.path.join("transcribed")

In [16]:
import os
import subprocess
import whisper
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

class VideoTranscriber:
    def __init__(self, model_name="base"):
        """Initializes the VideoTranscriber with a Whisper model."""
        self.model = whisper.load_model(model_name)
        logging.info(f"Loaded Whisper model: {model_name}")

    def extract_audio(self, video_path, audio_path):
        """Extracts audio from a video file using ffmpeg."""
        try:
            command = [
                'ffmpeg', '-i', video_path, '-q:a', '0', '-map', 'a', audio_path, '-y'
            ]
            subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
            logging.info(f"Extracted audio from {video_path} to {audio_path}")
            return audio_path
        except subprocess.CalledProcessError as e:
            logging.error(f"Failed to extract audio from {video_path}: {e}")
            return None

    def transcribe_audio(self, audio_path):
        """Transcribes an audio file using Whisper."""
        try:
            result = self.model.transcribe(audio_path)
            logging.info(f"Transcribed audio from {audio_path}")
            return result["text"]
        except Exception as e:
            logging.error(f"Failed to transcribe audio from {audio_path}: {e}")
            return None

    def process_video(self, video_path):
        """Processes a video file by extracting audio and transcribing it."""
        audio_path = video_path.rsplit('.', 1)[0] + ".mp3"
        if self.extract_audio(video_path, audio_path):
            transcript = self.transcribe_audio(audio_path)
            if transcript:
                return transcript
        return None

def main():
    video_dir = "videos"  # Directory containing video files
    transcriber = VideoTranscriber(model_name="base")
    transcripts = {}

    # Process each video file in the directory
    for video_file in os.listdir(video_dir):
        if video_file.endswith(('.mp4', '.mkv', '.avi', '.mov')):  # Add more formats if needed
            video_path = os.path.join(video_dir, video_file)
            logging.info(f"Processing {video_file}...")
            transcript = transcriber.process_video(video_path)
            if transcript:
                transcripts[video_file] = transcript
            else:
                logging.warning(f"Skipped {video_file} due to errors.")

    # Save all transcripts to a text file
    with open("transcriptions.txt", "w") as f:
        for video, text in transcripts.items():
            f.write(f"{video}\n{text}\n\n")

    logging.info("Transcription complete. Saved to transcriptions.txt.")

if __name__ == "__main__":
    main()

2025-03-15 00:16:42,078 - INFO - Loaded Whisper model: base
2025-03-15 00:16:42,079 - INFO - Processing Mod04_Intro.mp4...
2025-03-15 00:16:43,061 - INFO - Extracted audio from videos/Mod04_Intro.mp4 to videos/Mod04_Intro.mp3
2025-03-15 00:16:56,249 - INFO - Transcribed audio from videos/Mod04_Intro.mp3
2025-03-15 00:16:56,250 - INFO - Processing Mod03_Intro.mp4...
2025-03-15 00:16:58,312 - INFO - Extracted audio from videos/Mod03_Intro.mp4 to videos/Mod03_Intro.mp3
2025-03-15 00:17:22,755 - INFO - Transcribed audio from videos/Mod03_Intro.mp3
2025-03-15 00:17:22,757 - INFO - Processing Mod06_Intro.mp4...
2025-03-15 00:17:23,979 - INFO - Extracted audio from videos/Mod06_Intro.mp4 to videos/Mod06_Intro.mp3
2025-03-15 00:17:36,802 - INFO - Transcribed audio from videos/Mod06_Intro.mp3
2025-03-15 00:17:36,804 - INFO - Processing Mod02_Intro.mp4...
2025-03-15 00:17:38,345 - INFO - Extracted audio from videos/Mod02_Intro.mp4 to videos/Mod02_Intro.mp3
2025-03-15 00:17:54,500 - INFO - Transc

In [17]:
import csv

# Path to the input text file and output CSV file
input_file = "transcriptions.txt"
output_file = "transcriptions.csv"

# Open the input text file and output CSV file
with open(input_file, "r", encoding="utf-8") as txt_file, open(output_file, mode="w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    
    # Write the header row
    writer.writerow(["video", "transcription"])
    
    # Read the text file line by line
    video_name = None
    transcription = []
    
    for line in txt_file:
        line = line.strip()  # Remove leading/trailing whitespace
        
        # Check if the line is a video name
        if line.endswith((".mp4", ".mkv", ".avi", ".mov")):
            # If we have a previous video, write it to the CSV
            if video_name and transcription:
                writer.writerow([video_name, " ".join(transcription)])
                transcription = []  # Reset transcription for the next video
            
            # Set the current video name
            video_name = line
        
        # Otherwise, add the line to the transcription
        else:
            transcription.append(line)
    
    # Write the last video and transcription to the CSV
    if video_name and transcription:
        writer.writerow([video_name, " ".join(transcription)])

print(f"✅ Converted {input_file} to {output_file}")

✅ Converted transcriptions.txt to transcriptions.csv


## 3. Normalizing the text
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to perform any text normalization steps that are necessary for your solution.

In [44]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import re
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import nltk


# Download NLTK data (run once)
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download('punkt_tab')

nltk.download("averaged_perceptron_tagger")  # For POS tagging

# Initialize lemmatizer
lemmatizer = WordNetLemmatizer()

# Normalization function
def normalize_text(text):
    """
    Normalizes text by:
    1. Converting to lowercase
    2. Removing punctuation
    3. Tokenizing
    4. Removing stopwords
    5. Lemmatizing words
    6. Removing extra whitespace
    """
    # Convert to lowercase
    text = text.lower()
    
    # Remove punctuation
    text = re.sub(f"[{re.escape(string.punctuation)}]", "", text)
    
    # Tokenize the text
    tokens = word_tokenize(text)
    
    # Remove stopwords and lemmatize
    stop_words = set(stopwords.words("english"))
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]
    
    # Join tokens back into a string
    text = " ".join(tokens)
    
    # Remove extra whitespace
    text = " ".join(text.split())
    
    return text

# Path to the input CSV file and output CSV file
input_csv = "transcriptions.csv"
output_csv = "normalized_transcriptions.csv"

# Read the CSV file
df = pd.read_csv(input_csv)

# Normalize the 'transcription' column
df["transcription"] = df["transcription"].apply(normalize_text)

# Save the normalized data to a new CSV file
df.to_csv(output_csv, index=False)

print(f'Normalized input!')

[nltk_data] Downloading package punkt to /home/ec2-user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /home/ec2-user/nltk_data...
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


Normalized input!


## 4. Extracting key phrases and topics
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to extract the key phrases and topics from the videos.

In [22]:
pip install rake-nltk

7545.08s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Collecting rake-nltk
  Downloading rake_nltk-1.0.6-py3-none-any.whl.metadata (6.4 kB)
Downloading rake_nltk-1.0.6-py3-none-any.whl (9.1 kB)
Installing collected packages: rake-nltk
Successfully installed rake-nltk-1.0.6
Note: you may need to restart the kernel to use updated packages.


In [24]:
import pandas as pd
from rake_nltk import Rake

# Load the normalized transcriptions
df = pd.read_csv("normalized_transcriptions.csv")

# Initialize RAKE
rake = Rake()

# Extract key phrases for each transcription
df["key_phrases"] = df["transcription"].apply(
    lambda text: rake.extract_keywords_from_text(text) or rake.get_ranked_phrases()
)

# Save the updated CSV file
df.to_csv("transcriptions_with_key_phrases.csv", index=False)

## 5. Creating the dashboard
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to create the dashboard for your solution.

In [1]:
pip install dash pandas

Collecting dash
  Downloading dash-2.18.2-py3-none-any.whl.metadata (10 kB)
Collecting Flask<3.1,>=1.0.4 (from dash)
  Downloading flask-3.0.3-py3-none-any.whl.metadata (3.2 kB)
Collecting Werkzeug<3.1 (from dash)
  Downloading werkzeug-3.0.6-py3-none-any.whl.metadata (3.7 kB)
Collecting dash-html-components==2.0.0 (from dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl.metadata (3.8 kB)
Collecting dash-core-components==2.0.0 (from dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl.metadata (2.9 kB)
Collecting dash-table==5.0.0 (from dash)
  Downloading dash_table-5.0.0-py3-none-any.whl.metadata (2.4 kB)
Collecting retrying (from dash)
  Downloading retrying-1.3.4-py3-none-any.whl.metadata (6.9 kB)
Downloading dash-2.18.2-py3-none-any.whl (7.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m92.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Downloading dash_html_compo

In [7]:
pip install jupyter-dash

Collecting jupyter-dash
  Downloading jupyter_dash-0.4.2-py3-none-any.whl.metadata (3.6 kB)
Collecting ansi2html (from jupyter-dash)
  Downloading ansi2html-1.9.2-py3-none-any.whl.metadata (3.7 kB)
Downloading jupyter_dash-0.4.2-py3-none-any.whl (23 kB)
Downloading ansi2html-1.9.2-py3-none-any.whl (17 kB)
Installing collected packages: ansi2html, jupyter-dash
Successfully installed ansi2html-1.9.2 jupyter-dash-0.4.2
Note: you may need to restart the kernel to use updated packages.


In [45]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import os

def load_data():
    try:
        return pd.read_csv('transcriptions_with_key_phrases.csv')
    except Exception as e:
        print(f"Error loading data: {e}")
        # Return empty DataFrame with expected columns if file not found
        return pd.DataFrame(columns=['video', 'transcription', 'key_phrases'])

# search function
def search_videos(df, query):
    if not query:
        return df
    
    query = query.lower()
    mask = (
        df['video'].str.lower().str.contains(query, na=False) | 
        df['key_phrases'].str.lower().str.contains(query, na=False)
    )
    return df[mask]

# display a video item
def display_video_item(video, on_view_details):
    # Create a button for viewing details
    view_details_button = widgets.Button(
        description='View Details',
        button_style='primary',
        layout=widgets.Layout(width='100px')
    )
    
    # Set the button's click handler
    def on_view_details_clicked(b):
        on_view_details(video['video'])
    
    view_details_button.on_click(on_view_details_clicked)
    
    # Create a widget for the video item
    video_item = widgets.VBox([
        widgets.HTML(f"<h3 style='color: #2c3e50;'>{video['video']}</h3>"),
        widgets.HTML(f"<p style='color: #34495e; max-height: 80px; overflow: hidden;'>{video['key_phrases'][:200]}...</p>"),
        view_details_button
    ], layout=widgets.Layout(
        border='1px solid #ddd',
        border_radius='8px',
        padding='15px',
        margin='10px 0',
        background_color='#f9f9f9'
    ))
    
    return video_item

#display video details
def display_video_details(video):
    # Path to the video folder
    video_folder = './videos'  # Replace with the path to your video folder
    video_file = os.path.join(video_folder, video['video'])
    video_url = "https://arya-geetha-nlp.s3.us-east-1.amazonaws.com/"+ video['video']
    
    # Check if the video file exists
    if not os.path.exists(video_file):
        video_html = "<p style='color: red;'>Video file not found.</p>"
    else:
        # Embed the video player
        video_html = f"""
        <div style="border: 1px solid #ddd; border-radius: 8px; padding: 20px; background-color: #fff;">
            <h2 style="color: #2c3e50; margin-bottom: 15px;">{video['video']}</h2>
            
            <div style="background-color: #ecf0f1; height: 240px; display: flex; align-items: center; 
                        justify-content: center; margin-bottom: 20px; border-radius: 8px;">
                <video width="100%" height="240" controls>
<source src="{video_url}" type="video/mp4">
                </video>
            </div>
            
            <div style="display: flex; justify-content: space-between; color: #7f8c8d; 
                        font-size: 14px; margin-bottom: 20px;">
                <div>Video ID: {video_url}</div>
            </div>
            
            <h3 style="color: #2c3e50; margin-bottom: 10px;">Transcript</h3>
            <div style="background-color: #f8f9fa; padding: 15px; border-radius: 8px; 
                        max-height: 200px; overflow-y: auto; font-size: 14px; line-height: 1.5;">
                {video['key_phrases']}
            </div>
            
            <div style="margin-top: 20px;">
               <!-- <button class="back-btn" 
                        style="background-color: #95a5a6; color: white; border: none; 
                               padding: 8px 15px; border-radius: 4px; cursor: pointer;">
                    Back to Results
                </button> -->
                <button style="background-color: #3498db; color: white; border: none; 
                               padding: 8px 15px; border-radius: 4px; cursor: pointer; margin-left: 10px;">
                    Download Transcript
                </button>
            </div>
        </div>
        """
    
    return widgets.HTML(video_html)

# dashboard
def create_dashboard():
    # Load data
    df = load_data()
    
    # Create search widget
    search_input = widgets.Text(
        value='',
        placeholder='Search videos by content or title...',
        description='Search:',
        layout=widgets.Layout(width='70%')
    )
    
    search_button = widgets.Button(
        description='Search',
        button_style='primary',
        layout=widgets.Layout(width='100px')
    )
    
    results_output = widgets.Output()
    details_output = widgets.Output()
    
    # Function to show results
    def show_results(df):
        results_output.clear_output()
        with results_output:
            if df.empty:
                display(widgets.HTML("<p style='color: #7f8c8d;'>No videos found matching your search.</p>"))
            else:
                display(widgets.HTML(f"<h2>Search Results ({len(df)} videos)</h2>"))
                for _, video in df.iterrows():
                    display(display_video_item(video, show_video_details))
    
    # Function to show video details
    def show_video_details(video_id):
        results_output.layout.display = 'none'
        details_output.layout.display = ''
        details_output.clear_output()
        with details_output:
            video = df[df['video'] == video_id].iloc[0]
            display(display_video_details(video))
            
            # Add a back button
            back_button = widgets.Button(
                description='Back to Results',
                button_style='warning',
                layout=widgets.Layout(width='150px')
            )
            
            def on_back_button_clicked(b):
                results_output.layout.display = ''
                details_output.layout.display = 'none'
                show_results(df)
            
            back_button.on_click(on_back_button_clicked)
            display(back_button)
    
    # Search button click
    def on_search_button_clicked(b):
        query = search_input.value
        results = search_videos(df, query)
        show_results(results)
    
    search_button.on_click(on_search_button_clicked)
    
    # Search on enter key
    def on_enter(sender):
        if sender.value:
            on_search_button_clicked(None)
    
    search_input.on_submit(on_enter)
    
    # Layout
    header = widgets.HTML(
        value="<h1 style='color: #2c3e50;'>AWS Video Search Dashboard</h1>"
    )
    
    search_box = widgets.HBox([search_input, search_button])
    
    # Initialize with all results
    show_results(df)
    
    # Initially hide details panel
    details_output.layout.display = 'none'
    
    # Display dashboard
    display(header)
    display(search_box)
    display(results_output)
    display(details_output)

# Run the dashboard
create_dashboard()

HTML(value="<h1 style='color: #2c3e50;'>AWS Video Search Dashboard</h1>")

HBox(children=(Text(value='', description='Search:', layout=Layout(width='70%'), placeholder='Search videos by…

Output()

Output(layout=Layout(display='none'))

In [1]:
!pip install flask



In [31]:
!python appF.py

Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Serving Flask app 'appF'
 * Debug mode: on
 * Running on http://127.0.0.1:8080
[33mPress CTRL+C to quit[0m
 * Restarting with watchdog (inotify)
Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Debugger is active!
 * Debugger PIN: 102-876-032
Loading data from transcriptions_with_key_phrases.csv...
Successfully loaded 46 rows
Columns: ['video', 'transcription', 'key_phrases']

Sample video names:
0           Mod04_Intro.mp4
1           Mod03_Intro.mp4
2           Mod06_Intro.mp4
3           Mod02_Intro.mp4
4    Mod03_Sect07_part1.mp4
Name: video, dtype: object

Unique video prefixes:
['Mod04' 'Mod03' 'Mod06' 'Mod0

In [30]:
!python dashboard.py

Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Serving Flask app 'dashboard'
 * Debug mode: on
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8080
 * Running on http://172.16.75.118:8080
[33mPress CTRL+C to quit[0m
 * Restarting with watchdog (inotify)
Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Debugger is active!
 * Debugger PIN: 102-876-032
Loading data from transcriptions_with_key_phrases.csv...
Successfully loaded 46 rows
Columns: ['video', 'transcription', 'key_phrases']

Sample video names:
0           Mod04_Intro.mp4
1           Mod03_Intro.mp4
2           Mod06_Intro.mp4
3           Mod02_Intro.mp4
4    Mod03_Sect07_part1.mp

In [None]:
!python appF.py

Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Serving Flask app 'appF'
 * Debug mode: on
 * Running on http://127.0.0.1:8080
[33mPress CTRL+C to quit[0m
 * Restarting with watchdog (inotify)
Flask app is ready to run!
1. Make sure your CSV file is named 'transcriptions_with_key_phrases.csv' in the same directory
2. Run the app with: python appF.py
3. Access the dashboard at: http://127.0.0.1:8080/
 * Debugger is active!
 * Debugger PIN: 295-155-872
Loading data from transcriptions_with_key_phrases.csv...
Successfully loaded 46 rows
Columns: ['video', 'transcription', 'key_phrases']

Sample video names:
0           Mod04_Intro.mp4
1           Mod03_Intro.mp4
2           Mod06_Intro.mp4
3           Mod02_Intro.mp4
4    Mod03_Sect07_part1.mp4
Name: video, dtype: object

Unique video prefixes:
['Mod04' 'Mod03' 'Mod06' 'Mod0