# YouTube Video Summarizer with MLflow Integration

This notebook demonstrates how to:
1. Create a YouTube video summarization chain
2. Track the chain and prompts using MLflow
3. Load and use the tracked model

## Setup and Requirements

in databricks:

In [1]:
%pip install youtube-transcript-api groq python-dotenv

Looking in indexes: https://pypi.org/simple, https://luisangarita.gutierrez-contractor%40procore.com:****@artifacts.procoretech.com/artifactory/api/pypi/python/simple
Note: you may need to restart the kernel to use updated packages.


in local:

In [2]:
!pip install youtube-transcript-api groq python-dotenv mlflow -q

In [3]:
import mlflow
from youtube_transcript_api import YouTubeTranscriptApi
import groq
import os
from dotenv import load_dotenv
from urllib.parse import urlparse, parse_qs
import json
from typing import Dict, Any, List
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

try:
    import databricks.mlflow
    IS_DATABRICKS = True
    import dbutils
except ImportError:
    IS_DATABRICKS = False

from pathlib import Path
import tempfile


# Load environment variables
load_dotenv()

True

## Define Chain Components

In [4]:
class YouTubeSummaryChain:
    def __init__(self, model_name: str = "mixtral-8x7b-32768", temperature: float = 0.3, prompt_template: str = "", language: str = "spanish"):
        self.model_name = model_name
        self.temperature = temperature
        self.language = language
        if prompt_template == "":
            self.prompt_template = """
            Please provide a comprehensive summary of the following video transcript in {language}. 
            Focus on the main points, key insights, and important conclusions:

            {text}

            Please structure the summary with:
            1. Main Topic/Theme
            2. Key Points
            3. Important Details
            4. Conclusions
            """
        else:
            self.prompt_template = prompt_template
    
    def summarize_text(self, text):
        """Summarize text using Groq"""
        # Initialize client only when needed
        if IS_DATABRICKS:
            groq_api_key = dbutils.secrets.get(scope="your-scope", key="groq-api-key")
        else:
            groq_api_key = os.getenv('GROQ_API_KEY')
            
        client = groq.Groq(api_key=groq_api_key)
        prompt = self.prompt_template.format(text=text, language=self.language)

        try:
            completion = client.chat.completions.create(
                model="llama-3.2-90b-vision-preview",
                messages=[
                    {"role": "user", "content": prompt}
                ],
                temperature=self.temperature,
                max_tokens=2048
            )
            return completion.choices[0].message.content
        except Exception as e:
            print(f"Error in summarization: {e}")
            return None
    
    def extract_video_id(self, url: str) -> str:
        """Extract YouTube video ID from URL"""
        parsed_url = urlparse(url)
        if parsed_url.hostname == 'youtu.be':
            return parsed_url.path[1:]
        if parsed_url.hostname in ('www.youtube.com', 'youtube.com'):
            if parsed_url.path == '/watch':
                return parse_qs(parsed_url.query)['v'][0]
        return None

    def get_transcript(self, video_id: str) -> str:
        """Get transcript for a YouTube video"""
        try:
            transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'es'])
            return ' '.join([t['text'] for t in transcript_list])
        except Exception as e:
            print(f"Error getting transcript: {e}")
            return None

    def __call__(self, url: str) -> str:
        """Process a YouTube URL and return summary"""
        video_id = self.extract_video_id(url)
        if not video_id:
            return "Invalid YouTube URL"
        
        transcript = self.get_transcript(video_id)
        if not transcript:
            return "Could not retrieve transcript"
        
        summary = self.summarize_text(transcript)
        if not summary:
            return "Could not generate summary"
        
        return summary, transcript

    def get_config(self) -> Dict[str, Any]:
        """Get chain configuration for MLflow tracking"""
        return {
            "model_name": self.model_name,
            "temperature": self.temperature,
            "prompt_template": self.prompt_template
        }

## MLflow Integration

In [5]:
# Add this after the imports
# Set up MLflow tracking

# ... existing imports ...

def setup_mlflow():
    """
    Set up MLflow tracking with support for both Databricks and local environments
    
    :return: None
    :raises: MLflowException if tracking setup fails
    """
    try:
        if IS_DATABRICKS:
            # Databricks automatically configures tracking URI
            print("Running in Databricks environment")
        else:
            mlflow_dir = Path("mlruns")
            mlflow_dir.mkdir(exist_ok=True)
            mlflow.set_tracking_uri("sqlite:///mlflow.db")
    except Exception as e:
        print(f"Error setting up MLflow tracking: {e}")
        raise

def log_chain_to_mlflow(chain: YouTubeSummaryChain, experiment_name: str = "youtube-summarizer"):
    """
    Log the chain configuration and prompt to MLflow with Databricks support
    
    :param chain: YouTubeSummaryChain instance to log
    :param experiment_name: Name of the MLflow experiment
    :return: MLflow run ID
    :raises: MLflowException if logging fails
    """
    try:
        if IS_DATABRICKS:
            # Use workspace path for Databricks
            experiment_path = f"/Shared/{experiment_name}"
            try:
                experiment = mlflow.get_experiment_by_name(experiment_path)
                if experiment is None:
                    mlflow.create_experiment(experiment_path)
                mlflow.set_experiment(experiment_path)
            except Exception as e:
                print(f"Error setting up Databricks experiment: {e}")
                raise
        else:
            # Local experiment setup
            experiment = mlflow.get_experiment_by_name(experiment_name)
            if experiment is None:
                mlflow.create_experiment(experiment_name)
            mlflow.set_experiment(experiment_name)
        
        with mlflow.start_run() as run:
            # Log parameters
            config = chain.get_config()
            mlflow.log_params({
                "model_name": config["model_name"],
                "temperature": config["temperature"],
                "environment": "databricks" if IS_DATABRICKS else "local"
            })
            
            # Create temporary file for prompt template
            prompt_path = None
            with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
                f.write(config["prompt_template"])
                prompt_path = f.name
            
            try:
                # Create wrapper
                wrapper = YouTubeSummarizerWrapper(chain)
                
                # Log the model with requirements
                requirements = [
                    "youtube-transcript-api",
                    "groq",
                    "python-dotenv",
                    "pandas"
                ]
                if IS_DATABRICKS:
                    requirements.append("databricks-mlflow")
                
                mlflow.pyfunc.log_model(
                    artifact_path="youtube_summarizer",
                    python_model=wrapper,
                    artifacts={"prompt_template": prompt_path},
                    pip_requirements=requirements
                )
            finally:
                # Clean up temp file after logging
                if prompt_path and os.path.exists(prompt_path):
                    os.unlink(prompt_path)
            
            return run.info.run_id
            
    except Exception as e:
        print(f"Error logging chain to MLflow: {e}")
        raise

def load_chain_from_mlflow(run_id: str) -> YouTubeSummaryChain:
    """
    Load a chain from MLflow with Databricks support
    
    :param run_id: MLflow run ID to load
    :return: Loaded chain
    :raises: MLflowException if loading fails
    """
    try:
        if IS_DATABRICKS:
            model_uri = f"runs:/{run_id}/youtube_summarizer"
            chain = mlflow.pyfunc.load_model(model_uri)
            return chain
        else:
            model_uri = f"runs:/{run_id}/youtube_summarizer"
            chain = mlflow.pyfunc.load_model(model_uri)
            return chain
    except Exception as e:
        print(f"Error loading chain from MLflow: {e}")
        raise


class YouTubeSummarizerWrapper(mlflow.pyfunc.PythonModel):
    """
    MLflow wrapper for YouTube summarizer
    
    :param chain: Instance of YouTubeSummaryChain
    """
    def __init__(self, chain=None):
        self.chain = chain or YouTubeSummaryChain()
        
    def predict(self, context, model_input):
        """
        :param context: MLflow model context
        :param model_input: DataFrame or Series containing YouTube URLs
        :return: List of summaries
        """
        if isinstance(model_input, pd.Series):
            urls = model_input.tolist()
        else:
            urls = model_input['url'].tolist()
            
        return [self.chain(url) for url in urls]


## Example Usage

In [6]:
# Example usage
setup_mlflow()

chain = YouTubeSummaryChain()
run_id = log_chain_to_mlflow(chain)
print(f"Chain logged with run_id: {run_id}")

# Load the chain
loaded_chain = load_chain_from_mlflow(run_id)

# Use the loaded chain - Fix the prediction call
youtube_url = "https://www.youtube.com/watch?v=AgvBh3YC6fs&pp=ygUMZXBtIGNvbG9tYmlh"
# Convert single URL to pandas Series
input_data = pd.Series([youtube_url])
summary, transcript = loaded_chain.predict(input_data)[0]  # Get first result from the list
print(summary)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]



Chain logged with run_id: 3353294078c140ef8b9ebaee52ff842c
**Resumen del video sobre la venta de acciones de UNE por parte de EPM**

**1. Tema principal/Tema**

El gerente de EPM, Joh Maya, habla sobre la decisión de vender las acciones de UNE, una empresa de telecomunicaciones, y explica las razones detrás de esta decisión.

**2. Puntos clave**

* La industria de las telecomunicaciones es muy intensiva en capital y requiere grandes inversiones para mantenerse actualizada con la tecnología.
* EPM tiene otros negocios estratégicos más orientados a la prestación de servicios públicos, como la generación y distribución de energía, agua y gas.
* La venta de las acciones de UNE se debe a la necesidad de invertir en proyectos más estratégicos para EPM.
* El valor en libros de las acciones de UNE es de 1,6 billones de pesos, pero el precio de venta dependerá de la valoración del mercado.

**3. Detalles importantes**

* La venta de las acciones de UNE no implica una pérdida para EPM, sino una 

# Model monitoring


In [7]:
def log_metrics(metrics_dict):
    """Log metrics to MLflow"""
    with mlflow.start_run():
        mlflow.log_metrics(metrics_dict)
        mlflow.log_param("model_name", "facebook/bart-large-cnn")

In [8]:
from rouge_score import rouge_scorer
import psutil
import plotly.express as px
import plotly.graph_objects as go
import time

class PerformanceMonitor:
    def __init__(self):
        self.scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
        self.metrics_history = []
    
    def measure_latency(self, func, *args, **kwargs):
        """
        Measure execution time of a function
        
        :param func: Function to measure
        :param args: Positional arguments for the function
        :param kwargs: Keyword arguments for the function
        :return: tuple of (results, execution_time)
        """
        start_time = time.time()
        results = func(*args, **kwargs)  # Just store the results directly
        end_time = time.time()
        return results, end_time - start_time
    
    def measure_resource_usage(self):
        """Measure CPU and memory usage"""
        cpu_percent = psutil.cpu_percent()
        memory_info = psutil.Process().memory_info()
        return {
            'cpu_percent': cpu_percent,
            'memory_mb': memory_info.rss / 1024 / 1024
        }
    
    def calculate_text_metrics(self, summary: str, transcript: str) -> dict:
        """
        Calculate text-based metrics like reduction percentage and lengths
        
        :param summary: Generated summary text
        :param transcript: Original transcript text
        :return: Dictionary containing text metrics
        """
        summary_length = len(summary.split())
        transcript_length = len(transcript.split())
        reduction_percentage = ((transcript_length - summary_length) / transcript_length) * 100
        
        return {
            'summary_length': summary_length,
            'transcript_length': transcript_length,
            'reduction_percentage': reduction_percentage
        }
    
    def log_performance(self, latency, summary: str, transcript: str, resource_usage: dict):
        """
        Log all performance metrics including text metrics
        
        :param latency: Processing time
        :param summary: Generated summary text
        :param transcript: Original transcript text
        :param resource_usage: Dictionary containing resource usage metrics
        :return: Combined metrics dictionary
        """
        text_metrics = self.calculate_text_metrics(summary, transcript)
        metrics = {
            'latency': latency,
            **resource_usage,
            **text_metrics
        }
        self.metrics_history.append(metrics)
        log_metrics(metrics)
        return metrics

In [9]:
def plot_metrics_over_time(metrics_history):
    """
    Create interactive plots for metrics over time including text metrics
    
    :param metrics_history: List of dictionaries containing metrics data
    """
    df = pd.DataFrame(metrics_history)
    
    # Latency plot
    fig_latency = px.line(df, y='latency', title='Inference Latency Over Time')
    fig_latency.show()
    
    # Resource usage plot
    fig_resources = go.Figure()
    fig_resources.add_trace(go.Scatter(y=df['cpu_percent'], name='CPU %'))
    fig_resources.add_trace(go.Scatter(y=df['memory_mb'], name='Memory (MB)'))
    fig_resources.update_layout(title='Resource Usage Over Time')
    fig_resources.show()
    
    # Text metrics plot
    fig_text = go.Figure()
    text_metrics = ['summary_length', 'transcript_length', 'reduction_percentage']
    for metric in text_metrics:
        if metric in df.columns:
            fig_text.add_trace(go.Scatter(y=df[metric], name=metric))
    fig_text.update_layout(title='Text Metrics Over Time')
    fig_text.show()

In [10]:
# Initialize the performance monitor
monitor = PerformanceMonitor()

summarizer = load_chain_from_mlflow(run_id)

def process_videos(video_ids):
    """
    Process multiple videos with monitoring
    
    :param video_ids: List of YouTube video IDs or URLs
    :return: tuple of (list of summaries, list of metrics)
    """
    # if videos_ids is a string, convert it to a list
    if isinstance(video_ids, str):
        video_ids = [video_ids]
    
    # Prepare input data
    input_data = []
    for vid in video_ids:
        if "youtube.com" in vid or "youtu.be" in vid:
            input_data.append(vid)
        else:
            input_data.append(f"https://youtube.com/watch?v={vid}")
    
    input_series = pd.Series(input_data)
    
    # Process all videos
    summaries = []
    metrics_list = []
    
    # Generate summaries with latency measurement
    results, total_latency = monitor.measure_latency(
        lambda: summarizer.predict(input_series)
    )
    
    # Calculate average latency per video
    avg_latency = total_latency / len(video_ids)
    
    # Process each result
    for result in results:
        summary, transcript = result  # Unpack each result
        
        # Measure resource usage
        resource_usage = monitor.measure_resource_usage()

        # Log metrics for each video
        metrics = monitor.log_performance(avg_latency, summary, transcript, resource_usage)
        
        summaries.append(summary)
        metrics_list.append(metrics)
    
    return summaries, metrics_list


In [11]:
videos_ids = ["https://www.youtube.com/watch?v=v2DGDwOjdIk&pp=ygUMZXBtIGNvbG9tYmlh",
              "https://www.youtube.com/watch?v=mACcTs5YsMM&pp=ygUMZXBtIGNvbG9tYmlh",
              "https://www.youtube.com/watch?v=2LU9KKnI4Do&pp=ygUMZXBtIGNvbG9tYmlh"]

summary, metrics = process_videos(videos_ids)
plot_metrics_over_time(monitor.metrics_history)

# Evaluate the model

## Evaluation Pipeline

In [12]:
from nltk.translate.bleu_score import sentence_bleu
import nltk
# Download required NLTK data
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/ganga/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [20]:
def prepare_test_data() -> List[Dict[str, str]]:
    """Prepare test data with YouTube videos and reference summaries"""
    # Replace with your actual test data
    return [
        {
            "video_url": "https://www.youtube.com/watch?v=v2DGDwOjdIk&pp=ygUMZXBtIGNvbG9tYmlh",
            "reference_summary": "This source is a transcript of a video detailing the history of Empresas Públicas de Medellín (EPM), a public service company in Medellín, Colombia. It highlights the pivotal moment when Medellín was first illuminated by electric light in 1898, a time when gas lamps were the norm, and emphasizes the impact of this innovation on the city and its residents. The text also emphasizes the importance of EPM’s continued service in bringing electricity to remote areas in Colombia even in the 21st century, demonstrating the company's commitment to providing essential services to all. The transcript showcases the evolution of EPM from its origins in private companies to its current status as a public utility, highlighting its historical significance and its ongoing role in improving the lives of people in Medellín and beyond."
        },
        {
            "video_url": "https://www.youtube.com/watch?v=mACcTs5YsMM&pp=ygUMZXBtIGNvbG9tYmlh",
            "reference_summary": "This source is a transcript of a radio interview with John Maya, the manager of Empresas Públicas de Medellín (EPM), a public utility company in Colombia. The interview focuses on the financial difficulties faced by Afinia, a subsidiary of EPM that provides electricity to the Colombian Caribbean coast. Maya explains that Afinia's problems stem from a combination of factors, including a lack of government subsidies, delayed payments for a tariff option, and a rise in energy losses due to increased consumption and unpaid bills. He asserts that the root of the problem lies not with Afinia's management, but with the high cost of electricity in the region, which is driven by a combination of factors including high electricity generation costs and energy losses. Maya proposes a solution involving the government taking over a portion of the costs, but expresses concern about the slow pace of progress and the potential for the government to intervene and permanently take control of Afinia, a scenario he considers problematic. He also criticizes the previous administration's decision to replace Afinia's experienced manager with someone less qualified, a move that negatively impacted the company's performance. Overall, the interview provides insight into the complex financial and operational challenges faced by Afinia and the broader electricity sector in the Colombian Caribbean coast, highlighting the role of government policy, energy market dynamics, and internal management in shaping the current situation."
        },
        {
            "video_url": "https://www.youtube.com/watch?v=2LU9KKnI4Do&pp=ygUMZXBtIGNvbG9tYmlh",
            "reference_summary": "This excerpt is a transcript of a radio interview with Federico Gutiérrez, the mayor of Medellín, Colombia. The interview focuses on a looming crisis in Colombia’s natural gas supply, with Gutiérrez warning of potential rationing and price increases due to a nationwide shortage. He emphasizes the urgency of the situation, noting that existing supply contracts are expiring soon and that there are insufficient offers to meet projected demand. Gutiérrez argues that the government's energy policies have contributed to this crisis by neglecting exploration and exploitation. He also criticizes President Petro for using inflammatory language to describe a police intervention that involved removing an individual who was allegedly exposing himself to children, characterizing it as an instance of fascism. Gutiérrez contrasts this with his own focus on governance and addressing the city's pressing needs."
        }
    ]

test_data = prepare_test_data()

In [21]:
import re

def evaluate_transcript_with_llm(transcript: str, predicted_summary: str, reference_summary: str) -> str:
    """Evaluate a transcript with a LLM"""
    if IS_DATABRICKS:
            groq_api_key = dbutils.secrets.get(scope="your-scope", key="groq-api-key")
    else:
        groq_api_key = os.getenv('GROQ_API_KEY')
    client = groq.Groq(api_key=groq_api_key)
    prompt_template = """
    You are a professional editor, give me a score for the following summary based on the reference summary:
    
    transcript: {transcript}
    reference summary: {reference_summary}
    predicted summary: {predicted_summary}
    
    give me a score between 0 and 100 for the predicted summary
    """
    
    prompt = prompt_template.format(transcript=transcript, predicted_summary=predicted_summary, reference_summary=reference_summary)

    try:
        completion = client.chat.completions.create(
            model="llama-3.2-90b-vision-preview",
            messages=[
                {"role": "user", "content": prompt}
            ],
            temperature=0.3,
            max_tokens=2048
        )
        result = completion.choices[0].message.content
        # extract number from the result using regex
        return int(re.search(r'\d+', result).group())
    except Exception as e:
        print(f"Error in summarization: {e}")
        return None

def calculate_metrics(transcript: str, predicted_summary: str, reference_summary: str) -> Dict[str, float]:
    """Calculate various evaluation metrics"""
    # ROUGE scores
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    rouge_scores = scorer.score(reference_summary, predicted_summary)
    
    # BLEU score
    reference = [reference_summary.split()]
    candidate = predicted_summary.split()
    bleu = sentence_bleu(reference, candidate)
    
    # Summary length metrics
    pred_length = len(predicted_summary.split())
    ref_length = len(reference_summary.split())
    length_ratio = pred_length / ref_length if ref_length > 0 else 0
    
    # Evaluation with LLM
    score_llm = evaluate_transcript_with_llm(transcript, predicted_summary, reference_summary)
    
    return {
        'rouge1_precision': rouge_scores['rouge1'].precision,
        'rouge1_recall': rouge_scores['rouge1'].recall,
        'rouge1_f1': rouge_scores['rouge1'].fmeasure,
        'rouge2_f1': rouge_scores['rouge2'].fmeasure,
        'rougeL_f1': rouge_scores['rougeL'].fmeasure,
        'bleu_score': bleu,
        'summary_length_ratio': length_ratio,
        'predicted_length': pred_length,
        'reference_length': ref_length,
        'llm_score': score_llm
    }

In [22]:
def create_and_log_visualizations(metrics_list: List[Dict[str, float]]):
    """Create and log visualizations to MLflow"""
    # Convert metrics to DataFrame
    df = pd.DataFrame(metrics_list)
    
    # ROUGE scores comparison
    plt.figure(figsize=(10, 6))
    rouge_metrics = ['rouge1_f1', 'rouge2_f1', 'rougeL_f1']
    df[rouge_metrics].mean().plot(kind='bar')
    plt.title('Average ROUGE Scores')
    plt.ylabel('Score')
    plt.tight_layout()
    plt.savefig('rouge_scores.png')
    mlflow.log_artifact('rouge_scores.png')
    plt.close()
    
    # Summary length analysis
    plt.figure(figsize=(10, 6))
    plt.scatter(df['reference_length'], df['predicted_length'])
    plt.plot([0, max(df['reference_length'])], [0, max(df['reference_length'])], '--', color='red')
    plt.xlabel('Reference Summary Length')
    plt.ylabel('Predicted Summary Length')
    plt.title('Summary Length Comparison')
    plt.tight_layout()
    plt.savefig('length_comparison.png')
    mlflow.log_artifact('length_comparison.png')
    plt.close()
    
    # Metrics distribution
    plt.figure(figsize=(12, 6))
    metrics_to_plot = ['rouge1_f1', 'rouge2_f1', 'rougeL_f1', 'bleu_score']
    df[metrics_to_plot].boxplot()
    plt.title('Distribution of Evaluation Metrics')
    plt.ylabel('Score')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig('metrics_distribution.png')
    mlflow.log_artifact('metrics_distribution.png')
    plt.close()

In [23]:
def evaluate_model_with_mlflow(model, test_data: List[Dict[str, str]]):
    """Evaluate model and log results to MLflow"""
    mlflow.set_experiment("youtube-summarizer-evaluation")
    
    with mlflow.start_run(run_name="model_evaluation") as run:
        all_metrics = []
        
        # Log model parameters if available
        model_params = model.get_config() if hasattr(model, 'get_config') else {}
        mlflow.log_params(model_params)
        
        # Evaluate each test example
        for i, example in enumerate(test_data):
            try:
                # Generate summary using predict method
                input_data = pd.Series([example['video_url']])
                result = model.predict(input_data)[0]  # Get first result
                
                # Check the type of result and handle accordingly
                if isinstance(result, tuple):
                    summary, transcript = result
                elif isinstance(result, str):
                    summary = result
                    transcript = "Transcript not available"  # fallback
                else:
                    print(f"Unexpected result type: {type(result)}")
                    continue
                
                # Calculate metrics
                metrics = calculate_metrics(transcript, summary, example['reference_summary'])
                all_metrics.append(metrics)
                
                # Log metrics for each example
                for metric_name, value in metrics.items():
                    mlflow.log_metric(f"example_{i}_{metric_name}", value)
                
                # Log summaries as artifacts
                example_dir = f"example_{i}"
                os.makedirs(example_dir, exist_ok=True)
                
                with open(f"{example_dir}/predicted_summary.txt", "w") as f:
                    f.write(summary)
                with open(f"{example_dir}/reference_summary.txt", "w") as f:
                    f.write(example['reference_summary'])
                
                mlflow.log_artifacts(example_dir)
                
            except Exception as e:
                print(f"Error processing example {i}: {str(e)}")
                continue
        
        if not all_metrics:
            print("No successful evaluations completed")
            return None, None
            
        # Calculate and log average metrics
        avg_metrics = {}
        for metric in all_metrics[0].keys():
            avg_value = np.mean([m[metric] for m in all_metrics])
            avg_metrics[f"avg_{metric}"] = avg_value
            mlflow.log_metric(f"avg_{metric}", avg_value)
        
        # Create and log visualizations
        create_and_log_visualizations(all_metrics)
        
        return run.info.run_id, avg_metrics

## Base line Evaluation

log experiment to MLFlow

In [17]:
# Example usage
setup_mlflow()

prompt_template = """
Please provide a comprehensive summary of the following video transcript in {language}. 
Focus on the main points, key insights, and important conclusions:

{text}

Please structure the summary with:
1. Main Topic/Theme
2. Key Points
3. Important Details
4. Conclusions
"""
model_name = "llama-3.2-90b-vision-preview"

chain = YouTubeSummaryChain(prompt_template=prompt_template, language="english")
run_id = log_chain_to_mlflow(chain)
print(f"Chain logged with run_id: {run_id}")

# Load the chain
loaded_chain = load_chain_from_mlflow(run_id)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]



Chain logged with run_id: a3e0cf31be6941898f90784e623e20ae


In [18]:
# Use the loaded chain - Fix the prediction call
youtube_url = "https://www.youtube.com/watch?v=AgvBh3YC6fs&pp=ygUMZXBtIGNvbG9tYmlh"
# Convert single URL to pandas Series
input_data = pd.Series([youtube_url])
summary, transcript = loaded_chain.predict(input_data)[0]  # Get first result from the list
print(summary)

**Main Topic/Theme:**
The possible sale of EPM's shares in UNE, a telecommunications company, and the reasons behind this decision.

**Key Points:**

* EPM is considering selling its shares in UNE due to the high capital intensity of the telecommunications industry and the need to focus on its core businesses.
* The company has other strategic priorities, such as energy, water, and gas, and wants to allocate its resources accordingly.
* The sale of the shares would not go to the municipality of Medellín, but rather to EPM, which would use the funds for specific purposes.

**Important Details:**

* The value of EPM's shares in UNE is estimated to be around 1.6 billion pesos, based on the company's books and a recent capitalization of 600,000 million pesos.
* The sale of the shares would require the approval of the Council of Medellín, as the municipality is the owner of EPM.
* If the sale is approved, the funds would be used for four specific purposes: education and scholarships, innova

In [24]:
# Prepare test data
test_data = prepare_test_data()

# Run evaluation
run_id, avg_metrics = evaluate_model_with_mlflow(loaded_chain, test_data)

print("\nEvaluation Results:")
print("==================")
for metric, value in avg_metrics.items():
    print(f"{metric}: {value:.4f}")

print(f"\nMLflow run ID: {run_id}")
print("View detailed results in the MLflow UI")



The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()




Evaluation Results:
avg_rouge1_precision: 0.2968
avg_rouge1_recall: 0.5928
avg_rouge1_f1: 0.3869
avg_rouge2_f1: 0.1300
avg_rougeL_f1: 0.2096
avg_bleu_score: 0.0277
avg_summary_length_ratio: 2.1712
avg_predicted_length: 330.6667
avg_reference_length: 165.3333
avg_llm_score: 83.3333

MLflow run ID: fb73d241322d4a6ea53ae6c70635cf83
View detailed results in the MLflow UI


## View MLflow Experiment Results

You can view the tracked experiments by running:
```bash
mlflow ui
```

This will start the MLflow UI server where you can see:
1. All experiment runs
2. Chain configurations
3. Prompt templates
4. Performance metrics (if added)