[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Amyssjj/Agent_Influcencer/blob/main/Agent_Youtube_Insights_Fetcher.ipynb)

### **From Natural Language to Key Insights: The 3-Steps YouTube Data Pipeline**

##### **Motivation**
<a name="Motivation"></a>

In such a fast-paced world where information on the internet is exploding, staying connecte, forming my own perspective, and staying sharp is incredibly challenging. Yet, with full passion for learning and growth, I’m always striving to improve.


<hr>        

In [1]:
## import langfun and related packages
import langfun as lf
import pyglove as pg
from langfun.core.structured import function_generation
import pandas as pd
import json
from typing import Literal, Annotated
from datetime import datetime
from IPython import display


## optional, this is where setup the API_KEY
from dotenv import load_dotenv
import os

## load the keys
load_dotenv()  # This loads your .env file
youtube_key = os.environ.get('YOUTUEB_API_KEY')
claude_key = os.environ.get('CLAUDE_API_KEY')
gemini_key = os.environ.get('GEMINI_API_KEY')
openai_key = os.environ.get('OPENAI_API_KEY')

## build the LLM crew
lm_claude = lf.llms.Claude35Sonnet(api_key=claude_key, temperature=0.0)
lm_openai = lf.llms.Gpt4o(api_key=openai_key, temperature=0.6)
lm_gemini = lf.llms.GeminiExp_20241114(api_key=gemini_key, temperature=0.6)


  from .autonotebook import tqdm as notebook_tqdm
`pg.Object.schema` is deprecated and will be removed in future. Please use `__schema__` instead.


#### **Step1: Use LLM to generate a function—described purely in natural language.**

In [49]:
@function_generation.function_gen(lm=lm_claude, cache_filename='./utils/functions_cache.json')
def youtube_fetcher(API_KEY:str, youtube_id:str) -> dict:
    """Retreve the youtube video basic info, caption, and comments through API

    Returns:
    The video basic info, caption, and comments
"""

#### **Step2: Invoke that function directly, NO Manual CODING involved.**

In [50]:
## function has been cached, so it will be able to reuse next time without additional calling from LLM.
result = youtube_fetcher(API_KEY = youtube_key,  youtube_id='jIm2T7h_a0M')

Creating a new cache as cache file './utils/functions_cache.json' does not exist.


In [51]:
print(result.keys())

dict_keys(['basic_info', 'captions', 'comments'])


In [52]:
result['basic_info']

{'title': 'AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution',
 'description': "Calmest before the storm? Whatever analogy you want to use, things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI... \n\nAssembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained \n\nSora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. Plus Kling Motion Brush, Simple Bench QwQ update and much more.\n\n\nAI Insiders: https://www.patreon.com/AIExplained\n\nChapters:\n00:43 - OpenAI 12 Days, Sora Turbo, o1\n03:06 - Genie 2\n08:26 - Jensen Huang and Altman Hallucination Predictions\n09:45 - Bag of Heuristics Paper\n11:40 - Procedural Knowledge Paper\n13:02 - AssemblyAI Universal 2\n13:45 - SimpleBench QwQ and Chinese Models\n14:42 - Kling Motion Brush

#### **Step3: Produce key insights from another LLM**

In [40]:
my_question = '''what's the key point and insights in this content:{{content}}, how's users feeling and sentiment?'''

class insights(pg.Object):
    key_points: Annotated[str, "Please be specific and using bullet point to layout"]  
    users_feeling: str
    users_sentiment: str

fast_take = lf.query(prompt=my_question, content=result, schema=insights, lm=lm_openai)

In [41]:
fast_take

#### Appendix

In [54]:
## Using this tool,  we could know the python code generated for execution
with lf.use_settings(debug=True):
    print(youtube_fetcher.source())

```python
def youtube_fetcher(API_KEY: str, youtube_id: str) -> dict:
    """Retrieve the youtube video basic info, caption, and comments through API.
    
    Args:
        API_KEY: YouTube Data API v3 key
        youtube_id: YouTube video ID
        
    Returns:
        dict: Video information containing:
            - basic_info: Title, description, view count, etc.
            - captions: Video captions/subtitles if available
            - comments: Top level comments on the video
            
    Raises:
        HttpError: If the API request fails
        ValueError: If API_KEY or youtube_id is invalid
    """
    from googleapiclient.discovery import build
    from googleapiclient.errors import HttpError
    
    if not API_KEY or not youtube_id:
        raise ValueError("API_KEY and youtube_id must not be empty")
        
    try:
        # Initialize YouTube API client
        youtube = build('youtube', 'v3', developerKey=API_KEY)
        
        # Get video basic info
      