# Overall Plan
# 1. Define the Enhanced MVP Scope
Core Features for the Netherlands-Focused MVP:

## 1.1 User Input Interface

Inputs:
- Destination within the Netherlands (e.g., Amsterdam, Rotterdam)
- Preferences (e.g., museums, outdoor activities, nightlife, culinary experiences)
- Budget (e.g., budget-friendly, mid-range, luxury)
- Duration (e.g., 3 days, 7 days)
- Travel style (e.g., solo, family, romantic, adventure)

## 2.1 YouTube Video Collection

Fetch relevant YouTube videos specifically about traveling in the Netherlands based on user inputs.

## 3.1 Video Transcription

Convert video audio to text using open-source speech-to-text models.

## 4.1 Content Summarization

Summarize transcribed content to extract key travel tips and recommendations.

## 5.1 Basic Itinerary Generation

Create a day-by-day travel plan based on summarized content.
## 6.1 Presentation of Itinerary

Display the generated itinerary in a user-friendly format (e.g., web page).

Out of Scope for MVP:
- Advanced multimodal integration (video analysis, sentiment analysis)
- Real-time interactive features (quizzes, live Q&A)
- Detailed budget optimization with real-time pricing data
- Visual map-based itineraries

# 2. Choose the Technology Stack
## 2.1 Backend:
### Programming Language: Python
### Web Framework: Streamlit (for simplicity and rapid development)
### AI Frameworks:
#### Hugging Face Transformers: For leveraging pre-trained language models.
#### LangChain: For orchestrating and managing the AI workflows.
### APIs:
#### YouTube Data API: To fetch relevant videos.

## 2.2 Frontend:
Framework: Streamlit (handles both frontend and backend for MVP)

## 2.3 Database:
Option: SQLite or a simple JSON-based storage for MVP

## 2.4 Deployment:
Platform: Streamlit Sharing, Heroku, or a cloud platform with GPU support if needed

## 2.5 Additional Tools:
Transcription: Open-source models like Whisper available on Hugging Face or Coqui STT
Summarization & Q&A: LLaMA 3 via Hugging Face or other open-source LMs

# **Role of PyTorch in Hugging Face Transformers**
The Hugging Face Transformers library is a popular framework for natural language processing (NLP) tasks. It supports multiple deep learning backends, with PyTorch being the primary one. Here's how PyTorch integrates with Transformers:

##2.1. Model Implementation##
Most transformer models in Hugging Face's library, including LLaMA 3, are implemented using PyTorch. When you load a model via AutoModelForCausalLM or AutoModelForSeq2SeqLM, under the hood, PyTorch is used to:

Define Model Architecture: Layers, attention mechanisms, etc.
Handle Parameters: Loading pre-trained weights, managing model parameters.
Execute Forward Passes: Processing input data through the model to generate outputs.
##2.2. Tokenization and Encoding##
The tokenizer converts raw text into tokens that the model can understand. These tokens are typically represented as PyTorch tensors, which require PyTorch for further processing.

##2.3. Inference and Summarization##
When you perform tasks like summarization:

Input Processing: The input text is tokenized and converted into tensors.
Model Execution: These tensors are passed through the model to generate summaries.
Output Decoding: The generated tokens are decoded back into human-readable text.
All these steps are facilitated by PyTorch's tensor operations and GPU acceleration.



Why is PyTorch Essential Here?
* Model Execution: Transformers rely on PyTorch for running models efficiently, especially large ones like LLaMA 3.
* GPU Acceleration: PyTorch enables leveraging GPUs to significantly speed up computations, making real-time or large-scale processing feasible.
* Tensor Operations: All data manipulations (tokenization, encoding, decoding) are handled as tensor operations in PyTorch.
* Memory Management: Efficiently manages memory, allowing you to load and run large models without exhausting system resources.

# 1. Incorporating Google Maps for Geographical Context:
Strategy:

* Dynamic Map Embedding: For location-specific questions (like “What are the must-see attractions?” or “Where should I stay in Amsterdam?”), dynamically embed a Google Maps view with pinned locations for attractions, hotels, restaurants, etc.
* Directions and Route Planning: Integrate the Google Maps API to show routes for getting around Amsterdam (e.g., from the user's current location to an attraction or from one attraction to another). You could provide visual maps with public transit options, biking paths, or walking directions.
* Street View Integration: Allow users to explore attractions virtually with Google Street View to give them a more immersive experience.
Example Use Case: When the app answers the question “What’s the best way to get around Amsterdam?”, it could display a map showing popular biking routes or public transit options with route suggestions.

# 2. Using Computer Vision for Interactive Exploration:
Strategy:

* Image Classification for Amsterdam Landmarks: Integrate a computer vision model (e.g., ResNet, MobileNet) to allow users to upload or take photos, and the app identifies landmarks or attractions in Amsterdam (e.g., Rijksmuseum, Van Gogh Museum). This could be a fun feature for users to identify places they've visited or plan to visit.
* Landmark Recommendation from Images: Let users upload a picture, and based on computer vision analysis, the app can suggest nearby attractions or provide information about the identified place.
* Tour Planning Using Vision: Provide an option to recommend places based on images of landmarks or nature that users prefer, helping them create a personalized tour of the city.
Example Use Case: If a user uploads a picture of an iconic canal, the app can identify it as a part of Amsterdam’s canal system and suggest nearby attractions or cafes with map integration.

# 3. Integrating Text-to-Video Models like Flux:
Strategy:

* Personalized Travel Videos: For each answer (e.g., "What are the must-see attractions in Amsterdam?"), use Flux or another text-to-video model to generate short video clips summarizing the answer. You could feature snippets about places like the Van Gogh Museum, with animations based on the text answers.
* Augmented Travel Guides: For each question, instead of just providing a text-based answer, generate a video guide. This video could include visuals of the locations in Amsterdam, interactive elements like maps, and recommendations that are visually appealing.
* Visualizing Travel Routes: When answering questions about how to get from one place to another, generate videos that visually depict the route. The video could show a map, a simulated walking or biking tour, or transit routes.
Example Use Case: For the question, “What’s the best time of year to visit Amsterdam?”, a video could be generated that shows the city in different seasons, highlighting festivals in the summer and Christmas markets in the winter.

# 4. Multimodal Question Answering (QA) System:
Strategy:

* Multimodal Inputs for Travel Recommendations: Build a system where users can input not only text but also images or locations, and the app provides a combined answer. For instance, if a user provides a picture of a museum, the system can respond with historical information about the museum, the best time to visit, and even provide a Google Maps location.
* Image-to-Text Summarization for Travel Tips: Use computer vision to extract key visual elements from an image (e.g., a picture of a busy street in Amsterdam) and combine it with text-based context to generate travel recommendations or tips.
* Example Use Case: When users upload a photo of a place in Amsterdam, the app could suggest nearby attractions, provide historical information, and generate video content that matches the visual and textual information.

# 5. Interactive Itinerary Building with Visual Elements:
Strategy:

* Interactive Maps with Itinerary Recommendations: Based on the user’s preferences and the top 10 questions, provide a drag-and-drop interactive map where they can build their itinerary. The app can suggest popular attractions, routes, and places to eat.
* Photo-Based Itineraries: Allow users to upload photos of attractions they want to visit, and based on these, the app can generate an itinerary with accompanying visual and video content (using text-to-video models).
* Auto-Generated Travel Videos: After users finalize their itinerary, create a personalized travel video summarizing their trip, using video content from Flux or other models, which includes key highlights and tips based on their selected locations.
Example Use Case: After answering the question “What are the top 5 attractions in Amsterdam?”, allow users to click on each attraction, add it to their itinerary, and then generate a short personalized video that showcases the highlights of their upcoming trip.

# 6. Fun and Interactive Quizzes:
Strategy:

Computer Vision-Based Quizzes: After providing the top 10 travel questions, create a fun visual quiz where users guess the name of the landmarks or attractions from images, using a computer vision model for validation.
Interactive Map Quiz: Use Google Maps to create a quiz where users need to guess the location of famous attractions, and the app validates their choices using the Google Maps API.
Example Use Case: After answering the question, “Where should I stay in Amsterdam?”, show a series of images of different neighborhoods and ask the user to guess the correct neighborhood.

# 7. Voice Interactions and Text-to-Speech:
Strategy:

Text-to-Speech for Personalized Guides: Convert the answers to spoken text using ElevenLabs or any text-to-speech model. This adds a more interactive element, allowing users to listen to their travel guide.
Voice-Based Travel Assistant: Instead of typing in questions, allow users to ask their questions via voice, and respond with both spoken and visual answers (e.g., maps or text-to-video content).
Example Use Case: When a user asks, “What’s the best way to get around Amsterdam?”, the app provides a spoken response and visualizes the options on a map.

# Set Up

In [2]:
#@title Install Libraries
%pip install --upgrade --q elevenlabs==0.2.27
%pip install --q ollama
%pip install --q youtube-search-python youtube_transcript_api  langchain langchain_community langchain_huggingface transformers torch faiss-cpu


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
!curl https://ollama.ai/install.sh | sh
!pip show ollama

In [None]:
!ollama serve > rocama.log 2>&1 &
!ollama pull llama3.1:8b

In [3]:
#@title Load Libraries

# Ollama
import ollama

# DL
import torch

# Vector Databases
import faiss

# General
import os
import re
import webbrowser
import numpy as np
import pandas as pd
from IPython.display import display, HTML, Markdown

# Youtube
from youtubesearchpython import VideosSearch
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound

# Hugging Face
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM
from transformers import DPRQuestionEncoder, DPRContextEncoder, DPRQuestionEncoderTokenizer, DPRContextEncoderTokenizer

# Langchain | Elevenlabs | Langchain Agents
from langchain.prompts import PromptTemplate
from langchain import LLMChain
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_community.tools.eleven_labs.text2speech import ElevenLabsText2SpeechTool

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from decimal import MIN_EMIN
#@title Load Constants
DESTINATION = "Amsterdam"
PREFERENCES = ["Museums", "Outdoor Activities"]
MAX_RESULTS = 20 # Number of videos to fetch
MIN_VIEWS = 10000 # 10,000 views # Threshold for minimum views
LLM = "facebook/bart-large-cnn"
LOCAL_LLM = "llama3"
MAX_TOKENS = 1000

# Building a Pipeline Step by Step

##**1. Search YouTube Videos**

In [None]:
# **Search YouTube Videos**

def parse_views(views_str):
    """
    Parses the views string from YouTube and converts it to an integer.

    5,909 views -> 5909

    """
    # Remove the "views" part and any commas
    views_str = views_str.lower().replace('views', '').replace(',', '').strip()
    return int(views_str)  # In case of any parsing error

def fetch_youtube_videos(destination, preferences, MIN_VIEWS, MAX_RESULTS):
  """
  Fetches relevant YouTube videos based on the destination and user preferences.

  Parameters:
  - destination (str): The travel destination (e.g., 'Amsterdam').
  - preferences (list): List of user-selected preferences (e.g., ['Museums', 'Outdoor Activities']).
  - max_results (int): Maximum number of videos to fetch.

  Returns:
  - videos (list): List of dictionaries containing video details with views parsed as integers.
  """
  # Combine preferences into a search query
  preference_query = " ".join(preferences)
  search_query = f"{destination} travel guide{preference_query} Netherlands"

  # Initializing VideoSearch
  video_search = VideosSearch(search_query, limit=MAX_RESULTS)

  # Execute Search
  search_results = video_search.result()

  videos = []
  for video in search_results['result']:
      views_str = video['viewCount']['text']
      views = parse_views(views_str)
      print(f"parseview : {views}")

      # Filter out videos with fewer than MIN_VIEWS
      if views < MIN_VIEWS:
          continue  # Skip this video

      video_data = {
          'Title': video['title'],
          'Duration': video['duration'],
          'Views': views,  # Store as integer
          'Channel': video['channel']['name'],
          'Link': video['link']
        }
      videos.append(video_data)

  return videos

In [None]:
#@title Videos List
print("Fetching Relevent Youtube Videos about traveling in Amsterdam...\n")
videos = fetch_youtube_videos(DESTINATION, PREFERENCES, MIN_VIEWS, MAX_RESULTS)

In [None]:
len(videos)

In [None]:
videos

In [None]:
#@title Display Videos

def display_videos(videos):
  """
  Displays the list of videos in a pandas DataFrame and optionally opens them in the browser.

  Parameters:
  - videos (list): List of dictionaries containing video details.
  """
  if not videos:
    print("No videos found with more than 10,000 views.")
    return

  # Create a DataFrame for better display
  videos_df = pd.DataFrame(videos)
  print("\n Fetched YouTube Videos (Filtered by > 10,0000 views):")
  print(videos_df[["Title", "Duration","Views","Channel"]].to_string(index = False))

  # Display Clickable links
  print("\n Click the links below to watch the videos:\n")
  for idx, video in enumerate(videos, 1):
    # Display as Markdown link
    display(HTML(f"{idx}. <a href = '{video['Link']}' target = '_blsnk'>{video['Title']}</a>"))
  return videos_df

videos_df = display_videos(videos)

We can not open Youtube Videos in Webbrowser via Google Colab


In [None]:
# As we see We can't open Youtube videos from the colab so
webbrowser.open('https://www.youtube.com/watch?v=zFABm07RtXk')

In [None]:
videos_df

##**2.Transcript YouTube Videos**

In [None]:
# **2.Transcript YouTube Videos**

def extract_video_id(youtube_url):
  """
  Extracts the video ID from a YouTube URL.

    Parameters:
    - youtube_url (str): The full YouTube video URL.

    Returns:
    - video_id (str): The extracted video ID.
  """

  # Regular expression to extract video ID
  video_id_match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", youtube_url)
  if video_id_match:
    return video_id_match.group(1)
  else:
    return None

In [None]:
# Example
extract_video_id("https://www.youtube.com/watch?v=abcd1234EFG")

# Output: "abcd1234EFG"

In [None]:
#@title Transcription

def extract_transcripts(videos_df):
  # Initializing a new column for Transcripts
  videos_df['Transcript'] = None

  # Iterate over each video and fetch the transcript
  for index, row in videos_df.iterrows():
    youtube_url = row['Link']
    video_title = row['Title']
    video_id = extract_video_id(youtube_url)

    if video_id:
      try:
        # Fetch the transcript using the video ID
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages = ['en'])

        # Cobine the transcript segements into a single string
        transcript = ' '.join([segment['text'] for segment in transcript_list])

        # Assign the transcript to the DataFrame
        videos_df.at[index, "Transcript"] = transcript
        print(f"Transcript fetched for videos: '{video_title}'")

      except TranscriptsDisabled:
        print(f"Transcripts are disabled for this video: '{video_title}'.")
        videos_df.at[index, "Transcript"] = "Transcripts are disabled for this video."

      except NoTranscriptFound:
        print(f"No transcript found for video: '{video_title}'.")
        videos_df.at[index, "Transcript"] = "No transcript found for this video."

      except Exception as e:
        print(f"An error occurred while fetching the transcript for video: '{video_title}'. Error: {e}")
        videos_df.at[index, "Transcript"] = f"An error occurred while fetching the transcript for this video. Error: {e}"
    else:
      print(f"Could not extract video ID from URL: '{youtube_url}'.")
      videos_df.at[index, "Transcript"] = "Invalid YouTube URL."

  return videos_df

In [None]:
videos_df = extract_transcripts(videos_df)

In [None]:
videos_df

## **3.Summarization Using Facebook LLM via Hugging Face**

In [None]:
# **3.Summarization Using Facebook LLM via Hugging Face**

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained(LLM)
summarizer = pipeline("summarization", model = LLM)

# Clean Text Function
def clean_text(text):
  text = re.sub(r'[^\x00-\x7F]+'," ", text)
  text = re.sub(r'\n+',' ', text)
  text = re.sub(r'\s+',' ',text).strip()
  return text

def split_text_into_chunks(text, max_tokens = MAX_TOKENS):
  tokens = tokenizer.encode(text)
  chunks = [tokens[i:i + max_tokens]for i in range(0, len(tokens), max_tokens)]
  return [tokenizer.decode(chunk) for chunk in chunks]

# Split the Text into Chunk
t = clean_text(videos_df['Transcript'][0])
chunks = split_text_into_chunks(t)

# Summarize each chunk
summaries = [summarizer(chunk, max_length = 130, min_length = 30, do_sample = False)[0]['summary_text']for chunk in chunks]

# Combine the summaries if need
final_summary = " ".join(summaries)
final_summary

In [None]:
t

In [None]:
len(t)

In [None]:
tokens = tokenizer.encode(t)
print(f"Number of Tokens: {len(tokens)}")

In [None]:
videos_df.iloc[0:1]

In [None]:
#@title Summarizer

def split_text_into_chunsk(text, max_tokens = MAX_TOKENS):
  tokenizer = AutoTokenizer.from_pretrained(LLM)
  tokens = tokenizer.encode(text)
  chunks = [tokens[i:i + max_tokens] for i in range(0, len(tokens), max_tokens)]
  return [tokenizer.decode(chunk) for chunk in chunks]

def summarize_text(transcript, summarizer_pipeline):
  """
  Summarizes the provided transcript using the BART model.

  Parameters:
  - transcript (str): The video transcript to summarize.

  Returns:
  - summary (str): The summarized text or an error message.

  """
  try:

    # Split the text into chunks
    t = clean_text(transcript)
    chunks = split_text_into_chunks(t)

    # Summarizer each chunk
    summarizer = [summarizer_pipeline(chunk, max_length = 130, min_length = 30, do_sample = False)[0]['summary_text']for chunk in chunks]

    # Comibine the summaries if Needed
    result = ' '.join(summaries)

    if len(result) > 0:
      return result
    else:
      return "No summary generated."
  except Exception as e:
    # Return an error message in case of failure
    return f"Error summarizing transcript:  {e}"

# Initialize a new column for summaries
videos_df['Summary'] = None
summarizer_pipeline = pipeline("summarization", model = LLM)

# Iterate over each transcript and generated summaries
for index, row in videos_df.iloc[0:1].iterrows():
  transcript = row['Transcript']
  video_title =row['Title']

  if transcript and "transcripts are disabled" not in transcript.lower() and "no transcripts found" not in transcript.lower():
    print(f"Summarizing transcript for video: {video_title}")
    try:
      summary = summarize_text(transcript, summarizer_pipeline)
      videos_df.at[index, 'Summary'] = summary
      print(f"Summary generated for video: {video_title}")
    except Exception as e:
      print(f"Error summarizing transcript for video: {video_title}. Error: {e}")
      videos_df.at[index, 'Summary'] = f"Error summarizing transcript: {e}"
  else:
    print(f"No valid transcript found for video: '{video_title}'. Skipping summarization.\n")
    videos_df.at[index, 'Summary'] = "No transcript found for this video."

In [None]:
summary

In [None]:
# Display the updated DataFrame with summaries
videos_df

## 4.Building Dense Passage Retrieval (DPR)
where we can efficiently retrieve the most relevant parts of the transcripts related to travel in Amsterdam. After that, we can process or summarize the retrieved passages.
### 4.1 Set Up DPR Models (Query Encoder and Passage Encoder)

Hugging Face provides DPR models with two main components:
Query Encoder: Encodes the input query.
Passage Encoder: Encodes the passages to be searched.

In [None]:
# Load DPR Question encoder and tokenizer for the query
query_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
query_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')

# Load DPR context encoder and tokenizer for the passages(i.e , transcript sections)
passage_encoder = DPRContextEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
passage_tokenizer = DPRContextEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')

### 4.2 Encode Passages (Transcripts)
We will encode the passages (i.e., parts of the transcript) and store their embeddings for efficient retrieval.

In [None]:
# Encode passage (transcripts) using the DPR context encoder

def encode_passage(videos_df):
  passages = videos_df['Transcript'].tolist()
  passage_embeddings = []

  for passage in passages:
    inputs = passage_tokenizer(passage, return_tensors = 'pt', max_length = 512, truncation = True, padding = True)
    with torch.no_grad():
      embedding = passage_encoder(**inputs).pooler_output
    passage_embeddings.append(embedding.numpy())

  # Convert to a Numpy Array
  passage_embeddings = np.vstack(passage_embeddings)

  return passage_embeddings

# Encode all the transcripts in the DataFrame
passage_embeddings = encode_passage(videos_df)

In [None]:
passage_embeddings

### 4.3 Build a FAISS Index for Fast Retrieval

We'll use FAISS to index the encoded passages, making it faster to retrieve relevant passages based on the query.

In [None]:
# Initialize FAISS index for similarity search
dimension = passage_embeddings.shape[1] # DPR embeding size is 768
faiss_index = faiss.IndexFlatIP(dimension) # Inner Product (dot product) for similarity

# Add the Passage embeddings tot he Index
faiss_index.add(passage_embeddings)
faiss_index

### 4.4 Encode the Query and Retrieve Relevant Passages
Next, we will encode the user's query and use it to retrieve the most relevant passages from the indexed transcripts.

In [None]:
# Function to encode the query and search for the most relevant passages
def search_relevant_passage(query, faiss_index, top_k =3):
  # Encode the query using the DPR Question encoder
  query_inputs = query_tokenizer(query, return_tensors = 'pt', max_lenght = 128, truncation = True, padding = True)

  with torch.no_grad():
    query_embedding = query_encoder(**query_inputs).pooler_output.numpy()
  # Ensure query_embedding is 2D (i.e., shape (1, embeddingd_imension))
  query_embedding = query_embedding.reshape(1, -1)
  # Reshaping to (1, embedding_dimentions)

  # Search for the top-k most similar packages
  distances, indicies = faiss_index.search(query_embedding, top_k)

  print(f"distance: {distances}")
  print(f"indicies: {indicies}")

  # Retriver the corresponding transcripts and titles
  retrived_passages = []
  for i in range(top_k):
    idx = indicies[0][i]
    retrived_passages.append({
        'Title': videos_df.iloc[idx]['Title'],# Get the title using the index
        'Transcript': videos_df.iloc[idx]['Transcript'],# Get the transcript using the index
        'Similarity': distances[0][i]# Get the corresponding similarity score using i, not
    })

  return retrived_passages

# Example query
query = "What are the most best museums in Amsterdam ?"

# Searc for the most relevant passage
retrived_passages = search_relevant_passage(query, faiss_index, top_k = 2)

# Display Retrived results
for passage in retrived_passages:
  print(f"Title: {passage['Title']}")
  print(f"Similarity: {passage['Similarity']}")
  print(f"Transcript: {passage['Transcript']}\n")

In [None]:
# Function to encode the query and search for the most relevant passages
def search_relevant_passages(videos_df, faiss_index, query, top_k=3):
    # Encode the query using the DPR question encoder
    query_inputs = query_tokenizer(query, return_tensors='pt', max_length=128, truncation=True, padding=True)

    with torch.no_grad():
        query_embedding = query_encoder(**query_inputs).pooler_output.numpy()

    # Ensure query_embedding is 2D (i.e., shape (1, embedding_dimension))
    query_embedding = query_embedding.reshape(1, -1)  # Reshaping to (1, embedding_dimension)

    # Search for the top-k most similar passages
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Filter the DataFrame based on the retrieved indices
    top_k_indices = indices[0][:top_k]
    top_k_videos = videos_df.iloc[top_k_indices].copy()
    top_k_videos['Similarity Score'] = distances[0][:top_k]

    # Add the query as a new column in the DataFrame
    top_k_videos['Similarity Score'] = distances[0][:top_k]

    # Add the Query as a new column in the DataFrame
    top_k_videos['Query'] = query

    return top_k_videos
# Example query
query = "What are the best museums in Amsterdam ?"

# Search for the mosr relevant passage and filter videos from the videos_df
top_k_videos = search_relevant_passages(videos_df, faiss_index, query, top_k = 3)

# Display the Top_k filtered videos along with the similarity scores and query
print(top_k_videos[['Title', 'Similarity Score', 'Query']])
top_k_videos[['Title','Transcript','Similarity Score','Query']]

In [None]:
# Function to encode the query and search for the most relevant passages
def search_relevant_passages(videos_df, faiss_index, query, top_k=3):
    # Encode the query using the DPR question encoder
    query_inputs = query_tokenizer(query, return_tensors='pt', max_length=128, truncation=True, padding=True)

    with torch.no_grad():
        query_embedding = query_encoder(**query_inputs).pooler_output.numpy()

    # Ensure query_embedding is 2D (i.e., shape (1, embedding_dimension))
    query_embedding = query_embedding.reshape(1, -1)  # Reshaping to (1, embedding_dimension)

    # Search for the top-k most similar passages
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Filter the DataFrame based on the retrieved indices
    top_k_indicies = indices[0][:top_k]
    top_k_videos = videos_df.iloc[top_k_indicies].copy()
    top_k_videos['Similarity Score'] = distances[0][:top_k]

    # Add the query as a new column in the DataFrame
    top_k_videos['Similarity Score'] = distances[0][:top_k]

    # Add the Query as a new column in the DataFrame
    top_k_videos['Query'] = query

    return top_k_videos
# Example query
query = "What are the best museums in Amsterdam ?"

# Search for the mosr relevant passage and filter videos from the videos_df
top_k_videos = search_relevant_passages(videos_df, faiss_index, query, top_k = 3)

# Display the Top_k filtered videos along with the similarity scores and query
print(top_k_videos[['Title', 'Similarity Score', 'Query']])
top_k_videos[['Title','Transcript','Similarity Score','Query']]

## 5. Building Travel Agent to Generate Top 10 Queries


In [None]:
!ollama list

In [None]:
# Function to Generate Questions using ChatLLama From Ollama
def generate_questions(city):
  prompt = f"Act as an Travel Agent and Expert in {city} tour Guide. Generate a list of the top 10 questions that a first-time traveler might ask about visiting {city}."

  # Use the Chat API to Generate response
  response = ollama.chat(model = "llama3.1:8b", messages = [{'role':"user","content":prompt}])
  return response

# Example usage
city = 'Amsterdam'
top_10_questions = generate_questions(city)
print(top_10_questions)


In [None]:
display(Markdown(top_10_questions['message']['content']))

In [None]:
# Function to generate questions using LLaMA 3 via Ollama with a refined prompt
def generate_questions(city):
    # Refined prompt to ask specifically for only the questions, without extra text
    prompt = f"""
    As a travel guide expert, generate a list of the top 10 questions that a first-time traveler might ask about visiting {city}.
    Please provide only the questions, numbered 1 to 10, without any additional descriptions.
    """

    # Use Ollama's Chat API to generate the questions
    response = ollama.chat(model="llama3.1:8b", messages=[{"role": "user", "content": prompt}])

    return response['message']['content']

# Function to display the questions beautifully with Markdown
def display_questions_with_markdown(city):
    # Generate the top 10 questions
    questions_text = generate_questions(city)

    # Convert the text into a Markdown-friendly format
    markdown_output = f"### Top 10 Questions for First-Time Travelers to {city}:\n\n{questions_text}"

    # Display the formatted text using Markdown
    display(Markdown(markdown_output))

# Example usage
city = "Amsterdam"
display_questions_with_markdown(city)


# 5.Create Text to Speech Agent

In [None]:
import os
from google.colab import userdata
os.environ["ELEVEN_API_KEY"] = userdata.get('ELEVEN_API_KEY')

In [None]:
#@title ElevenLabs without Agents

text_to_speak = '''Top 10 Questions for First-Time Travelers to Amsterdam:
Here are the top 10 questions that a first-time traveler might ask about visiting Amsterdam:

What is the best way to get around Amsterdam?
Is Amsterdam safe for tourists?
What are some must-see attractions in Amsterdam?
Can I drink the tap water in Amsterdam?
Are there any specific dress code or cultural norms I should be aware of?
How much money do I need to budget for food and activities?
Is Amsterdam a good place for solo travelers or couples?
What are some popular neighborhoods or areas to stay in?
Can I bring my own bike or rent one in Amsterdam?
Are there any unique or quirky experiences I should have while visiting Amsterdam?'''

tts = ElevenLabsText2SpeechTool()
print(tts.name)

In [None]:
speech_file = tts.run(text_to_speak)
speech_file

In [None]:
tts.play(speech_file)

In [None]:
#@title ## 5.2 ElevenLAbs with Agents
tools = load_tools(["eleven_labs_text2speech"])

agent = initialize_agent(
    tools=tools,
    llm=LLM,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)
audio_file = agent.run(text_to_speak)