<a href="https://colab.research.google.com/github/atilatech/atila-core-service/blob/add_long_form_answering/atlas/notebooks/question_answering_youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Answer Questions using Youtube

This notebook shows how to give long-form answers to questions using Youtube.

Inspired by [Abstractive Question Answering](https://docs.pinecone.io/docs/abstractive-question-answering) and [Long Form Question Answering in Haystack](https://www.pinecone.io/learn/haystack-lfqa/).

This tutorial builds on the previous tutorial, Create an Atlas service[todo add link], that showed how to index Youtube videos and return matching sections of a video given a search term. 

This tutorial will be covering how to take those matching sections and combine them together to generate a long-form answer.

At a high-level it is a 2 step process:

1. Find sentences that have the relevant sections

2. Combine the sections together to form a coherent answer

## Get Relevant Context

First we are going to send a query "best exercises for longevity" and it will return all the videos that are related to the topics, exercise and longevity.

In [None]:
%pip install pinecone-client requests

## Get API Keys

1. You will need a [Pinecone API key (free)](https://app.pinecone.io/).
2. Deploy [this model](https://huggingface.co/tomiwa1a/openai-whisper-endpoint) as an inference endpoint.

In [None]:
from getpass import getpass
# getpass tip: https://stackoverflow.com/a/54577734/5405197
PINECONE_API_KEY = getpass('Enter PINECONE_API_KEY')
HUGGING_FACE_API_KEY = getpass('Enter HUGGING_FACE_API_KEY')
# replace this with your HUGGING_FACE_ENDPOINT_URL
HUGGING_FACE_ENDPOINT_URL = "https://rl2hxotyspedkt19.us-east-1.aws.endpoints.huggingface.cloud"

In [3]:
import requests
import pinecone
import json
from typing import Union

pinecone_index_id = "youtube-search"

pinecone.init(
    api_key=PINECONE_API_KEY,
    environment="us-west1-gcp"
)

def send_encoding_request(query: Union[str, list]):
    payload = json.dumps({
        "inputs": "",  # inputs key is not used but our endpoint expects it
        "query": query,
    })
    headers = {
        'Authorization': f'Bearer {HUGGING_FACE_API_KEY}',
        'Content-Type': 'application/json'
    }

    response = requests.request("POST", HUGGING_FACE_ENDPOINT_URL, headers=headers, data=payload)
    return response.json()

pinecone_index = pinecone.Index(pinecone_index_id)
def query_model(query, video_id=""):
    encoded_query = send_encoding_request(query)
    metadata_filter = {"video_id": {"$eq": video_id}} if video_id else None
    vectors = encoded_query['encoded_segments'][0]['vectors']
    return pinecone_index.query(vectors, top_k=5,
                                include_metadata=True,
                                filter=metadata_filter).to_dict()

In [None]:
query = "best exercises for longevity"
results = query_model(query)
results['matches'][3]

## Create Generator Model

Next, we create our generator, which will take the given paragraphs and combine them together to give an answer.

> Generators are sequence-to-sequence (Seq2Seq) models that take the query and retrieved contexts as input and use them to generate an output, the answer.

[Long-Form Question-Answering](https://www.pinecone.io/learn/haystack-lfqa/#:~:text=Generators%20are%20sequence%2Dto%2Dsequence%20(Seq2Seq)%20models%20that%20take%20the%20query%20and%20retrieved%20contexts%20as%20input%20and%20use%20them%20to%20generate%20an%20output%2C%20the%20answer.)

You can think of it as a model that takes a piece of text, transforms it and generates another piece of text. We will use the [bart_lfqa model](https://towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb) which [can be found on huggingface](https://huggingface.co/vblagoje/bart_lfqa).

In [None]:
%pip install -U transformers torch

In [27]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "vblagoje/bart_lfqa"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model = model.to(device)


In [35]:
def generate_answer(query, documents):

    # concatenate question and support documents into BART input
    conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
    query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

    model_input = tokenizer(query_and_docs, truncation=False, padding=True, return_tensors="pt")

    generated_answers_encoded = model.generate(input_ids=model_input["input_ids"].to(device),
                                            attention_mask=model_input["attention_mask"].to(device),
                                            min_length=64,
                                            max_length=256,
                                            do_sample=False, 
                                            early_stopping=True,
                                            num_beams=8,
                                            temperature=1.0,
                                            top_k=None,
                                            top_p=None,
                                            eos_token_id=tokenizer.eos_token_id,
                                            no_repeat_ngram_size=3,
                                            num_return_sequences=1)
    answer = tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    return answer
# # below is the abstractive answer generated by the model
# ["When you heat water to

In [36]:
query = "what is egcg"
context_results = query_model(query)

answer_context = [sentence['metadata']['text'] for sentence in context_results['matches']]

generate_answer(query, answer_context)

query_and_docs question: what is egcg context: <P>  grain. EGCG is a polyphenol found in green tea and a potent antioxidant that has shown effectiveness  against various conditions, including androgenic alopecia. Combating hair loss is not just about looks,  understanding the mechanisms of senescent alopecia and ways to reverse it can provide insights into  other aspects of aging. In this new study, the researchers used an emerging micro needle technology  to deliver drugs directly to the inner layers of the skin. Cone like micro needles were loaded  with nanoparticles containing rapamycin, EGCG, or a combination. The micro needles were applied to <P>  using dissolvable micro needles loaded with brappa mice in and epi-galocatican galate or EGCG  and active ingredients in green tea. Studies have found that rapamycin, one of the most promising  general protective drugs, not only stimulates hair regrow, but can also partially reverse hair  grain. EGCG is a polyphenol found in green tea an

['Epi-Galocatican Galate or EGCG is a polyphenol found in green tea and a potent antioxidant that has shown effectiveness  against various conditions, including androgenic alopecia. In a study, the researchers used an emerging micro needle technology  to deliver drugs directly to the inner layers of the skin. The micro needles were applied to a dissolvable micro needles loaded with brappa mice in and epi-galocatican galate  and active ingredients in Green tea. The results were dose-dependent, with moderate doses of rapamycin being the  most effective. The researchers also confirmed that the treatment resulted in increased  autophagy in follicular regions, and promoting Autophagy is currently thought to be the central mechanism of action. This study reiterates the health potential of two  molecules popular in the longevity field. Additionally, this micro needle-based  drug delivery method could potentially be used to treat various other skin conditions.']

In [39]:
context_results['matches'][0]

{'id': 'GK5YNAJrRWc-t38',
 'score': 26.6459885,
 'values': [],
 'sparseValues': {},
 'metadata': {'end': 45.0,
  'id': 'GK5YNAJrRWc-t38',
  'length': 252.0,
  'start': 38.0,
  'text': ' grain. EGCG is a polyphenol found in green tea and a potent antioxidant that has shown effectiveness  against various conditions, including androgenic alopecia. Combating hair loss is not just about looks,  understanding the mechanisms of senescent alopecia and ways to reverse it can provide insights into  other aspects of aging. In this new study, the researchers used an emerging micro needle technology  to deliver drugs directly to the inner layers of the skin. Cone like micro needles were loaded  with nanoparticles containing rapamycin, EGCG, or a combination. The micro needles were applied to',
  'thumbnail': 'https://i.ytimg.com/vi/GK5YNAJrRWc/sddefault.jpg',
  'title': '"Longevity Molecules" Preserve Hair & Hearing in Mice',
  'url': 'https://youtu.be/GK5YNAJrRWc?t=38',
  'video_id': 'GK5YNAJrRWc'

## Deploy to Huggingface

Take this function and combine it with the functions in the previous step and create an endpoint handler that can be used in Huggingface.

## Install Dependencies

In [None]:
!pip install transformers sentence_transformers pytube

# optional install pytorch so you can use a gpu for faster transcription
# command below is for Linux. See instructions for mac and windows: https://pytorch.org/get-started/locally/
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

!pip install git+https://github.com/openai/whisper.git -q
!apt install ffmpeg # https://stackoverflow.com/questions/51856340/how-to-install-package-ffmpeg-in-google-colab

In [49]:
from typing import Dict

from sentence_transformers import SentenceTransformer
from tqdm import tqdm
import whisper
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import pytube
import time


class EndpointHandler():
    # load the model
    WHISPER_MODEL_NAME = "tiny.en"
    SENTENCE_TRANSFORMER_MODEL_NAME = "multi-qa-mpnet-base-dot-v1"
    QUESTION_ANSWER_MODEL_NAME = "vblagoje/bart_lfqa"
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def __init__(self, path=""):

        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f'whisper and question_answer_model will use: {device}')

        t0 = time.time()
        self.whisper_model = whisper.load_model(self.WHISPER_MODEL_NAME).to(device)
        t1 = time.time()

        total = t1 - t0
        print(f'Finished loading whisper_model in {total} seconds')

        t0 = time.time()
        self.sentence_transformer_model = SentenceTransformer(self.SENTENCE_TRANSFORMER_MODEL_NAME)
        t1 = time.time()

        total = t1 - t0
        print(f'Finished loading sentence_transformer_model in {total} seconds')
        
        self.question_answer_tokenizer = AutoTokenizer.from_pretrained(self.QUESTION_ANSWER_MODEL_NAME)
        t0 = time.time()
        self.question_answer_model = AutoModelForSeq2SeqLM.from_pretrained(self.QUESTION_ANSWER_MODEL_NAME).to(device)
        t1 = time.time()
        total = t1 - t0
        print(f'Finished loading question_answer_model in {total} seconds')

    def __call__(self, data: Dict[str, str]) -> Dict:
        """
        Args:
            data (:obj:):
                includes the URL to video for transcription
        Return:
            A :obj:`dict`:. transcribed dict
        """
        # process input
        print('data', data)

        if "inputs" not in data:
            raise Exception(f"data is missing 'inputs' key which  EndpointHandler expects. Received: {data}"
                            f" See: https://huggingface.co/docs/inference-endpoints/guides/custom_handler#2-create-endpointhandler-cp")
        video_url = data.pop("video_url", None)
        query = data.pop("query", None)
        long_form_answer = data.pop("long_form_answer", None)
        encoded_segments = {}
        if video_url:
            video_with_transcript = self.transcribe_video(video_url)
            video_with_transcript['transcript']['transcription_source'] = f"whisper_{self.WHISPER_MODEL_NAME}"
            encode_transcript = data.pop("encode_transcript", True)
            if encode_transcript:
                encoded_segments = self.combine_transcripts(video_with_transcript)
                encoded_segments = {
                    "encoded_segments": self.encode_sentences(encoded_segments)
                }
            return {
                **video_with_transcript,
                **encoded_segments
            }
        elif query:
            if long_form_answer:
                context = data.pop("context", None)
                answer = self.generate_answer(query, context)
                response = {
                    "answer": answer
                }

                return response
            else:
                query = [{"text": query, "id": ""}] if isinstance(query, str) else query
                encoded_segments = self.encode_sentences(query)

                response = {
                    "encoded_segments": encoded_segments
                }

                return response

        else:
            return {
                "error": "'video_url' or 'query' must be provided"
            }

    def transcribe_video(self, video_url):
        decode_options = {
            # Set language to None to support multilingual,
            # but it will take longer to process while it detects the language.
            # Realized this by running in verbose mode and seeing how much time
            # was spent on the decoding language step
            "language": "en",
            "verbose": True
        }
        yt = pytube.YouTube(video_url)
        video_info = {
            'id': yt.video_id,
            'thumbnail': yt.thumbnail_url,
            'title': yt.title,
            'views': yt.views,
            'length': yt.length,
            # Althhough, this might seem redundant since we already have id
            # but it allows the link to the video be accessed in 1-click in the API response
            'url': f"https://www.youtube.com/watch?v={yt.video_id}"
        }
        stream = yt.streams.filter(only_audio=True)[0]
        path_to_audio = f"{yt.video_id}.mp3"
        stream.download(filename=path_to_audio)
        t0 = time.time()
        transcript = self.whisper_model.transcribe(path_to_audio, **decode_options)
        t1 = time.time()
        for segment in transcript['segments']:
            # Remove the tokens array, it makes the response too verbose
            segment.pop('tokens', None)

        total = t1 - t0
        print(f'Finished transcription in {total} seconds')

        # postprocess the prediction
        return {"transcript": transcript, 'video': video_info}

    def encode_sentences(self, transcripts, batch_size=64):
        """
        Encoding all of our segments at once or storing them locally would require too much compute or memory.
        So we do it in batches of 64
        :param transcripts:
        :param batch_size:
        :return:
        """
        # loop through in batches of 64
        all_batches = []
        for i in tqdm(range(0, len(transcripts), batch_size)):
            # find end position of batch (for when we hit end of data)
            i_end = min(len(transcripts), i + batch_size)
            # extract the metadata like text, start/end positions, etc
            batch_meta = [{
                **row
            } for row in transcripts[i:i_end]]
            # extract only text to be encoded by embedding model
            batch_text = [
                row['text'] for row in batch_meta
            ]
            # create the embedding vectors
            batch_vectors = self.sentence_transformer_model.encode(batch_text).tolist()

            batch_details = [
                {
                    **batch_meta[x],
                    'vectors': batch_vectors[x]
                } for x in range(0, len(batch_meta))
            ]
            all_batches.extend(batch_details)

        return all_batches

    def generate_answer(self, query, documents):

        # concatenate question and support documents into BART input
        conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
        query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

        model_input = self.question_answer_tokenizer(query_and_docs, truncation=False, padding=True, return_tensors="pt")

        generated_answers_encoded = self.question_answer_model.generate(input_ids=model_input["input_ids"].to(self.device),
                                                attention_mask=model_input["attention_mask"].to(self.device),
                                                min_length=64,
                                                max_length=256,
                                                do_sample=False, 
                                                early_stopping=True,
                                                num_beams=8,
                                                temperature=1.0,
                                                top_k=None,
                                                top_p=None,
                                                eos_token_id=self.question_answer_tokenizer.eos_token_id,
                                                no_repeat_ngram_size=3,
                                                num_return_sequences=1)
        answer = self.question_answer_tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)
        return answer

    @staticmethod
    def combine_transcripts(video, window=6, stride=3):
        """

        :param video:
        :param window: number of sentences to combine
        :param stride: number of sentences to 'stride' over, used to create overlap
        :return:
        """
        new_transcript_segments = []

        video_info = video['video']
        transcript_segments = video['transcript']['segments']
        for i in tqdm(range(0, len(transcript_segments), stride)):
            i_end = min(len(transcript_segments), i + window)
            text = ' '.join(transcript['text']
                            for transcript in
                            transcript_segments[i:i_end])
            # TODO: Should int (float to seconds) conversion happen at the API level?
            start = int(transcript_segments[i]['start'])
            end = int(transcript_segments[i]['end'])
            new_transcript_segments.append({
                **video_info,
                **{
                    'start': start,
                    'end': end,
                    'title': video_info['title'],
                    'text': text,
                    'id': f"{video_info['id']}-t{start}",
                    'url': f"https://youtu.be/{video_info['id']}?t={start}",
                    'video_id': video_info['id'],
                }
            })
        return new_transcript_segments


In [50]:
my_handler = EndpointHandler(path="")

whisper and question_answer_model will use: cuda
Finished loading whisper_model in 0.6368415355682373 seconds
Finished loading sentence_transformer_model in 0.9580769538879395 seconds
Finished loading question_answer_model in 4.88532018661499 seconds


In [52]:
query = "what is egcg"
context_results = query_model(query)

answer_context = [sentence['metadata']['text'] for sentence in context_results['matches']]

payload = {"query": "what is egcg", "inputs": "",
           'long_form_answer': True, 'context': answer_context}
payload_pred=my_handler(payload)
payload_pred

data {'query': 'what is egcg', 'inputs': '', 'long_form_answer': True, 'context': [' grain. EGCG is a polyphenol found in green tea and a potent antioxidant that has shown effectiveness  against various conditions, including androgenic alopecia. Combating hair loss is not just about looks,  understanding the mechanisms of senescent alopecia and ways to reverse it can provide insights into  other aspects of aging. In this new study, the researchers used an emerging micro needle technology  to deliver drugs directly to the inner layers of the skin. Cone like micro needles were loaded  with nanoparticles containing rapamycin, EGCG, or a combination. The micro needles were applied to', ' using dissolvable micro needles loaded with brappa mice in and epi-galocatican galate or EGCG  and active ingredients in green tea. Studies have found that rapamycin, one of the most promising  general protective drugs, not only stimulates hair regrow, but can also partially reverse hair  grain. EGCG is a 

{'answer': ['Epi-Galocatican Galate or EGCG is a polyphenol found in green tea and a potent antioxidant that has shown effectiveness  against various conditions, including androgenic alopecia. In a study, the researchers used an emerging micro needle technology  to deliver drugs directly to the inner layers of the skin. The micro needles were applied to a dissolvable micro needles loaded with brappa mice in and epi-galocatican galate  and active ingredients in Green tea. The results were dose-dependent, with moderate doses of rapamycin being the  most effective. The researchers also confirmed that the treatment resulted in increased  autophagy in follicular regions, and promoting Autophagy is currently thought to be the central mechanism of action. This study reiterates the health potential of two  molecules popular in the longevity field. Additionally, this micro needle-based  drug delivery method could potentially be used to treat various other skin conditions.']}

In [None]:
payload = {"video_url": "https://www.youtube.com/watch?v=ciKdF97JWpU", 'inputs': ''} # Jimmy Butler Reveals What Made Him Leave the Philadelphia 76ers | The JJ Redick Podcast | The Ringer


# # test the handler
payload_pred=my_handler(payload) # note this line might give 'AttributeError: 'OutStream' object has no attribute 'buffer'' error
payload_pred 

In [None]:
payload = {"query": "basketball", 'inputs': ''} # Jimmy Butler Reveals What Made Him Leave the Philadelphia 76ers | The JJ Redick Podcast | The Ringer


# # test the handler
payload_pred=my_handler(payload)
payload_pred