# GPT - Loading Youtube videos using LLama Index and Llama collectors

In this example, rather than training a model from scratch, we can leverage the benefits of using a pre-trained model and enhance its knowledge by extracting information from a YouTube video using Llama collectors and the Llama index. This approach, known as context learning, allows us to reduce resource consumption while creating a customized model specifically designed for geoscience.

Here we will use the LLM from open ai, therefore we need to have an open AI key. 

In [1]:
import os
os.environ['OPENAI_API_KEY'] = "insert your openai key "
os.environ["CUDA_VISIBLE_DEVICES"] = "0" 

from pathlib import Path #needed for the pdf connector
from llama_index import download_loader
from llama_index import SimpleDirectoryReader # Simple reader from txt

from llama_index import (
    GPTVectorStoreIndex,
    GPTEmptyIndex,
    GPTTreeIndex,
    GPTListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)

from llama_index import download_loader


## From Context


Youtube reader from llama connectors

In [2]:
YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")

loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=-irZQY9H6mg'])

# new_index = GPTVectorStoreIndex.from_documents(documents)
new_index = GPTListIndex.from_documents(documents)

Collecting youtube_transcript_api~=0.5.0 (from -r /home/alfarhmy/miniconda3/envs/geodude/lib/python3.9/site-packages/llama_index/readers/llamahub_modules/youtube_transcript/requirements.txt (line 1))
  Downloading youtube_transcript_api-0.5.0-py3-none-any.whl (23 kB)
Installing collected packages: youtube_transcript_api
Successfully installed youtube_transcript_api-0.5.0


Creating the query and the response

In [4]:
# query with embed_model specified
query_engine = new_index.as_query_engine(response_mode='tree_summarize',
    verbose=False
)
response = query_engine.query("What deep learning methods are used to interpolate seismic data")
print(response)


Auto encoders and generative adversarial neural networks are two deep learning methods used to interpolate seismic data. Additionally, a multi-dimensional adversarial gun (MDA gun) method has been introduced for 3D reconstruction of missing traces in seismic data.


In [5]:
# query with embed_model specified
query_engine = new_index.as_query_engine(response_mode='tree_summarize'
)
response_2 = query_engine.query("What is the video about")
print(response_2)


The video is about how cutting-edge machine learning techniques are revolutionizing the way we handle missing tracers in seismic data, leading to a more accurate interpretation. It also discusses the two main categories of machine learning for seismic data interpolation (Auto encoders and generative adversarial neural networks) and introduces a new method called Multi-Dimensional Adversarial GAN (MDA GAN) for reconstruction of missing traces in seismic data.
