<a href="https://colab.research.google.com/github/claudio1975/Medium-blog/blob/master/Langchain_%26_OpenAI_in_Action/Video_Data_Analysis_with_LangChain_%26_OpenAI_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Import Libraries and prepare Workspace

datasource: https://www.youtube.com/watch?v=mnoCy0j7DNs

Future Food | The Menu of 2030 it's a video retrieved from YouTube

The video introduces the topic of future food and the challenges of feeding a growing population by 2050.

It explores possible food sources that could be on our menu in 2030: insects, lab meat, algae, farmed fish, and GMOs.




In [None]:
# Install the langchain package
!pip install langchain &> /dev/null

In [None]:
# Install the langchain package with openai
!pip install -U langchain-openai &> /dev/null

In [None]:
# Install the chromadb
!pip install chromadb &> /dev/null

In [None]:
# Install the tiktoken
!pip install tiktoken &> /dev/null

In [None]:
# Install youtube-transcript-api
!pip install youtube-transcript-api &> /dev/null


In [None]:
# Install pytube
!pip install pytube &> /dev/null

In [None]:
# Import the os package
import os
# Import the openai package
import openai
# Set openai.api_key to the OPENAI_API_KEY environment variable
os.environ["OPENAI_API_KEY"] = ""

In [None]:
# Import the langchain package and modules
import langchain as lc
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import YoutubeLoader
from langchain_community.document_loaders import GoogleApiYoutubeLoader
from langchain.chains import RetrievalQA
from langchain.chains import create_tagging_chain
from langchain.chains import create_extraction_chain
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

# From the langchain.schema module, import AIMessage, HumanMessage, SystemMessage
from langchain.schema import AIMessage, HumanMessage, SystemMessage

In [None]:
# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown

In [None]:
# Create a ChatOpenAI object. Assign to chat.
chat = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
lc.__version__

'0.1.6'

In [None]:
openai.__version__

'1.12.0'

### Have a look of the video metadata

In [None]:
loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=mnoCy0j7DNs",
    add_video_info=True,
    language=["en", "id"],
    translation="en",
)
video=loader.load()

In [None]:
video

[Document(page_content="future food the menu of 2030 the world's  population has been increasing faster  than food production even with modern  agricultural technology there will be  nine billion people to feed by 2050  researchers have been looking at new  food sources tweaking existing ones and  even creating entirely new foods we  examine what could be on our dinner  table 20 to 30 years from now  critters a 2013 UN Food and Agricultural  Organization report reminds us that  there are 1,900 arable insect species  out there that some 2 billion Earthlings  already regularly consume beetles  butterflies moths bees and locusts  insects are abundantly available and  rich in low-fat protein fiber and  minerals lab meat scientists came up  with synthetic meat grown in the lab as  early as 2013 scientists have already  cultured ground beef from cows stem  cells although that lab patty cost three  hundred and thirty thousand dollars to  make and tasted quite bland  experts predict it will on

### Chunking Text

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=20, add_start_index=True
)
text = text_splitter.split_documents(video)

### Embedding and Upload the data into a VectorStore

In [None]:
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(text, embeddings)

### Natural Language Retrieval

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    chat,
    retriever=vectorstore.as_retriever(),
    chain_type='stuff'
)

#### Q&A Analysis

In [None]:
# Possible Food sources
question = "Could you tell me the possible food sources on our menu in 2030 ?"
result = qa_chain.invoke({"query": question})
display(result['result'].split('\n'))

['Possible food sources on our menu in 2030 could include:',
 '',
 '1. Insects: With over 1,900 arable insect species, insects are seen as a potential solution for food shortages. Beetles, butterflies, moths, bees, and locusts are already consumed by 2 billion people and are rich in low-fat protein, fiber, and minerals.',
 '',
 "2. Algae: Algae, which is already used as a biofuel, is being explored as a food source. It can be grown in both oceans and freshwater and is considered the fastest-growing plant on Earth. Algae farming has the potential to become the world's biggest crop industry.",
 '',
 '3. Lab-grown meat: Scientists have been working on growing synthetic meat in the lab. While it was initially expensive and lacked flavor, experts predict that it will become more affordable and tastier in the future.',
 '',
 '4. Customized food: Advancements in technology, such as 3D printing, may allow for the customization of food shapes, textures, tastes, and forms. This could enable indi

#### Length Analysis

In [None]:
# Length of text
question = "Could you tell me how many words are in the 'text'?"
result = qa_chain.invoke({"query": question})
display(result['result'].split('\n'))

['The number of words in the given text is 165.']

#### Key word Analysis

In [None]:
# Key word details
question = "Could you provide the key words in the 'Future Food | The Menu of 2030' video?"
result = qa_chain.invoke({"query": question})
display(result['result'].split('\n'))

["The key words in the 'Future Food | The Menu of 2030' video are:",
 '',
 '- future food',
 '- menu of 2030',
 "- world's population",
 '- increasing faster',
 '- food production',
 '- modern agricultural technology',
 '- nine billion people',
 '- feed by 2050',
 '- new food sources',
 '- tweaking existing ones',
 '- creating entirely new foods',
 '- dinner table',
 '- critters',
 '- UN Food and Agricultural Organization',
 '- arable insect species',
 '- customize food',
 '- shapes',
 '- textures',
 '- tastes',
 '- forms',
 '- order online',
 '- chocolate bar',
 '- snack',
 '- 3D printing',
 '- affordable product',
 '- ground beef algae',
 '- biofuel',
 '- solution for food shortages',
 '- feed humans and animals',
 '- algae farming',
 '- biggest crop industry',
 '- oceans',
 '- freshwater',
 '- insect species',
 '- Earthlings',
 '- consume',
 '- beetles',
 '- butterflies',
 '- moths',
 '- bees',
 '- locusts',
 '- low-fat protein',
 '- fiber',
 '- minerals',
 '- lab meat',
 '- synthet

#### Key Sentence Analysis

In [None]:
# Key sentences details
question = "Could you provide the key sentences in the 'Future Food | The Menu of 2030' video?"
result = qa_chain.invoke({"query": question})
display(result['result'].split('\n'))

['1. "The world\'s population has been increasing faster than food production."',
 '2. "Even with modern agricultural technology, there will be nine billion people to feed by 2050."',
 '3. "Researchers have been looking at new food sources, tweaking existing ones, and even creating entirely new foods."',
 '4. "We examine what could be on our dinner table 20 to 30 years from now."',
 '5. "There are 1,900 arable insect species."',
 '6. "You will be able to fully customize food shapes, textures, tastes, and forms."',
 '7. "Algae is seen as a solution for the problem of food shortages."',
 '8. "Algae farming could become the world\'s biggest crop industry."',
 '9. "Insects are abundantly available and rich in low-fat protein, fiber, and minerals."',
 '10. "Scientists have already cultured ground beef from cow stem cells."']

#### Tagging Data Analysis

In [None]:
# Schema
schema = {
    "properties": {
        "sentiment": {"type": "string"},
        "aggressiveness": {"type": "integer"},
        "language": {"type": "string"},
    }
}

# LLM
tagging = create_tagging_chain(schema, chat)

In [None]:
# Tagging of a sentence
video_sentence = "fortunately humans are aware of this and have implemented sustainable commercial fishing practices and turned to cultivating fish aquaculture is going big with 35 countries producing more farmed fish than fish caught in the wild"
display(tagging.invoke(video_sentence))

{'input': 'fortunately humans are aware of this and have implemented sustainable commercial fishing practices and turned to cultivating fish aquaculture is going big with 35 countries producing more farmed fish than fish caught in the wild',
 'text': {'sentiment': 'positive', 'language': 'English'}}

In [None]:
# Tagging of a sentence
video_sentence="biofuel algae is seen as a solution for the problem of food shortages as it can feed humans and animals alike algae is the fastest growing plant on earth and has long been cultivated in Asia food experts predicts algae farming could become the world's biggest crop industry"
display(tagging.invoke(video_sentence))

{'input': "biofuel algae is seen as a solution for the problem of food shortages as it can feed humans and animals alike algae is the fastest growing plant on earth and has long been cultivated in Asia food experts predicts algae farming could become the world's biggest crop industry",
 'text': {'sentiment': 'positive', 'language': 'English'}}

In [None]:
# Tagging of a sentence
video_sentence="UN Food and Agricultural Organization report reminds us that there are 1,900 arable insect species out there that some 2 billion Earthlings already regularly consume beetles butterflies moths bees and locusts insects are abundantly available and rich in low-fat protein fiber and minerals"
display(tagging.invoke(video_sentence))

{'input': 'UN Food and Agricultural Organization report reminds us that there are 1,900 arable insect species out there that some 2 billion Earthlings already regularly consume beetles butterflies moths bees and locusts insects are abundantly available and rich in low-fat protein fiber and minerals',
 'text': {'sentiment': 'positive', 'language': 'English'}}

#### NER extraction

In [None]:
# Schema
schema = {
    "properties": {
        "name": {"type": "string"},
        "adjective": {"type": "string"},
        "adverb": {"type": "string"}
        },
    "required": ["name","adjective","adverb"],
}

extraction = create_extraction_chain(schema, chat)


In [None]:
# Named Entity of Recognition from a sentence
video_sentence="UN Food and Agricultural Organization report reminds us that there are 1,900 arable insect species out there that some 2 billion Earthlings already regularly consume beetles butterflies moths bees and locusts insects are abundantly available and rich in low-fat protein fiber and minerals"
display(extraction.invoke(video_sentence))

{'input': 'UN Food and Agricultural Organization report reminds us that there are 1,900 arable insect species out there that some 2 billion Earthlings already regularly consume beetles butterflies moths bees and locusts insects are abundantly available and rich in low-fat protein fiber and minerals',
 'text': [{'name': 'insect',
   'adjective': 'abundantly available',
   'adverb': 'regularly'},
  {'name': 'beetle',
   'adjective': 'abundantly available',
   'adverb': 'regularly'},
  {'name': 'butterfly',
   'adjective': 'abundantly available',
   'adverb': 'regularly'},
  {'name': 'moth', 'adjective': 'abundantly available', 'adverb': 'regularly'},
  {'name': 'bee', 'adjective': 'abundantly available', 'adverb': 'regularly'},
  {'name': 'locust',
   'adjective': 'abundantly available',
   'adverb': 'regularly'}]}

In [None]:
# Named Entity of Recognition from a sentence
video_sentence="biofuel algae is seen as a solution for the problem of food shortages as it can feed humans and animals alike algae is the fastest growing plant on earth and has long been cultivated in Asia food experts predicts algae farming could become the world's biggest crop industry"
display(extraction.invoke(video_sentence))

{'input': "biofuel algae is seen as a solution for the problem of food shortages as it can feed humans and animals alike algae is the fastest growing plant on earth and has long been cultivated in Asia food experts predicts algae farming could become the world's biggest crop industry",
 'text': [{'name': 'biofuel algae', 'adjective': 'solution', 'adverb': ''},
  {'name': 'food shortages', 'adjective': 'problem', 'adverb': ''},
  {'name': 'humans', 'adjective': '', 'adverb': 'feed'},
  {'name': 'animals', 'adjective': '', 'adverb': 'feed'},
  {'name': 'algae', 'adjective': 'fastest growing', 'adverb': ''},
  {'name': 'Asia', 'adjective': 'cultivated', 'adverb': ''},
  {'name': 'algae farming', 'adjective': '', 'adverb': 'predicts'},
  {'name': "world's biggest crop industry",
   'adjective': '',
   'adverb': 'become'}]}

In [None]:
# Named Entity of Recognition from a sentence
video_sentence = "fortunately humans are aware of this and have implemented sustainable commercial fishing practices and turned to cultivating fish aquaculture is going big with 35 countries producing more farmed fish than fish caught in the wild"
display(extraction.invoke(video_sentence))

{'input': 'fortunately humans are aware of this and have implemented sustainable commercial fishing practices and turned to cultivating fish aquaculture is going big with 35 countries producing more farmed fish than fish caught in the wild',
 'text': [{'name': 'humans',
   'adjective': 'sustainable',
   'adverb': 'fortunately'},
  {'name': 'aquaculture', 'adjective': 'cultivating', 'adverb': ''}]}