#### 1. Can algorithms like SMOTE/ROSE which synthesize new examples from the training data be called Generative AI?

SMOTE are ROSE are techniques used to address class imbalance. SMOTE generates synthetic examples for the minority class by interpolating between existing instances while ROSE randomly selects a minority instance and generates synthetic examples by adding noise to it. Although SMOTE and ROSE address class imbalance, they are not considered true generative models. Generative AI models, on the other hand, create new data samples based on learned patterns. 

#### 2. Implement WikipediaQueryRun tool from LangChain and search for Generative artificial intelligence and obtain the content of the page

In [1]:
from langchain_community.document_loaders import WikipediaLoader

In [2]:
search_term = "Constantinopole"
pages = WikipediaLoader(query=search_term, load_max_docs=1).load()

In [3]:
print(pages[0].page_content)

Constantinople (see other names) became the capital of the Roman Empire during the reign of Constantine the Great in 330. Following the collapse of the Western Roman Empire in the late 5th century, Constantinople remained the capital of the Eastern Roman Empire (also known as the Byzantine Empire; 330–1204 and 1261–1453), the Latin Empire (1204–1261), and the Ottoman Empire (1453–1922). Following the Turkish War of Independence, the Turkish capital then moved to Ankara. Officially renamed Istanbul in 1930, the city is today the largest city in Europe, straddling the Bosporus strait and lying in both Europe and Asia, and the financial centre of Turkey.
In 324, after the Western and Eastern Roman Empires were reunited, the ancient city of Byzantium was selected to serve as the new capital of the Roman Empire, and the city was renamed Nova Roma, or 'New Rome', by Emperor Constantine the Great. On 11 May 330, it was renamed Constantinople and dedicated to Constantine. Constantinople is gen

#### 3. Implement a tool using YouTubeSearchTool and YoutubeLoader to search and obtain transcript of a video obtained on searching 'LangChain' in YouTube 

In [6]:
from langchain_community.document_loaders import YoutubeLoader
from langchain_community.tools import YouTubeSearchTool

In [5]:
! pip install --upgrade --quiet  youtube-transcript-api

In [9]:
! pip install --upgrade --quiet  youtube_search

In [10]:
tool = YouTubeSearchTool()

In [11]:
links=tool.run('Sidemen')

In [12]:
links

"['https://www.youtube.com/watch?v=T-Ky46HVkd0&pp=ygUHU2lkZW1lbg%3D%3D', 'https://www.youtube.com/watch?v=w_SWVBwcpdE&pp=ygUHU2lkZW1lbg%3D%3D']"

In [13]:
import re
sample = r"https://www\.youtube\.com/watch\?v=[\w-]+&pp=[\w-]+"
urls = re.findall(sample, links)

In [16]:
from langchain_community.document_loaders.youtube import TranscriptFormat

loader = YoutubeLoader.from_youtube_url(
    urls[0],
    add_video_info=False,
)
print("\n\n".join(map(repr, loader.load())))



#### 4. XML Parser

In [21]:
from langchain.output_parsers import XMLOutputParser
from langchain_community.chat_models import ChatAnthropic
from langchain_core.prompts import PromptTemplate
from typing import List
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import AzureChatOpenAI

In [22]:
!pip install defusedxml



In [23]:
from langchain_openai import AzureChatOpenAI
model = AzureChatOpenAI(
    temperature=0,
    api_key="534036f31d14400b9f07a2cd7680af25",
    api_version="2024-02-01",
    azure_endpoint="https://dono-rag-demo-resource-instance.openai.azure.com",
    model="GPT_35_TURBO_DEMO_RAG_DEPLOYMENT_DONO"
)

In [24]:
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


In [30]:
actor_query = "Generate the filmography for RajKapoor."
parser = XMLOutputParser(pydantic_object=Actor, tags=["movies", "actor", "film", "name", "genre"])

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": actor_query})
for s in chain.stream({"query": actor_query}):
    print(s)

{'movies': [{'actor': [{'name': 'RajKapoor'}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Awaara'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'Drama'}]}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Shree 420'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'Drama'}]}]}]}
{'movies': [{'actor': [{'film': [{'name': 'Mera Naam Joker'}]}]}]}
{'movies': [{'actor': [{'film': [{'genre': 'Drama'}]}]}]}


#### 4. YAML Parser

In [31]:
from typing import List

from langchain.output_parsers import YamlOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import AzureChatOpenAI

In [33]:
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

In [36]:
actor_query = "Generate the filmography for RajKapoor."
parser = YamlOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": actor_query})
for s in chain.stream({"query": actor_query}):
    print(s)

name='RajKapoor' film_names=['Awaara', 'Shree 420', 'Mera Naam Joker']



#### 5. N-Gram Overlap Selector

In [40]:
!pip install langchain-chroma
!pip install langchain-openai
!pip install --upgrade langchain-community

Collecting langchain-chroma
  Obtaining dependency information for langchain-chroma from https://files.pythonhosted.org/packages/cf/2f/66e209db560881be55a55f7ecbdc026ca3e45971833016401a951be78ab2/langchain_chroma-0.1.1-py3-none-any.whl.metadata
  Downloading langchain_chroma-0.1.1-py3-none-any.whl.metadata (1.3 kB)
Downloading langchain_chroma-0.1.1-py3-none-any.whl (8.5 kB)
Installing collected packages: langchain-chroma
Successfully installed langchain-chroma-0.1.1


Collecting langchain-community
  Obtaining dependency information for langchain-community from https://files.pythonhosted.org/packages/ed/fc/1c3bed764d4e3575b0d81a8700d429bca4b89ea18ac756fef771fc91f0bf/langchain_community-0.2.4-py3-none-any.whl.metadata
  Downloading langchain_community-0.2.4-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain<0.3.0,>=0.2.0 (from langchain-community)
  Obtaining dependency information for langchain<0.3.0,>=0.2.0 from https://files.pythonhosted.org/packages/cb/e7/2556005908c42be3ab384e289729a21766c761a21186f2f433aa475e859f/langchain-0.2.3-py3-none-any.whl.metadata
  Downloading langchain-0.2.3-py3-none-any.whl.metadata (6.9 kB)
Collecting langchain-core<0.3.0,>=0.2.0 (from langchain-community)
  Obtaining dependency information for langchain-core<0.3.0,>=0.2.0 from https://files.pythonhosted.org/packages/18/d6/6eb8bf9b340b8827874a9c065d195af66e3287be0832a7c7143f30747c6e/langchain_core-0.2.5-py3-none-any.whl.metadata
  Downloading langchain_core-0.2.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-openai 0.1.3 requires langchain-core<0.2.0,>=0.1.42, but you have langchain-core 0.2.5 which is incompatible.


In [41]:
from langchain_chroma import Chroma
from langchain_community.example_selectors import NGramOverlapExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

In [63]:
example_prompt = PromptTemplate(
    input_variables=["input", "verb"],
    template="Input: {input}\nOutput: {verb}",
)

# Examples of verbs
examples = [
    {"input": "Ronaldo scores a magnificent goal", "verb": "scores"},
    {"input": "The striker scores a hat-trick in the championship game.", "verb": "scores"},
    {"input": "The team captain scores a crucial penalty.", "verb": "scores"},
    {"input": "Messi scores a stunning free kick", "verb": "scores"},
    {"input": "Dog barks at the strangers", "verb": "barks"},
    {"input": "The lion roars at the approaching herd.", "verb": "roars"},
    {"input": "The puppy yelps at the unknown noises", "verb": "yelps"},
    {"input": "Usain runs the fastest", "verb": "runs"},
    {"input": "Michael jumps the highest in the athletics meet", "verb": "jumps"},
    {"input": "Sarah cycles the longest distance in the race", "verb": "cycles"},
]


In [64]:
example_selector = NGramOverlapExampleSelector(examples=examples,example_prompt=example_prompt,threshold=-1.0)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the verb of every input sentence",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

In [65]:
print(dynamic_prompt.format(sentence="Spot can run fast."))

Give the verb of every input sentence

Input: Ronaldo scores a magnificent goal
Output: scores

Input: The striker scores a hat-trick in the championship game.
Output: scores

Input: The team captain scores a crucial penalty.
Output: scores

Input: Messi scores a stunning free kick
Output: scores

Input: Dog barks at the strangers
Output: barks

Input: The lion roars at the approaching herd.
Output: roars

Input: The puppy yelps at the unknown noises
Output: yelps

Input: Usain runs the fastest
Output: runs

Input: Michael jumps the highest in the athletics meet
Output: jumps

Input: Sarah cycles the longest distance in the race
Output: cycles

Input: Spot can run fast.
Output:


#### 6. Sematic Similarity Selector

In [66]:
from langchain_chroma import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_community.example_selectors import NGramOverlapExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_openai import AzureOpenAIEmbeddings

In [67]:
# initializing the embeddin model
embeddings = AzureOpenAIEmbeddings(
    model="ADA_RAG_DONO_DEMO",
    api_key="35e177acfb054c28a1058071beb6e609",
    api_version="2024-02-01",
    azure_endpoint="https://dono-rag-demo-resource-instance.openai.azure.com"
)

In [68]:
example_prompt = PromptTemplate(
    input_variables=["input", "verb"],
    template="Input: {input}\nOutput: {verb}",
)

# Examples of verbs
examples = [
    {"input": "Ronaldo scores a magnificent goal", "verb": "scores"},
    {"input": "The striker scores a hat-trick in the championship game.", "verb": "scores"},
    {"input": "The team captain scores a crucial penalty.", "verb": "scores"},
    {"input": "Messi scores a stunning free kick", "verb": "scores"},
    {"input": "Dog barks at the strangers", "verb": "barks"},
    {"input": "The lion roars at the approaching herd.", "verb": "roars"},
    {"input": "The puppy yelps at the unknown noises", "verb": "yelps"},
    {"input": "Usain runs the fastest", "verb": "runs"},
    {"input": "Michael jumps the highest in the athletics meet", "verb": "jumps"},
    {"input": "Sarah cycles the longest distance in the race", "verb": "cycles"},

]


In [69]:
example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    Chroma,
    k=2,
)

similar_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the subject of every input sentence",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

In [70]:
# Result
print(similar_prompt.format(sentence="The match winning goal was scored by Leo Messi"))


Give the subject of every input sentence

Input: Messi scores a stunning free kick
Output: scores

Input: Ronaldo scores a magnificent goal
Output: scores

Input: The match winning goal was scored by Leo Messi
Output:
