# Day 4 Huggingface API & Langchain

## 1. Langchain 

In [1]:
!pip install langchain
!pip install openai

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [2]:
import dotenv
import langchain
from langchain.llms import OpenAI
dotenv.load_dotenv()

import os

secret_key = os.getenv("OPENAI_API_KEY")

In [3]:
client = OpenAI(openai_api_key=secret_key)

  warn_deprecated(


In [7]:
# Zero shot prompting

prompt = "List me 10 largest cities in the United States of America"

In [9]:
client.predict(prompt).strip().split("\n")

['1. New York City, New York',
 '2. Los Angeles, California',
 '3. Chicago, Illinois',
 '4. Houston, Texas',
 '5. Phoenix, Arizona',
 '6. Philadelphia, Pennsylvania',
 '7. San Antonio, Texas',
 '8. San Diego, California',
 '9. Dallas, Texas',
 '10. San Jose, California']

## 2. Prompt Templates

In [4]:
from langchain.prompts import PromptTemplate

In [11]:
prompt_template_name = PromptTemplate(
    input_variables=["country"],
    template="Can you list me the 10 largest cities in {country}"
)

In [12]:
prompt_template_name.format(country="France")

'Can you list me the 10 largest cities in France'

In [13]:
prompt_template_name.format(country="China")

'Can you list me the 10 largest cities in China'

Create a Flask or a Streamlit application that prompts users for a country name and retrieves specific information based on that country. Instead of requiring users to provide a complete prompt, the application utilizes a PromptTemplate to construct the necessary prompt with just a simple variable provided by the user. This functionality cannot be achieved directly through the OpenAI API; it requires integration with Langchain to facilitate the prompt construction and information retrieval process.

In [14]:
client.predict(prompt_template_name.format(country="China")).strip().split("\n")

['1. Shanghai',
 '2. Beijing',
 '3. Guangzhou',
 '4. Shenzhen',
 '5. Chongqing',
 '6. Tianjin',
 '7. Chengdu',
 '8. Nanjing',
 '9. Wuhan',
 "10. Xi'an"]

## 3. Agents

In [18]:
prompt2 = "Whats the current GDP of the United States of America and tell me which year you are referring to?"

In [19]:
client.predict(prompt2).strip().split("\n")

['As of 2021, the current GDP of the United States of America is approximately $22.68 trillion. This figure is based on the latest data from the World Bank.']

As we can see I assume its using GPT 3.5 which the cutoff is 2021 

**I will extract real time information using Serp API, by using Google search engine I will extract the information in real time**

In [5]:
!pip install google-search-results

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [6]:
# https://serpapi.com/
# Free Plan Limited to 100 Searches
serpapi_key = os.environ.get("SERPAPI_API_KEY")

In [13]:
from langchain.agents import AgentType
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI

In [8]:
client = OpenAI(openai_api_key=secret_key)

In [9]:
tool = load_tools(["serpapi"], serpapi_key=serpapi_key, llm=client)

In [11]:
agent = initialize_agent(tool, client, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

  warn_deprecated(


In [12]:
agent.run("Whats the current GDP of the United States of America and tell me which year you are referring to?")

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should always think about what to do
Action: Search
Action Input: "US GDP by year"[0m
Observation: [36;1m[1;3m["The U.S. is a country of 50 states covering a vast swath of North America, with Alaska in the northwest and Hawaii extending the nation’s presence into the Pacific Ocean. Major Atlantic Coast cities are New York, a global finance and culture center, and capital Washington, DC. Midwestern metropolis Chicago is known for influential architecture and on the west coast, Los Angeles' Hollywood is famed for filmmaking. ― Google", 'United States type: Country in North America.', 'United States entity_type: countries, locations, travel, competitions, military, travel.', 'United States kgmid: /m/09c7w0.', 'United States place_id: ChIJCzYy5IS16lQRQrfeQ5K5Oxw.', 'Gross domestic product (GDP), the featured measure of U.S. output, is the market value of the goods and services produced by labor and property located in the .

'The current GDP of the United States in 2023 is $27.36 trillion.'

In [13]:
agent.run("Tell me the 5 top current affairs in the United States of America")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m You should always think about what to do
Action: Search
Action Input: "Top current affairs United States of America"[0m
Observation: [36;1m[1;3m['Journalist Mina Al-Oraibi on how the Arab world perceives White House policy on Israel, Hamas, and the Gaza humanitarian crisis.', 'Get the latest news about politics, the economy, health care, the environment, education, Congress and state, local and federal government.', 'Reuters.com is your online source for the latest US news stories and current events, ensuring our readers up to date with any breaking news developments.', 'View the latest national news from The Associated Press. Stay tuned in to all of the current events within the U.S. borders and breaking U.S. news stories.', 'USA TODAY delivers current national and local news, sports, entertainment, finance, technology, and more through award-winning journalism, photos, ...', 'Get the latest news headlines and top storie

"The top 5 current affairs in the United States of America are the Arab world's perception of White House policy on Israel, Hamas, and the Gaza humanitarian crisis, updates on politics, the economy, healthcare, environment, and education, breaking news developments from Reuters and The Associated Press, national news from USA TODAY, and current news headlines from NBC News."

**What we used was the Google Search Engine now lets try Wikipedia:**

In [12]:
!pip install wikipedia

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [15]:
tool = load_tools(["wikipedia"], llm=client)

In [18]:
agent = initialize_agent(tool, client, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

In [19]:
agent.run("Find me an overview of the YOLO You Only Look Once algorithm")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should read about YOLO You Only Look Once on Wikipedia
Action: wikipedia
Action Input: YOLO You Only Look Once[0m



  lis = BeautifulSoup(html).find_all('li')



Observation: [36;1m[1;3mPage: Timeline of algorithms
Summary: The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception.

Page: You Only Live Once (book)
Summary: You Only Live Once: The Roadmap to Financial Wellness and a Purposeful Life is a 2016 book written by Jason Vitug. The book sets to redefine the YOLO mantra that has been used to define Generation Y or millennials, to set a mindset shift to cultivate a healthy and wealthy lifestyle for a lifetime. The book focuses on a three step process called ACT, an acronym for awareness, creating a plan, and taking control steps, outlined by the author. The books aim is to make readers define the life they want to live before setting financial goals. You Only Live Once: The Roadmap to Financial Wellness and a Purposeful Life is written in a simple conversation tone based on Vitug's financial experiences and conversations with others[0m
Thought:[32;1m[1;3m I shou



  lis = BeautifulSoup(html).find_all('li')



Observation: [36;1m[1;3mPage: Timeline of algorithms
Summary: The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception.

Page: Small object detection
Summary: Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

[0m
Thought:[32;1m[1;3m I should read about the YOLO You Only Look Once algorithm on Wikipedia
Action: wikipedia
Action Input: YOLO You Only Look Once (algorithm)[0m



  lis = BeautifulSoup(html).find_all('li')



Observation: [36;1m[1;3mPage: Timeline of algorithms
Summary: The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception.

Page: Small object detection
Summary: Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: YOLO You Only Look Once (algorithm) is a real-time object detection algorithm that uses a single convolutional neural network to detect objects in images and videos. It has been widely used in the computer vision community and is known for its speed and accuracy.[0m

[1m> Finished chain.[0m


'YOLO You Only Look Once (algorithm) is a real-time object detection algorithm that uses a single convolutional neural network to detect objects in images and videos. It has been widely used in the computer vision community and is known for its speed and accuracy.'

In [20]:
agent.run("Give me a education overview of the current president of the United States of America")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m The best way to answer this question is to use Wikipedia to get accurate information about the current president's education.
Action: wikipedia
Action Input: Joe Biden[0m
Observation: [36;1m[1;3mPage: Joe Biden
Summary: Joseph Robinette Biden Jr. ( , BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States. A member of the Democratic Party, he previously served as the 47th vice president from 2009 to 2017 under President Barack Obama and represented Delaware in the United States Senate from 1973 to 2009.
Born in Scranton, Pennsylvania, Biden moved with his family to Delaware in 1953. He graduated from the University of Delaware before earning his law degree from Syracuse University. He was elected to the New Castle County Council in 1970 and to the U.S. Senate in 1972. As a senator, Biden drafted and led the effort to pass the Violent Crime Control and Law Enforc

'Joe Biden attended the University of Delaware and Syracuse University for law school.'

## 4. Chains

**Definition:** <br>
Central to LangChain is a vital component known as LangChain Chains, forming the core connection among one or several Large Language Models (LLMs). In certain sophisticated applications, it becomes necessart to chain LLMs together, either to enhance the quality of the generated text or to enable the generation of text that is beyond the capability of a single LLM.

In [21]:
client

OpenAI(client=<openai.resources.completions.Completions object at 0x00000297D2AF0D60>, async_client=<openai.resources.completions.AsyncCompletions object at 0x00000297D2B04FD0>, openai_api_key='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', openai_proxy='')

#### Example using Chains

In [10]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}")

In [11]:
from langchain.chains import LLMChain

In [28]:
chain = LLMChain(llm=client, prompt=prompt)

In [31]:
chain.run("AI SaaS for White Papers")

'\n\n"IntelliWrite Solutions" '

#### Example using Chains 2

In [34]:
prompt_template = PromptTemplate(
    input_variables=["service"],
    template="I want to open an AI as a Service company that offers this {service}. What should I name it?"
)

In [35]:
prompt_template

PromptTemplate(input_variables=['service'], template='I want to open an AI as a Service company that offers this {service}. What should I name it?')

In [36]:
chain2 = LLMChain(llm=client, prompt=prompt_template)

In [38]:
chain2.run("White Papers").strip().split("\n")

['1. "CortexAI Solutions"',
 '2. "MindHive AI Services"',
 '3. "NeuralNet Innovations"',
 '4. "IntelliServe AI"',
 '5. "Cognitive Cloud Co."',
 '6. "AI Nexus Services"',
 '7. "SmartBrain Solutions"',
 '8. "DeepMind Industries"',
 '9. "InnovAIte Services"',
 '10. "VirtualBrain Co."']

In [39]:
chain3 = LLMChain(llm=client, prompt=prompt_template, verbose=True)

In [42]:
chain3.run("Legal Documents")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI want to open an AI as a Service company that offers this Legal Documents. What should I name it?[0m

[1m> Finished chain.[0m


'\n\n1. "AI LegalDocs"\n2. "DocuAI"\n3. "LegalLeverage AI"\n4. "SmartDocs Solutions"\n5. "AI Law Co."\n6. "LegalMind AI"\n7. "DocuTech AI"\n8. "AI LegalEase"\n9. "LegalGenius AI"\n10. "SmartLegal Docs"'

**If you want to combine multiple chain and set a sequence for that we use a simple sequential chain**

In [43]:
prompt_template_name2 = PromptTemplate(
    input_variables=["service"],
    template="I want to open an AI as a platform as a service company that offers this {service}. What should I name it?"
)

name_chain = LLMChain(llm=client, prompt=prompt_template_name2)

prompt_template_items= PromptTemplate(
    input_variables=["name"],
    template="Suggest some strategies for {name}"
)

strategies_chain = LLMChain(llm=client, prompt=prompt_template_items)

In [45]:
# Combining both chains
from langchain.chains import SimpleSequentialChain

In [47]:
chain=SimpleSequentialChain(chains = [name_chain, strategies_chain])

In [49]:
print(chain.run("Legal AI"))

 

As for strategies, here are a few suggestions:

1. Define your target audience: Identify the specific segment of the legal industry that you want to cater to, such as law firms, corporate legal departments, or individual lawyers. This will help you tailor your marketing and sales efforts accordingly.

2. Highlight the benefits: Showcase how your platform can make legal processes more efficient, accurate, and cost-effective for your potential clients. This could include features such as document review, contract analysis, or legal research.

3. Partner with established legal organizations: Collaborate with bar associations, legal technology companies, or law schools to gain credibility and reach a wider audience.

4. Offer free trials and demos: Allow potential clients to test out your platform for a certain period or offer live demonstrations to showcase its capabilities. This can help build trust and increase conversions.

5. Utilize social media and content marketing: Use social m

**Draw Back:** <br>
Gave us an answer from the second prompt and not both prompts

#### Understanding the Sequential Chain

In [54]:
prompt_template_name = PromptTemplate(
    input_variables=["service"],
    template="I want to open an AI as a platform as a service company that offers this {service}. What should I name it?"
)

name_chain = LLMChain(llm=client, prompt=prompt_template_name, output_key="name of the company")

In [55]:
prompt_template_items = PromptTemplate(
    input_variables=["name of the company"],
    template="Suggest some services for {name of the company}"
)

items_chain = LLMChain(llm=client, prompt=prompt_template_items, output_key="elevator pitch")

In [57]:
from langchain.chains import SimpleSequentialChain

In [59]:
chain = SimpleSequentialChain(chains=[name_chain, items_chain])


In [81]:
chain.run("Input").strip().split("\n")

['1. AI development and consulting services',
 '2. AI incubator program for startups and entrepreneurs',
 '3. AI strategy and implementation consulting',
 '4. AI software development and customization',
 '5. AI training and education programs',
 '6. AI data management and analytics services',
 '7. AI integration and deployment services',
 '8. AI networking and collaboration platform',
 '9. AI marketing and branding services',
 '10. AI ethical and responsible AI consulting and training.']

## 5. Document Loader 

Reference: https://python.langchain.com/docs/modules/data_connection/document_loaders/

Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

Document loaders provide a "load" method for loading data as documents from a configured source. They optionally implement a "lazy load" as well for lazily loading data into memory.

In [14]:
!pip install pypdf

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pypdf
  Downloading pypdf-4.1.0-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.1.0-py3-none-any.whl (286 kB)
   ---------------------------------------- 0.0/286.1 kB ? eta -:--:--
   ----- --------------------------------- 41.0/286.1 kB 653.6 kB/s eta 0:00:01
   -------------------- ------------------- 143.4/286.1 kB 1.4 MB/s eta 0:00:01
   -------------------- ------------------- 143.4/286.1 kB 1.4 MB/s eta 0:00:01
   -------------------- ------------------- 143.4/286.1 kB 1.4 MB/s eta 0:00:01
   -------------------- ------------------- 143.4/286.1 kB 1.4 MB/s eta 0:00:01
   -------------------- ------------------- 143.4/286.1 kB 1.4 MB/s eta 0:00:01
   -------------------- ----------------- 153.6/286.1 kB 459.5 kB/s eta 0:00:01
   -------------------- ----------------- 153.6/286.1 kB 459.5 kB/s eta 0:00:01
   -------------------- ----------------- 153.6/286.1 kB 459.5 kB/s eta 0:00:01
   -----

In [17]:
from langchain.document_loaders import PyPDFLoader

In [22]:
loader = PyPDFLoader(r"C:\Users\ganad\Desktop\Learning with Projects\GenerativeAI_FreeCodeCamp-1\𝐃𝐀𝐘_𝟒_𝐇𝐮𝐠𝐠𝐢𝐧𝐠𝐅𝐚𝐜𝐞_𝐀𝐏𝐈_&_𝐋𝐚𝐧𝐠𝐜𝐡𝐚𝐢𝐧\EEP567_FFMMMTI_Final_Report (1).pdf")

In [23]:
loader

<langchain_community.document_loaders.pdf.PyPDFLoader at 0x1f0245083a0>

In [26]:
pages = loader.load_and_split()

In [27]:
pages

[Document(page_content='1 Friend or Foe: Multi -Modal Military Target Identification  \nAndrew Jeon • Bassam Halabiya • Naif Ganadily • Zachary Saunders  \nUniversity of Washington  \nEE P 567: Machine Learning for Cybersecurity  \nProfessor Radha Poovendran  • Lead  TA Qifan Lu  \nRoles and Responsibilities  \nWe constructed our custom dataset together in equal parts, taking images mainly from Google , following \nthe split below to create our initial 1100 image dataset.  \n● Andrew - US/RU Navy (250 Images) + Landscape/Urban “Null” (25 images)  \n● Bassam - US/RU Marines (250 Images) + Paintball “Null” (25 images)  \n● Naif - US/RU Army (250 Images) + Desert “Null” (25 images)  \n● Zachary - US/RU Airforce (250 Images) + Civilians “Null” (25 images)  \n \nBassam took a leadership role in dataset construction , leveraging his expertise with Roboflow and dataset \ncreation. He oversaw the quality control process for our dataset, meticulously providing feedback for \nrefinement upon sub