# LLM App with Langchain

* [Datacamp Tutorial](https://www.datacamp.com/tutorial/how-to-build-llm-applications-with-langchain)

## What are Large Language Models (LLMs)?

* AI systems designed to understand and generate human-like text.
* LLM Models are trained on large amounts of text data, enabling them to:
    * Understand the context of a given text.
    * Generate text that is coherent and relevant to the context.
    * Perform language-related tasks such as: 
        * translation
        * summarization
        * question answering
        * text completion

## What is Langchain?

* Langchain is a Python library that makes it easy to use LLMs.
* It comes with a collection of APIs that allow you to:
    * generate text
    * perform language-related tasks
    * fine-tune LLMs on your own data
* Langchain is built on top of the [Hugging Face Transformers library](https://huggingface.co/transformers/).
* Langchain can be used to:
    * tailor prompts
    * construct chain link models
    * fine-tune LLMs on your own data
    * integrate models for use in production (eg: GPT, Hugging Face, etc.)
    * manipulate context for precision and recall

Read More: [Introduction to LangChain for Data Engineering & Data Applications](https://www.datacamp.com/tutorial/introduction-to-lanchain-for-data-engineering-and-data-applications)

## Key Components of LangChain

### **Components and Chains**
* **Components** = building blocks of Langchain.
    * these modules perform specific functions in the language processing pipeline
    * they can be combined to create **chains** for tailored workflows and apps:
        * eg: sentiment analysis, intent recognition, text generation, etc.
* **Chains** = collection of components that are used to perform a specific task.
* **Chain Links** = individual components that make up a chain.

### **Prompt Templates**
* a collection of prompts that are used to generate text.
    * they are reusable predefined prompts across chains
    * these templates can become dynamic and adaptable by inserting specific "values"
        * eg: a prompt asking for a user's name could be personalized by inserting a specific value
    * this feature is beneficial for generating prompts based on dynamic resources
* are used to:
    * generate text
    * perform language-related tasks
    * fine-tune LLMs on your own data
    * tailor prompts
    * construct chain link models
    * fine-tune LLMs on your own data
    * integrate models for use in production (eg: GPT, Hugging Face, etc.)
    * manipulate context for precision and recall

### **Vector Stores**
* essentially analyze numerical representations of document meanings
* serves as a storage facility for these embeddings, allowing efficient search based on semantic similarity.
* used to:
    * store and search information via embeddings
    * analyze numerical representations of document meanings
    * serve as a storage facility for these embeddings
    * allow efficient search based on semantic similarity

### **Indexes and retrievers**
* **Indexes** act as databases storing details and metadata about the model's training data
* **Retrievers** swiftly search this index for specific information
* **Indexes and retrievers** are used to:
    * store details and metadata about the model's training data
    * swiftly search this index for specific information
    * improve the model's responses by providing context and related information

### **Output Parsers**
* used to:
    * manage and refine the responses generated by the model
    * eliminate undesired content
    * tailor the output format
    * supplement extra data to the response
    * extract structured results, like JSON objects, from the language model's responses

### **Example selectors**
* used to:
    * identify appropriate instances from the model's training data
    * improve the precision and pertinence of the generated responses
    * adjust to favor certain types of examples or filter out unrelated ones
    * provide a tailored AI response based on user input

### **Agents**
* are unique LangChain instances, each with specific prompts, memory, and chain for a particular use case.
* can be deployed on various platforms, including web, mobile, and chatbots, catering to a wide audience.



## Setup Langchain in Python

1. Activate the environment: 
    - Conda: `conda activate envpy39`
    - venv: `source venv/bin/activate`
    - VS Code: `Ctrl+Shift+P` > `Python: Select Interpreter` > `Python 3.9.7 64-bit ('envpy39': conda)`


2. Install required packages:

In [1]:
%pip install langchain -q
# or
# install langchain -c conda-forge

Note: you may need to restart the kernel to use updated packages.


3. Setup the key as an environment variable:
    - Create a file called: `.env` 
    - Get OpenAI Token
    - Set the variable in the .env file: `OPENAI_API_KEY="..."`

4. Setup the key in the relevant class:

In [2]:
import os

API_KEY = os.getenv("OPENAI_API_KEY")


## Build an LLM Powered App with Langchain

In [3]:
# Using OpenAI API

from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001", openai_api_key=API_KEY)

print(llm("Tell me a joke about data scientist"))



What data scientist did the data scientist?

The data scientist did the data scientist?

Because the data scientist knew what he was doing, he was able to churn out data that featsed almost every analysis imaginable.


In [None]:
# Using HuggingFace API
# retrieve a token and set the token value in .env file

from langchain.llms import HuggingFaceHub

llm = HuggingFaceHub(repo_id = "google/flan-t5-xl", huggingfacehub_api_token = API_KEY)

print(llm("Tell me a joke about data scientist"))

In [12]:
# Experiment with multiple prompts

llm_response = llm.generate(['Tell me a joke about data scientist',

'Tell me a joke about recruiter',

'Tell me a joke about psychologist'])


[Generation(text='\n\nWhat data scientist did the data scientist?\n\nThe data scientist did the data scientist?\n\nBecause the data scientist knew how to use data to make sense of it all, they were also able to make sense of it all and come up with something better.', generation_info={'finish_reason': 'stop', 'logprobs': None})]


In [14]:
print(llm_response)

generations=[[Generation(text='\n\nWhat data scientist did the data scientist?\n\nThe data scientist did the data scientist?\n\nBecause the data scientist knew how to use data to make sense of it all, they were also able to make sense of it all and come up with something better.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nWhy did the recruiter find a busy person?\n\nBecause they were too busy to ask for help.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nWhy did the psychologist say that he never needed a new approach to his work?\n\nBecause he had already found several old ones that worked well.', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'completion_tokens': 111, 'prompt_tokens': 20, 'total_tokens': 131}, 'model_name': 'text-ada-001'}


In [15]:
print(llm_response.generations)

[[Generation(text='\n\nWhat data scientist did the data scientist?\n\nThe data scientist did the data scientist?\n\nBecause the data scientist knew how to use data to make sense of it all, they were also able to make sense of it all and come up with something better.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nWhy did the recruiter find a busy person?\n\nBecause they were too busy to ask for help.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nWhy did the psychologist say that he never needed a new approach to his work?\n\nBecause he had already found several old ones that worked well.', generation_info={'finish_reason': 'stop', 'logprobs': None})]]


In [16]:
print(llm_response.generations[0])

[Generation(text='\n\nWhat data scientist did the data scientist?\n\nThe data scientist did the data scientist?\n\nBecause the data scientist knew how to use data to make sense of it all, they were also able to make sense of it all and come up with something better.', generation_info={'finish_reason': 'stop', 'logprobs': None})]


## Managing Prompt Templates for LLMs in LangChain

In [17]:
USER_INPUT = 'Paris'

from langchain.llms import OpenAI

from langchain import PromptTemplate

llm = OpenAI(model_name="text-davinci-003", openai_api_key=API_KEY)

template = """ I am travelling to {location}. What are the top 3 things I can do while I am there. Be very specific and respond as three bullet points """

prompt = PromptTemplate(

input_variables=["location"],

template=template,

)

final_prompt = prompt.format(location=USER_INPUT )

print(f"LLM Output: {llm(final_prompt)}")

LLM Output: 

1. Visit the Louvre Museum and see the Mona Lisa and other famous artworks
2. Climb the Eiffel Tower and take in the incredible views of Paris
3. Enjoy a leisurely stroll along the Champs-Élysées and admire the iconic architecture


In [29]:

# prompt for user input

from langchain.llms import OpenAI

from langchain import PromptTemplate

llm = OpenAI(model_name="text-davinci-003", openai_api_key=API_KEY)

template = """ I am travelling to {location}. What are the top 3 things I can do while I am there. Be very specific and respond as three bullet points """

prompt = PromptTemplate(

input_variables=["location"],

template=template,

)

final_prompt = prompt.format(location=input())

print(f"LLM Output: {llm(final_prompt)}")



LLM Output: 

1. Visit the Great Barrier Reef; take a boat tour and snorkel or scuba dive to explore the incredible marine life. 

2. Go on a road trip along the Great Ocean Road; witness the stunning coastline, rainforest, and wildlife. 

3. Take a walk to the top of Sydney Harbour Bridge; enjoy the incredible views of the harbour and the city skyline.


## Combining LLMs and Prompts in Multi-Step Workflows

* **Multi-step workflows** = a series of steps that are executed in a specific order to achieve a particular goal.
    * examples include:
        * Sequentially combining multiple LLMs by using the output of the first LLM as input for the second LLM (refer to this section)
        * Integrating LLMs with prompt templates
        * Merging LLMs with external data, such as for question answering
        * Incorporating LLMs with long-term memory, like chat history

* EXAMPLE - we create a chain with two components:
    * The first component is responsible for identifying the most popular city corresponding to a particular country as input by the user. 
    * In contrast, the second component focuses on providing information about the top three activities or attractions available for tourists visiting that specific city.

In [30]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain, SimpleSequentialChain
from langchain import PromptTemplate

llm = OpenAI(
    model_name="text-davinci-003", 
    openai_api_key=API_KEY
    )

# first step in chain

template = "What is the most popular city in {country} for tourists? Just return the name of the city"

first_prompt = PromptTemplate(
    input_variables=["country"],
    template=template
    )

chain_one = LLMChain(
    llm = llm, 
    prompt = first_prompt
    )

# second step in chain

second_prompt = PromptTemplate(
    input_variables=["city"],
    template="What are the top three things to do in this: {city} for tourists. Just return the answer as three bullet points.",
    )

chain_two = LLMChain(
    llm=llm, 
    prompt=second_prompt
    )

# Combine the first and the second chain

overall_chain = SimpleSequentialChain(
    chains=[
        chain_one, 
        chain_two
        ], 
    verbose=True
    )

final_answer = overall_chain.run("Canada")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

Toronto[0m
[33;1m[1;3m 

• Explore the CN Tower 
• Take a stroll through the Distillery District 
• Check out the Royal Ontario Museum[0m

[1m> Finished chain.[0m
