# Langchain: Tools & Chains

In the context of LangChain, a **tool** is a **software application or library** that provides **a specific functionality or set of functionalities to perform a particular task or set of tasks**. In the context of LangChain, **tools are designed to make it easier to work with large language models and NLP tasks**.

**Python REPL**: A Python Read-Eval-Print Loop (REPL) that allows you to interactively experiment with language models and NLP tasks. The REPL provides a simple way to test and explore the capabilities of LangChain.

**SerpAPI**: A tool for scraping and processing web data, allowing you to extract and analyze large amounts of data from the web. SerpAPI is a powerful tool for web scraping and data extraction.

**Wolfram Alpha**: A computational knowledge engine that provides a wide range of computational and mathematical functions, allowing you to perform complex calculations and simulations. Wolfram Alpha is a powerful tool for data analysis and visualization.

**Transformers**: A popular open-source library for natural language processing (NLP) tasks, providing pre-trained models and a simple interface for using them.

**Trainer**: A tool for training and fine-tuning language models, allowing you to customize the training process and experiment with different hyperparameters.

**Tokenizer**: A library for tokenizing text data, providing a range of tokenization strategies and pre-trained models for specific languages.
Dataset: A library for working with datasets, providing tools for loading, processing, and manipulating datasets for NLP tasks.

**Model Hub**: A repository of pre-trained models, allowing you to easily access and use a wide range of language models and NLP models.

# Chains

In **LangChain**, a **chain** is an **end-to-end wrapper around multiple individual components, providing a way to accomplish a common use case by combining these components in a specific sequence**. The most commonly used type of chain is the **LLMChain**, which consists of a **PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser**.

# Components of LLMChain(Generic Chain):

**PromptTemplate**: This is a template that helps format input variables into a structured prompt. It provides a way to organize and structure the information before passing it to the model.

**Model (LLM or ChatModel)**: The model, either a **Large Language Model (LLM) or a ChatModel**, is the core component that processes the formatted prompt and generates an output based on its training data.

**Optional Output Parser**: This component is not always required. If provided, it takes the output of the LLM model and further processes it into a final format. This step can be useful for refining or structuring the model's output.

# How LLMChain Works:

**Input Variables**: The LLMChain takes one or more input variables, which could be anything relevant to the task at hand. In this example, it might be details about the company, such as its focus on eco-friendly water bottles.

**PromptTemplate Formatting**: The input variables are passed through the PromptTemplate, which structures them into a formatted prompt. This step ensures that the information is presented in a way that the model can understand and process.

**Model Processing**: The formatted prompt is then given to the model (LLM or ChatModel). The model uses its training data and algorithms to generate an output based on the input prompt.

**Optional Output Parsing**: If an output parser is provided, it takes the output from the model and further refines it into a final format. This step can involve cleaning up the output, extracting specific information, or organizing it in a particular way.

**Example: Generating a Company Name**:

Let's say we want to create a company that produces eco-friendly water bottles.

We would define a PromptTemplate that includes placeholders for input variables, such as "{CompanyType}", "{Product}", etc.

Set the input variables with values relevant to our company, like "EcoTech" for {CompanyType} and "Water Bottles" for {Product}.

The PromptTemplate formats these variables into a prompt, like "Create a name for a company that produces {Product} for {CompanyType}." This structured prompt is then passed to the model.

The model generates an output based on the prompt, perhaps suggesting names like "EcoAqua" or "GreenWave Bottles."

If an output parser is provided, it might further refine the output into a final format, such as selecting the most creative or catchy name from the model's suggestions.

In [1]:
!pip -q install openai==0.28 langchain==0.0.208 huggingface_hub

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m805.5 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m30.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os

os.environ['OPENAI_API_KEY'] = 'your_key'
os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'your_token'

In the context of LangChain, **a chain refers to a sequence of operations or tasks that are performed on a piece of text or data. Chains can be used to automate complex workflows, perform multiple tasks on a single piece of text, or to create custom workflows for specific use cases**.

Here are some categories of chains with examples:

# Generic Chains

**Text Processing Chains**: These chains perform basic text processing tasks such as tokenization, stemming, and lemmatization.

* Example: A chain that performs the following tasks:

  *Tokenization* : Breaks down the text into individual words or characters.
  Stemming: Reduces words to their base form.

  *Lemmatization* : Reduces words to their base form using a dictionary.

  Use case: A chatbot that needs to process and analyze user input.

# Utility Chains

**Data Processing Chains**: These chains perform data processing tasks such as filtering, sorting, and aggregating data.

  * Example: A chain that performs the following tasks:

  Filtering: Filters out data based on specific criteria.

  Sorting: Sorts data based on specific criteria.

  Aggregating: Aggregates data based on specific criteria.

  Use case: A company that needs to process and analyze large datasets.

# Asynchronous Chains

**API Integration Chains**: These chains integrate with external APIs to perform tasks such as data retrieval and processing.
  * Example: A chain that performs the following tasks:

  API call: Makes an API call to retrieve data.

  Data processing: Processes the retrieved data.

  Data storage: Stores the processed data.

  Use case: A company that needs to integrate with external APIs to retrieve and process data.


# Machine Learning Chains:
These chains perform machine learning tasks such as training and testing models.

  * Example: A chain that performs the following tasks:

  Data preparation: Prepares data for machine learning.

  Model training: Trains a machine learning model.

  Model testing: Tests the trained model.

  Use case: A company that needs to train and test machine learning models.



## Basic LLMChain - Fact Extraction

In [12]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

In [19]:
llm = OpenAI(model_name='gpt-3.5-turbo',
             temperature=0,
             max_tokens = 256)



In [20]:
article = '''Coinbase, the second-largest crypto exchange by trading volume, released its Q4 2022 earnings on Tuesday, giving shareholders and market players alike an updated look into its financials. In response to the report, the company's shares are down modestly in early after-hours trading.In the fourth quarter of 2022, Coinbase generated $605 million in total revenue, down sharply from $2.49 billion in the year-ago quarter. Coinbase's top line was not enough to cover its expenses: The company lost $557 million in the three-month period on a GAAP basis (net income) worth -$2.46 per share, and an adjusted EBITDA deficit of $124 million.Wall Street expected Coinbase to report $581.2 million in revenue and earnings per share of -$2.44 with adjusted EBITDA of -$201.8 million driven by 8.4 million monthly transaction users (MTUs), according to data provided by Yahoo Finance.Before its Q4 earnings were released, Coinbase's stock had risen 86% year-to-date. Even with that rally, the value of Coinbase when measured on a per-share basis is still down significantly from its 52-week high of $206.79.That Coinbase beat revenue expectations is notable in that it came with declines in trading volume; Coinbase historically generated the bulk of its revenues from trading fees, making Q4 2022 notable. Consumer trading volumes fell from $26 billion in the third quarter of last year to $20 billion in Q4, while institutional volumes across the same timeframe fell from $133 billion to $125 billion.The overall crypto market capitalization fell about 64%, or $1.5 trillion during 2022, which resulted in Coinbase's total trading volumes and transaction revenues to fall 50% and 66% year-over-year, respectively, the company reported.As you would expect with declines in trading volume, trading revenue at Coinbase fell in Q4 compared to the third quarter of last year, dipping from $365.9 million to $322.1 million. (TechCrunch is comparing Coinbase's Q4 2022 results to Q3 2022 instead of Q4 2021, as the latter comparison would be less useful given how much the crypto market has changed in the last year; we're all aware that overall crypto activity has fallen from the final months of 2021.)There were bits of good news in the Coinbase report. While Coinbase's trading revenues were less than exuberant, the company's other revenues posted gains. What Coinbase calls its "subscription and services revenue" rose from $210.5 million in Q3 2022 to $282.8 million in Q4 of the same year, a gain of just over 34% in a single quarter.And even as the crypto industry faced a number of catastrophic events, including the Terra/LUNA and FTX collapses to name a few, there was still growth in other areas. The monthly active developers in crypto have more than doubled since 2020 to over 20,000, while major brands like Starbucks, Nike and Adidas have dived into the space alongside social media platforms like Instagram and Reddit.With big players getting into crypto, industry players are hoping this move results in greater adoption both for product use cases and trading volumes. Although there was a lot of movement from traditional retail markets and Web 2.0 businesses, trading volume for both consumer and institutional users fell quarter-over-quarter for Coinbase.Looking forward, it'll be interesting to see if these pieces pick back up and trading interest reemerges in 2023, or if platforms like Coinbase will have to keep looking elsewhere for revenue (like its subscription service) if users continue to shy away from the market.
'''

In [21]:
len(article)

3533

In [22]:
fact_extraction_prompt = PromptTemplate(
    input_variables=["text_input"],
    template="Extract the key facts out of this text. Don't include opinions. Give each fact a number and keep them short sentences. :\n\n {text_input}"
)

In [23]:
fact_extraction_chain = LLMChain(llm=llm, prompt=fact_extraction_prompt)

facts = fact_extraction_chain.run(article)

print(facts)

1. Coinbase released its Q4 2022 earnings, showing $605 million in revenue.
2. Coinbase's stock is down in after-hours trading after reporting a loss of $557 million in Q4.
3. Wall Street expected Coinbase to report $581.2 million in revenue.
4. Coinbase's trading volumes and transaction revenues fell due to a 64% decline in the overall crypto market capitalization.
5. Coinbase's trading revenue fell from $365.9 million to $322.1 million in Q4.
6. Coinbase's "subscription and services revenue" rose from $210.5 million to $282.8 million in Q4.
7. Monthly active developers in crypto have more than doubled since 2020.
8. Major brands like Starbucks, Nike, and Adidas have entered the crypto space.
9. Trading volume for both consumer and institutional users fell quarter-over-quarter for Coinbase.


In [24]:
print(facts)

1. Coinbase released its Q4 2022 earnings, showing $605 million in revenue.
2. Coinbase's stock is down in after-hours trading after reporting a loss of $557 million in Q4.
3. Wall Street expected Coinbase to report $581.2 million in revenue.
4. Coinbase's trading volumes and transaction revenues fell due to a 64% decline in the overall crypto market capitalization.
5. Coinbase's trading revenue fell from $365.9 million to $322.1 million in Q4.
6. Coinbase's "subscription and services revenue" rose from $210.5 million to $282.8 million in Q4.
7. Monthly active developers in crypto have more than doubled since 2020.
8. Major brands like Starbucks, Nike, and Adidas have entered the crypto space.
9. Trading volume for both consumer and institutional users fell quarter-over-quarter for Coinbase.


## Rewrite as a summary from the facts

In [25]:
investor_update_prompt = PromptTemplate(
    input_variables=["facts"],
    template="You are a Goldman Sachs analyst. Take the following list of facts and use them to write a short paragrah for investors. Don't leave out key info:\n\n {facts}"
)

In [26]:
investor_update_chain = LLMChain(llm=llm, prompt=investor_update_prompt)

investor_update = investor_update_chain.run(facts)

print(investor_update)
len(investor_update)

Investors, Coinbase recently released its Q4 2022 earnings, reporting $605 million in revenue, slightly below Wall Street's expectations of $581.2 million. However, the company's stock is down in after-hours trading after revealing a loss of $557 million in Q4. This decline can be attributed to a 64% drop in the overall crypto market capitalization, leading to decreased trading volumes and transaction revenues for Coinbase. Despite this, the company saw growth in its "subscription and services revenue," which rose from $210.5 million to $282.8 million in Q4. It is worth noting that monthly active developers in crypto have more than doubled since 2020, and major brands like Starbucks, Nike, and Adidas have entered the crypto space. While trading volume for both consumer and institutional users fell quarter-over-quarter, Coinbase continues to adapt to the evolving crypto landscape.


892

In [27]:
triples_prompt = PromptTemplate(
    input_variables=["facts"],
    template="Take the following list of facts and turn them into triples for a knowledge graph:\n\n {facts}"
)

In [28]:
triples_chain = LLMChain(llm=llm, prompt=triples_prompt)

triples = triples_chain.run(facts)

print(triples)
len(triples)

1. (Coinbase, hasRevenue, $605 million)
2. (Coinbase, hasStock, down)
3. (Coinbase, expectedRevenue, $581.2 million)
4. (Coinbase, hasDecline, 64%)
5. (Coinbase, tradingRevenue, $322.1 million)
6. (Coinbase, subscriptionRevenue, $282.8 million)
7. (Crypto, hasMonthlyActiveDevelopers, doubled)
8. (Starbucks, enteredCryptoSpace, true)
9. (Coinbase, tradingVolume, fell)


369

## Chaining these together

In LangChain, a **Sequential chain** is a type of chain that allows you to generate text in a sequential manner, where each prompt is generated based on the previous output. Here's a breakdown of how it works:

**Key characteristics:**

1. **Sequential generation**: Each prompt is generated based on the previous output.
2. **Chain-like structure**: The output of each step becomes the input for the next step.
3. **Iterative process**: The model generates text in a sequential manner, with each step building upon the previous one.

**How it works:**

1. **Initial prompt**: You provide an initial prompt or input to the model.
2. **First response**: The model generates a response to the initial prompt.
3. **Next prompt**: The generated response becomes the input for the next prompt.
4. **Next response**: The model generates a response to the previous response.
5. **Repeat**: Steps 3-4 are repeated until a stopping criterion is reached (e.g., a maximum number of steps or a specific condition).

**Use cases:**

1. **Storytelling**: Generate a story by providing a starting prompt and then using the generated text as the input for the next prompt.
2. **Dialogue**: Create a conversation by generating responses to previous responses.
3. **Summarization**: Generate a summary of a text by providing the original text as the initial prompt and then using the generated summary as the input for the next prompt.
4. **Creative writing**: Use sequential chains to generate creative writing, such as poetry or short stories.

**Benefits:**

1. **Coherent text generation**: Sequential chains can generate coherent and context-dependent text.
2. **Iterative refinement**: Each step refines the previous output, allowing for more accurate and relevant text generation.
3. **Flexibility**: Sequential chains can be used for a wide range of applications, from storytelling to summarization.

By leveraging sequential chains in LangChain, you can generate text that is more coherent, context-dependent, and engaging.

In [33]:
from langchain.chains import SimpleSequentialChain, SequentialChain

full_chain = SimpleSequentialChain(chains=[fact_extraction_chain, investor_update_chain], verbose=False)

**Verbose mode**:

When you set the verbose parameter to True in LangChain, the model generates more detailed and elaborate responses. This mode is useful when you want the model to provide more context, explanations, or supporting details in its responses.

**Characteristics of verbose mode**:

**More detailed responses**: The model generates longer and more detailed responses that provide additional context or explanations.

**Increased word count**: Verbose mode typically results in longer responses with a higher word count.

**More elaborate language**: The model uses more complex sentence structures and vocabulary to provide more detailed and nuanced responses.

In [34]:
response = full_chain.run(article)

In [35]:
response

"Investors, the recent release of Coinbase's Q4 2022 earnings has brought mixed results. While the company reported a strong $605 million in revenue, it also posted a net loss of $557 million, disappointing Wall Street expectations. The stock is down in after-hours trading, despite a previous 86% year-to-date increase. Consumer and institutional trading volumes fell in Q4, reflecting the overall decline in the crypto market capitalization by $1.5 trillion in 2022. However, other revenues at Coinbase increased, and the company saw a significant rise in monthly active developers in the crypto space. Major brands like Starbucks and Nike entering the crypto market indicate potential growth opportunities. It is important to note that trading revenue at Coinbase fell in Q4 compared to Q3, highlighting the need for continued monitoring of market trends and diversification strategies."

## PAL Math Chain

#### What is PAL Chain?

PAL Chain, short for Program-Aided Language Models Chain, is a feature of the LangChain library. It is designed to decompose natural language problems into executable steps, generate corresponding code, and then execute this code to yield a result. This process involves a sequence of events, starting with the `**on_chain_start**` callback and ending with either `**on_chain_end**` or `**on_chain_error**`.

PAL Chain requires a base language model (LLM) and a corresponding prompt to parse a user's question written in natural language.** It is particularly useful for reading complex math problems described in natural language, generating programs for solving these problems as intermediate reasoning steps, and then offloading the solution step to a runtime such as a Python interpreter**.

#### Key Features of PAL Chain

One of the key features of PAL Chain is its ability to perform validations on the generated code using `**PALValidation**`. It also uses an expression, typically `**print(solution())**`, to get the answer from the generated code.

PAL Chain also has a memory object that gets called at the start and end of every chain. This memory object is optional and defaults to None.

#### Important Considerations

While PAL Chain provides some basic guardrails such as limiting available locals/globals and inspecting the generated Python AST using `PALValidation`, these measures are not a replacement for a proper sandbox. Therefore, it is advised not to use this class on untrusted inputs, with elevated permissions, or without consulting your security team about proper sandboxing.

#### Examples of PAL Chain in Use

An example of PAL Chain in use can be seen in a math problem scenario. Given the problem "Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?", PAL Chain can be used to generate and execute the necessary code to solve this problem.

#### Potential Issues

There have been instances where users have encountered issues with importing PAL Chain, possibly due to recent changes in the LangChain codebase or the Palchain module. In some cases, the generated code may not be valid Python code, leading to errors when trying to run the chain.

In [36]:
from langchain.chains import PALChain

In [None]:
llm = OpenAI(model_name='gpt-3.5-turbo',
             temperature=0,
             max_tokens=512)

In [43]:
pal_chain = PALChain.from_math_prompt(llm, verbose=True)

In [44]:
question = "Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?"

In [45]:
question_02= "The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?"

In [47]:
pal_chain.run(question)



[1m> Entering new  chain...[0m
[32;1m[1;3mdef solution():
    """Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?"""
    cindy_pets = 4
    marcia_pets = cindy_pets + 2
    jan_pets = 3 * marcia_pets
    total_pets = cindy_pets + marcia_pets + jan_pets
    result = total_pets
    return result[0m

[1m> Finished chain.[0m


'28'

## API Chains - OpenMeteo - Weather information

can throw errors based on API return length


#### What is OpenMeteo?

**OpenMeteo is a free, open-source platform that provides a weather forecast API** for non-commercial use. **It allows users to retrieve weather data for various applications. OpenMeteo is particularly useful for developers, as it provides a simple and user-friendly way to consume weather data in Python applications via a client Python wrapper library known as openmeteopy**.

#### Key Features of OpenMeteo

One of the key features of OpenMeteo is its ability to provide weather forecasts for multiple locations at once, up to 1000 locations in a single go. This makes it a powerful tool for applications that require large-scale weather data.

OpenMeteo also maintains a blog where they share the latest developments around weather forecasts. This blog is hosted on Substack, a platform for independent writing, and has hundreds of readers.

#### OpenMeteo on AWS

OpenMeteo also has a presence on AWS Open Data, where users can contribute to the development of Open-Meteo's open data. This suggests that OpenMeteo is not just a platform for consuming weather data, but also a community-driven project where users can contribute to the improvement and expansion of the available data.

#### OpenMeteo's Python Library: openmeteopy

For Python developers, OpenMeteo provides a client Python wrapper library called openmeteopy. This library allows for quick and easy consumption of Open-Meteo data from Python applications. It provides a simple object model and is designed to be user-friendly, making it a valuable tool for developers working with weather data in Python.

#### Conclusion

In summary, OpenMeteo is a valuable resource for anyone needing to access weather data, especially for non-commercial use. Its open-source nature, coupled with its user-friendly Python library and ability to provide data for multiple locations at once, make it a versatile tool for developers and data enthusiasts alike.

In [56]:
from langchain import OpenAI
from langchain.chains.api.prompt import API_RESPONSE_PROMPT

from langchain.chains import APIChain
from langchain.prompts.prompt import PromptTemplate


In [55]:
llm = OpenAI(model_name="gpt-3.5-turbo",temperature=0,
             max_tokens=100)



In [57]:
from langchain.chains.api import open_meteo_docs
chain_new = APIChain.from_llm_and_api_docs(llm,
                                           open_meteo_docs.OPEN_METEO_DOCS,
                                           verbose=True)

In [60]:
chain_new.run('What is the temperature like right now in Sierra Leone, Africa in degrees Celcius?')



[1m> Entering new  chain...[0m
[32;1m[1;3mhttps://api.open-meteo.com/v1/forecast?latitude=8.460555&longitude=-11.779889&hourly=temperature_2m&current_weather=true&temperature_unit=celsius&timezone=Africa/Freetown[0m
[33;1m[1;3m{"latitude":8.5,"longitude":-11.75,"generationtime_ms":0.0629425048828125,"utc_offset_seconds":0,"timezone":"Africa/Freetown","timezone_abbreviation":"GMT","elevation":170.0,"current_weather_units":{"time":"iso8601","interval":"seconds","temperature":"°C","windspeed":"km/h","winddirection":"°","is_day":"","weathercode":"wmo code"},"current_weather":{"time":"2024-05-10T14:30","interval":900,"temperature":32.3,"windspeed":7.1,"winddirection":221,"is_day":1,"weathercode":2},"hourly_units":{"time":"iso8601","temperature_2m":"°C"},"hourly":{"time":["2024-05-10T00:00","2024-05-10T01:00","2024-05-10T02:00","2024-05-10T03:00","2024-05-10T04:00","2024-05-10T05:00","2024-05-10T06:00","2024-05-10T07:00","2024-05-10T08:00","2024-05-10T09:00","2024-05-10T10:00","2024

'The current temperature in Sierra Leone, Africa is 32.3 degrees Celsius.'

In [61]:
chain_new.run('Is it raining in Singapore?')



[1m> Entering new  chain...[0m
[32;1m[1;3mhttps://api.open-meteo.com/v1/forecast?latitude=1.3521&longitude=103.8198&hourly=rain&current_weather=true[0m
[33;1m[1;3m{"latitude":1.375,"longitude":103.875,"generationtime_ms":0.08094310760498047,"utc_offset_seconds":0,"timezone":"GMT","timezone_abbreviation":"GMT","elevation":46.0,"current_weather_units":{"time":"iso8601","interval":"seconds","temperature":"°C","windspeed":"km/h","winddirection":"°","is_day":"","weathercode":"wmo code"},"current_weather":{"time":"2024-05-10T14:30","interval":900,"temperature":27.9,"windspeed":2.2,"winddirection":171,"is_day":0,"weathercode":3},"hourly_units":{"time":"iso8601","rain":"mm"},"hourly":{"time":["2024-05-10T00:00","2024-05-10T01:00","2024-05-10T02:00","2024-05-10T03:00","2024-05-10T04:00","2024-05-10T05:00","2024-05-10T06:00","2024-05-10T07:00","2024-05-10T08:00","2024-05-10T09:00","2024-05-10T10:00","2024-05-10T11:00","2024-05-10T12:00","2024-05-10T13:00","2024-05-10T14:00","2024-05-10T

'Based on the API response, it is not currently raining in Singapore. The hourly rain values for the next 7 days are all 0.00 mm, indicating no precipitation is expected.'

In [None]:
# chain_new.