
# Basic Prompting with LangChain

LangChain is a popular framework that allow users to quickly build apps and pipelines around **L**arge **L**anguage **M**odels. It can be used to for chatbots, **G**enerative **Q**uestion-**A**nwering (GQA), summarization, and much more.


* **Prompt templates**: Prompt templates are templates for different types of prompts

* **LLMs**: Large language models used here FLAT-T5 -large


**Objectives** : We use a Large language model to understand different styles of prompting using **Langchain Prompt Templates**
1. Logical Thinking
2. Chain of Thought Prompting
3. Prompt Chaining

**Note** : Langchain is a comprehensive tool chain for promoting. This is a basic examples demonstrate chaining concepts. In future I plan to write more on some advanced Prompt chaining with Langchain.

In [1]:
!pip install -qU langchain

## Setting up Hugging Face

- We need to make sure to have a Hugging Face Account and a Huggingface API token. To do this, login to HF and create an Account first.
- For Hugging Face models we need a Hugging Face Hub API token. We can find this by first getting an account at HuggingFace.co and clicking on our profile in the top-right corner > click Settings > click Access Tokens > click New Token > set Role to write > Generate > copy and paste the token below:

In [2]:
import os
#set the Huggingface API tokens to work with Lang chain
os.environ['HUGGINGFACEHUB_API_TOKEN'] = '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' # replace this with your own token

We can then generate text using a HF Hub model (we'll use `google/flan-t5-large`) using Langchain's prompt Template format below

_(The default Inference API doesn't use specialized hardware and so can be slow and cannot run larger models like `google/flan-t5-xxl`)_

In [12]:
#from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.chains import LLMChain
from langchain.prompts.prompt import PromptTemplate
from langchain.llms import HuggingFaceHub


# initialize HF LLM
model = HuggingFaceHub(
    repo_id="google/flan-t5-large",
    model_kwargs={"temperature":1e-50}
)

# build prompt template for simple question-answering
template = """Question: {question}

Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(
    prompt=prompt,
    llm=model
)



In [19]:
# Prompting type 1 : logical thinking
question = """
I am riding a bicycle. The pedals are moving fast. I look into the
mirror and I am not moving. Why is this?
"""
print(llm_chain.run(question))

I am not moving because I am not moving.


In [14]:
# Prompting type 2:  Chain of Thought Prompting 
question="Answer the following by reasoning step-by-step: A Cafeteria had 10 bananas. If they used 4 for lunch. how many bananas do they have ?"
print(llm_chain.run(question))

They had 10 - 4 = 6 bananas left. Therefore, the final answer is 6.


If we'd like to ask multiple questions we can by passing a list of dictionary objects, where the dictionaries must contain the input variable set in our prompt template (`"question"`) that is mapped to the question we'd like to ask.

**Grouping Questions Together** : It is a LLM, so we can try feeding in all questions at once:

In [18]:
multi_template = """Answer the following questions one at a time.

Questions:
{questions}

Answers:
"""
long_prompt = PromptTemplate(
    template=multi_template,
    input_variables=["questions"]
)

llm_chain = LLMChain(
    prompt=long_prompt,
    llm=model
)

qs_str = (
    "What is the capital of Germany?" +
    "Who was the 1st person on the moon?" 
)

print(llm_chain.run(qs_str))

Berlin is the capital of Germany. The first person to walk on the moon was Buzz Aldrin


In [22]:
# let us make sure we release any memory and invoke Garbage Collector before we move to other notebooks
# free up the memory
import gc
import torch
# del model
# del llm_chain
# del long_prompt
# del prompt



gc.collect()
torch.cuda.empty_cache()


In [24]:
# Let us monitor memory
import torch

# Retrieve GPU memory statistics
memory_stats = torch.cuda.memory_stats()
# Retrieve maximum GPU memory allocated by PyTorch
max_memory_allocated = torch.cuda.max_memory_allocated()
# Calculate available GPU memory
total_memory = torch.cuda.get_device_properties(0).total_memory
available_memory = total_memory - memory_stats["allocated_bytes.all.current"]

# Print the result
print(f"total_memory: {total_memory / 1024**3:.2f} GB")
# print(f"Peak GPU memory allocated by PyTorch: {max_memory_allocated / 1024**3:.2f} GB")
print(f"Available GPU memory: {available_memory / 1024**3:.2f} GB")


## Make sure you are able to Total Memory of 14GB before moving to the next assisgnment, else restart the session

total_memory: 14.76 GB
Available GPU memory: 14.76 GB
