## Question Answering Template using HuggingFaceHub (Google FLAN-T5 model)

In [1]:
from langchain import HuggingFaceHub
from langchain.chains import LLMChain



In [309]:
llm=HuggingFaceHub(repo_id='google/flan-t5-large', model_kwargs={'temperature':0})

In [3]:
from langchain.prompts import PromptTemplate

In [4]:
template=PromptTemplate(
    input_variables=['question'],
    template='''Question: {question}
    Answer:'''
)

In [5]:
llmchain=LLMChain(llm=llm,prompt=template)

#### Answering one question

In [6]:
llmchain.run(question='What is the capital of France?')

'paris'

#### Answering multiple questions in parallel

In [8]:
questions=[
    {'question':'What is the capital of France?'},
    {'question':'Where can we observe whale sharks in South America?'},
    {'question':"Which gas is most abundant in Earth's atmosphere?"},
    {'question':'What is the most efficient fuel for rockets to the International Space Station?'}
    
]

In [8]:
result=llmchain.generate(questions)

In [9]:
print(result)

generations=[[Generation(text='paris', generation_info=None)], [Generation(text='pacific', generation_info=None)], [Generation(text='nitrogen', generation_info=None)], [Generation(text='kerosene', generation_info=None)]] llm_output=None run=RunInfo(run_id=UUID('e3ef3c59-2d28-4bcc-9542-d106b7f8306d'))


In [10]:
result.dict()['generations'][0]

[{'text': 'paris', 'generation_info': None}]

In [11]:
for q,a in zip(questions, result.dict()['generations']):
    print(f'Question: {q["question"]}')
    print(f'Answer: {a[0]["text"]}')
    print('**************')

Question: What is the capital of France?
Answer: paris
**************
Question: Where can we observe whale sharks in South America?
Answer: pacific
**************
Question: Which gas is most abundant in Earth's atmosphere?
Answer: nitrogen
**************
Question: What is the most efficient fuel for rockets to the International Space Station?
Answer: kerosene
**************


#### Trying to pass all the questions in llmchain.run()

In [12]:
result2=llmchain.run(questions)

In [13]:
result2

'helium'

#### Interestingly, we get only one answer if we send all 4 questions in a chain.run command <br>
The main reason is that all 4 q's are pushed into a single context or as a single question which does not make sense. And for some reason, the answer 'helium is irrelevant to all :)<br>
To sort this out, we need to update the template and hence, the prompt

In [310]:
template = """Answer the following questions one at a time.

Questions:
{questions}

Answers:
"""
input_variables=['questions']

In [311]:
prompt=PromptTemplate(input_variables=input_variables, template=template)

In [312]:
llmchain=LLMChain(llm=llm, prompt=prompt)

In [313]:
questions=[
    {'question':"What is the capital of France?"},
    {'question':"Where can we observe sharks?"},
    {'question':'What is the most efficient fuel for rockets to the International Space Station?'}
    
]

In [314]:
questions

[{'question': 'What is the capital of France?'},
 {'question': 'Where can we observe sharks?'},
 {'question': 'What is the most efficient fuel for rockets to the International Space Station?'}]

In [260]:
q_list=[q['question'] for q in questions]
q_list

['What is the capital of France?',
 'Where can we observe sharks?',
 'What is the most efficient fuel for rockets to the International Space Station?']

In [316]:
q_str='\n'.join([q['question'] for q in questions])
q_str

'What is the capital of France?\nWhere can we observe sharks?\nWhat is the most efficient fuel for rockets to the International Space Station?'

In [317]:
llmchain.run(questions=q_str,verbose=True)

'Paris is the capital of France. Sharks live in the ocean. Sharks live in the'

Hence, we see that the prompting for multiple questions is not as robust as chain.generate().<br>
The llmchain literally starts repeating out the second answer instead of answering the third question <br>
The third question is completely ignored. Could this be because of a token limit? <br>

#### Now, modifying the questions slightly...

In [328]:
questions2=[
    {'question':"What is the capital of France?"},
    {'question':"Where can we observe sharks?"},
    {'question':'What is the most efficient fuel for rockets?'}
    
]

In [329]:
q_str2=''.join([q['question'] for q in questions2])
q_str2

'What is the capital of France?Where can we observe sharks?What is the most efficient fuel for rockets?'

In [330]:
llmchain.predict(questions=q_str2,verbose=True)

'The capital of France is Paris. The most efficient fuel for rockets is hydrogen. Hydrogen'

Hence, we see that the prompting for multiple questions is not as robust as chain.generate().<br>
The llmchain literally misses out the second question<br>
The third question has a different answer just because we ignored the "International Space Station" part.

<b>Hence, it makes sense to segregate the questions and pass them individually to the model.</b>

## Text Summarization

#### Text summarization can be done by simply prompting the model to summarize the data provided.

In [272]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

In [274]:
llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [284]:
prompt=PromptTemplate(
input_variables=['text'],
template='Summarize the following text to one sentence: {text}')

In [277]:
sample_text='LangChain provides many modules that can be used to build language model applications. Modules can be combined to create more complex applications, or be used individually for simple applications. The most basic building block of LangChain is calling an LLM on some input. Let’s walk through a simple example of how to do this. For this purpose, let’s pretend we are building a service that generates a company name based on what the company makes.'

In [293]:
print('Length of original text:',len(sample_text))

Length of original text: 448


In [285]:
prompt.format(text=sample_text)

'Summarize the following text to one sentence: LangChain provides many modules that can be used to build language model applications. Modules can be combined to create more complex applications, or be used individually for simple applications. The most basic building block of LangChain is calling an LLM on some input. Let’s walk through a simple example of how to do this. For this purpose, let’s pretend we are building a service that generates a company name based on what the company makes.'

In [289]:
llm.predict(prompt.format(text=sample_text))

'LangChain offers various modules for building language model applications, allowing users to combine them for more complex applications or use them individually for simpler ones, with the basic building block being calling an LLM on input, as demonstrated in the example of creating a company name based on its product.'

In [331]:
from langchain.chains import LLMChain

In [290]:
llmchain=LLMChain(prompt=prompt,llm=llm)

In [294]:
summ=llmchain.run(sample_text)

In [295]:
print('Length of summarized text:',len(summ))

Length of summarized text: 319


#### Hence, we can see the summarization has reduced the length of the text from 448 to 319 - a difference of around 29%

## Language Translation

In [296]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

In [298]:
llm=ChatOpenAI(model='gpt-3.5-turbo',temperature=0)

In [302]:
prompt=PromptTemplate(
template='Translate the sentence {sentence} from {source_language} to {target_language}',
input_variables=['sentence','source_language','target_language'])

In [303]:
llmchain=LLMChain(prompt=prompt, llm=llm)

In [305]:
llmchain.run(sentence='Le traitement du langage naturel est la nouvelle interface PC',source_language="French",target_language='English')

'The natural language processing is the new PC interface.'