<a href="https://colab.research.google.com/github/Crisitunity-Lab/ARDC-Project/blob/main/Sandbox/Prototype_OpenAI_OutputParsers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using LangChain with Output Parsers
Output parsers are responsible for specifying the schema a language model should respond to a request in. This requires a schema to be specified with the information required.

## LLM Used
Using OpenAI as the base foundation model.

In [1]:
!pip -q install langchain openai tiktoken cohere

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.3/47.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip show langchain

Name: langchain
Version: 0.0.286
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, async-timeout, dataclasses-json, langsmith, numexpr, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


In [3]:
# Insert OpenAI API Key
api_key = "<insert key>"

In [5]:
from langchain.chat_models import ChatOpenAI

# Set temp to zero to limit the variability of output from the model
chat_llm = ChatOpenAI(temperature=0.0, openai_api_key=api_key)

Create a template string that has a place holder for the tweet. The prompt tries to give the model some background on how to "think" about the question being posed.

In [6]:
template_string = """You are crisis response person that needs to understand the relevance of tweets. \
You help identify relevant information in a crisis.

Take the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.

tweet: ```{tweet}```
"""

In [7]:
from langchain.prompts import ChatPromptTemplate

# Create the prompt template that takes the raw blob with instructions
prompt_template = ChatPromptTemplate.from_template(template_string)

In [8]:
# What does the prompt look like?
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['tweet'], output_parser=None, partial_variables={}, template='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```{tweet}```\n', template_format='f-string', validate_template=True)

In [9]:
# Pass in the first message from the 2013 Alberta Floods
msg = "RT @CBCAlerts: Canmore, Alta. declares state of emergency due to flooding  - with some residents being moved to community centre #Alberta"

In [10]:
# Add message to prompt template
tweet_response = prompt_template.format_messages(tweet=msg)

In [11]:
tweet_response

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```RT @CBCAlerts: Canmore, Alta. declares state of emergency due to flooding  - with some residents being moved to community centre #Alberta```\n', additional_kwargs={}, example=False)]

In [12]:
# Run the model on the prompt template
consultant_response = chat_llm(tweet_response)

In [13]:
# Print output from the model
print(consultant_response.content)

Sentiment: The sentiment of the tweet is not explicitly mentioned. However, based on the information provided, it can be inferred that the sentiment is likely negative or concerning due to the declaration of a state of emergency and the need to move residents to a community centre.

Crisis Type: The crisis type mentioned in the tweet is flooding. Canmore, Alberta has declared a state of emergency specifically due to flooding.

Country: The tweet relates to Canada, specifically the province of Alberta. Canmore is a town located in Alberta.


The output fron the model doesn't look too bad, although it is a little wordy and may not be useful for our use case.

In the next bits we'll use the Structured Output Parser to receive stucture from the model.

In [14]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

# Set the output Response Schema
sentiment_schema = ResponseSchema(name="sentiment",
                             description="This is the senitment of the tweet")

crisis_type_schema = ResponseSchema(name="crisis_type",
                                      description="This is type of crisis (e.g. flooding, bushfire, typhoon)")

country_schema = ResponseSchema(name="country",
                                    description="This is the country where the crisis is located")

response_schemas = [sentiment_schema,
                    crisis_type_schema,
                    country_schema]

In [15]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [16]:
# Set the format instructions to be passed to the model
format_instructions = output_parser.get_format_instructions()

In [17]:
# Show the format instructions
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"sentiment": string  // This is the senitment of the tweet
	"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)
	"country": string  // This is the country where the crisis is located
}
```


In [18]:
# Now add the format instructions to the template string
template_string = """You are crisis response person that needs to understand the relevance of tweets. \
You help identify relevant information in a crisis.

Take the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.

tweet: ```{tweet}```

{format_instructions}
"""

In [20]:
# Create prompt template with the format instructions included
prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(tweet=msg,
                                format_instructions=format_instructions)

messages

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```RT @CBCAlerts: Canmore, Alta. declares state of emergency due to flooding  - with some residents being moved to community centre #Alberta```\n\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"sentiment": string  // This is the senitment of the tweet\n\t"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)\n\t"country": string  // This is the country where the crisis is located\n}\n```\n', additional_kwargs={}, example=False)]

In [21]:
# Get response from model
response = chat_llm(messages)
response

AIMessage(content='```json\n{\n\t"sentiment": "neutral",\n\t"crisis_type": "flooding",\n\t"country": "Canada"\n}\n```', additional_kwargs={}, example=False)

In [22]:
# Return response as a dict to allow for easy extraction of formatted data
response_as_dict = output_parser.parse(response.content)
response_as_dict

{'sentiment': 'neutral', 'crisis_type': 'flooding', 'country': 'Canada'}

## Results
The output seems to be nicely formatted and returns values that seem to valid and may be useful, but this has happened before only for the next prompt to return something that doesn't match the required output.

## Next Steps
Try mode tweets from the 2013 Alberta floods, as well as the 2012 Colorado Wildfires

In [23]:
# Next message from the Alberta Floods
msg_2 = "RT @GlobalCalgary: If you are in #Canmore and need help, an emergency line has been set up. Pls call 403-678-1551. #abstorm #abflood"

In [24]:
prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(tweet=msg_2,
                                format_instructions=format_instructions)

messages

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```RT @GlobalCalgary: If you are in #Canmore and need help, an emergency line has been set up. Pls call 403-678-1551. #abstorm #abflood```\n\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"sentiment": string  // This is the senitment of the tweet\n\t"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)\n\t"country": string  // This is the country where the crisis is located\n}\n```\n', additional_kwargs={}, example=False)]

In [25]:
response = chat_llm(messages)
response

AIMessage(content='```json\n{\n\t"sentiment": "neutral",\n\t"crisis_type": "flooding",\n\t"country": "Canada"\n}\n```', additional_kwargs={}, example=False)

In [26]:
response_as_dict = output_parser.parse(response.content)
response_as_dict

{'sentiment': 'neutral', 'crisis_type': 'flooding', 'country': 'Canada'}

Looks useful but is the same output as the first tweet. This may be right - after all it's the same crisis - but want to make sure it's working OK.

In [27]:
# Another message from the Alberta Floods
msg_3 = "RT @CgyCA: UPDATE: Erlton, Victoria Park, Cliff Bungalow and Inglewood added to #yyc evacuation order because of flooding. #abflood"

In [28]:
prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(tweet=msg_3,
                                format_instructions=format_instructions)

messages

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```RT @CgyCA: UPDATE: Erlton, Victoria Park, Cliff Bungalow and Inglewood added to #yyc evacuation order because of flooding. #abflood```\n\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"sentiment": string  // This is the senitment of the tweet\n\t"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)\n\t"country": string  // This is the country where the crisis is located\n}\n```\n', additional_kwargs={}, example=False)]

In [29]:
response = chat_llm(messages)
response

AIMessage(content='```json\n{\n\t"sentiment": "neutral",\n\t"crisis_type": "flooding",\n\t"country": "Canada"\n}\n```', additional_kwargs={}, example=False)

In [30]:
response_as_dict = output_parser.parse(response.content)
response_as_dict

{'sentiment': 'neutral', 'crisis_type': 'flooding', 'country': 'Canada'}

Tweets from the 2012 Alberta floods are returning structure and predictable (maybe too predictable) dict formats. Let's try another crisis to understand if the model will pick up a different crisis in a different country.

In [31]:
# Message from the 2012 Colorado wildfires
msg_4 = "#Media Large wildfire in N. Colorado prompts evacuations: Crews are battling a fast-moving wildf... http://t.co/ju1BGTKH #Politics #News"

In [32]:
prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(tweet=msg_4,
                                format_instructions=format_instructions)

messages

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```#Media Large wildfire in N. Colorado prompts evacuations: Crews are battling a fast-moving wildf... http://t.co/ju1BGTKH #Politics #News```\n\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"sentiment": string  // This is the senitment of the tweet\n\t"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)\n\t"country": string  // This is the country where the crisis is located\n}\n```\n', additional_kwargs={}, example=False)]

In [33]:
response = chat_llm(messages)
response

AIMessage(content='```json\n{\n\t"sentiment": "neutral",\n\t"crisis_type": "wildfire",\n\t"country": "United States"\n}\n```', additional_kwargs={}, example=False)

In [34]:
response_as_dict = output_parser.parse(response.content)
response_as_dict

{'sentiment': 'neutral', 'crisis_type': 'wildfire', 'country': 'United States'}

Output looks useful and the country and crisis type of changed to align with the new information.

In [35]:
# Another message from the 2012 Colorado wildfires
# This tweet is intersting as it mentions "Belgium". Will the model indetify the US as the country?
msg_5 = "RT @TheUmno: MT @KellyNehls: #photo Unedited photo from New Belgium Brewery #highparkfire #fortcollins #loc New Belgium Brewery # http:/ ..."

prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(tweet=msg_5,
                                format_instructions=format_instructions)

messages

[HumanMessage(content='You are crisis response person that needs to understand the relevance of tweets. You help identify relevant information in a crisis.\n\nTake the tweet below delimited by triple backticks and use it to understand the sentiment, the crisis type and country it relates to.\n\ntweet: ```RT @TheUmno: MT @KellyNehls: #photo Unedited photo from New Belgium Brewery #highparkfire #fortcollins #loc New Belgium Brewery # http:/ ...```\n\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"sentiment": string  // This is the senitment of the tweet\n\t"crisis_type": string  // This is type of crisis (e.g. flooding, bushfire, typhoon)\n\t"country": string  // This is the country where the crisis is located\n}\n```\n', additional_kwargs={}, example=False)]

In [36]:
response = chat_llm(messages)
response

AIMessage(content='```json\n{\n\t"sentiment": "neutral",\n\t"crisis_type": "wildfire",\n\t"country": "United States"\n}\n```', additional_kwargs={}, example=False)

In [37]:
response_as_dict = output_parser.parse(response.content)
response_as_dict

{'sentiment': 'neutral', 'crisis_type': 'wildfire', 'country': 'United States'}

Country has been predicted correctly.

## Results
The method seems to produce predictable results that looks to be correct.

## Next steps
- Can Falcon 7B/Llama be used with the structuredoutputparser? If so, will it work as well?

- Can this process be scaled?

- Can it predict informativeness?

## Questions/Thoughts
Have these tweets been fed into the model during the training process? If so, is the model just regurgitating what it already knows? Would it be as effective on an unseen dataset?