Thanks to the starter code - https://colab.research.google.com/drive/14GQw8HW8TllB_S3enqotM3dXU7Pav9e_


In [None]:
import os

# Set CUDA_VISIBLE_DEVICES to a specific GPU, e.g., '0' for the first GPU
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

# To disable CUDA from seeing any GPUs, set it to an empty string
# os.environ['CUDA_VISIBLE_DEVICES'] = ''

# # TODO: set your huggingface cache directory. For dgxrise cluster, download large models to raid/ instead of home/ directory
# os.environ["HF_HOME"] = ""

In [None]:
from langchain.chains import LLMChain, SequentialChain
from langchain.memory import ConversationBufferMemory
from langchain import HuggingFacePipeline, PromptTemplate,  LLMChain

from transformers import AutoModel, pipeline,AutoTokenizer, AutoModelForCausalLM

from datetime import datetime

import torch
import pandas as pd

In [None]:
# TODO: set your huggingface access token
access_token = ''

In [None]:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token=access_token)

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                             device_map='auto',
                                             torch_dtype=torch.float16,
                                             load_in_4bit=True,
                                             bnb_4bit_quant_type="nf4",
                                             bnb_4bit_compute_dtype=torch.float16,
                                             token=access_token)

pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer,
                torch_dtype=torch.float16,
                device_map="auto",
                max_new_tokens=5000,
                do_sample=True,
                top_k=1,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )

llm = HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature': 0.7, 'max_length': 5000, 'top_k': 1})

In [None]:
instruction = """
Your task is to carefully read the provided Wikipedia article and identify specific instances of impacts related to the climate event described. 
For every distinct impact mentioned, regardless of how many times similar impacts are noted, record each as a separate JSON object. 
**Ensure that the locations and perils for each impact are specific to the incident being described, not just general to the entire event.** 
It is crucial that you only use information explicitly mentioned in the article — do not infer, assume, or invent any data. Copy the exact values and details as they are provided in the text for each of the following fields:


Instructions for each impact:
- Record each impact as a separate JSON object whenever the article mentions any impacts that fall into the categories listed below.
- The "Perils" should be specific to the cause of the impact mentioned (e.g., flood, storm).
- The "Locations" must be as specific as possible, relating directly to where the impact occurred (e.g., a city or town if mentioned, not just the country or region).
- For dates, use 'Single_Date' for impacts mentioned on a specific day, and 'Start_Date' and 'End_Date' for impacts occurring over a period.

Impact Categories (Note: read through the following description of each category, make sure you understand their meanings): 
- Total_Death: include all people dead in the event, directly or indirectly
- Num_Injured: include all people injured in the event, directly or indirectly
- Num_Displaced: include all people displaced in the event. Note: only detect the information when explicitly mentioned as "displace". If the text refers to groups, like "500 families displaced", please return the result as "500 families"
- Num_Homeless: include all people left homeless by the event. Note: only detect the information when explicitly mentioned as "homeless". If the text refers to groups, like "500 families homeless", please return the result as "500 families"
- Num_Affected: include all people affected by the event. Note: only detect the information when explicitly mentioned as "affect/impact/influence". If the text refers to groups, like "500 families affected", please return the result as "500 families"
- Insured_Damage: refer to the insurance damage
- Total_Damage: refer to the total economic loss or damage
- Buildings_Damaged: the total number of buildings damaged due to the event

Format your response as follows for each impact:

```
{{
  "impact 1": {{
    "Impact_Category": "",
    "Impact_Data",
    "Perils": "",
    "Locations": "",
    "Single_Date": "",
    "Start_Date": "",
    "End_Date": "",
    
  }},
  "impact 2": {{
    # Another JSON object for the next distinct impact
  }},
  ...
}}
```
Fill in "Impact_Category" with the specific category (e.g., "Total_Deaths", "Num_Affected") and "Impact_Data" with the actual data extracted from the article. 
Remember, each JSON object should represent a unique impact mention from the article, with all relevant details transcribed exactly as presented. Do not fill in any field with assumed or inferred information.

Note: Ensure not to generalize the 'Perils' and 'Locations' fields; they must directly correspond to the specific impact described. If the article does not provide explicit information for a field, leave it as null instead of 0.
And if the article doesn't mention anything about some of the impact categories, don't create JSON objects for them.
"""

In [None]:

def get_prompt_template(instruction=instruction):
    # Markers
    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<>\n", "\n<>\n\n"

    # System Prompt with Article
    SYSTEM_PROMPT = B_SYS + "\"Article Text\": {wiki_input}" + E_SYS

    # Complete Prompt with Instructions
    prompt_template = SYSTEM_PROMPT + B_INST + instruction + E_INST
    
    # print(f"prompt_template:{prompt_template}")

    return prompt_template

In [None]:
# input_variables: list of string
# template: string with {} to insert the input variable
prompt_template = get_prompt_template()
prompt = PromptTemplate(template=prompt_template, input_variables=['wiki_input'])

llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=False)

In [None]:
def generate_exp_log(out_dir, file_name, instruction, wiki_input, response):
    timestamp = datetime.now().strftime("%m%d%H%M")
    with open(f'{out_dir}{timestamp}-{file_name}.txt', 'w') as file:
        file.write("="*20+"Instruction"+"="*20)
        file.write(f"{instruction}\n\n")
        file.write("="*20+"Wiki Input"+"="*20)
        file.write(f"\n{wiki_input}\n\n")
        file.write("="*20+"Response"+"="*20)
        file.write(f"\n{response}\n\n")

In [None]:
# A single manual input (for testing)
wiki_input = """The 2008 Queensland storms were a series of three thunderstorms that struck South East Queensland, Australia. The first storm hit on 16 November 2008 and was followed by two further storms on 19 and 20 November. The storms resulted in 2 fatalities.

Background
First incident
Winds reached 130 kilometres (81 mi)/h,[2] equivalent to a Category 3 cyclone. Electricity was cut to about 230,000 homes and about 4,000 homes were left in need of serious repair, and at least 30 houses were beyond repair. Some of the worst hit areas included The Gap, Kenmore, Arana Hills and Albany Creek. Forecasters at the Bureau of Meteorology rated the storm the worst to hit south-east Queensland since 2004 & The Biggest to The South east Queensland region since 1985.[3]

Second incident
Torrential rain affected areas of Brisbane, Ipswich and Toowoomba, with rainfall reaching more than 250 mm in some locations. The Ipswich and Marburg areas were the worst affected, whilst four homes in inner-city Paddington were unroofed. The Inner City Bypass was flooded and forced to close, as was the Moggill Ferry.

Third incident
Blackwater, near Emerald in Central Queensland, was hit hard by the third storm, with 20 houses sustaining roof damage and up to 100 more were damaged in some way. Gregory MP Vaughan Johnson said the storm was terrifying. He said he was driving to an appointment when the storm slammed into Blackwater, forcing him to turn back to Emerald. Hail was described as the size of golf balls and witnesses saw a lot of damage."""

response = llm_chain.invoke(wiki_input)
print("Response Generated!")
print(response)

generate_exp_log('./results/reversed_prompt/', 'reversed_prompt', instruction, wiki_input, response)