<a href="https://colab.research.google.com/github/Crisitunity-Lab/ARDC-Project/blob/main/Notebooks/Llama2_prod_r6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Llama2 Production

SUMA 08/10/2023

#Instructions

Running code on Google Colab
1.   Create HuggingFace account
2.   Gain access to model: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
3.   Create folder 'data' in project folder on personal Google Drive
4.   Save data set CrisiLexT26 in folder 'data'
5.   Data sourced from: https://crisislex.org/data-collections.html#CrisisLexT26
6.   Mount Google Drive
7.   Update code with Paths
8.   Set system up as per steps below
9.   Run Experiments
10.  Evaluate

## Setup

In [1]:
# Install packages
!pip install -q transformers einops accelerate langchain bitsandbytes
!pip install sentencepiece
!pip install pycountry

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m65.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m91.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m76.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m69.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Import Packages
from langchain import PromptTemplate,  LLMChain
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch
import pandas as pd
import numpy as np
import json
import os
import csv
import time
import re
import matplotlib.pyplot as plt
import json
import pycountry



In [3]:
# Mounting GDrive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


### Prerequisites

To load our desired model, `meta-llama/Llama-2-7b-chat-hf`, we first need to authenticate ourselves on Hugging Face. This ensures we have the correct permissions to fetch the model.

1. Gain access to the model on Hugging Face: [Link](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
2. Use the Hugging Face CLI to login and verify your authentication status.


In [4]:
# Logging onto HuggingFace
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Preparing data

In [5]:
# Read (timestamps)

# Path to the main folder
main_folder = '/content/gdrive/MyDrive/iLab2/data/CrisisLexT26'

# Function to process CSV files in a folder
def process_folder(folder_path):
    dfs = []
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.endswith('.csv') and 'period' in file.lower():
                file_path = os.path.join(root, file)
                df = pd.read_csv(file_path)
                dfs.append(df)
    return dfs

# Read and process each subfolder
combined_data = []
for subfolder in os.listdir(main_folder):
    subfolder_path = os.path.join(main_folder, subfolder)
    if os.path.isdir(subfolder_path):
        subfolder_data = process_folder(subfolder_path)
        combined_data.extend(subfolder_data)

# Concatenate all data into one DataFrame
combined_df_p = pd.concat(combined_data, ignore_index=True)


# Remove spaces from column names
combined_df_p.columns = combined_df_p.columns.str.replace(' ', '')

# Rename Columns in prep for left join
combined_df_p.rename(columns={'Tweet-ID': 'Tweet ID'}, inplace=True)

# # Save the combined DataFrame to a CSV file
# output_file = 'combined_data.csv'
# combined_df.to_csv(output_file, index=False)

print(f"Combined timestamps read")


Combined timestamps read


In [6]:
# Read tweets

# Path to the main folder
main_folder = '/content/gdrive/MyDrive/iLab2/data/CrisisLexT26'

# Function to process CSV files in a folder, add folder name as label, and add subfolder name as a column
def process_folder(folder_path, label, subfolder_name):
    dfs = []
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.endswith('.csv') and 'labeled' in file.lower():
                file_path = os.path.join(root, file)
                df = pd.read_csv(file_path)
                df['Label'] = label
                df['subfolder_name'] = subfolder_name  # Add subfolder name as a column
                dfs.append(df)
    return dfs

# Read and process each subfolder
combined_data = []
for subfolder in os.listdir(main_folder):
    subfolder_path = os.path.join(main_folder, subfolder)
    if os.path.isdir(subfolder_path):
        label = subfolder
        subfolder_data = process_folder(subfolder_path, label, subfolder)
        combined_data.extend(subfolder_data)

# Concatenate all data into one DataFrame
combined_df_l = pd.concat(combined_data, ignore_index=True)

# Print the first few rows of the combined DataFrame
print(f"Combined tweets read")
combined_df_l.head(10)


Combined tweets read


Unnamed: 0,Tweet ID,Tweet Text,Information Source,Information Type,Informativeness,Label,subfolder_name
0,347686624563429378,"RT @CBCAlerts: Canmore, Alta. declares state o...",Media,Affected individuals,Related and informative,2013_Alberta_floods,2013_Alberta_floods
1,347766337344503808,RT @GlobalCalgary: If you are in #Canmore and ...,Media,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods
2,347779159327637504,RT @metrocalgary: UPDATE: Latest from the @cit...,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods
3,347783236191129600,RT @GlobalCalgary: GALLERY: Incredible photos ...,Media,Not applicable,Related and informative,2013_Alberta_floods,2013_Alberta_floods
4,347793432514801665,RT @nenshi: Major risk of flooding in Calgary....,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods
5,347793818403344384,RT @weathernetwork: RT: Residents of #HighRive...,Media,Affected individuals,Related and informative,2013_Alberta_floods,2013_Alberta_floods
6,347802211222433792,RT @BrockWHarrison: . @ElectDanielle and other...,NGOs,Affected individuals,Related and informative,2013_Alberta_floods,2013_Alberta_floods
7,347804916514951168,Lots of #abflood updates on our liveblog http:...,Media,Other Useful Information,Related and informative,2013_Alberta_floods,2013_Alberta_floods
8,347808955663257600,RT @aldjohnmar: Our first Emergency Shelter wi...,NGOs,Infrastructure and utilities,Related and informative,2013_Alberta_floods,2013_Alberta_floods
9,347812738900320256,'Move to high ground:' Rapid flooding forces e...,Media,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods


In [7]:
# Extract earliest timestamp

# Convert it to datetime first
combined_df_p['Timestamp'] = pd.to_datetime(combined_df_p['Timestamp'])

# Group by 'Tweet-ID' and aggregate to find the earliest timestamp
earliest_timestamps = combined_df_p.groupby('Tweet ID')['Timestamp'].agg('min').reset_index()

# Display or use the resulting DataFrame 'earliest_timestamps'
# print(earliest_timestamps)

In [8]:
# Left join

# Perform a left join on 'Tweet ID'
result_df = combined_df_l.merge(earliest_timestamps, on='Tweet ID', how='left')

# Display
# print(result_df)
# result_df.shape
result_df.head()

Unnamed: 0,Tweet ID,Tweet Text,Information Source,Information Type,Informativeness,Label,subfolder_name,Timestamp
0,347686624563429378,"RT @CBCAlerts: Canmore, Alta. declares state o...",Media,Affected individuals,Related and informative,2013_Alberta_floods,2013_Alberta_floods,2013-06-20 12:05:25+00:00
1,347766337344503808,RT @GlobalCalgary: If you are in #Canmore and ...,Media,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods,2013-06-20 17:22:10+00:00
2,347779159327637504,RT @metrocalgary: UPDATE: Latest from the @cit...,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods,2013-06-20 18:13:07+00:00
3,347783236191129600,RT @GlobalCalgary: GALLERY: Incredible photos ...,Media,Not applicable,Related and informative,2013_Alberta_floods,2013_Alberta_floods,2013-06-20 18:29:19+00:00
4,347793432514801665,RT @nenshi: Major risk of flooding in Calgary....,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013_Alberta_floods,2013-06-20 19:09:50+00:00


In [9]:
# Read

# Define the main folder path
main_folder = '/content/gdrive/MyDrive/iLab2/data/CrisisLexT26'

# Initialize empty lists to store the extracted data
subfolder_names = []
names = []
start_days = []
durations = []
countries = []
location_descriptions = []
sub_categories = []
types = []

# Loop through the subfolders
for subfolder in os.listdir(main_folder):
    subfolder_path = os.path.join(main_folder, subfolder)

    # Check if it's a directory
    if os.path.isdir(subfolder_path):
        json_files = [f for f in os.listdir(subfolder_path) if f.endswith('.json')]

        # Loop through JSON files in the subfolder
        for json_file in json_files:
            json_path = os.path.join(subfolder_path, json_file)

            # Read the JSON file
            with open(json_path, 'r') as f:
                data = json.load(f)

                # Extract the required information
                subfolder_names.append(subfolder)
                names.append(data['name'])
                start_days.append(data['time']['start_day'])
                durations.append(data['time']['duration'])
                countries.append(data['location']['country'])
                location_descriptions.append(data['location']['location_description'])
                sub_categories.append(data['categorization']['sub_category'])
                types.append(data['categorization']['type'])

# Create a DataFrame from the extracted data
df_annotation = pd.DataFrame({
    'subfolder_name': subfolder_names,
    'name': names,
    'start_day': start_days,
    'duration': durations,
    'country': countries,
    'location_description': location_descriptions,
    'sub_category': sub_categories,
    'type': types
})

# Display the DataFrame
df_annotation.head()


Unnamed: 0,subfolder_name,name,start_day,duration,country,location_description,sub_category,type
0,2013_Alberta_floods,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods
1,2013_Glasgow_helicopter_crash,Glasgow helicopter crash,2013-11-29,30,UK,Glasgow,Unintentional,Crash
2,2012_Costa_Rica_earthquake,Costa Rica earthquake,2012-09-04,13,Costa Rica,Costa Rica,Geophysical,Earthquake
3,2013_Manila_floods,Manila Floods,2013-08-17,11,Phillipines,Manila,Hydrological,Floods
4,2013_LA_airport_shootings,LA Airport Shootings,2013-11-01,12,US,Los Angeles,Intentional,Shootings


In [10]:
# Left join

# Perform a left join on the 'subfolder'
result_df = result_df.merge(df_annotation, left_on='subfolder_name', right_on='subfolder_name', how='left')

# Drop the duplicate 'subfolder_name'
result_df.drop(columns=['subfolder_name'], inplace=True)


In [11]:
# Count the number of rows in result_df
row_count = result_df.shape[0]

# Print the row count
print(f"Number of rows in result_df: {row_count}")

result_df.head()


Number of rows in result_df: 27933


Unnamed: 0,Tweet ID,Tweet Text,Information Source,Information Type,Informativeness,Label,Timestamp,name,start_day,duration,country,location_description,sub_category,type
0,347686624563429378,"RT @CBCAlerts: Canmore, Alta. declares state o...",Media,Affected individuals,Related and informative,2013_Alberta_floods,2013-06-20 12:05:25+00:00,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods
1,347766337344503808,RT @GlobalCalgary: If you are in #Canmore and ...,Media,Caution and advice,Related and informative,2013_Alberta_floods,2013-06-20 17:22:10+00:00,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods
2,347779159327637504,RT @metrocalgary: UPDATE: Latest from the @cit...,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013-06-20 18:13:07+00:00,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods
3,347783236191129600,RT @GlobalCalgary: GALLERY: Incredible photos ...,Media,Not applicable,Related and informative,2013_Alberta_floods,2013-06-20 18:29:19+00:00,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods
4,347793432514801665,RT @nenshi: Major risk of flooding in Calgary....,Government,Caution and advice,Related and informative,2013_Alberta_floods,2013-06-20 19:09:50+00:00,Alberta Floods,17/06/2013,25,Canada,Alberta,Hydrological,Floods


## Cleaning

In [12]:
# Country coding

# Funciton
def clean_country_name(country_name):
    try:
        country = pycountry.countries.get(name=country_name)
        if country:
            return country.alpha_2
        else:
            return "Unknown"
    except LookupError:
        return "Unknown"

# Clean
result_df['country'] = result_df['country'].apply(clean_country_name)


In [13]:
# Replace underscores
result_df['Label'] = result_df['Label'].str.replace('_', ' ')


In [14]:
# Show
result_df.head()

Unnamed: 0,Tweet ID,Tweet Text,Information Source,Information Type,Informativeness,Label,Timestamp,name,start_day,duration,country,location_description,sub_category,type
0,347686624563429378,"RT @CBCAlerts: Canmore, Alta. declares state o...",Media,Affected individuals,Related and informative,2013 Alberta floods,2013-06-20 12:05:25+00:00,Alberta Floods,17/06/2013,25,CA,Alberta,Hydrological,Floods
1,347766337344503808,RT @GlobalCalgary: If you are in #Canmore and ...,Media,Caution and advice,Related and informative,2013 Alberta floods,2013-06-20 17:22:10+00:00,Alberta Floods,17/06/2013,25,CA,Alberta,Hydrological,Floods
2,347779159327637504,RT @metrocalgary: UPDATE: Latest from the @cit...,Government,Caution and advice,Related and informative,2013 Alberta floods,2013-06-20 18:13:07+00:00,Alberta Floods,17/06/2013,25,CA,Alberta,Hydrological,Floods
3,347783236191129600,RT @GlobalCalgary: GALLERY: Incredible photos ...,Media,Not applicable,Related and informative,2013 Alberta floods,2013-06-20 18:29:19+00:00,Alberta Floods,17/06/2013,25,CA,Alberta,Hydrological,Floods
4,347793432514801665,RT @nenshi: Major risk of flooding in Calgary....,Government,Caution and advice,Related and informative,2013 Alberta floods,2013-06-20 19:09:50+00:00,Alberta Floods,17/06/2013,25,CA,Alberta,Hydrological,Floods


## Model

In [15]:
def generate_answer(text, system_prompt, instruction):
  promt_template = get_prompt(instruction, system_prompt)
  prompt = PromptTemplate(template=prompt_template, input_variables= ['text'])
  llm_chain = LLMChain(prompt=prompt, llm=llm)
  text = llm_chain.run(text)
  return text

In [16]:
# Tokenizer
model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    do_sample=True,
    top_k=10,

    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [17]:
# Load model
llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})


In [21]:
# Prompt Template Formatting
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<>\n", "\n<>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

def get_prompt(instruction, new_system_prompt=DEFAULT_SYSTEM_PROMPT ):
    SYSTEM_PROMPT = B_SYS + new_system_prompt + E_SYS
    prompt_template =  B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt_template

def parse_text(text):
        wrapped_text = textwrap.fill(text, width=100)
        print(wrapped_text +'\n\n')
        # return assistant_text

In [18]:
# Create a function using the code below to generate a answer for each row in the dataframe. Then apply the function to the dataframe to generate the answer and save the output to 'answer' in df.
def generate_answer(text, system_prompt, instruction):
    # # Define the instruction
    # instruction = instruction + row['TweetText']
    #define the prompt template
    prompt_template = get_prompt(instruction, system_prompt)
    #create prompt
    prompt = PromptTemplate(template=prompt_template, input_variables= ["text"])
    #create the language model chain
    llm_chain = LLMChain(prompt=prompt, llm=llm)
    #generate text using the prompt template
    text = llm_chain.run(text)
    #return the text
    return text


## Experiments

In [45]:
# Grab 10 random samples
# filtered_df = result_df.query("Label == '2013_Colorado_floods'")
testing_df = result_df.sample(n=50, random_state=42)
testing_df.columns = testing_df.columns.str.replace(' ', '')
testing_df['TweetText'] = testing_df['TweetText'].astype(str)


# testing_df.head(12)


In [46]:
# Experiment 1: Crisis location code (country)

system_prompt = "You are an advance assistant that excels at analysing text that contains useful information during or after a crisis. A crisis can include bushfires, wildfires, floods, hurricanes, earthquakes, covid-19, pandemic etc. Stop after giving one answer. If the text is ambigous return  '''Not applicable'''. Only return one code that is the most appropriate without explaination. Return one label without comment or dialogugue. If the location is a state, return the country in which the state is. "
instruction = "For the following text for me the ISO country code. {text}"

# Create an empty 'answer' column in testing_df
testing_df['country_pred'] = ''

# Loop over the data
for index, row in testing_df.iterrows():
    TweetText = row['TweetText']

    # Update the 'answer' column in the DataFrame using the output from generate_answer
    testing_df.loc[index, 'country_pred'] = generate_answer(TweetText, system_prompt, instruction)





In [41]:
# Experiment 2: crisis informativeness type labels

system_prompt = "You are an advance assistant that excels at classifying whether the text that contains useful information during or after a crisis. A crisis can include bushfires, wildfires, floods, hurricanes, earthquakes, covid-19, pandemic etc. Stop after giving one answer. If it is not related to a crisis return '''Not applicable'''. Only return one label that is the most appropriate without explaination. Return one label without comment or dialogugue."
instruction = "Classify {TweetText} for me whether it is informatie and whether it is related to {Label}, using these crisis information labels [ \"Related - but not informative\", \"Related and informative\", \"Not related\"]."

# Create an empty 'answer' column in testing_df
testing_df['Informativeness_pred'] = ''

# Loop over the data
for index, row in testing_df.iterrows():

    # Input variables
    input_variables = {
        "TweetText": row['TweetText'],
        "Label": row['Label']
        }

    #define the prompt template
    prompt_template = get_prompt(instruction, system_prompt)

    #create prompt
    prompt = PromptTemplate(template=prompt_template, input_variables= ["TweetText", "Label"])

    #create the language model chain
    llm_chain = LLMChain(prompt=prompt, llm=llm)

    #generate text using the prompt template
    text_inf = llm_chain.run(input_variables)

    # Update the 'answer' column in the DataFrame using the output from generate_answer
    testing_df.loc[index, 'Informativeness_pred'] = text_inf




In [42]:
# Check
testing_df.head(20)


Unnamed: 0,TweetID,TweetText,InformationSource,InformationType,Informativeness,Label,Timestamp,name,start_day,duration,country,location_description,sub_category,type,country_pred,Informativeness_pred
17799,239986309865807872,RT @garcilasop: #Lafuncióndebecontinuar es una...,Outsiders,Other Useful Information,Related - but not informative,2012 Venezuela refinery,2012-08-27 07:22:49+00:00,Venezuela refinery explosion,2012-08-24,12,Unknown,Falcon,Unintentional,Explosion,Not applicable,Related and informative.
9419,275898683109801984,Typhoon Bopha Pounds Southern Philippines: A p...,Media,Caution and advice,Related and informative,2012 Typhoon Pablo,2012-12-04 09:45:46+00:00,Typhoon Pablo,2012-11-24,21,Unknown,Phillipines,Meteorological,Typhoon,PHP,"Based on the information provided, I would c..."
1972,412933108561108992,#helicopter #parachute #drop #sarmaaaan http:/...,Not labeled,Not labeled,Not related,2013 Glasgow helicopter crash,2013-12-17 13:11:39+00:00,Glasgow helicopter crash,2013-11-29,30,Unknown,Glasgow,Unintentional,Crash,"Based on the text you provided, the most app...",Based on the information provided in the twe...
24446,392992116478996481,RT @MarkDiStef: Defence Dept's been investigat...,Outsiders,Caution and advice,Related and informative,2013 Australia bushfire,2013-10-23 12:33:16+00:00,Australia wildfires,2013-10-12,21,AU,New South Wales,Climatological,Wildfire,AUS,Related and informative
12125,327400365765046272,Tak Ada WNI Korban Runtuhnya Rana Plaza Bangla...,Media,Not applicable,Related and informative,2013 Savar building collapse,2013-04-25 12:35:04+00:00,Savar building collapse,2013-04-23,36,BD,Savar,Unintentional,Collapse,Bangladesh (BD),"Based on the provided URL, I would classify ..."
27873,404735664329351168,RT @nicola_pinna: A #Olbia c'è bisogno di cib...,Outsiders,Donations and volunteering,Related and informative,2013 Sardinia floods,2013-11-24 22:17:56+00:00,Sardinia Floods,2013-11-16,13,IT,Sardinia,Hydrological,Floods,IT,Related and informative.
2055,415154554549075968,RT @BreahnaZhane: RIP To Our Fort Riley Soldie...,Not labeled,Not labeled,Not related,2013 Glasgow helicopter crash,2013-12-23 16:18:53+00:00,Glasgow helicopter crash,2013-11-29,30,Unknown,Glasgow,Unintentional,Crash,US,"Based on the information provided, I would c..."
15053,390643910214946817,#Earthquake 7.2 tuesday 15 oct destroyed a lot...,Media,Infrastructure and utilities,Related and informative,2013 Bohol earthquake,2013-10-17 01:02:20+00:00,Bohol earthquake,2013-10-14,12,Unknown,Phillipines,Geophysical,Earthquake,"Based on the text you provided, the most app...",Related and informative
4667,396330455303475200,RT @jackmooring: Praying for the victims at LA...,Outsiders,Sympathy and support,Related - but not informative,2013 LA airport shootings,2013-11-01 17:38:38+00:00,LA Airport Shootings,2013-11-01,12,Unknown,Los Angeles,Intentional,Shootings,USA,Related and informative
8365,302496073392459777,"RT @TeamCoco: OMG guyz, did you see that meteo...",Media,Not applicable,Related - but not informative,2013 Russia meteor,2013-02-15 19:14:18+00:00,Russian meteor,2013-02-14,19,Unknown,Chelyabinsk,Others,Meteorite,Not applicable,Related and informative


In [44]:
answer_1 = testing_df.iloc[1, 15]
print(answer_1)


answer_1 = testing_df.iloc[1, 4]
print(answer_1)


  Based on the information provided, I would classify the text as "Related and informative". The text mentions a powerful typhoon pounding the southern Philippines, which is a clear indication of a crisis situation. The mention of Typhoon Pablo in the tweet also suggests that the text is related to a previous crisis event. Therefore, I would label the text as "Related and informative".
Related and informative


## Performance Evaluation