# Event Classification using NLP

This notebook classifies events based on their descriptions using a pre-trained SentenceTransformer model. The process involves loading the data, preprocessing it, encoding the descriptions, and finding the closest topics.

In [None]:
# Import the necessary libraries
from sentence_transformers import SentenceTransformer, util
import numpy as np
import pandas as pd
import ast

## Step 1: Load Data

First, we load the event data from a CSV file.

In [None]:
file_path = 'events.csv'
data = pd.read_csv(file_path)

data.head()

##  Step 2: Preprocess Data

We need to convert the string representations of lists in the 'Response' column to actual Python lists. This is done using the ast.literal_eval function.

In [None]:
import ast

def convert_string_to_list(string_list):
    """
    Convert a string representation of a list to an actual list.
    
    Parameters:
    string_list (str): The string representation of the list.
    
    Returns:
    list: The actual list if conversion is successful, otherwise an empty list.
    """
    try:
        return ast.literal_eval(string_list)
    except Exception:
        return []

# Apply the conversion function to the 'Response' column
data['Response'] = data['Response'].apply(convert_string_to_list)

# Display the first few rows of the data to check the conversion
data.head()


## Step 3: Encode and Classify Descriptions

Using a pre-trained SentenceTransformer model, we encode the descriptions and classify them into predefined topics.

In [None]:
# Load a pre-trained SentenceTransformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Define a list of topics
topics = ["General", "Weather", "Political", "Economy", "Energy", "Business"]
topic_embeddings = model.encode(topics, convert_to_tensor=True)

def find_closest_topics(headlines):
    """
    Find the closest topics for a list of headlines using cosine similarity.
    
    Parameters:
    headlines (list): A list of headlines.
    
    Returns:
    list: A list of the closest topics for each headline.
    """
    # Encode the headlines
    headline_embeddings = model.encode(headlines, convert_to_tensor=True)
    
    # Compute cosine similarities between headline embeddings and topic embeddings
    similarities = util.pytorch_cos_sim(headline_embeddings, topic_embeddings)
    
    # Find the index of the maximum similarity score for each headline
    closest_topics = [topics[sim.argmax()] for sim in similarities]
    return closest_topics

# Apply the classification function to the 'Response' column
data['Event Type NLP'] = data['Response'].apply(find_closest_topics)

# Check the results
print(data[['Response', 'Event Type NLP']].head())


## Step 4: Save Classified Data
Finally, we save the classified data to a new CSV file.

In [None]:
data.to_csv("Labeled_Events.csv")