# Classifications

This notebook demonstrates a simple embedding-based text classification workflow using the OpenAI API.  
It creates embeddings for predefined topic labels, embeds a sample news headline, compares vectors with cosine distance, and assigns the closest matching label.

## Setup

Import required libraries, load environment variables, initialize OpenAI client and define method to create embeddings.

In [None]:
from dotenv import load_dotenv
from openai import OpenAI
from scipy.spatial import distance


load_dotenv()

client = OpenAI()

def create_embeddings(texts):
  response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
  )

  response_dict = response.model_dump()

  return [item['embedding'] for item in response_dict['data']]

def create_article_text(article):
  return f"""Headline: {article['headline']}
Keywords: {', '.join(article['keywords'])}
"""

## Create Class Descriptions

In [None]:
topics = [
  { 'label': 'Tech' },
  { 'label': 'Science' },
  { 'label': 'Sport' },
  { 'label': 'Business' },
]

class_descriptions = [topic['label'] for topic in topics]
class_embeddings = create_embeddings(class_descriptions)

## Create Item to Classify

In [None]:
article = {
  "headline": "NVIDIA GPUs may be the key to unlocking the potential of AI",
  "keywords": ["ai", "business", "computers"],
}

article_text = create_article_text(article)
article_embeddings = create_embeddings([article_text])[0]

## Compute Embedding Distances

In [None]:
def find_closest(query_vector, embeddings):
  distances = []

  for index, embedding in enumerate(embeddings):
    dist = distance.cosine(query_vector, embedding)
    distances.append({ "distance": dist, "index": index })

  return min(distances, key=lambda x: x['distance'])

closest = find_closest(article_embeddings, class_embeddings)

## Extract the Most Similar Label

In [None]:
label = topics[closest['index']]['label']
print(f"Headline: {article['headline']}")
print(f"Closest label: {label}") # Expected output: "Closest label: Tech"