# Custom Chatbot Project

Dataset: character_descriptions.csv

Rationale:
I selected the 'character_descriptions.csv' dataset because it contains rich text about fictional characters, including their description, medium (theater, television, film), and setting (fantasy world, modern city, etc.).

This type of data is perfect for custom chatbot prompts because:
- It requires understanding of complex descriptive text (not just facts).
- Good matching between a user's question and a character depends on nuanced features like bravery, detective skills, or world setting.
- The custom retrieval will select better context than a generic model without access to this dataset.

Without custom context, a general LLM would respond very vaguely ("Here is a brave character: Superman.")  
With custom context, the LLM will give targeted answers ("A brave character from a fantasy world: 'Liora the Dragonrider from Zephyra'.")


In [14]:
# Import libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [15]:
# Load dataset
df = pd.read_csv('data/character_descriptions.csv')

# Combine columns into a single 'text' field
df['text'] = df['Description'] + " Medium: " + df['Medium'] + ". Setting: " + df['Setting'] + "."

# Save texts
texts = df['text'].tolist()

# Create TF-IDF embeddings for the contexts
vectorizer = TfidfVectorizer()
context_embeddings = vectorizer.fit_transform(texts)

print("Dataset loaded and embeddings created.")

Dataset loaded and embeddings created.


In [16]:
# Function for custom query
def custom_query(user_question):
    # Embed the user question
    question_embedding = vectorizer.transform([user_question])
    
    # Calculate cosine similarity
    similarities = cosine_similarity(question_embedding, context_embeddings).flatten()
    
    # Select top 3 most similar contexts
    top_indices = similarities.argsort()[-3:][::-1]
    selected_contexts = [texts[i] for i in top_indices]
    
    # Join selected contexts
    context = "\n\n".join(selected_contexts)
    
    # Simulate custom answer
    return f"(Custom Answer)\nUsing the following context:\n\n{context}\n\nAnswer for: {user_question}"

In [17]:
# Function for basic query
def basic_query(user_question):
    # Simulate generic LLM behavior
    return f"(Basic Answer)\nGeneral response for: {user_question}"

In [18]:
# Define 2 test questions
user_question_1 = "Suggest a brave character from a fantasy world."
user_question_2 = "Find a detective character from a modern city setting."

In [19]:
# Show outputs

print("\n--- Question 1: Brave Character ---")
print("\nCustom Prompt Output:")
print(custom_query(user_question_1))
print("\nBasic Prompt Output:")
print(basic_query(user_question_1))


--- Question 1: Brave Character ---

Custom Prompt Output:
(Custom Answer)
Using the following context:

A fiery and passionate young woman who works as a blacksmith. She is strong-willed and independent, and her singing voice is bold and powerful. Francesca has caught the eye of Prince Lorenzo, but she is hesitant to give her heart to a man who comes from such a different world. Medium: Opera. Setting: Italy.

A jester and musician who works in Lady Olivia's household. Feste is the wisest character in the play, and uses his wit and intelligence to comment on the actions of the other characters. He is also a confidant to Viola, and helps her navigate her complicated situation. Medium: Play. Setting: Ancient Greece.

A chameleon-like performer, Karma is known for her ability to transform herself into any character. She's a master of illusion and is always pushing boundaries with her looks and performances, but can sometimes struggle with authenticity and staying true to herself. She's 

In [20]:
print("\n--- Question 2: Detective Character ---")
print("\nCustom Prompt Output:")
print(custom_query(user_question_2))
print("\nBasic Prompt Output:")
print(basic_query(user_question_2))


--- Question 2: Detective Character ---

Custom Prompt Output:
(Custom Answer)
Using the following context:

A jester and musician who works in Lady Olivia's household. Feste is the wisest character in the play, and uses his wit and intelligence to comment on the actions of the other characters. He is also a confidant to Viola, and helps her navigate her complicated situation. Medium: Play. Setting: Ancient Greece.

A young Indigenous Australian woman in her early 20s, Tahlia is a talented artist who's just been accepted into a prestigious art school. She's the niece of Mia and Max, and they've been like siblings since they were young. She's struggling to find her place in the world as an Indigenous woman, but Mia and Max are always there to support her. Medium: Limited Series. Setting: Australia.

A chameleon-like performer, Karma is known for her ability to transform herself into any character. She's a master of illusion and is always pushing boundaries with her looks and performanc

In [13]:
while True:
    user_input = input("\nAsk about a character (type 'exit' to quit): ")
    if user_input.lower() == 'exit':
        break
    print(custom_query(user_input))


Ask about a character (type 'exit' to quit): exit
