In [None]:
import catllm as cat
import os

Here we set our API's. I'm reading from a file on my machine, but you can also just copy and paste the string.

In [None]:
anthropic_api_key = "ANTHROPIC_API_KEY"
openai_api_key = "OPENAI_API_KEY"
preplexity_api_key = "PERPLEXITY_API_KEY"
mistral_api_key = "MISTRAL_API_KEY"
google_api_key = "GOOGLE_API_KEY"

Generating some fake data and categories to assign to it.

In [None]:
#data we will be categoizing
list_names = ["becuase i dont like living here", "for a bigger house", "to be with my wife"]

#categories we want to extract from the data
user_categories = ["to start living with or to stay with partner/spouse",
                   "relationship change (divorce, breakup, etc)",
                   "the person had a job or school or career change, including transferred and retired",
                   "the person's partner's job or school or career change, including transferred and retired",
                   "financial reasons (rent is too expensive, pay raise, etc)",
                   "related specifically features of the home, such as a bigger or smaller yard"]

Below we're running the most simple example of the package. The essential elements are defined above, which are 
1. The survey input you want to categorize (a column of text data)
2. The categories you want to extract (defined by you)

In [None]:
test_anthropic = cat.multi_class(
    survey_input= list_names, 
    user_model="claude-sonnet-4-20250514",
    categories=user_categories,
    api_key=anthropic_api_key)

test_anthropic.head()

The package will automatically detect when you switch model source. Below, we're running an example with GPT-5. Just remember to make sure you input the correct API!

In [None]:
test_openai = cat.multi_class(
    survey_question="why did you move?", 
    survey_input= list_names, 
    user_model="gpt-5",
    categories=user_categories,
    api_key=openai_api_key)

test_openai.head()

One of the most powerful features of CatLLM is the ability to call on the thousands of models available on Huggingface. Here, you might need to be more explicit about where the model is coming.

Here, we set model_source to "Huggingface" so that CatLLM knows where to pull from. Again, make sure you have the correct API key! 

WARNING: Using Huggingface without a Hugginface pro subscription is not recommended. The free tier will only allow you to cycle through a few rows. 

In [None]:
test_huggingface = cat.multi_class(
    survey_input= list_names, 
    user_model="meta-llama/Llama-4-Maverick-17B-128E-Instruct:groq",
    model_source="Huggingface",
    categories=user_categories,
    api_key=huggingface_api_key)

test_huggingface.head()

Let's add a bit more complexity for better results. 

Rather than just giving the model the task of categorizing without any context, let's provide CatLLM with the survey question that was asked of the respondent. 

Let's also ask CatLLM to provide a bit more context on what its role and goal is. 

Keep in mind that these features make for a longer prompt, which translates to higher costs from the model provider. Only use this feature if you want to improve the quality of your output and you're willing to pay a bit more.

In [None]:
test_anthropic_with_context = cat.multi_class(
    survey_input= list_names, 
    survey_question = "Why did you move?", # add your survey question here
    context_prompt = True, # ask Cat-LLM to provide a bit more context on the cateorization task
    user_model="claude-sonnet-4-20250514",
    categories=user_categories,
    api_key=anthropic_api_key)

test_anthropic_with_context.head()

Now, let's add even more complexity. Before categorizing, let's ask the model to take a "step back" and observe the bigger picture here. 

In this case, we will ask the model to consider broader reasons for why people move so that we can get it to reason towards a better answer. 

Again, this will increase your prompt size, so use only if you don't mind the additional API costs.

In [None]:
test_anthropic_step_back = cat.multi_class(
    survey_input= list_names, 
    survey_question = "Why did you move?",
    context_prompt = True,
    step_back_prompt = True, # ask the model to consider the broader conceptual background
    user_model="claude-sonnet-4-20250514",
    categories=user_categories,
    api_key=anthropic_api_key)

test_anthropic_step_back.head()

The step back method works to improve the quality of your categorizations, but chain of verification (CoVe) should improve them even further. 

What is CoVe? Here, we use multiple prompts to get the model to "reason" through its response by considering more than surface level information. That is, we ask the model a series of "verification" questions to get the model to think a bit more about its initial answer and have the opportunity to revise it.

WARNING: Although we can combine all of these features, doing so will multiply your costs. Only combine all features if you're hoping for the absolute best output and don't mind paying 10 times as much for a small improvement.

In [None]:
test_anthropic_cove = cat.multi_class(
    survey_input= list_names, 
    survey_question = "Why did you move?",
    context_prompt = True,
    step_back_prompt = True,
    chain_of_verification = True, # allow the model to think through its initial response and change its answer
    user_model="claude-sonnet-4-20250514",
    categories=user_categories,
    api_key=anthropic_api_key)

test_anthropic_cove.head()

Sometimes, however, a simpler prompt might be able to get us all the way there. Turning on Chain-of-Thought (CoT) can improve results without needing to do many prompts and increases costs at a fraction of the price of CoVe.

CoT essentially asks the model to reson through its response a bit more before outputting a response. It does so within the same prompt, rather than needing many prompts.

Below is an example.

In [None]:
test_anthropic_cot = cat.multi_class(
    survey_input= list_names, 
    survey_question = "Why did you move?",
    context_prompt = True,
    step_back_prompt = True,
    chain_of_thought = True, # ask the model to "reason" through its response in the same prompt
    user_model="claude-sonnet-4-20250514",
    categories=user_categories,
    api_key=anthropic_api_key)

test_anthropic_cot.head()