# Python APIs for Large Language Models

* * * 

<div class="alert alert-success">  
    
### Learning Objectives
    
* Differentiate between large language models (LLMs) and the applications built on top of them (e.g., ChatGPT, Claude)
* Understand how to programmatically interface with LLMs using APIs
* Apply LLMs to social science research for tasks such as thematic coding, content classification, and structured data extraction at scale
</div>

### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive excersise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>
📝 **Poll:** A Zoom poll to help you learn!<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br> 

### Sections
1. [This Workshop](#)
2. [LLMs as a Research Tool](#)
3. [Motivating Scenario: Thematic Coding](#)
    4. [Approach 1: Manual Coding](#)
    5. [Approach 2: ChatGPT](#)
    6. [Approach 3: The API](#)
4. [LLM Hosting and Open Router](#)

<a id='this'></a>

## This Workshop

Most of us have used tools like ChatGPT or Claude, but what you may not realize is that these interfaces are **not the language models themselves**. They’re polished applications built on top of large language models (LLMs), adding helpful features like memory, chat formatting, context injection, and more.

In this workshop, we’ll peel back the layers and explore:
- **How LLMs are hosted and accessed through APIs**
- **How tools like ChatGPT are built around them**
- **How LLM's can be useful in social science research**
- **How you can interface directly with LLMs via API calls to power your social science research workflows**

This shift from using LLMs in chat to using them as infrastructure unlocks powerful possibilities for research:
- Extracting structured data from unstructured text  
- Performing thematic coding across interview transcripts  
- Generating summaries, classifications, and annotations at scale  

By the end, you'll know how to go from “let me ask ChatGPT a question” to let me integrate LLMs into my own social science research workflow to classify data, extract insights, and analyze content at scale.

## LLMs as a Research Tool

Before we even get into the programmatic use of LLMs, let's break down a few of the methods and how they can be useful in your social science workflow. 

Across a wide variety of research workflows, LLMs can be used to: 
- **Analyze Qualitative Data**: Perform thematic coding, sentiment analysis, and information extraction from interview transcripts, field notes, and open-ended survey responses.
    - *For example*: A social scientist could automatically identify recurring themes like "systemic barriers" or "lack of institutional support" across 50 interview transcripts, saving dozens of hours of manual review.
- **Classify Content**: Categorize thousands of social media posts, news articles, or legal documents based on predefined criteria.
    - *For example*: A political scientist could classify thousands of tweets about a new policy as "in favor," "against," or "neutral," allowing for large-scale analysis of public opinion.
- **Summarize and Synthesize**: Generate concise summaries of long texts, helping you quickly identify key arguments and recurring patterns across a large corpus.
    - *For example*: A historian could feed a model hundreds of personal letters from a historical period and ask it to identify common sentiments or events, providing a high-level overview before a deeper dive.
- **Create Structured Data with `StructuredOutput`**: Transform messy, unstructured text into clean, structured data (e.g., JSON) that is ready for quantitative analysis.
    - *For example*: An urban planner could extract specific details like a building's address, construction date, and historical zoning designation from a collection of digitized city reports, building a database for further analysis. This workshop will focus on the last two points, showing you how to go from unstructured text to usable data at scale. The following scenario is just one example of how this can be applied.

## Motivating Scenario: Thematic Coding of COVID-19 Narratives

<div class="alert alert-success">  
💡 <i>Tip</i>: In this workshop, we will anchor our learning to one specific case study, demonstrating how to transform unstructured text into usable data at scale. However, the techniques you will learn here are universally applicable to a wide range of research challenges.
</div>

Imagine you're a social science researcher helping compile a book like [When the City Stopped](https://www.cornellpress.cornell.edu/book/9781501780387/when-the-city-stopped/), a project that gathered, sorted, and categorized stories from essential workers across New York City during the COVID-19 pandemic. You’ve sent out an open call and collected hundreds of open-ended stories from nurses, EMTs, grocery clerks, MTA workers, and others. These personal narratives capture moments of fear, resilience, trauma, grief, and hope. You want to analyze them systematically.

You get a few responses back like:
- > "People think of front‑line workers…the grocery workers, transit workers, the first responders… as having helped the city get through it. But that’s not what happened. We helped the city survive it."
- > "We ran out of masks again. There were nights when I cried the whole subway ride home, not because I was scared - though I was - but because I felt like no one saw us."

Your goal is to identify:
- The emotions expressed in each story
- Mentions of material conditions (e.g. PPE shortages)
- Instances of collective solidarity or isolation
- Themes of grief, duty, or burnout

And then categorize, sort, and cluster these examples based on the content and themes present. 

---

### Approach 1: Manual Thematic Coding (Example)

![Manual Thematic Coding](../images/manual_approach.png)
In traditional qualitative research, this would involve:
- Reading a sample of stories manually
- Developing a codebook of recurring themes
- Having multiple researchers apply those codes to each story
- Negotiating disagreements, refining the codes, then repeating for 100s of entries

Let’s say one narrative reads:
> “We ran out of masks again. There were nights when I cried the whole subway ride home, not because I was scared—though I was—but because I felt like no one saw us.”

A human-coded version might look like:
- emotion: grief, exhaustion
- material_conditions: PPE shortage
- solidarity: absent
- theme: invisibility of labor

Though this process is rigorous; it is also slow, expensive, and hard to scale.

---

### Approach 2: ChatGPT
![Manual Thematic Coding](../images/chatGPT_approach.png)
At some point, you realize: "What if I just paste this into ChatGPT and ask it to extract that structured information for me!" 
And you try it, and it works! Voila. Seems simple enough and you get a clean response like:

```json
{
  "emotion": ["grief", "exhaustion"],
  "material_conditions": ["PPE shortage"],
  "solidarity": "absent",
  "theme": "invisibility of labor"
}
```

But you still Have thousands of more more samples to get through. You still have to:
- Manually copy and paste each sample into the UI
- Copy and append the output to a document or spreadsheet
- Hope ChatGPT stays consistent
- And most importantly, repeat this process thousands of times

Seems like something that would be made easy if you could just programmatically call ChatGPT or whichever the model is that it utilizes behind the scenes.

    
### Breakthrough Approach: Interfacing directly with the Large Language Models behind GPT
![Manual Thematic Coding](../images/llm_automated.png)

Instead of working through a web interface, what if you could:
- Programmatically send each sample to the model  
- Receive structured, reliable JSON responses  
- Save everything automatically into a CSV or database  
- Scale up from 10 samples to 10,000


💡 **Tip**: We will enable this through a special tool called StructuredOuput which will be covered later in this workshop, but for now hold on. 

## Background: LLM Hosting
Now before we get into how you can interface with these LLM's directly through the API, let's build up some context to how LLM's are even hosted.

<img src="../images/chat_gpt_llm_server.png" width="400">

### LLM Server

At its core, a Large Language Model is just a giant neural network:  a trained set of weights and biases. When you ask it something you’re running what’s called a **forward pass or inference**. You pass in some input text, and the model predicts the next most likely word (again and again, until it's done). Theoretically, if you had a computer powerful enough you could run this set of weights and biases locally on your computer. 

However, in practice the SoTa (state of the art models) tend to be much, much larger than what you can fit on your computer. When you use ChatGPT, Claude, or Gemini, you’re not actually running the model on your laptop. Behind the scenes, you're sending a request to a Large Language Model (LLM) hosted on a powerful server that is capable of running these giant models. They can require hundreds of gigabytes of memory and need specialized hardware like GPUs.

### Hosted LLMs and API Servers
Because running an LLM requires this massive amount of compute resources, companies host them on the cloud and provide access through something called an API Server.

An API (Application Programming Interface) server gives your application a way to:
- Send text to the model (request),
- Run an inference (get predictions) remotely,
- And receive the result back (response)

So when you interact with ChatGPT, what’s really happening under the hood is something like this:
ChatGPT UI  →  API Call  →  Hosted LLM  →  Output  →  ChatGPT UI

This pattern became the foundation for how LLMs are accessed at scale.

### Competing Standards for LLM APIs
As more models were released, different organizations built their own API “specifications” for hosting LLMs. A few major ones emerged:

- OpenAI API Spec – The most widely adopted standard, designed by OpenAI
- Hugging Face TGI (Text Generation Inference) – Hugging Face’s approach to model serving
- vLLM – A fast, GPU-efficient serving engine from UC Berkeley & partners

Each of these defined:
- What endpoints (like /generate or /chat) should exist
- How to format the request body (prompts, parameters, etc.)
- What the output should look like (tokens, probabilities, completions...)

### The World Converges: OpenAI's API Format
Similar to how phones began to converge on a single charging standard (like USB-C), the AI developer ecosystem has increasingly converged on OpenAI’s API format as a common interface for LLMs. Today, even third-party models like Mistral, Meta’s LLaMA, and others are often hosted behind APIs that support OpenAI-style endpoints, such as:

- [Responses](https://platform.openai.com/docs/api-reference/responses)
- [Chat](https://platform.openai.com/docs/api-reference/chat)
- [Embeddings](https://platform.openai.com/docs/api-reference/embeddings)

Why the convergence?
- Developer tools (e.g., LangChain, LlamaIndex, OpenRouter) are already built around these formats
- Developers don’t need to learn a new interface for every model (easy model swapping)
- It became a de facto "dialect" for talking to LLMs

# Using OpenRouter for LLM Access

For the rest of this workshop, we’ll be using a service called **OpenRouter** to interact with LLMs.

---

## What is OpenRouter?

OpenRouter is a unified API gateway for Large Language Models. It allows you to send a prompt to many different models (like GPT-4, Claude, Mistral, and others) using a single, consistent interface. Though OpenRouter doesn't host the LLM servers themselves, they allow you to interact with LLM hosts through one unified API. 

In addition to providing one unified experience, we are using OpenRouter since it doesn't require a credit card to interact with the free tier of models (though you are limited to 50 calls per day).

## Getting Set Up

To use OpenRouter, you’ll need to:
1. Create a free account at [https://openrouter.ai](https://openrouter.ai)
2. Click on your profile (top right) and generate an API key
    - Settings -> API Keys -> Create API Key
4. ⚠️ **Warning:** Save this API Key Somewhere Safe
    -  It is generally best practice to save this key somewhere secure like an environment variable or in a secure file on your system.<br>
    - This is a very sensitive key which allows you to interface directly with the API's. Do not share it with anyone or upload it to any public accounts like Github.
    - You will only be able to view this key once, before you can't see it again. If you lose it, you will have to create a new one.

For the purposes of this workshop, let's make a new file in your environment called API_KEY, save it there, and then read our key directly from that file.

In [7]:
# Let's load in the API_KEY which we've saved to a file
with open('API_KEY.txt', 'r') as file:
    API_KEY = file.read()

In [None]:
# Install the OpenAI Python Client
# !pip install openai

In [17]:
# Import the OpenAI Python Client
from openai import OpenAI

# Notice how we can utilize the OpenAI Client with non OpenAI Hosts/Models due to the unified standard
client = OpenAI(
  base_url="https://openrouter.ai/api/v1", 
  api_key="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" # Place API Key Here,
)

So now we have our client setup. Next, we need to figure out which model to use. 
On OpenRouter, you can navigate to [models](https://openrouter.ai/models) tab and search free to view all free models. Notice popular models like Llama 3.2, Google Gemma, and DeepSeek V3. We will choose DeepSeek v3 as it has the largest token set, but feel free to try any other.

⚠️ **Warning:** Be careful not to run these cells too many times though. We are only allowed **50 free calls per day** with the free tier on Open Router. Any more than that, and you will be prevented from making any more requests till the next day. 

💡 **Tip**: You can check how many requests you've used per day by navigating to the [Activity Menu](https://openrouter.ai/activity) in OpenRouter (Top Left Menu -> Activity)

In [22]:
# Request a Completion (Output) from the Chat Completions Endpoint
completion = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1:free",
    max_tokens=150,
    messages=[
        {
          "role": "user",
          "content": "Tell me about the D-Lab at UC Berkeley in 50 words or less."
        }
    ]
)

In [23]:
# Pretty Printing The Completion Response
from pprint import pprint
pprint(completion.model_dump())

{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': None,
              'message': {'annotations': None,
                          'audio': None,
                          'content': "UC Berkeley's D-Lab is an "
                                     'interdisciplinary hub offering courses, '
                                     'workshops, and consulting. It equips '
                                     'students and researchers with practical '
                                     'data science and computational skills '
                                     'for real-world social science and '
                                     'humanities research, fostering '
                                     'cutting-edge, data-driven scholarship.',
                          'function_call': None,
                          'reasoning': None,
                          'refusal': None,
                          'role': 'assistant',
                          

Notice the abundance of data contained in the raw Completions Object. This Object represents the response from the LLM featuring the outpue message, content, finish_reason, and a variety of other useful metadata. To get the generated response, we want to focus on the choices object. 

In [24]:
print(completion.choices[0].message.content)

UC Berkeley's D-Lab is an interdisciplinary hub offering courses, workshops, and consulting. It equips students and researchers with practical data science and computational skills for real-world social science and humanities research, fostering cutting-edge, data-driven scholarship.


Notice the broad knowledge base of the DeepSeek Model which contains information about UC Berkley's very own D-Lab.

### Congrats! This marks the end of the first part of today's workshop! 

Now that we've established the 'why,' Part 2 will dive into the 'how.' We will directly interface with the API that powers platforms like ChatGPT, breaking down the fundamental parameters of the API call. After learning these fundamentals, you will be able to use the API to power your own scalable research workflows.