<div style="background-color: #002676; padding: 20px;">
<img src="https://live-masters-in-computational-social-science.pantheon.berkeley.edu/wp-content/uploads/2025/04/image-3-2.png" alt="MaCSS" width="200">
</div>

# **Notebook 5:** Accessing LLMs

[wdtmacss@berkeley.edu](mailto:wdtmacss@berkeley.edu)\
**Computational Social Science 1A**\
[Human Psychology and Social Technologies](https://classes.berkeley.edu/content/2025-fall-compss-214a-001-lec-001) 
Fall 2025\
UC Berkeley [Masters in Computational Social Science](https://macss.berkeley.edu/about/)

**Week 5:** Welcome! Accessing frontier models as a data analyst; evaluating open-weight models on huggingface. 

🤖🤖🤖🫣

---

**Table of Contents**

- [Class Summary](#class-summary)
- [Introduction: Model Landscape](#model-landscape)
- [Today's Research Session](#research-session)
- [Today's Lab Session](#lab-session)
---

<div style="padding: 20px;">
<img src="https://upload.wikimedia.org/wikipedia/commons/d/d6/Hf-logo-with-title.svg" alt="" width="450">
</div>

# Class Summary
* Today we will ...
---

# Introduction: An Evolving Landscape of LLM Accessibility 

## Which Model?
ChatGPT? Claude? Gemini? DeepSeek? How should we choose which model to use as a data analyst in the current landscape? What are the differing capabilities of different models? What are the different ways to access different models, how much do they cost, how are your queries stored and used? Questions such as these are central to the role of data analysts who wish to make use of the latest advances.  

## Local vs. Remote Hosting
An important distinction to consider whether you intend to host your own language model or use a model that is provided by another party and hosted elsewhere. Most use cases involve models hosted by a provider these days, and this is certainly the case for propreitary frontier models. But especially for smaller models and open-weight models, instantiating them on your own machine or on cloud computing services is an important possibility and can involve major advantages. The primary drawback is the limitations to the capabilities of smaller and open-weight models -- we will explore some of these limitations systematiicaly later today. 

### Running Models Locally
Smaller models with fewer parameters can be hosted locally on your own computer or on a computing service such as Google Colaboratory, or cloud computing providers. Later in today's class, we will interact with a model hosted on HuggingFace. 

There are two principal dimensions to consider: first, are the weights of a model openly available; second, how large is the model? As an example, [OpenAI's GPT2 model](https://en.wikipedia.org/wiki/GPT-2) is openly available to use, meaning that the pre-trained model weights are openly accessible. This model includes 1.5 billion paramters. While this is obviously a large number, a model of this size can be read run locally using only CPU (not GPU) resources.   

### Interacting with Models Hosted on a Server
The primary alternative to running a model locally is to rely on a third part to provide access to the model. For example, Antrhopic provides access to their Calude family of models hosted on their servers. Users can interact with the model in various ways without the difficulties involved in hosting the model themselves, benefitting from the GPU resources and other computing infrastructure that Antrhopic provides to serve the model. The same can be said for other major providers, such as OpenAI and Google. 

#### Browser-based Chat Interfaces
We are likely most familiar with browser-based chat interfaces to large language models. For example, Google's Gemini family of models can be accessed via a conversational chat interface that is browser-based or provided directly as an app for mobile devices.   

#### Model APIs
As a data analyst, we are likely to require more systematic and programmatic access to model responses than is practical through chat-based interfaces. Moreover, modern chat interfaces include features that might be undesirable for purposes of data analysis, such as memory across conversations, and other advances features that make sense in other contexts. 

Most major model providers therefore also offer access to their models via an API. Access via API is primarily motivated by the needs of developers, who build apps or services that use the model in some way programmatically. However, as researchers and data analysts we can benefit from model APIs in the same way. 

In today's session, we will begin to explore the structure of model APIs and how to use them.

## Some Popular Developer Platforms
OpenAI, Google Anthropic and others provide access to ther models via their developer platforms. 

### OpenAI
Here is an example using from OpenAI's documentation using their Python develoepr tools:

```Python 
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Write a short bedtime story about a unicorn."
)

print(response.output_text)
```
### Anthropic
```Python
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1000,
    messages=[
        {
            "role": "user",
            "content": "What should I search for to find the latest developments in renewable energy?"
        }
    ]
)
print(message.content)
```

## Open Source Models
One major alternative is the use of Openly accessible models. There are several competetive models that are openly-accessible but would be too large to run on your own machine. For such models, we rely on platforms to host and provide access to those models. 

One of the primary platforms is [HuggingFace](https://huggingface.co/). We will discuss Huggingface and the [models available there](https://huggingface.co/models) in the later sections of this class. One way to access models on HuggingFace is through [InferenceProviders](https://huggingface.co/docs/inference-providers/en/guides/first-api-call). Here is a basic example:

```Python
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hyperbolic",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)
```

---

# Today's Research Session
We will dive into the details of several popular models, and the ways that data analysts can access them. Use any resources you like to answer the following set of questions. If you use an AI system to find information, be sure to fact check the AI system! 

The questions below should be interpreted as focusing on language models (text generation models) primarily, rather than on image models or other Generative AI models.  

## OpenAI
Using the [documentation for OpenAI's dveeloper platform](https://platform.openai.com/docs/overview), answer the questions below as best you can. 

* What are some of the models available in OpenAI's developer platform? List at least three, and say how they differ.
* How much does it cost to use these three models via the API under the `standard` tier? What is a `token`?
* How does OpenAI handle user authentication when people make requests to their API? See the [API documentation](https://platform.openai.com/docs/api-reference/authentication)
 


## Anthropic
Using the [documentation for Anthropic's dveeloper platform](https://docs.claude.com/en/docs/intro), answer the questions below as best you can. 

* What are some of the models available? List at least two, and say how they differ.
* How much does it cost to use these models via the API? What is MTok?
* What is LongContext pricing?
* When writing code to interact with API incurs costs, there can be risks involved in accidentally spending too much. Does Anthropic allow you to set hard limits to spending on API calls?
* What kinds of accounts are available to use the API from anthropic?


## Google
Using the [documentation for Google AI's dveeloper platform](https://ai.google.dev/), answer the questions below as best you can. 

* What are some of the models available? List at least two, and say how they differ.
* What are the pricing tiers for Gemini API usage?
* What are rate limits? 
* In general, how does Gemini's pricing model differ from Claude and GPT?

## UC Berkeley Options? 
Recently, UC Berkeley has begun to provide access to AI models to our campus community. What are the options for interacting with models a researchers, via an API? 

Check out the follow university pages to explore this question:

[UC Berkeley AI Hub](https://ai.berkeley.edu/)
[Models Licensed to the UC Berkeley Community](https://ai.berkeley.edu/resources/licensed-generative-ai-tools)

## Stretch Goal: Open Source Models on Huggingface
This excersise is more open ended, so feel free to return to this later. How are models open-source frontier models such as DeepSeek R1 accessed via huggingface? is there an API you can use? Who provides the model? 


# Today's Coding Session
Today we're going to interact with a handful of smaller, openly-available models hosted on HuggingFace. To facilitate this, I have built a toy system that we can interact with. Our system is based loosely on the structure of OpenAI's [Chat Completion](https://platform.openai.com/docs/guides/text) API. Read through the examples below, and try to answer the questions by using and editing the code provided.   

## Toy System: A Huggingface App
I have created a [Huggingface Organization for this class](https://huggingface.co/macss-css). Within this organization, I have built a [Huggingface Space for this lab session](https://huggingface.co/spaces/macss-css/llm-lab-two). 

This Huggingface Space provides an app that allows you to retreive text completions from one of several smaller models. I will introduce the models below, and provide example code for how to send requests to this app.

## Example
Here is code to make a request to our system. First we need to import the requests library and specify two things: the URL of our toy system and the name of the model we wish to access.

In [2]:
import requests

# Here is the URL of the app
URL = "https://macss-css-llm-lab-two.hf.space"

model = "gpt2"

The function below can be used to send a request to our app. 

In [3]:
def send_request(request):
    response = requests.post(f"{URL}/predict", json=request)
    return response.json()

The structure of our request object is as follows. First, we need to specify a `message`. To mirror the structure of the OpenAI chat compeltion API, our message object is a list containing a json. The json contains two fields -- a role and a content field. The role is always "user" in our setting, and the content is the message string that we would like the model to complete.

In [4]:
example_message = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]


One we have this object, we combine it into a json request object that also specifies the model we wish to use (`model`), the number of token that the model is allowed to use to respond (`max_tokens`), and the amount of randomness we wish to see from the model (`temperature` -- this parameters should be between zero and one, where one is the most random and zero is deterministic).

In [13]:
example_request = {
    "messages": example_message,
    "model": model,
    "temperature": 0.7,
    "max_tokens": 50,
}

We can use the function above to send this request to out system, and see how the model responds:

In [14]:
result = send_request(example_request)

Let's look at the result object to understand it's structure:

In [11]:
print(result)

{'choices': [{'message': {'role': 'assistant', 'content': ' How about England and Scotland, where there are no Englishmen left in Europe.\nThe point to this article has been made by a lot of people who have had time to read it before reading my analysis: "Capitalism\'s Great New Idea"'}}], 'model': 'gpt2', 'usage': {'prompt_tokens': 6, 'completion_tokens': 41, 'total_tokens': 47}}


And print out the text of the model response specifically:

In [12]:
model_response = result["choices"][0]["message"]["content"]
print(model_response)

 How about England and Scotland, where there are no Englishmen left in Europe.
The point to this article has been made by a lot of people who have had time to read it before reading my analysis: "Capitalism's Great New Idea"


### Configurable aspects
We can configure all aspects of the request object (but for now don't change the model). Here's another example:

In [19]:
second_example_message = [
    {
        "role": "user",
        "content": "The letter of the alphabet that comes after 'd' is:"
    }
]

second_example_request= {
    "messages": second_example_message,
    "model": model,
    "temperature": 0.85,
    "max_tokens": 20,
}

In [20]:
new_result = send_request(second_example_request)
print(new_result["choices"][0]["message"]["content"])


 3 + 2 = 5
A. In a way, we are asking for more than one part


## Excersises

### Goal 1: Days of the Week
Does this model understand the days of the week? What day comes after friday? What is the happiest day of the week? Try out different ways of prompting GPT2 to establish this

In [None]:
# Your code goes here

### Goal 2: List Six Names
Can GPT2 count? Ask it to list specific numbers of things.

In [None]:
# your code goes here

### Goal Three: Judgements about a peice of text
Can GPT2 answer questions about a peice of text, such as a social media post? Is this is happy post? Does the post mention a sports team? etc etc.

In [21]:
# Your code goes her

### Goal Four: Your own evaluation task
Come up with your own simple evaluation task and test it out on the model. You can be as creative as you like!

In [None]:
# your code goes here.

### Goal Five: testing Different Models
Choose one or more of the tasks you examined above. Here are three models you can specify instead of GPT2. Uncomment a line to select a model. Explore how these different models perform on the test you chose. Examine different parameters and models; try find which setup is best at the task or at multiple tasks.

In [None]:
# model = "gpt2"  
# model = "distilbert/distilgpt2"  
# model = "EleutherAI/gpt-neo-125M"  
# model = "facebook/opt-125m"