# Using LLMs in Humanities Research via API

Welcome into deeper world of Large Language Models (LLMs) and their applications in humanities research! In an era where artificial intelligence is transforming every field of study, the humanities are experiencing a revolutionary shift in how we approach text analysis, interpretation, and research methodologies.

### Our Team: Instructors and Assistants

**Valdis Saulespurēns** works as a researcher and developer at the National Library of Latvia. Additionally, he is a lecturer at Riga Technical University, where he teaches Python, JavaScript, and other computer science subjects. Valdis has a specialization in Machine Learning and Data Analysis, and he enjoys transforming disordered data into structured knowledge. With more than 30 years of programming experience, Valdis began his professional career by writing programs for quantum scientists at the University of California, Santa Barbara. Before moving into teaching, he developed software for a radio broadcast equipment manufacturer. Valdis holds a Master's degree in Computer Science from the University of Latvia.

**Anda Baklāne** is a researcher and curator of digital research services at the National Library of Latvia. She teaches Introduction to Digital Humanities and Digital Social Sciences and Text Analysis and Visualization courses at the University of Latvia. Anda holds a master's degree in philosophy and a PhD in literary theory. Her research interests include Latvian contemporary literature, metaphor, models, distant reading, and academic data visualization.

**Viesturs Vēveris** is an analyst at the National Library of Latvia. With training in digital humanities and social sciences, he currently focuses on developing tools and methodologies for text analysis and data visualization.

**Haralds Matulis** is a researcher and also organizer of this iteration of Baltic Summer School of Digital Humanities. He has a background in digital humanities and is interested in the intersection of technology and humanities research. Haralds is dedicated to promoting digital literacy and innovation in the humanities.

### Why This Workshop Matters

The digital transformation of humanities research has opened unprecedented opportunities for scholars to analyze vast corpora of text, uncover hidden patterns, and gain new insights into human culture and expression. Large Language Models represent the cutting edge of this transformation, offering powerful tools for:

- **Automated text analysis** at scale previously impossible for human researchers
- **Cross-lingual research** capabilities that break down language barriers
- **Pattern recognition** in literary and historical texts
- **Assistance with translation and transcription** of historical documents

### Workshop Goals

By the end of this three-session workshop, you will:

1. **Understand the fundamentals** of Large Language Models and their capabilities for humanities research
2. **Master API interactions** to programmatically access and utilize various LLM services
3. **Learn practical applications** including concept mining, named entity recognition, and text analysis
4. **Develop skills** in prompt engineering for humanities-specific tasks
5. **Address real challenges** such as working with OCR errors in historical texts
6. **Gain hands-on experience** with tools for error correction and translation
7. **Build confidence** in integrating AI technologies into your research workflow

### What Makes This Approach Special

Rather than relying on simple chat interfaces, you'll learn to harness the full power of LLMs through API access, enabling:
- **Batch processing** of large document collections
- **Customizable workflows** tailored to your specific research needs
- **Reproducible research** methods with documented processes
- **Integration** with existing digital humanities tools and methodologies


## Session 1 (11.30-13.00) - Introduction to LLMs and APIs

In our first session, we will explore the basics of Large Language Models (LLMs) and how to interact with them using APIs. We will cover the following topics:
- **What are LLMs?**: An introduction to Large Language Models, their capabilities, and how they can be applied in humanities research.
- **Setting Up Your Environment**: Instructions on how to set up your programming environment to interact with LLM APIs.
- **Understanding APIs**: A brief overview of what APIs are, how they work, and why they are essential for accessing LLMs.
- **Understanding JSON**: An introduction to JSON (JavaScript Object Notation), the data format commonly used for API responses, and how to work with it in Python.
- **OpenRouter API**: Introduction to the OpenRouter API, which provides access to various LLMs.

## What are LLMs?

**Large Language Models (LLMs)** are sophisticated artificial intelligence systems trained on vast collections of text data to understand, generate, and manipulate human language. Think of them as extremely well-read digital assistants that have absorbed millions of books, articles, websites, and documents, enabling them to engage with text in remarkably human-like ways.

### How LLMs Work: The Basics

LLMs use a technology called **transformer architecture** (you don't need to understand the technical details!) that allows them to:

1. **Predict the next word** in a sequence based on context
2. **Understand relationships** between words, sentences, and concepts
3. **Generate coherent text** that follows patterns learned from training data
4. **Transfer knowledge** from one domain to another

### Key Terms for Digital Humanities

#### **Training Data**
The massive collection of texts used to teach the LLM. This typically includes:
- Books and literature from various periods and cultures
- Academic papers and journals
- News articles and magazines
- Web content and reference materials
- **Important**: The quality and diversity of training data affects what the model "knows"

#### **Tokens**
The basic units of text that LLMs process. A token can be:
- A whole word ("humanities")
- Part of a word ("human" + "ities")
- Punctuation marks
- **Why it matters**: API costs are often calculated per token

#### **Context Window**
The amount of text an LLM can "remember" at once, measured in tokens. Common sizes:
- **GPT-3.5**: ~4,000 tokens (≈3,000 words)
- **GPT-4**: ~8,000-32,000 tokens
- **Claude**: ~100,000+ tokens
- **Why it matters**: Determines how much text you can analyze at once

#### **Prompt**
The input text you give to an LLM to get a response. Effective prompting is crucial for good results.

#### **Fine-tuning**
The process of further training a model on specific data to improve performance for particular tasks.

### Applications in Digital Humanities

#### **1. Text Analysis**
- **Sentiment analysis** of historical documents
- **Thematic analysis** across large corpora
- **Stylometric analysis** for authorship attribution
- **Content classification** and categorization

#### **2. Language Processing**
- **Translation** of historical texts
- **Transcription** assistance for handwritten documents
- **OCR error correction** in digitized materials
- **Modernization** of archaic language

#### **3. Research Assistance**
- **Literature reviews** and source discovery
- **Citation analysis** and bibliography generation
- **Concept mapping** and knowledge extraction
- **Hypothesis generation** from patterns in data

#### **4. Content Generation**
- **Metadata generation** for digital collections
- **Summary creation** for large document sets
- **Educational material** development
- **Interactive exhibits** and digital storytelling

### Limitations and Considerations

#### **Accuracy Concerns**
- LLMs can generate plausible but incorrect information (**hallucinations**)
- Always verify important claims against primary sources
- Use multiple models and cross-check results

#### **Bias and Representation**
- Training data reflects societal biases
- May underrepresent certain cultures, languages, or perspectives
- Critical evaluation is essential, especially for sensitive topics

#### **Temporal Knowledge**
- Models have knowledge cutoff dates
- May not know about recent events or publications
- Historical accuracy varies by period and region

#### **Language Coverage**
- Performance varies significantly across languages
- Better results for well-represented languages (English, major European languages)
- Limited effectiveness for minority or historical languages

#### **Legal Considerations**
- Legality of data acquisition for training models
- Protection of user input; consider how your data is handled when using online AI applications

### Popular LLM Models for Research

#### **OpenAI's GPT Series**
- **GPT-3.5**: Fast, cost-effective for many tasks
- **GPT-4**: More capable, better reasoning, higher cost
- **Strengths**: General knowledge, writing quality
- **Best for**: Text generation, analysis, general research tasks

#### **Anthropic's Claude**
- **Claude-3**: Various sizes (Haiku, Sonnet, Opus)
- **Strengths**: Large context windows, careful reasoning
- **Best for**: Long document analysis, ethical considerations

#### **Google's Gemini**
- **Gemini Pro**: Competitive with GPT-4
- **Strengths**: Multimodal capabilities, integration with Google services
- **Best for**: Research integration, document processing

#### **Open Source Models**
- **Llama 2/3**: Meta's open-source models
- **Mistral**: European open-source alternative
- **Benefits**: Transparency, customization, data privacy

### Getting Started: Questions to Ask

Before using LLMs in your research, consider:

1. **What specific task** do you want to accomplish?
2. **How much text** will you be processing?
3. **What level of accuracy** do you need?
4. **Are there privacy concerns** with your data?
5. **What's your budget** for API usage?
6. **Do you need real-time results** or can processing take time?


### Usage of LLMs at the National Library of Latvia: Three Examples

- **OCR and normalization of Latvian old script**

OCR quality for historical newspapers printed in Fraktur script is often poor in materials digitized in earlier years. LLMs have shown limited effectiveness in correcting when preservation of original, non-contemporary glyphs is required. However, they are very effective when the goal is normalization — transforming the text into contemporary script rather than reproducing the original form. The overcorrection and hallucinations are present but the number of errors is lower compared to historical OCR.

<img src="https://github.com/ValRCS/BSSDH_2025_workshop_LLM_API/blob/main/img/fraktur-normalization.JPG?raw=true" alt="Original OCR (left); processed with Gemini 2.5 Flash Preview" width="500">

- **Building complex, topic-based corpora: Corpus of Latvian Music Texts**

Keyword-based methods are commonly used to build topic-specific datasets from databases. When keywords have broad or ambiguous meanings, this approach often results in high rates of false positives that can exceed 50% of total data. LLMs can help address this issue by providing more context-aware filtering.

<img src="https://github.com/ValRCS/BSSDH_2025_workshop_LLM_API/blob/main/img/system-prompt-music.JPG?raw=true" alt="Example of the prompt" width="500">

- **Extracting semantically related terms: Transportation in Latvian novels**

LLM-based extraction from 460 novels identified approximately 160 valid land transportation term types - double the number produced by other methods, which yielded around 80 terms. Testing with GPT-4.0, Gemini 1.5, and Gemini 2.0 confirmed that while this approach generates more useful terms, precision varies significantly, ranging from 46% to 76%. In some cases, more that half of the terms were either hallucinations or were only loosely related
to commonly accepted definitions of "land vehicle". Full text: 10.26083/tuprints-00030146

![Average relative document frequency of terms referring to motorized and horsedrawn vehicles](https://github.com/ValRCS/BSSDH_2025_workshop_LLM_API/blob/main/img/Fig2-relative.png?raw=true)



## Interactive Version of the Notebook

### Open in Google Colab
<a href="https://colab.research.google.com/github/ValRCS/BSSDH_2025_workshop_LLM_API/blob/main/notebooks/workshop_session_1.ipynb?flush_cache=true" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Static vs Interactive Notebooks

**Static Notebooks** (like what you might see on GitHub) **are read-only versions** that display the content but don't allow you to execute code cells, modify content, install packages, or save your changes.

**Interactive Notebooks** allow you to:
- **Execute code cells** by pressing Shift+Enter or clicking the play button
- **Edit and experiment** with code in real-time
- **Install Python packages** as needed
- **Save your work** and download modified notebooks
- **See live outputs** including text, tables, and visualizations

### About Google Colab

**Google Colab** (Colaboratory) is a free, cloud-based Jupyter notebook environment that:

- **Requires no setup** - runs entirely in your web browser
- **Provides free computational resources** including CPU, GPU, and limited TPU access
- **Comes pre-installed** with most common data science and machine learning libraries
- **Integrates seamlessly** with Google Drive for saving and sharing notebooks
- **Supports real-time collaboration** allowing multiple people to work on the same notebook
- **Automatically saves** your progress to Google Drive

**Getting Started with Colab:**
1. Click the "Open in Colab" badge above
2. Sign in with your Google account (required)
3. The notebook will open in a new tab
4. You can immediately start executing cells by clicking the play button (▶️) or pressing Shift+Enter

**💡 Pro Tip:** Right-click the Colab badge and select "Open link in new tab" to keep this reference page open while working in the interactive notebook!

## Setting Up Your Environment

To interact with LLM APIs effectively, we need to set up our programming environment with the necessary libraries and configurations. This includes installing required packages and setting up API credentials.

In [None]:
# Let's print some basic information about this interactive notebook
print("This is an interactive notebook for the BSSDH 2025 workshop on LLMs and APIs.")
# first let's see what Python version we are using
import sys
print(f"Python version: {sys.version}")
# now today's date and time
from datetime import datetime
print(f"Today's date and time: {datetime.now()}")
# we will need to work with JSON data, so let's import the json module
import json
print("JSON module imported successfully.")
# we will need to read and write files so let's import pathlib
from pathlib import Path
print("Path from pathlib imported successfully.")
# TODO for those with some experience it can be useful to print more information about the environment, free memory, drives, etc.
print("Will import external libraries if available.")
# Let's also check if we have the requests library installed, which is commonly used for making API calls
try:
    import requests
    print(f"Requests library version: {requests.__version__}")
except ImportError:
    print("Requests library is not installed. You can install it using 'pip install requests'.")

# let's install tqdm for progress bars if not already installed
try:
    from tqdm import tqdm
    # import version
    from tqdm import __version__ as tqdm_version
    print(f"TQDM library version: {tqdm_version}")
except ImportError:
    print("TQDM library is not installed. You can install it using 'pip install tqdm'.")

# now let's try importing OpenAI's library if available
try:
    import openai
    print(f"OpenAI library version: {openai.__version__}")
except ImportError:
    print("OpenAI library is not installed. You can install it using 'pip install openai'.")



This is an interactive notebook for the BSSDH 2025 workshop on LLMs and APIs.
Python version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Today's date and time: 2025-07-30 12:01:19.490619
JSON module imported successfully.
Path from pathlib imported successfully.
Will import external libraries if available.
Requests library version: 2.32.4
TQDM library version: 4.67.1
OpenAI library version: 1.97.1


### Why Check System Information and Library Versions?

**Environment Documentation** is crucial for reproducible research and troubleshooting. Here's why we print this information:

#### **1. Reproducibility**
- **Version consistency**: Different library versions can produce different results
- **Environment documentation**: Future researchers (including yourself) can recreate the exact same setup
- **Research integrity**: Ensures your findings can be validated by others

#### **2. Troubleshooting**
- **Debugging assistance**: When code doesn't work, version information helps identify compatibility issues
- **Support requests**: Technical support often requires knowing your exact environment setup
- **Error diagnosis**: Many errors are version-specific and can be quickly resolved with this information

#### **3. Best Practices in Digital Humanities**
- **Methodological transparency**: Document all tools and versions used in your research
- **Collaboration**: Team members can ensure they're using compatible environments
- **Publication standards**: Many journals now require detailed technical specifications

#### **4. API Compatibility**
- **Service requirements**: Different LLM APIs may require specific library versions
- **Feature availability**: Newer features might only be available in recent library versions
- **Security updates**: Ensures you're using libraries with the latest security patches

**💡 Pro Tip**: Always run this environment check at the beginning of your research sessions to catch any changes that might affect your results!

## Understanding APIs

**API (Application Programming Interface)** is a set of rules and protocols that allows different software applications to communicate with each other. Think of an API as a digital messenger that takes your request, tells a system what you want, and then brings the response back to you in a structured format.

### The Restaurant Analogy

Imagine you're at a restaurant:
- **You** (the client) want to order food
- **The kitchen** (the server) prepares the food
- **The waiter** (the API) takes your order to the kitchen and brings your food back

In the digital world:
- **Your Python script** (the client) wants data or a service
- **The LLM service** (the server) processes your request
- **The API** takes your request and returns the results

### Why APIs Matter for Digital Humanities

#### **1. Programmatic Access**
Instead of manually copying and pasting text into web interfaces:
- **Batch processing**: Analyze hundreds of documents automatically
- **Consistency**: Same processing applied to all texts
- **Reproducibility**: Document and repeat your exact methods
- **Efficiency**: Save hours or days of manual work

#### **2. Integration with Research Workflows**
- **Combine with existing tools**: Integrate with databases, spreadsheets, visualization software
- **Custom analysis pipelines**: Build workflows tailored to your specific research questions
- **Data preservation**: Keep detailed logs of all processing steps
- **Scalability**: Handle projects ranging from single documents to massive corpora

### Key API Concepts

#### **HTTP Methods**
APIs use standard web protocols:
- **GET**: Retrieve information (like downloading a file)
- **POST**: Send data for processing (like submitting a form)
- **PUT**: Update existing data
- **DELETE**: Remove data

For LLM APIs, we primarily use **POST** to send text for analysis.

#### **Request and Response**
Every API interaction involves:
1. **Request**: What you send to the API
   - URL (endpoint)
   - Headers (metadata like authorization)
   - Body (your actual data/text)
2. **Response**: What the API sends back
   - Status code (200 = success, 404 = not found, etc.)
   - Data (usually in JSON format)

#### **Authentication**
Most APIs require proof of identity:
- **API Keys**: Secret strings that identify you
- **Tokens**: Temporary credentials with specific permissions
- **Rate Limits**: Restrictions on how many requests you can make

### API Anatomy for LLM Services

#### **Base URL**
The main address of the API service:
```
https://openrouter.ai/api/v1/
```

#### **Endpoints**
Specific functions within the API:
```
/chat/completions  # For sending messages to LLMs
/models           # List available models
/usage            # Check your usage statistics
```

#### **Complete URL**
```
https://openrouter.ai/api/v1/chat/completions
```

### Headers: The API's Metadata

Headers provide essential information about your request:

```python
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "HTTP-Referer": "https://your-research-project.edu",
    "X-Title": "Digital Humanities Text Analysis"
}
```

#### **Common Headers Explained**
- **Authorization**: Proves you're allowed to use the service
- **Content-Type**: Tells the API what format your data is in
- **HTTP-Referer**: (Optional) Identifies your project for tracking
- **X-Title**: (Optional) Describes your application

### Request Body: Your Actual Data

The request body contains your instructions and text:

```python
request_body = {
    "model": "openai/gpt-3.5-turbo",
    "messages": [
        {
            "role": "user",
            "content": "Analyze the sentiment of this historical document: [your text here]"
        }
    ],
    "max_tokens": 1000,
    "temperature": 0.1
}
```

#### **Key Parameters**
- **model**: Which LLM to use
- **messages**: Your conversation with the AI
- **max_tokens**: Maximum length of response
- **temperature**: Creativity level (0 = deterministic, 1 = creative)

### Common API Response Formats

#### **Successful Response (Status 200)**
```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-3.5-turbo",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "This historical document expresses predominantly negative sentiment regarding the economic policies..."
      },
      "finish_reason": "stop"
    }
  ]
}
```

#### **Error Response (Status 400+)**
```json
{
  "error": {
    "message": "You exceeded your rate limit",
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded"
  }
}
```

### Digital Humanities Use Cases

#### **1. Manuscript Analysis**
```python
# Analyze medieval manuscript style
request = {
    "model": "openai/gpt-4",
    "messages": [
        {"role": "user", "content": "Identify the literary devices in this Middle English text: [manuscript text]"}
    ]
}
```

#### **2. Historical Document Processing**
```python
# Extract entities from historical records
request = {
    "model": "anthropic/claude-3-sonnet",
    "messages": [
        {"role": "user", "content": "Extract all person names, places, and dates from this 18th-century letter: [letter text]"}
    ]
}
```

#### **3. Comparative Literature**
```python
# Compare themes across texts
request = {
    "model": "meta-llama/llama-2-70b-chat",
    "messages": [
        {"role": "user", "content": "Compare the themes of exile in these two poems: [poem 1] vs [poem 2]"}
    ]
}
```

### Best Practices for Research

#### **1. Documentation**
- **Log all API calls**: Keep records of what models and parameters you used
- **Version control**: Track changes to your analysis methods
- **Reproducible scripts**: Write code that others can run and verify

#### **2. Error Handling**
```python
import requests

try:
    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()  # Raises an exception for bad status codes
    result = response.json()
except requests.exceptions.RequestException as e:
    print(f"API request failed: {e}")
```

#### **3. Rate Limiting and Costs**
- **Respect rate limits**: Don't overwhelm the service
- **Monitor usage**: Track your API costs
- **Batch efficiently**: Group similar requests when possible

#### **4. Data Privacy**
- **Sensitive data**: Be cautious with personal or confidential historical materials
- **Institutional policies**: Check your institution's data use guidelines
- **Terms of service**: Understand how API providers handle your data

### Popular APIs for Digital Humanities

#### **LLM APIs**
- **OpenRouter**: Access to multiple models through one interface
- **OpenAI API**: Direct access to GPT models
- **Anthropic API**: Claude models with large context windows
- **Hugging Face API**: Open-source models

#### **Complementary APIs**
- **Google Books API**: Access to digitized books
- **DPLA API**: Digital Public Library of America
- **Europeana API**: European cultural heritage
- **Archive.org API**: Internet Archive materials

### Security Considerations

#### **API Key Management**
- **Never commit keys to version control**
- **Use environment variables** to store sensitive information
- **Rotate keys regularly**
- **Limit key permissions** where possible

#### **Example: Secure Key Storage**
```python
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Safely access your API key
api_key = os.getenv("OPENROUTER_API_KEY")
if not api_key:
    raise ValueError("API key not found in environment variables")
```

### Testing and Development

#### **Start Small**
1. **Test with short texts** before processing large corpora
2. **Use cheaper models** for initial experiments
3. **Validate outputs** with known examples
4. **Compare multiple models** for the same task

#### **API Testing Tools**
- **Postman**: Visual interface for testing API calls
- **curl**: Command-line tool for simple tests
- **Python requests library**: For programmatic testing

### Next Steps

Understanding APIs is crucial because they provide:
- **Systematic access** to powerful LLM capabilities
- **Integration possibilities** with your existing research tools
- **Scalability** for large-scale text analysis projects
- **Reproducibility** essential for academic research

In the next section, we'll explore JSON format, which is how APIs structure the data they send and receive.

## Understanding JSON

JSON (JavaScript Object Notation) is a lightweight data format commonly used for API responses. It's human-readable and easy to work with in Python, making it ideal for handling structured data from LLM APIs.

Official JSON website: [json.org](https://www.json.org/)

### A Brief History of JSON

#### **Origins (2001-2005)**
- **Created by Douglas Crockford** in 2001 while working at State Software - later moved to Yahoo! one of early adopters
- **Originally designed** as a data exchange format for web applications
- **Name derivation**: "JavaScript Object Notation" because it uses JavaScript object syntax
- **Problem it solved**: Need for a lightweight alternative to XML for AJAX applications

#### **Early Adoption (2005-2010)**
- **2005**: First JSON specification published
- **Web 2.0 era**: Became popular with rise of dynamic web applications
- **AJAX revolution**: JSON enabled faster, more efficient data exchange
- **Language support**: Libraries developed for Python, Java, C#, and other languages

#### **Standardization (2010-present)**
- **2013**: RFC 7159 established JSON as an internet standard
- **2017**: Updated standard (RFC 8259) clarified specifications
- **Current status**: De facto standard for web APIs and data exchange
- **Universal adoption**: Supported by virtually every programming language

### Why JSON Became Dominant

#### **Advantages over XML**
- **Lighter weight**: Less verbose, smaller file sizes
- **Faster parsing**: Simpler structure means quicker processing
- **Human readable**: Easy to read and debug
- **Native JavaScript support**: No additional parsing needed in browsers

#### **Comparison Example**
**XML Version:**
```xml
<book>
  <title>Digital Humanities Methods</title>
  <author>Jane Smith</author>
  <year>2024</year>
  <topics>
    <topic>Text Analysis</topic>
    <topic>Data Visualization</topic>
  </topics>
</book>
```

**JSON Version:**
```json
{
  "title": "Digital Humanities Methods",
  "author": "Jane Smith",
  "year": 2024,
  "topics": ["Text Analysis", "Data Visualization"]
}
```

### JSON Syntax: Complete Guide

#### **Basic Structure Rules**
1. **Data is in name/value pairs**
2. **Data is separated by commas**
3. **Curly braces hold objects**
4. **Square brackets hold arrays**
5. **Strings must use double quotes**

#### **Data Types**

##### **1. Strings**
- Must be enclosed in **double quotes** (not single quotes)
- Can contain Unicode characters
- Escape sequences supported

```json
{
  "simple_string": "Hello World",
  "unicode_string": "Latvian: Sveika, pasaule! 🇱🇻",
  "escaped_string": "Quote: \"Hello\" and newline: \n",
  "empty_string": ""
}
```

##### **2. Numbers**
- Integer or floating point
- No leading zeros (except for decimal numbers)
- Scientific notation supported

```json
{
  "integer": 42,
  "negative": -17,
  "float": 3.14159,
  "scientific": 1.23e-10,
  "zero": 0
}
```

##### **3. Booleans**
- Only `true` or `false` (lowercase)
- No other boolean representations

```json
{
  "is_published": true,
  "is_draft": false
}
```

##### **4. Null**
- Represents empty value
- Written as `null` (lowercase)

```json
{
  "optional_field": null,
  "missing_data": null
}
```

##### **5. Objects**
- Collections of key/value pairs
- Keys must be strings in double quotes
- Values can be any JSON data type

```json
{
  "researcher": {
    "name": "Dr. Anda Baklāne",
    "institution": "National Library of Latvia",
    "specialization": "Digital Humanities",
    "contact": {
      "email": "anda.baklane@lnb.lv",
      "phone": null
    }
  }
}
```

##### **6. Arrays**
- Ordered lists of values
- Values can be any JSON data type (mixed types allowed)
- Zero-indexed

```json
{
  "research_topics": [
    "Text Analysis",
    "Data Visualization", 
    "Machine Learning"
  ],
  "mixed_array": [
    "string",
    42,
    true,
    null,
    {"nested": "object"},
    [1, 2, 3]
  ],
  "empty_array": []
}
```

#### **Nesting and Complex Structures**

JSON supports unlimited nesting of objects and arrays:

```json
{
  "digital_humanities_project": {
    "title": "Latvian Literature Analysis",
    "metadata": {
      "created": "2025-01-15",
      "version": "1.2",
      "authors": [
        {
          "name": "Valdis Saulespurēns",
          "role": "Lead Developer",
          "skills": ["Python", "Machine Learning", "APIs"]
        },
        {
          "name": "Anda Baklāne", 
          "role": "Research Lead",
          "skills": ["Literary Theory", "Text Analysis", "Data Visualization"]
        }
      ]
    },
    "datasets": [
      {
        "name": "19th Century Novels",
        "size": 1200,
        "languages": ["Latvian", "German"],
        "analysis_results": {
          "sentiment_scores": [0.65, 0.72, 0.58],
          "themes": {
            "exile": 0.34,
            "identity": 0.78,
            "nationalism": 0.45
          }
        }
      }
    ]
  }
}
```

### JSON in Digital Humanities Context

#### **Research Metadata Example**
```json
{
  "manuscript_analysis": {
    "document_id": "LNB-MS-1847-034",
    "title": "Personal Letters of Krišjānis Barons",
    "date_created": "1847-03-15",
    "language": "Latvian",
    "script": "Gothic",
    "digitization": {
      "scan_date": "2023-08-15",
      "resolution": "600dpi",
      "format": "TIFF",
      "ocr_confidence": 0.87
    },
    "analysis_results": {
      "entities": {
        "persons": ["Krišjānis Barons", "Anna Barone"],
        "places": ["Rīga", "Jelgava", "Dundaga"],
        "dates": ["1847-03-15", "1847-04-02"]
      },
      "themes": [
        {"theme": "family_relations", "confidence": 0.95},
        {"theme": "folklore_collection", "confidence": 0.78},
        {"theme": "cultural_identity", "confidence": 0.82}
      ],
      "sentiment": {
        "overall": "positive",
        "score": 0.73,
        "emotions": {
          "joy": 0.45,
          "nostalgia": 0.67,
          "concern": 0.23
        }
      }
    }
  }
}
```

#### **LLM API Request Example**
```json
{
  "model": "openai/gpt-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a digital humanities expert specializing in 19th-century Latvian literature."
    },
    {
      "role": "user", 
      "content": "Analyze the following text excerpt for themes of exile and identity: [text content here]"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.1,
  "metadata": {
    "research_project": "Baltic Literary Themes",
    "researcher": "BSSDH Workshop Participant",
    "date": "2025-08-03"
  }
}
```

### Common JSON Errors and How to Avoid Them

#### **1. Syntax Errors**
```json
// ❌ WRONG - Single quotes
{ 'author': 'Jane Smith' }

// ✅ CORRECT - Double quotes
{ "author": "Jane Smith" }

// ❌ WRONG - Trailing comma
{
  "title": "Book",
  "year": 2024,
}

// ✅ CORRECT - No trailing comma
{
  "title": "Book", 
  "year": 2024
}

// ❌ WRONG - Comments (not allowed in strict JSON)
{
  "title": "Book", // This is a comment
  "year": 2024
}

// ✅ CORRECT - No comments
{
  "title": "Book",
  "year": 2024
}
```

#### **2. Data Type Errors**
```json
// ❌ WRONG - Undefined values
{
  "value": undefined
}

// ✅ CORRECT - Use null for missing values
{
  "value": null
}

// ❌ WRONG - Functions (not valid JSON)
{
  "calculate": function() { return 42; }
}

// ✅ CORRECT - Only data, no functions
{
  "result": 42
}
```

### Working with JSON in Python

#### **Basic Operations**
```python
import json

# Creating JSON from Python data
data = {
    "title": "Digital Humanities Research",
    "authors": ["Valdis", "Anda"],
    "published": True,
    "year": 2025
}

# Convert to JSON string
json_string = json.dumps(data)
print(json_string)

# Convert back to Python object
parsed_data = json.loads(json_string)
print(parsed_data["title"])
```

#### **Pretty Printing**
```python
# Format JSON nicely
pretty_json = json.dumps(data, indent=2, ensure_ascii=False)
print(pretty_json)
```

#### **Reading/Writing JSON Files**
```python
# Write to file
with open('research_data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

# Read from file
with open('research_data.json', 'r', encoding='utf-8') as f:
    loaded_data = json.load(f)
```

#### **Handling API Responses**
```python
import requests

response = requests.post(api_url, headers=headers, json=request_data)
if response.status_code == 200:
    result = response.json()  # Automatically parses JSON
    content = result['choices'][0]['message']['content']
    print(content)
```

### JSON Validation and Tools

#### **Online Validators**
- **JSONLint**: [jsonlint.com](https://jsonlint.com) - Validate and format JSON
- **JSON Formatter**: [jsonformatter.org](https://jsonformatter.org) - Format and validate
- **JSON Schema Validator**: Validate against specific schemas

#### **Python Validation**
```python
import json

def validate_json(json_string):
    try:
        json.loads(json_string)
        return True, "Valid JSON"
    except json.JSONDecodeError as e:
        return False, f"Invalid JSON: {e}"

# Test validation
test_json = '{"name": "Valdis", "role": "instructor"}'
is_valid, message = validate_json(test_json)
print(f"Valid: {is_valid}, Message: {message}")
```

### JSON Schema for Data Validation

For research projects, you can define schemas to ensure data consistency:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "document_id": {
      "type": "string",
      "pattern": "^[A-Z]{3}-[A-Z]{2}-[0-9]{4}-[0-9]{3}$"
    },
    "analysis_date": {
      "type": "string",
      "format": "date"
    },
    "confidence_score": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    }
  },
  "required": ["document_id", "analysis_date"]
}
```

### Best Practices for Digital Humanities

#### **1. Consistent Structure**
- Use consistent naming conventions (snake_case or camelCase)
- Maintain consistent data types across similar fields
- Document your JSON structure for team members

#### **2. Meaningful Keys**
```json
// ❌ Unclear
{"d": "2025-08-03", "a": "Valdis", "s": 0.85}

// ✅ Clear and descriptive
{
  "analysis_date": "2025-08-03",
  "analyst": "Valdis", 
  "confidence_score": 0.85
}
```

#### **3. Version Your Data Formats**
```json
{
  "format_version": "1.2",
  "created_by": "BSSDH Workshop Tools",
  "data": {
    // Your actual research data
  }
}
```

#### **4. Include Metadata**
```json
{
  "metadata": {
    "created": "2025-08-03T10:30:00Z",
    "tool": "GPT-4 via OpenRouter",
    "researcher": "Workshop Participant",
    "institution": "Baltic Summer School of Digital Humanities"
  },
  "analysis_results": {
    // Your analysis data
  }
}
```

### Current State and Future of JSON

#### **Current Usage (2025)**
- **Dominant format** for REST APIs and web services
- **Standard format** for configuration files in many tools
- **Primary format** for NoSQL databases (MongoDB, CouchDB)
- **Essential skill** for data science and digital humanities

#### **Alternatives and Competitors**
- **YAML**: More human-readable, used for configuration
- **TOML**: Growing popularity for configuration files
- **Protocol Buffers**: Google's binary format for performance
- **MessagePack**: Binary format that's more compact than JSON

#### **JSON's Continued Relevance**
- **Universal support**: Every programming language supports JSON
- **Simplicity**: Easy to learn and implement
- **Web standard**: Built into browsers and web technologies
- **Ecosystem**: Vast ecosystem of tools and libraries

JSON remains the go-to format for data exchange in digital humanities research because of its simplicity, readability, and universal support. Understanding JSON thoroughly will serve you well in any computational research project!



### JSON in Digital Humanities

- **Metadata**: Store information about texts, authors, publication dates.
- **Analysis Results**: Save outputs from LLMs (e.g., named entities, summaries).
- **Data Exchange**: Share research datasets between tools and collaborators.

### Working with JSON in Python

Import the `json` module:


In [None]:
import json

# Convert Python dict to JSON string
data = {"author": "Valdis", "topic": "Digital Humanities"}
json_str = json.dumps(data)

# Convert JSON string back to Python dict
parsed = json.loads(json_str)
print(parsed["author"])  # Output: Valdis



### Example: LLM API Response

A typical LLM API response in JSON:


In [None]:
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The main theme of this novel is exile and identity."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 20
  }
}



### Tips for Digital Humanities Researchers

- **Validate your JSON**: Use online tools like [jsonlint.com](https://jsonlint.com).
- **Document your data**: Add clear keys and structure for future use.
- **Use JSON for reproducibility**: Save analysis results for sharing and publication.

JSON is a foundational skill for working with APIs and digital research tools!


In [None]:

## OpenRouter API

OpenRouter is a unified API that provides access to multiple LLM providers through a single interface. This makes it convenient to experiment with different models and compare their performance for humanities research tasks.

### What is OpenRouter?

**OpenRouter** acts as a gateway to dozens of different LLM providers, allowing you to:
- **Access multiple models** through a single API interface
- **Compare performance** across different LLMs for the same task
- **Switch between models** without changing your code structure
- **Manage costs** by choosing models based on budget and performance needs

### Key Advantages for Digital Humanities Research

#### **1. Model Diversity**
- **OpenAI models**: GPT-3.5, GPT-4, GPT-4 Turbo
- **Anthropic models**: Claude-3 Haiku, Sonnet, Opus
- **Google models**: Gemini Pro, Gemini Flash
- **Open source models**: Llama, Mistral, and many others
- **Specialized models**: Fine-tuned for specific tasks

#### **2. Cost Optimization**
- **Transparent pricing**: See exact costs per model
- **Choose by budget**: Use cheaper models for initial testing
- **Scale appropriately**: Use powerful models only when needed
- **Usage tracking**: Monitor your spending in real-time

#### **3. Unified Interface**
- **Consistent API**: Same request format for all models
- **Easy switching**: Change models by modifying one parameter
- **Standard responses**: Uniform JSON response structure
- **Simplified authentication**: One API key for all providers

### OpenRouter API Structure

#### **Base URL**
```
https://openrouter.ai/api/v1/chat/completions
```

#### **Request Format**
All requests use the same JSON structure, regardless of the underlying model:

```json
{
  "model": "model-provider/model-name",
  "messages": [
    {
      "role": "system",
      "content": "System instructions here"
    },
    {
      "role": "user", 
      "content": "User query here"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.1
}
```

### Authentication and Security

#### **API Key Management**
OpenRouter requires an API key for authentication. For security:
- **Never hardcode** API keys in your scripts
- **Use environment variables** to store sensitive credentials
- **Rotate keys regularly** for enhanced security
- **Monitor usage** to detect unauthorized access

#### **Setting Up Environment Variables**
Store your API key in an environment variable for secure access:

```python
import os

# Retrieve API key from environment variable
api_key = os.getenv("OPENROUTER_API_KEY_LNB")
if not api_key:
    print("❌ Error: OPENROUTER_API_KEY_LNB environment variable not found")
    print("Please set your OpenRouter API key as an environment variable")
else:
    print("✅ API key loaded successfully")
```

### Practical Example: Analyzing Latvian Literature

Let's create a practical example that demonstrates how to use OpenRouter for digital humanities research focused on Latvian literature:


In [1]:
import os
import json
import requests
from datetime import datetime

def analyze_latvian_text_with_openrouter():
    """
    Demonstrate OpenRouter API usage for Latvian literature analysis
    """
    
    # Step 1: Get API key from environment
    api_key = os.getenv("OPENROUTER_API_KEY_LNB")
    
    if not api_key:
        print("❌ Error: OPENROUTER_API_KEY_LNB environment variable not found")
        print("\nTo set up your API key:")
        print("1. Get an API key from https://openrouter.ai/")
        print("2. Set environment variable: OPENROUTER_API_KEY_LNB=your_key_here")
        return None
    
    print("✅ API key loaded successfully")
    
    # Step 2: Set up the API endpoint and headers
    url = "https://openrouter.ai/api/v1/chat/completions"
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "HTTP-Referer": "https://www.digitalhumanities.lv/bssdh/2025/",  # Your project URL
        "X-Title": "BSSDH 2025 LLM Workshop - Latvian Literature Analysis"
    }
    
    # Step 3: Sample Latvian text for analysis
    # This is a fictional excerpt inspired by Latvian literary traditions
    latvian_text = """
    Rudens vējš pūta pār dzelteno lapu jūru. Mārtiņš skatījās uz tālo horizontu, 
    kur saule lēni iegrimst aiz mežu sienas. Viņa sirdī valdīja skumjas par atstāto 
    dzimteni, bet arī cerība uz jaunu dzīvi svešumā. Vai viņš kad atgriezīsies 
    mājās? Vai viņa bērni runās latviski?
    """
    
    # Step 4: Create the request payload
    request_data = {
        "model": "openai/gpt-3.5-turbo",  # Using GPT-3.5 for cost-effectiveness
        "messages": [
            {
                "role": "system",
                "content": """You are a digital humanities expert specializing in Latvian literature 
                and culture. Analyze texts for themes, emotional content, cultural references, 
                and literary devices. Provide detailed, scholarly analysis."""
            },
            {
                "role": "user",
                "content": f"""Please analyze this Latvian text excerpt for the following elements:

1. Main themes (especially themes of exile, identity, belonging)
2. Emotional tone and sentiment
3. Cultural and geographical references
4. Literary devices used
5. Historical or social context suggested

Text to analyze:
{latvian_text}

Please provide your analysis in English, with specific references to the Latvian text."""
            }
        ],
        "max_tokens": 1000,
        "temperature": 0.1,  # Low temperature for consistent, analytical responses
        "top_p": 0.9
    }
    
    # Step 5: Make the API request
    try:
        print("🔄 Sending request to OpenRouter API...")
        print(f"📝 Model: {request_data['model']}")
        print(f"📊 Max tokens: {request_data['max_tokens']}")
        print(f"🌡️ Temperature: {request_data['temperature']}")
        print("-" * 50)
        
        response = requests.post(url, headers=headers, json=request_data, timeout=30)
        
        # Check if request was successful
        response.raise_for_status()
        
        # Step 6: Parse the JSON response
        result = response.json()
        
        # Step 7: Extract and display the analysis
        if 'choices' in result and len(result['choices']) > 0:
            analysis = result['choices'][0]['message']['content']
            
            print("✅ Analysis completed successfully!")
            print("=" * 60)
            print("📖 LITERARY ANALYSIS RESULTS")
            print("=" * 60)
            print(analysis)
            print("=" * 60)
            
            # Display usage statistics
            if 'usage' in result:
                usage = result['usage']
                print(f"\n📊 API Usage Statistics:")
                print(f"   • Prompt tokens: {usage.get('prompt_tokens', 'N/A')}")
                print(f"   • Completion tokens: {usage.get('completion_tokens', 'N/A')}")
                print(f"   • Total tokens: {usage.get('total_tokens', 'N/A')}")
            
            # Return the full response for further processing
            return {
                'text_analyzed': latvian_text,
                'analysis': analysis,
                'model_used': request_data['model'],
                'timestamp': datetime.now().isoformat(),
                'usage_stats': result.get('usage', {}),
                'full_response': result
            }
        
        else:
            print("❌ Error: No analysis returned from the API")
            return None
            
    except requests.exceptions.Timeout:
        print("❌ Error: Request timed out. Please try again.")
        return None
    except requests.exceptions.HTTPError as e:
        print(f"❌ HTTP Error: {e}")
        if response.status_code == 401:
            print("   This usually means your API key is invalid or expired.")
        elif response.status_code == 429:
            print("   Rate limit exceeded. Please wait before making another request.")
        return None
    except requests.exceptions.RequestException as e:
        print(f"❌ Request Error: {e}")
        return None
    except json.JSONDecodeError:
        print("❌ Error: Invalid JSON response from API")
        return None

# Run the analysis
print("🇱🇻 LATVIAN LITERATURE ANALYSIS WITH OPENROUTER API")
print("=" * 60)
result = analyze_latvian_text_with_openrouter()

if result:
    print(f"\n💾 Analysis completed at: {result['timestamp']}")
    print("You can now save this analysis to a file or database for your research.")

🇱🇻 LATVIAN LITERATURE ANALYSIS WITH OPENROUTER API
✅ API key loaded successfully
🔄 Sending request to OpenRouter API...
📝 Model: openai/gpt-3.5-turbo
📊 Max tokens: 1000
🌡️ Temperature: 0.1
--------------------------------------------------
✅ Analysis completed successfully!
📖 LITERARY ANALYSIS RESULTS
1. Main themes:
The main themes in this text excerpt revolve around exile, identity, and belonging. The mention of the protagonist, Mārtiņš, looking at the distant horizon where the sun is setting behind the forest wall evokes a sense of longing and nostalgia for his homeland ("dzimteni"). The juxtaposition of sadness for the homeland left behind and hope for a new life in a foreign land reflects the complex emotions of exile and the search for belonging. The questions posed about whether he will return home and if his children will speak Latvian also highlight the themes of identity and cultural preservation in the face of displacement.

2. Emotional tone and sentiment:
The emotional ton

### Understanding the Code Structure

#### **1. Environment Variable Setup**
```python
api_key = os.getenv("OPENROUTER_API_KEY_LNB")
```
- **Secure access**: API key stored in environment variable
- **Error handling**: Graceful failure if key not found
- **Best practice**: Never hardcode sensitive credentials

#### **2. Request Headers**
```python
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "HTTP-Referer": "https://bssdh.eu/",
    "X-Title": "BSSDH 2025 LLM Workshop - Latvian Literature Analysis"
}
```
- **Authorization**: Bearer token authentication
- **Content-Type**: Tells API we're sending JSON data
- **HTTP-Referer**: Identifies your project (optional but recommended)
- **X-Title**: Descriptive title for usage tracking

#### **3. JSON Request Structure**
```python
request_data = {
    "model": "openai/gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": "System instructions..."},
        {"role": "user", "content": "User query..."}
    ],
    "max_tokens": 1000,
    "temperature": 0.1
}
```
- **model**: Specifies which LLM to use
- **messages**: Conversation format with system and user roles
- **max_tokens**: Limits response length (controls cost)
- **temperature**: Controls creativity (0 = deterministic, 1 = creative)

#### **4. Error Handling**
The code includes comprehensive error handling for:
- **Authentication errors** (401): Invalid API key
- **Rate limiting** (429): Too many requests
- **Network timeouts**: Connection issues
- **JSON parsing errors**: Malformed responses

### Popular Models for Digital Humanities

#### **For Analysis Tasks**
```python
# Cost-effective for bulk analysis
"openai/gpt-3.5-turbo"

# More sophisticated analysis
"openai/gpt-4-turbo"

# Large context for long documents
"anthropic/claude-3-sonnet"

# Fast and economical
"google/gemini-flash-1.5"
```

#### **For Multilingual Tasks**
```python
# Strong multilingual capabilities
"openai/gpt-4"

# Good for European languages
"anthropic/claude-3-opus"

# Open source alternative
"meta-llama/llama-3-70b-instruct"
```

### Customizing for Your Research

#### **System Prompts for Different Tasks**
```python
# For sentiment analysis
system_prompt = """You are an expert in sentiment analysis of historical texts. 
Analyze the emotional content and provide numerical scores for different emotions."""

# For named entity recognition
system_prompt = """You are a specialist in extracting names, places, and dates 
from historical documents. Focus on accurate identification and categorization."""

# For thematic analysis
system_prompt = """You are a literary scholar specializing in thematic analysis. 
Identify recurring themes, motifs, and symbolic elements in the text."""
```

#### **Adjusting Parameters for Different Goals**
```python
# For creative interpretation (higher temperature)
request_data["temperature"] = 0.7

# For factual analysis (lower temperature)
request_data["temperature"] = 0.1

# For longer analysis (more tokens)
request_data["max_tokens"] = 2000

# For concise summaries (fewer tokens)
request_data["max_tokens"] = 300
```

### Saving and Managing Results

#### **Save Analysis to File**
```python
def save_analysis_to_file(result, filename):
    """Save analysis results to JSON file"""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(result, f, indent=2, ensure_ascii=False)
    print(f"✅ Analysis saved to {filename}")

# Usage
if result:
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"latvian_analysis_{timestamp}.json"
    save_analysis_to_file(result, filename)
```

#### **Create Research Database**
```python
def add_to_research_database(result, database_file="research_analyses.json"):
    """Add analysis to cumulative research database"""
    try:
        # Load existing database
        with open(database_file, 'r', encoding='utf-8') as f:
            database = json.load(f)
    except FileNotFoundError:
        # Create new database
        database = {"analyses": []}
    
    # Add new analysis
    database["analyses"].append(result)
    
    # Save updated database
    with open(database_file, 'w', encoding='utf-8') as f:
        json.dump(database, f, indent=2, ensure_ascii=False)
    
    print(f"✅ Analysis added to database. Total analyses: {len(database['analyses'])}")
```

### Cost Management Tips

#### **Monitor Token Usage**
```python
def estimate_cost(usage_stats, model_name):
    """Estimate cost based on token usage"""
    # These are example rates - check OpenRouter for current pricing
    pricing = {
        "openai/gpt-3.5-turbo": {"input": 0.0015, "output": 0.002},  # per 1K tokens
        "openai/gpt-4": {"input": 0.03, "output": 0.06},
        "anthropic/claude-3-sonnet": {"input": 0.003, "output": 0.015}
    }
    
    if model_name in pricing:
        input_cost = (usage_stats.get('prompt_tokens', 0) / 1000) * pricing[model_name]['input']
        output_cost = (usage_stats.get('completion_tokens', 0) / 1000) * pricing[model_name]['output']
        total_cost = input_cost + output_cost
        print(f"💰 Estimated cost: ${total_cost:.6f}")
        return total_cost
    else:
        print("💰 Cost estimation not available for this model")
        return None
```

#### **Batch Processing Strategy**
```python
def analyze_multiple_texts(texts, model="openai/gpt-3.5-turbo", delay=1):
    """Analyze multiple texts with rate limiting"""
    results = []
    
    for i, text in enumerate(texts):
        print(f"Processing text {i+1}/{len(texts)}...")
        
        # Modify the request for each text
        request_data["messages"][1]["content"] = f"Analyze this text: {text}"
        
        # Make request (using the same function structure as above)
        result = make_api_request(request_data)
        if result:
            results.append(result)
        
        # Rate limiting - wait between requests
        if i < len(texts) - 1:  # Don't wait after the last request
            time.sleep(delay)
    
    return results
```

### Next Steps

This OpenRouter API introduction provides the foundation for:
1. **Batch processing** multiple documents
2. **Comparative analysis** using different models
3. **Cost-effective research** with appropriate model selection
4. **Reproducible workflows** with documented API calls

In the next sessions, we'll build on this foundation to create more sophisticated analysis pipelines for specific digital humanities tasks.

### Troubleshooting Common Issues

#### **Authentication Problems**
- Verify API key is correct and active
- Check environment variable name matches exactly
- Ensure no extra spaces in the API key

#### **Rate Limiting**
- Add delays between requests (time.sleep())
- Use exponential backoff for retries
- Monitor your usage on OpenRouter dashboard

#### **Model Selection**
- Start with cheaper models for testing
- Use more powerful models for final analysis
- Check model availability and pricing regularly

The OpenRouter API provides a powerful, flexible gateway to state-of-the-art language models, making it an ideal choice for digital humanities research requiring sophisticated text analysis capabilities.

## Practical Assignment - Create your own LLM API request

You have all been e-mailed an API key for the OpenRouter API. Your task is to create a new query that analyzes a document of your choice using the OpenRouter API.

### Alternatives to system environment variables - load API keys by copy pasting them into variable

Your API keys are valuable and should be kept secure. One way to do this is to paste them into a prompt that asks for the key, rather than hardcoding them into your script. This way, you can keep your keys out of version control and avoid accidental exposure.


In [3]:
# let's ask for API key using user input that does not show the input
import getpass
api_key = getpass.getpass("Please enter your OpenRouter API key: ")
os.environ["OPENROUTER_API_KEY_LNB"] = api_key # this is not strictly necessary, but it is a good practice to keep your API keys in environment variables
print("✅ API key set successfully. You can now use it in your scripts.")
# main thing is not to print the key to the console because in notebooks you can inadvertently expose it and then submit it to a public repository

✅ API key set successfully. You can now use it in your scripts.
