![SGSSS Logo](../img/SGSSS_Stacked.png)

# Collecting Digital Data for Social Scientists

## Introduction

In this session we evaluate how well LLMs can generate the code we wrote today. You will submit prompts to different LLMs (e.g., ChatGPT, Claude, an open model) and compare the quality of the outputs.

## Guide to Using This Resource

This notebook is designed to be used in Google Colab. To open it in Colab:

1. Click on the **File** menu and select **Open in Colab**, or upload this `.ipynb` file directly to [Google Colab](https://colab.research.google.com/).
2. You do not need to install anything locally -- Colab provides a free Python environment in your browser.
3. Work through the cells in order, pasting LLM-generated code into the designated code cells.
4. Use the markdown cells to record your evaluation notes.

## The Exercise

In this practical you will:

1. **Define coding prompts** based on the tasks we completed in today's practicals.
2. **Submit each prompt** to at least three different LLMs (e.g., ChatGPT, Claude, and an open model such as Llama or Mistral).
3. **Paste the generated code** into the cells below.
4. **Evaluate the outputs** using a consistent set of criteria.

The prompts below are already written for you. Your job is to submit them, collect the outputs, and assess the results.

## Evaluation Criteria

Use the following criteria to evaluate each LLM's output:

1. **Does the code run without errors?** -- Can you execute it in Colab without modifications?
2. **Does it produce the correct output?** -- Does the result match what you would expect?
3. **Does it use appropriate libraries?** -- Are the chosen packages standard and well-suited to the task?
4. **Is the code well-structured and readable?** -- Is it clearly organised, with sensible variable names and comments?
5. **Are there any security or ethical issues?** -- Does the code handle sensitive data appropriately? Does it respect rate limits and terms of service?

## Prompt 1: Web Scraping

Submit the following prompt to each LLM:

> Write a Python script that scrapes all organisation names and URLs from the Edinburgh Council warm and welcoming spaces directory (https://www.edinburgh.gov.uk/directory/10258/other-warm-and-welcoming-locations). The script should loop through the A-Z pages, extract each organisation's name and link, and save the results as a JSON file.

In [None]:
# Paste ChatGPT output here


In [None]:
# Paste Claude output here


In [None]:
# Paste open model output here (e.g., Llama, Mistral)


### Evaluation

| Criterion | ChatGPT | Claude | Open Model |
|---|---|---|---|
| Runs without errors? | | | |
| Correct output? | | | |
| Appropriate libraries? | | | |
| Well-structured? | | | |
| Security/ethical issues? | | | |

## Prompt 2: API Data Collection

Submit the following prompt to each LLM:

> Write a Python script that downloads stop-and-search data for all police forces using the UK Police API (https://data.police.uk/api/). The script should get a list of forces, loop through each one to request stop-and-search data, handle errors, respect rate limits, and save all results as a single JSON file.

In [None]:
# Paste ChatGPT output here


In [None]:
# Paste Claude output here


In [None]:
# Paste open model output here


### Evaluation

| Criterion | ChatGPT | Claude | Open Model |
|---|---|---|---|
| Runs without errors? | | | |
| Correct output? | | | |
| Appropriate libraries? | | | |
| Well-structured? | | | |
| Security/ethical issues? | | | |

## Prompt 3: Data Analysis

Submit the following prompt to each LLM:

> Write a Python script that requests GDP data for all G7 countries from the World Bank API (https://api.worldbank.org/v2/), converts the results to a pandas DataFrame, and creates a bar chart comparing the most recent GDP figures.

In [None]:
# Paste ChatGPT output here


In [None]:
# Paste Claude output here


In [None]:
# Paste open model output here


### Evaluation

| Criterion | ChatGPT | Claude | Open Model |
|---|---|---|---|
| Runs without errors? | | | |
| Correct output? | | | |
| Appropriate libraries? | | | |
| Well-structured? | | | |
| Security/ethical issues? | | | |

## Discussion Questions

Reflect on the following questions with your group:

1. Which LLM produced the best code overall? What made it better?
2. Did any LLM produce code that looked correct but was actually wrong? How would you know?
3. What information did you need to include in your prompt to get good results?
4. How would you use LLMs in your own research workflow? Where would you trust them and where would you not?
5. What are the implications for reproducibility if researchers use LLM-generated code?

## Our Solutions

Compare the LLM outputs against the code in Practicals 1-3. Those notebooks represent tested, working solutions that we built step by step.

---

**END OF FILE**