# Greek Epigrams to CSV

This notebook extracts Greek texts from the Anthologia Graeca API and formats them into a CSV file in preparation for named entity recognition (NER) work.  
The output CSV includes:
- Epigram number  
- Greek text (with newlines removed)
- Text URL in the Anthologia API.

This forms the basis of a collaborative annotation workflow for the NER-Anthologia project.

In [1]:
import requests
import csv
import re

In [2]:
API_URL = "https://anthologiagraeca.org/api"
passages = []

In [3]:
# Function to fetch json data from an API endpoint
def get_data(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Failed to fetch {url}: {response.status_code}")
        return {}

In [4]:
# Fetch all passages from API (may take a while)
url = f"{API_URL}/passages/?limit=200"

while url:
    response = get_data(url)
    passages.extend(response.get("results", []))
    url = response.get("next")

print(f"Fetched {len(passages)} passages.")

Fetched 4134 passages.


In [5]:
# Extract Greek texts and write to CSV
output_rows = [("epigram_number", "greek_text", "url")]  
  
for passage in passages:
    book = passage.get("book", {})
    book_number = book.get("number")
    fragment_number = passage.get("fragment")
    subfragment_number = passage.get("sub_fragment") # When needed
    epigram_number = f"{book_number}.{fragment_number}{subfragment_number}" # Combine all of the information
    
    for text_entry in passage.get("texts", []): 
        if text_entry.get("language") == "grc": # Get greek text
            greek_text = text_entry.get("text", "")
            greek_text = re.sub(r"\s+", " ", greek_text).strip()
            urn = text_entry.get("url", "")
            output_rows.append((epigram_number, greek_text, urn))
            break # Make sure we only get one text

In [6]:
# Write data to CSV file
with open("greek_passages.csv", mode="w", newline='', encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerows(output_rows)

print("CSV saved as greek_passages.csv")

CSV saved as greek_passages.csv
