# Hackathon using Linked Data with quick statements in wikidata

- Wikidata [Crade](https://www.wikidata.org/wiki/Wikidata:Cradle#article)
- [QuickStatements UI](https://quickstatements.toolforge.org/)
- QuickStatements [API](https://www.wikidata.org/wiki/Help:QuickStatements#Using_the_API_to_start_batches)

In [1]:
import os
import time

import httpx
import pandas as pd

from openai import OpenAI

In [2]:
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

In [3]:
citations = pd.read_csv("../data/2024-12-10-citations.csv")

In [4]:
citations

Unnamed: 0,Authors,Title,Publication,Volume,Number,Pages,Year,Publisher
0,"Gaizauskas, Robert; Wilks, Yorick;",Information extraction: Beyond document retrieval,Journal of documentation,54.0,1.0,70-105,1998,MCB UP Ltd
1,"Morgan, Paul;",Hypertext and the literary document,Journal of documentation,47.0,4.0,373-388,1991,MCB UP Ltd
2,"Kircz, Joost G;",Modularity: the next form of scientific inform...,Journal of documentation,54.0,2.0,210-235,1998,MCB UP Ltd
3,"Farrow, John F;",A cognitive process model of document indexing,Journal of documentation,47.0,2.0,149-166,1991,MCB UP Ltd
4,"Heery, Rachel;",Review of metadata formats,Program,30.0,4.0,345-373,1996,MCB UP Ltd
5,"Burnett, Kathleen; Ng, Kwong Bor; Park, Soyeon;",A comparison of the two traditions of metadata...,Journal of the American Society for Informatio...,50.0,13.0,1209-1217,1999,Wiley Online Library
6,"Weibel, Stuart; Godby, Jean; Miller, Eric; Dan...",OCLC/NCSA metadata workshop report,,,,,1995,
7,"Schenkman, Bo N; Jönsson, Fredrik U;",Aesthetics and preferences of web pages,Behaviour & information technology,19.0,5.0,367-377,2000,Taylor & Francis
8,"Lakoff, George;",Explaining embodied cognition results,Topics in cognitive science,4.0,4.0,773-785,2012,Wiley Online Library
9,"McManus, I Chris; Wu, Wen;","“The square is… bulky, heavy, contented, plain...","Psychology of Aesthetics, Creativity, and the ...",7.0,2.0,130,2013,Educational Publishing Foundation


In [5]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": """From the following row in a csv, create Wikidata Quickstatements:\nAuthors 	Title 	Publication 	Volume 	Number 	Pages 	Year 	Publisher
0 	Gaizauskas, Robert; Wilks, Yorick; 	Information extraction: Beyond document retrieval 	Journal of documentation 	54.0 	1.0 	70-105 	1998 	MCB UP Ltd"""
        }
    ],
    model="gpt-4o",
)

In [6]:
print(chat_completion.choices[0].message.content)

To create Wikidata Quickstatements for the given row of data, we need to map the information to the appropriate properties used in Wikidata. I'll write the Quickstatements using the column data provided:

1. Author: `Gaizauskas, Robert`
2. Author: `Wilks, Yorick`
3. Title: `Information extraction: Beyond document retrieval`
4. Publication: `Journal of documentation`
5. Volume: `54`
6. Number: `1`
7. Pages: `70-105`
8. Year: `1998`
9. Publisher: `MCB UP Ltd`

Quickstatements:

```
CREATE
LAST|P31|Q13442814
LAST|P1476|en:"Information extraction: Beyond document retrieval"
LAST|P2093|"Gaizauskas, Robert"
LAST|P2093|"Wilks, Yorick"
LAST|P1433|Q15716944
LAST|P478|"54"
LAST|P433|"1"
LAST|P304|"70-105"
LAST|P577|+1998-00-00T00:00:00Z/9
LAST|P123|Q17929746
```

Here's what individual parts of the above Quickstatements mean:

- `P31|Q13442814` specifies that this item is a "scholarly article."
- `P1476|en:"Information extraction: Beyond document retrieval"` specifies the title of the work.
- `P

In [7]:
def search_wikidata(name):
    """
    Search for a name in Wikidata and return the Q code if found.
    """
    url = "https://www.wikidata.org/w/api.php"
    params = {
        "action": "wbsearchentities",
        "search": name,
        "language": "en",
        "format": "json",
        "limit": 1  # Limit to one result for simplicity
    }
    response = httpx.get(url, params=params)
    if response.status_code == 200:
        results = response.json().get('search', [])
        if results:
            return results[0]['id']  # Return the Q-code (e.g., Q12345)
    return None  # Return None if not found


def create_quickstatements_with_qcodes(row):
    statements = []
    # Create a new item for the article
    item = f'CREATE\n'
    item += f'LAST|P31|Q13442814\n'  # Instance of: scholarly article
    item += f'LAST|P1476|en:"{row["Title"]}"\n'  # Title
    item += f'LAST|P577|+{row["Year"]}-00-00T00:00:00Z/9\n'  # Publication date
    
    # Authors
    authors = row['Authors'].split('; ')
    for author in authors:
        if author.strip():  # Check if the author field is not empty
            # Adjust author name format to "First Last" instead of "Last, First"
            if ',' in author:
                last_name, first_name = author.split(', ')
                author_name = f"{first_name} {last_name}"
            else:
                author_name = author  # No comma, assume name is already in correct format
            print(author_name)
            qcode = search_wikidata(author_name)
            if qcode:
                item += f'LAST|P50|{qcode}\n'  # Author (linked to Wikidata Q code)
            else:
                item += f'LAST|P2093|"{author_name}"\n'  # Author name string if no Q code
    
    # Journal
    publication_qcode = search_wikidata(row["Publication"])
    if not publication_qcode:
        publication_qcode = row["Publication"]
    item += f'LAST|P1433|{publication_qcode}\n'  # Published in
    item += f'LAST|P478|"{row["Volume"]}"\n'  # Volume
    item += f'LAST|P433|"{row["Number"]}"\n'  # Issue
    
    # Pages
    if pd.notna(row['Pages']):
        item += f'LAST|P304|"{row["Pages"]}"\n'  # Pages
    
    # Publisher
    if pd.notna(row['Publisher']) and row['Publisher'].strip():
        qcode = search_wikidata(row['Publisher'])
        if qcode:
            item += f'LAST|P123|{qcode}\n'  # Publisher (linked to Wikidata Q code)
        else:
            item += f'LAST|P123|"{row["Publisher"]}"\n'  # Publisher name string
    
    statements.append(item)
    return "\n".join(statements)

In [8]:
search_wikidata("Yorick Wilks")

'Q4470008'

In [9]:
qcode = search_wikidata("Robert Gaizauskas")

In [10]:
qcode

'Q58329735'

In [11]:
quickstatements_with_qcodes = []
for _, row in citations.iterrows():
    quickstatements_with_qcodes.append(create_quickstatements_with_qcodes(row))
    time.sleep(5)  # Add a delay to avoid overwhelming the Wikidata API

# Combine all statements and save to a file
quickstatements_output_with_qcodes = "\n".join(quickstatements_with_qcodes)
output_file_with_qcodes = '../data/2024-12-10-quickstatements_with_qcodes.txt'
with open(output_file_with_qcodes, 'w') as f:
    f.write(quickstatements_output_with_qcodes)

output_file_with_qcodes

Robert Gaizauskas
Yorick Wilks
Paul Morgan
Joost G Kircz
John F Farrow
Rachel Heery
Kathleen Burnett
Kwong Bor Ng
Soyeon Park
Stuart Weibel
Jean Godby
Eric Miller
Ron Daniel
Bo N Schenkman
Fredrik U Jönsson
George Lakoff
I Chris McManus
Wen Wu
George Lakoff
Mark Johnson
Richard Furuta
P David Stotts


'../data/2024-12-10-quickstatements_with_qcodes.txt'

In [12]:
for row in quickstatements_with_qcodes:
    print(row)

CREATE
LAST|P31|Q13442814
LAST|P1476|en:"Information extraction: Beyond document retrieval"
LAST|P577|+1998-00-00T00:00:00Z/9
LAST|P50|Q58329735
LAST|P50|Q4470008
LAST|P1433|Q6295097
LAST|P478|"54.0"
LAST|P433|"1.0"
LAST|P304|"70-105"
LAST|P123|"MCB UP Ltd"

CREATE
LAST|P31|Q13442814
LAST|P1476|en:"Hypertext and the literary document"
LAST|P577|+1991-00-00T00:00:00Z/9
LAST|P50|Q62560226
LAST|P1433|Q6295097
LAST|P478|"47.0"
LAST|P433|"4.0"
LAST|P304|"373-388"
LAST|P123|"MCB UP Ltd"

CREATE
LAST|P31|Q13442814
LAST|P1476|en:"Modularity: the next form of scientific information presentation?"
LAST|P577|+1998-00-00T00:00:00Z/9
LAST|P2093|"Joost G Kircz"
LAST|P1433|Q6295097
LAST|P478|"54.0"
LAST|P433|"2.0"
LAST|P304|"210-235"
LAST|P123|"MCB UP Ltd"

CREATE
LAST|P31|Q13442814
LAST|P1476|en:"A cognitive process model of document indexing"
LAST|P577|+1991-00-00T00:00:00Z/9
LAST|P2093|"John F Farrow"
LAST|P1433|Q6295097
LAST|P478|"47.0"
LAST|P433|"2.0"
LAST|P304|"149-166"
LAST|P123|"MCB UP Ltd"

