# ESWC 2026 Demo

This is a notebook to suppor the demo paper "bikiDATA: Developer-friendly Queries over RDF triples"

## Abstract

When developing applications that make use of Knowledge graphs (KGs), a layer is required to translate data using the RDF data model into a format that can be manipulated and displayed in modern web applications. One of the most popular programming languages for data analysis and web applications is Python. In this paper, we present bikiDATA, a storage format and Python library that can be used to develop large-scale data applications in a developer-friendly manner. 

The system features a suite of convenient extra facilities in addition to the general SPARQL queries, like integrated fulltext search, semantic embeddings, KG embeddings and visual similarity searching.

This code can be directly [run a in a Google Colab environment](https://colab.research.google.com/github/ISE-FIZKarlsruhe/bikidata/blob/main/eswc_2026.ipynb), if all goes well this will start up a live notebook where you can download, import and query some data.

In [1]:
!pip install bikidata
import bikidata

!pip install rich
from rich import print



DEBUG     bikidata 2026-02-16 19:52:08 BIKIDATA_DB is configured as bikidata.duckdb
ERROR     bikidata 2026-02-16 19:52:13 COHERE_API_KEY environment variable is not set. 
DEBUG     bikidata 2026-02-16 19:52:13 Trying Redis at localhost




In [2]:
import urllib.request

DATA_URI = "https://epoz.org/olympics.nt.gz" # Use the Olympics dataset by Angus Addlesee
urllib.request.urlretrieve(DATA_URI, "olympics.nt.gz")

('olympics.nt.gz', <http.client.HTTPMessage at 0x10f974cd0>)

In [3]:
bikidata.build(["olympics.nt.gz"])

DEBUG     bikidata 2026-02-16 19:52:26 Building Bikidata index with ['olympics.nt.gz']
DEBUG     bikidata 2026-02-16 19:52:26 Good, there are no triples in bikidate table yet
INFO      bikidata 2026-02-16 19:52:33 No BIKIDATA_FTS_SETTINGS found, using default settings: ignore = '[^a-zA-Z0-9]+', strip_accents = 1, lower=1, stemmer='porter'


In [4]:
# Let's pick a few random entries from the data to see what it look like
r = bikidata.query({"filters": [{"p":"id", "o":"random 3"}]})
for entry in r["results"].values():
    print(entry)


Note how the entities returned in a bikidata query are Python dicts. The keys of the dicts are the property IRIs, and the values is a list of values for that property. A value can be either an IRI, a literal or a blank node. We use N3-notation to indicate the contents of a value.

For example:

```json
{
    '<http://www.w3.org/2000/01/rdf-schema#label>': ['"Ahmad Mayez Khanji"@en'],
    '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>': ['<http://xmlns.com/foaf/0.1/Person>'],
}
```

(in the above example, some properties are left out for clarity)

In [5]:
# Let's find some Gold medallists
r = bikidata.query({"filters": [{"p":"<http://wallscope.co.uk/ontology/olympics/medal>", "o":"<http://wallscope.co.uk/resource/olympics/medal/Gold>"}]})
print(f"Found {r['total']} gold medallists")

In [6]:
# But we only want a few, not all 13369
r = bikidata.query({"size":10, "filters": [{"p":"<http://wallscope.co.uk/ontology/olympics/medal>", "o":"<http://wallscope.co.uk/resource/olympics/medal/Gold>"}]})

# We can print only the IRIs of the entities, as the results are keyed by the IRIs
print("10 gold medallists IRIs", [entry for entry in r["results"]])

# And we can index the results, by print the first one.
print("The first entity in the results", list(r['results'].values())[0])


Can we see where else an example athlete is mentioned?

In [7]:
r = bikidata.query({"filters":[{"o":"<http://wallscope.co.uk/resource/olympics/athlete/FrancisWilliamFrankTyler>"}]})
print(r)

By leaving out the "p" part in the filter, and just querying for a "o" of <http://wallscope.co.uk/resource/olympics/athlete/FrancisWilliamFrankTyler> we now see that this athlete appears in two different olympics!

What about the details for that athlete? We can retrieve a single item by ID.

In [85]:
r = bikidata.query({"filters":[{"p":"id", "o":"<http://wallscope.co.uk/resource/olympics/athlete/FrancisWilliamFrankTyler>"}]})
print(r["results"])

In [None]:
# We can also do fulltext searches, using the custom property "fts"

r = bikidata.query({"filters":[{"p":"fts", "o":"freestyle"}]})
print(set([entry for entry in r["results"]][:10])) # Let's just print the first 10

There is a problem often encountered in real-word Linked Data systems, where the text matched in the search results is of sub-entity referenced in a one-hop relationship. (this sub-entity is also often a blank node) How can we solve this? By specifying the number of hops to use in the fulltext search predicate in bikidata, as an integer following the "fts" part, for example **"fts 1"**
This will then query for items matching the fts query specified, but return all the entities _that reference the matched result_ by the specified number of hops.

In [8]:
r = bikidata.query({"filters":[{"p":"fts 1", "o":"freestyle"}]})
print(set([entry for entry in r["results"]][:10])) # Let's just print the first 10

## Combining multiple filters with operators

In [None]:
# Choose some ranom items, but only those that are of type Person
r = bikidata.query(
    {
        "filters" : [
            {"p":"id", "o":"random 1000"},
            {"op":"and","p":"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>", "o":"<http://xmlns.com/foaf/0.1/Person>"}
        ]
    }
)
print([iri for iri in r['results']])

In [21]:
# Same as previous ranom person query, but now exclude items with a team property of South Africa
r = bikidata.query(    
    { "size": 5,
        "filters" : [
            {"p":"id", "o":"random 1000"},
            {"op":"and","p":"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>", "o":"<http://xmlns.com/foaf/0.1/Person>"},
            {"op":"not", "p":"<http://dbpedia.org/ontology/team>", "o":"<http://wallscope.co.uk/resource/olympics/team/SouthAfrica>"}
        ]
    }
)
print([(iri, obj.get('<http://dbpedia.org/ontology/team>')[0]) for iri, obj in r['results'].items()])


## Aggregates

Calculating aggregates over non-trivial RDF datasets using SPARQL can be a very computationally intensive operation. With bikiDATA we can do fast aggregates, that are very useful when building applications that support search and filtering.

In [24]:
r = bikidata.query(
    {
        "aggregates" : ['properties'],
    }    
)
print([f"{iri} {count}" for field, agg in r['aggregates'].items() for count,iri in agg] )

In [25]:
r = bikidata.query(
    {
        "size": 0,
        "filters" : [
            {"p":"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>"}            
        ],
        "aggregates" : ['<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>']
    }    
)
print([f"{iri} {count}" for field, agg in r['aggregates'].items() for count,iri in agg] )

In [28]:
r = bikidata.query(
    {
        "size": 0,
        "filters" : [
            {"p":"fts", "o":"bla"}            
        ],
        "aggregates" : ['<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>']
    }    
)
print([f"{iri} {count}" for field, agg in r['aggregates'].items() for count,iri in agg] )

This notebook gratefully acknowledges the work done by Angus Addlesee in [this Github repo](https://github.com/wallscope/olympics-rdf), to support [his article on "Creating Linked Data"](https://medium.com/wallscope/creating-linked-data-31c7dd479a9e) (from 2018!) 
