# Automatically Paginating a SPARQL query

In [1]:
!pip install sparqlwrapper

[33mYou are using pip version 19.0.3, however version 20.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import sys
sys.path.append("..")

from itertools import islice
from elastic_wikidata import sparql_helpers, sparql_to_es

from tqdm.auto import tqdm

Let's write a query to get all humans. There are over 8 million humans on Wikidata so we'll get a timeout if we try to run the entire query at once. 

In [3]:
query = """
SELECT ?human WHERE {{
   ?human wdt:P31 wd:Q5.  
}}
"""

We can use `elastic_wikidata` to paginate the query instead.

In [4]:
pages = sparql_helpers.paginate_sparql_query(query, page_size=500)
next(pages)



'\nSELECT ?human WHERE {{\n   ?human wdt:P31 wd:Q5.  \n}}\n\n        LIMIT 500\n        OFFSET 0\n        '

## Running paginated queries

Putting this all together, we can use `sparql_to_es.get_entities_from_query` to:
1. paginate a query to fetch entities
2. run each page against the Wikidata Query Service
3. combine the results

In [6]:
entities = sparql_to_es.get_entities_from_query(query, page_size=100, limit=1000)

print(f"{len(entities)} entities returned")
print(f"{len(set(entities))} unique entities returned")

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))


1000 entities returned
1000 unique entities returned
