<div align="center" style="border:solid 1px gray;">
    <a href="https://openalex.org/">
        <img src="../../resources/img/OpenAlex-banner.png" alt="OpenAlex banner" width="300">
    </a>
</div>

# Turn the page
❓ Let's say we query OpenAlex for a [list of entities](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities). By default the API only gives us the first 25 results of the list. Why is that ❓

>Just like books split large amounts of text and distribute it onto **pages**, the OpenAlex API does the same with a (potentially massive) list of entities.

It makes the data more manageable for both sides: We get small amounts of data that fit into our computer's memory in a reasonable amount of time, while the OpenAlex API needs to process less data at once and can serve more requests to more users.


👉 So, coming back to our question of why only 25 results: That is only the first page with a partial list of results!  
The API even tells us this. Every page includes a **meta section** with the following information:

<div align="center" style="margin-bottom:30px; box-shadow: rgba(50, 50, 93, 0.25) 0px 2px 5px -1px, rgba(0, 0, 0, 0.3) 0px 1px 3px -1px;">
    <img src="../../resources/img/notebooks/meta-object.png" alt="meta object" width="50%" height="50%">
</div>


In order to get the complete list, we need to "_leaf through_" all the pages. But how do we do that?  
There are two techniques the OpenAlex API offers: **_🔢 basic paging_** and **_↪️ cursor paging_**. Let's get to know them!

<div style="background:#e7edf7; border-left:solid 2px blue; padding-left:10px">
    <b> 💡 Use the Polite Pool</b><br>
While it is always a good idea to use the <a href="https://docs.openalex.org/how-to-use-the-api/rate-limits-and-authentication#the-polite-pool">polite pool</a>, this holds especially true for paging. The polite pool has much faster and more consistent response times, so for multiple requests these gains in response time will aggregate and speed up your application!
</div>

<hr>

## 🔢 Basic paging
[Basic paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#basic-paging) is the simplest form of paging and works like this: 
* All pages are numbered **from 1 to n**  
*We can determine n by dividing meta's `count` by `per_page` and rounding the result up to the next integer.* 
* To request one of the pages, we add the **`page` parameter** to the URL and put the page number as its value,  
e.g. for requesting page 2, we add <i>ht<span>tps://</span>api.openalex.org/works?filter=author.id:A5048491430<b>&page=2</b></i>

### Within limits
<div style="background:#e7edf7; border-left:solid 2px yellow; padding-left:10px;">
    ⚠️  While basic paging is easy to use, it only works for the <b>first 10,000 results</b> of any list.<br>
    If we want to see more than 10,000 results, we'll need to use cursor paging.
</div>

### Example
Let's look at an example, where we want to retrieve a complete list of all publications from an author and print their OpenAlex IDs.  
Given the OpenAlex ID for the author `A5048491430` the URL would be: https://api.openalex.org/works?filter=author.id:A5048491430.

To loop through all pages, we start by setting `page=1` and then repeating:
* request the specified page by adding the `page` parameter to the URL
* print all of the OpenAlex IDs from the publications on this page in blocks of five
* update `page` parameter to `page`+1

until *either* there are no more results on the requested page *or* the next request would exceed 10,000 results.

In [1]:
import requests

# url with a placeholder for page number
example_url_with_page = 'https://api.openalex.org/works?filter=author.id:A5048491430&page={}'

page = 1
has_more_pages = True
fewer_than_10k_results = True

# loop through pages
while has_more_pages and fewer_than_10k_results:
    
    # set page value and request page from OpenAlex
    url = example_url_with_page.format(page)
    print('\n' + url)
    page_with_results = requests.get(url).json()
    
    # loop through partial list of results
    results = page_with_results['results']
    for i,work in enumerate(results):
        openalex_id = work['id'].replace("https://openalex.org/", "")
        print(openalex_id, end='\t' if (i+1)%5!=0 else '\n')

    # next page
    page += 1
    
    # end loop when either there are no more results on the requested page 
    # or the next request would exceed 10,000 results
    per_page = page_with_results['meta']['per_page']
    has_more_pages = len(results) == per_page
    fewer_than_10k_results = per_page * page <= 10000


https://api.openalex.org/works?filter=author.id:A5048491430&page=1
W2046766973	W2741809807	W2045657963	W1572136682	W2066415719
W2170531319	W1963524534	W1553564559	W2003014790	W2051771537
W1987881751	W2980172586	W2095083909	W1528782725	W2102613218
W4235038322	W2014140050	W2109312864	W3071882161	W1501540670
W2103827239	W4229010617	W2133737815	W4366077396	W2171848392

https://api.openalex.org/works?filter=author.id:A5048491430&page=2
W4245410681	W3021154342	W2017292130	W2168771768	W4213202391
W4236031980	W1945323029	W2103382090	W2105695765	W3084168212
W4211010643	W1934573562	W2050143895	W2065622609	W2110180658
W2941875476	W4237216357	W4242907897	W4244183537	W4247478427
W4287670050	W104609242	W1972136887	W2005148091	W2010883332

https://api.openalex.org/works?filter=author.id:A5048491430&page=3
W2108112433	W2154768595	W2255028491	W2284153834	W2307679124
W2398849157	W2402184614	W2414739039	W2613086963	W2727815292
W2740744046	W2949915600	W2951362513	W2979437137	W3084303366
W3168937413	W3206

<hr>

## ↪️ Cursor paging
[Cursor paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#cursor-paging) is a bit more complicated than basic paging, but it allows us to access as many records as we like. 


To use cursor paging,
* we add the **`cursor` parameter** with a start value of `*` to our first query,  
e.g. <i>ht<span>tps://</span>api.openalex.org/works?filter=author.id:A5048491430<b>&cursor=*</b></i>

* The response to our query will now include a `next_cursor` value in the response's `meta` section.  
To retrieve the next page, we **copy `meta.next_cursor`** into the cursor field of our URL.

* To get all the results, we keep repeating the second step until `meta.next_cursor` is null.

<div align="center" style="margin-bottom:30px; box-shadow: rgba(50, 50, 93, 0.25) 0px 2px 5px -1px, rgba(0, 0, 0, 0.3) 0px 1px 3px -1px;">
    <img src="../../resources/img/notebooks/cursor-paging.png" alt="cursor paging">
</div>

### With great power comes great responsibility
Cursor paging is very powerful, since there is no limit on the number of pages you can request. Please use it responsibly!
<div style="background:#e7edf7; border-left:solid 2px red; padding-left:10px">
    🚫 <b>Don't use cursor paging to download a very large or even the whole dataset</b>
    <ul>
        <li>It's bad for you because it will take many days to page through a long list like '/works' or '/authors'.</li>
        <li>It's bad for the OpenAlex API (and other users!) because it puts a massive load on their servers.</li>
    </ul>

 Instead, download everything at once, using the <a href="https://docs.openalex.org/download-all-data/openalex-snapshot">data snapshot</a>. It's free, easy, fast, and you get all the results in same format you'd get from the API.
</div>

### Example
Let's look at the same example as before, where we want to retrieve a complete list of all publications from an author and print their OpenAlex IDs.  

To loop through all pages, we start by setting `cursor=*` and then repeating:
* request the specified page by adding the `cursor` parameter to the URL
* print all of the OpenAlex IDs from the publications on this page in blocks of five
* update `cursor` parameter to `meta.next_cursor`

until `meta.next_cursor` is null and the list of results is empty.

In [2]:
import requests

# url with a placeholder for cursor
example_url_with_cursor = 'https://api.openalex.org/works?filter=author.id:A5048491430&cursor={}'

cursor = '*'

# loop through pages
while cursor:
    
    # set cursor value and request page from OpenAlex
    url = example_url_with_cursor.format(cursor)
    print("\n" + url)
    page_with_results = requests.get(url).json()
    
    # loop through partial list of results
    results = page_with_results['results']
    for i,work in enumerate(results):
        openalex_id = work['id'].replace("https://openalex.org/", "")
        print(openalex_id, end='\t' if (i+1)%5!=0 else '\n')

    # update cursor to meta.next_cursor
    cursor = page_with_results['meta']['next_cursor']


https://api.openalex.org/works?filter=author.id:A5048491430&cursor=*
W2046766973	W2741809807	W2045657963	W1572136682	W2066415719
W2170531319	W1963524534	W1553564559	W2003014790	W2051771537
W1987881751	W2980172586	W2095083909	W1528782725	W2102613218
W4235038322	W2014140050	W2109312864	W3071882161	W1501540670
W2103827239	W4229010617	W2133737815	W4366077396	W2171848392

https://api.openalex.org/works?filter=author.id:A5048491430&cursor=Ils3LCAnaHR0cHM6Ly9vcGVuYWxleC5vcmcvVzIxNzE4NDgzOTInXSI=
W4245410681	W3021154342	W2017292130	W2168771768	W4213202391
W4236031980	W1945323029	W2103382090	W2105695765	W3084168212
W4211010643	W1934573562	W2050143895	W2065622609	W2110180658
W2941875476	W4237216357	W4242907897	W4244183537	W4247478427
W4287670050	W104609242	W1972136887	W2005148091	W2010883332

https://api.openalex.org/works?filter=author.id:A5048491430&cursor=IlswLCAnaHR0cHM6Ly9vcGVuYWxleC5vcmcvVzIwMTA4ODMzMzInXSI=
W2108112433	W2154768595	W2255028491	W2284153834	W2307679124
W2398849157	W24021846

<hr>

What we covered in this notebook is quite technical and might be a bit for beginners to take in, so
please don't worry too much, if you need to reread it or need additional clarifying. 
The main concept to take away is that 
* the OpenAlex API distributes result lists into smaller chunks called pages 
* and thus to retrieve a complete result list, we have to manually or programatically "leaf" though these pages.

Happy paging! 😎