In [1]:
%pip install ipython python-dotenv requests

Note: you may need to restart the kernel to use updated packages.


In [4]:
from IPython.display import Image
from dotenv import load_dotenv
import urllib.parse
import requests
import json
import csv
import os

## API Tutorial
This document has two aims:

- (1) To show you the process of registering and using an API application from Web of Science / Clarivate

- (2) Utilize a script written by the author to efficiently retrieve results from multiple pages.

### Create an Account or Log In
The first step is to register or log into https://developer.clarivate.com/ .

![Step 1](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.40.48 PM.png)

### Register the Application Step 1
After you will register for an API application. Please click "Register".

![Step 2](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.41.46 PM.png)

### Register a New Application Step 2
Then click on "Register a new Application" ...

![Step 3](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.43.05 PM.png)

... and fill out the form.

![Step 4](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.43.45 PM.png)

### Approval
Once the registration is approved you can click on the Name link of your application as seen below.

![Step 5](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.44.04 PM.png)

Then click on the name if the API.

![Step 6](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.44.37 PM.png)

This will take you to a window where you can click the "Try It" button and you can then see the special UI for your registered application.

![Step 7](./tutorial_screenshots/Screenshot%202025-09-03%20at%203.45.19 PM.png)

The rest is play around and learn! This interface allows you to use the AP without code. It also shows you all teh advance search options and even allows you to perform a test search! This test search will tell you the curl command and show you the example output.

However, it is not the most efficient way to use the API. Below is a notebook that will help you get more results faster. It is written in python and utilizes requests in place of curl. This helps keep it all in one language! Just remember to not exceed the limit of your subscription!

### Script Step 1
This is where you will place your information such as API key, Database, and search phrase. All the abriviations are seen and explained in your individual APU UI. Please refer there.

In [5]:
# Load environment variables from .env file
load_dotenv()
# Use the environment variable, but fall back to a placeholder if it's not set.
# Replace 'YOUR_API_KEY' with your actual key if you are not using a .env file.
API_KEY = os.environ.get('API_KEY', 'YOUR_API_KEY')
database_code = os.environ.get('database_code')
page_limit = int(os.environ.get('page_limit', 50))
page_max = int(os.environ.get('page_max', 5))
sort_field = os.environ.get('sort_field')
ascending = os.environ.get('ascending')
base_url = os.environ.get('base_url')
search_phrase = os.environ.get('search_phrase')

print(search_phrase)

"climate change" OR "global warming" AND sustainability


In [6]:
# This is a definition used to outline a repeatable task.
# This function creates an encoded query string for a search phrase.
def create_encoded_query(phrase):
    """Encodes a string for use in a URL."""
    return urllib.parse.quote(phrase)


### Script Step 2
This step will build your url phrase that will be added to the base url. This creates the full request that will then be sent to the API.

In [7]:
# Adjust this line to adjust your search fields.
search_phrase_with_tag = f'TS=({search_phrase})'

params = {
    'db': database_code,
    'q': search_phrase_with_tag,
    'limit': page_limit
}

if ascending == 'true':
    params['sortField'] = f'{sort_field} A' # Note: API docs use a space not a + like in the UI.
elif ascending == 'false':
    params['sortField'] = f'{sort_field} D'
else:
    params['sortField'] = sort_field


In [8]:
headers = {
    'accept': 'application/json',
    'X-ApiKey': API_KEY
}

all_results = []


### Script Step 3
This section does the actual search, retrieving results from pages 1-X, where X is the page limit you defined earlier.

In [11]:
if not base_url:
    print("Error: 'base_url' is not set. Please check your .env file or environment variables.")
else:
    for i in range(1, page_max + 1):
      # Set the page for the current iteration
      params['page'] = i
      
      print(f"Requesting page {i}...")
      
      # Make the API call
      response = requests.get(base_url, headers=headers, params=params)
      
      # Check if the request was successful
      if response.status_code == 200:
          data = response.json()
          # The list of documents is in the 'hits' key of the JSON response
          page_hits = data.get('hits', [])
          if page_hits:
              # Add the results from this page to our master list
              all_results.extend(page_hits)
              print(f"  ... Success! Added {len(page_hits)} results.")
          else:
              # Stop if a page returns no results
              print("  ... No more results found. Stopping.")
              break 
      else:
          print(f"  ... Error on page {i}: Status code {response.status_code}")
          print(f"  ... Response: {response.text}")
          # Stop the loop if an error occurs
          break

Requesting page 1...
  ... Success! Added 50 results.
Requesting page 2...
  ... Success! Added 50 results.
Requesting page 3...
  ... Success! Added 50 results.
Requesting page 4...
  ... Success! Added 50 results.
Requesting page 5...
  ... Success! Added 50 results.


### Script Step 4
Finally, the list of JSONs is exported into a more human readable format, a csv. For down stream use.

In [12]:
# Define the name of the output CSV file
csv_file_name = 'wos_results.csv'

# Check if we have any results to write
if all_results:
    # Open the file in write mode
    with open(csv_file_name, 'w', newline='', encoding='utf-8') as csvfile:
        # The 'all_results' list contains dictionaries. We can use the keys
        # of the first dictionary as the headers for our CSV.
        headers = all_results[0].keys()
        
        # Create a DictWriter object which maps dictionaries to CSV rows
        writer = csv.DictWriter(csvfile, fieldnames=headers)
        
        # Write the header row
        writer.writeheader()
        
        # Write all the dictionary rows from our results list
        writer.writerows(all_results)
        
    print(f"\nSuccessfully saved {len(all_results)} results to {csv_file_name}")
else:
    print("\nNo results to save to CSV.")


Successfully saved 250 results to wos_results.csv
