### Getting data with APIs

The client refers to our application that will access some user data. We should save the Client ID and Secret somewhere as they will be useful later.

In [2]:
import json
# Load credentials
with open('client-credentials.json') as file:
    client_credentials = json.load(file)

print('Credentials:', list(client_credentials.keys())) # ['client_id', 'client_secret']

Credentials: ['client_id', 'client_secret']


The json.load() function returns the JSON content as a Python dictionary that we can use to access the client information

In [2]:
print(client_credentials['client_id']) # Client ID

76830


The Strava API is made for applications that need to access user data from Strava. In this scenario, the app (client) has to be authorized by the users to access and manage their Strava data. 

In our case, we analyze our own data, but the process is the same - we need to explicitly authorize our app to access and manage our data. Let’s see how to do this.

Authorization is done in the browser via a URL that links to the website Authorization Service. For Strava, it is: https://www.strava.com/oauth/authorize

If you copy/paste this link in a browser, you should get the following message - note that the formatting depends on the browser, but the content should be the same

{
  "message": "Bad Request",
  "errors": [
    {
      "resource": "Application",
      "field": "client_id",
      "code": "invalid"
    }
  ]
}

This message (in JSON format) simply says that our authorization request is invalid. In fact, we need to provide a few additional parameters to the Strava Authorization Service! This is done via GET parameters which is a way to pass (key, value) information pairs via URLs

GET syntax - passing parameters via a URL
url?key1=value1&key2=value2&key3=value3


According to the Strava Authentication API reference, we need to provide the following parameters: client_id which identifies our application, scope which defines the requested scope/rights for our application ex. read user profile, read user activities. Let’s store all the required GET parameters in a Python dictionary. We get the client_id from the json file and set the scope to get full read access to the user data. Also, as specified in the Strava API Authentication Reference, we need to set a response_type parameter to 'code' to explicitly say that Strava should return us with some Access Code for this user data.

In [3]:
oauth_params = {
    'client_id': client_credentials['client_id'],
    'scope': 'read_all,profile:read_all,activity:read_all',
    'redirect_uri': 'https://localhost',
    'response_type': 'code'
}

Let’s now create the authorization link with the GET syntax that we saw above. In practice, URLs follow strict rules and we typically use a function to achieve this. In Python, we can use the urlencode() one from the built-in urllib Python module

In [4]:
from urllib.parse import urlencode

# Generate link that users can copy/paste in their browser to authorize our app
print('https://www.strava.com/oauth/authorize' + '?' + urlencode(oauth_params))

https://www.strava.com/oauth/authorize?client_id=76830&scope=read_all%2Cprofile%3Aread_all%2Cactivity%3Aread_all&redirect_uri=https%3A%2F%2Flocalhost&response_type=code


After authorizing the application, Strava redirects the user to the app redirect URI i.e. localhost in our case. Since our computer doesn’t host any web server, you should see an empty page saying that the site can’t be reached. This isn’t an issue in our case! We can manually copy the access code from the URL

https://localhost/?
  state=&
  code=...&   <--- Here is our access code
  scope=read,activity:read_all,profile:read_all,read_all

As we can see, Strava returned three GET parameters

* code which is the Access Code needed to retrieve the user data
* scope which lists the scopes accepted by the user
* state which is an optional flag from the Strava API used for security reasons - not important in our case since we’re not really building an app but just doing some data analysis

The important parameter here is the user Access Code. Let’s extract it from the URL.

Again, the access code is private and shouldn’t be shared via our notebook. One solution is to automatically extract it from the URL. First, let’s load the URL into memory with the Python getpass() built-in function

In [None]:
from getpass import getpass

# After authorizing the app, user is redirected to
authorization_response = getpass(prompt='Full callback URL')

You should get a prompt when running this code - copy/paste the https://localhost/?state=&code=...&scope=... URL that you got from the Strava Authorization Service from above. Python will store it in the authorization_response variable.

Let’s now extract the code part from it

In [None]:
from urllib.parse import urlparse, parse_qs

# Extract Authorization Code from URL
authorization_code = parse_qs(urlparse(authorization_response).query)['code'][0]

In this code, we use urlparse from the Python urllib module to parse the URL into a ParseResult object from which we get the state=&code=...&scope=... part of the URL with its .query attribute.
You can always verify that it worked by printing the intermediate results in a new cell - note: as the code should be kept private, make sure to then remove the print statement and output from your notebook!

In [None]:
urlparse(authorization_response).query # state=&code=...&scope=...

We then extract the GET parameters using the parse_qs() function which returns a Python dictionary with one entry per result. The ['code'][0] simply says: get the first (and only) value of the code GET variable from the parsed parameters.
Again, don’t hesitate to print intermediate results to better understand the role of each function in the code from above

In [None]:
parse_qs(urlparse(authorization_response).query)
# Returns
# {'code': ['...'],
#  'scope': ['read,activity:read_all,profile:read_all,read_all']}

If you print the authorization_code, you should see that it correctly extracted the code

In [None]:
print(authorization_code)

### Get access token

Now that our app has been authorized to access data from the user via the Access Code, we need to retrieve an Access Token from Strava to actually perform API queries and access user resources

Why do we need an Access Token? Isn’t the Access Code sufficient? This actually depends on the API. Strava implements the OAuth 2.0 protocol where the app first needs to get some Access Code before getting the final Access Token that can be used to retrieve data. However, you might work later with other APIs that directly provide the Access Token. For instance, this was the case with the Strava API before October 2018 when they adopted the OAuth 2.0 standard.

Let’s get back to our Strava API example. According to its authentication documentation, we can get the access token by making a POST request to https://www.strava.com/oauth/token

You can think of GET requests as the addresses that are in the browser URL field at the top of the window. They are used to get content from the web but are not necessarily meant to send data to a web service besides short GET variables as we saw above. On the other hand, POST requests are used to send/post data to a web service, data that will typically be stored by it. In our API scenario which exposes entries of user data, we will typically use

* GET queries to list the entries
* POST queries to create new entries
* PUT queries to update existing entries

We can use the Python Requests library to perform those three types of queries via its .get(), .post() and .put() functions. Let’s see how to do the POST one to get the access token

In [16]:
import requests

# Exchange Authorization Code for Access Token
r = requests.post('https://www.strava.com/oauth/token', data={
    'client_id': client_credentials['client_id'],
    'client_secret': client_credentials['client_secret'],
    'code': authorization_code,
    'grant_type': 'authorization_code'
})
r.status_code # 200

400

The function takes two arguments, the URL and the data that let us send the fields defined in the Strava API token exchange guide

* The client_id and client_secret which identify our application
* The access code code
* A grant_type field which states that we “post” the authorization code

The function returns an object of type Response with the response from the Strava service. The first thing to do is to verify that the request was successful. We can achieve that by printing the status_code attribute. In our case, we get 200 for the status which means that the request was successful. In general 2xx codes are for successful requests whereas 4xx ones are returned when something is wrong ex. the famous 404 “Not Found” one - see List of HTTP status codes for the complete list of codes.

The Response object stores the result of the request in its .text attribute. In our case, it should contain our Access Token - note: again, the token should be kept private, make sure to then remove the print statement and output from your notebook!

In [None]:
print(r.text) # ex. {"token_type":"Bearer","expires_at": ... }

As we can see, the token is actually a JSON object of (key, value) pairs which is common practice for HTTP API responses. We can use the .json() method from the Response object to parse the result into a Python dictionary

In [None]:
r.json()

# Outputs:
# {'token_type': 'Bearer',
#  'expires_at': 1558..,
#  'expires_in': 21600,
#  'refresh_token': '6001..',
#  'access_token': '8bea..',
#  'athlete': {'id': ..,
#   ...
#   'follower': ..}}

This token contains all the information we need to retrieve the data from the Strava API

* The access_token - we will use it in all of our API requests expires_at and expires_in which specify when the token expires
* A refresh_token to get a new Access Token when this one expires
* Strava also sends some basic information about the user in an athlete field

Let’s store the token as we will need it later when interacting with the Strava API. One solution is to store it in a .json file that we can later reload in our code and notebooks. We can do this via a token_saver() function that takes the JSON object, creates a token.json file and saves it with the json.dump() function

In [None]:
# Token saver
def token_saver(token_obj):
    with open('token.json', 'w') as file:
        json.dump(token_obj, file, indent=4)

token_saver(r.json())

You can verify that the token is saved by simply opening the token.json file ex. directly in JupyterLab or any other text editor.

### Refreshing tokens

Before starting making requests to the Strava API, let’s quickly discuss how access tokens work and when they should be refreshed.

First, let’s start by reloading the token - as this is a frequent operation, we can also write a small function for this

In [None]:
# Token loader
def get_token():
    with open('token.json', 'r') as file:
        return json.load(file)

token = get_token()
token.keys() # 'token_type', 'expires_at', 'expires_in', 'refresh_token', 'access_token', 'athlete'

We saw above that the token has two fields that specify when the token expires

In [None]:
print('Expires in:', token['expires_in']) # initially: 21600 (6 hours)
print('Expires at:', token['expires_at']) # in seconds

Those are both in seconds

* expires_in is initially 21600 because the access token lives for 6 hours
* expires_at is when the token expires

The second field is a Unix timestamp which is defined as the number of seconds since Jan 1, 1970. To get the corresponding date, we can use functions from the Python datetime built-in module

In [None]:
from datetime import datetime, timedelta

print('Expires at:', datetime.fromtimestamp(token['expires_at'])) # date, time
print('Expires in:', timedelta(seconds=token['expires_in'])) # time delta

When the token expires, we should reach the Strava token API and ask for a new one by simply issuing a new POST request with our refresh_token

In [None]:
# Refresh expired Access Tokens
r = requests.post('https://www.strava.com/oauth/token', data={
    'client_id': client_credentials['client_id'],
    'client_secret': client_credentials['client_secret'],
    'refresh_token': token['refresh_token'],
    'grant_type': 'refresh_token'
})
token_saver(r.json())
token = get_token()

Note that this time, the grant_type is refresh_token instead of authorization_code. We should now have a new token if the last one expired or a copy of the current one if it’s still valid.

### Reach API using Python Requests

Finally, let’s get some data from the API. Strava exposes several endpoints such as

* /activities to add, retrieve and update the athlete activities
* /clubs to list the athlete clubs
* /routes, /segments and so on
APIs usually provide an API Reference Documentation that lists the different endpoints and explains how they work. For Strava, you can take a look at developers.strava.com/docs/reference. For this example, we will simply list some of the athlete activities with the /athlete/activities one.

As you can see in the related documentation entry List Athlete Activities, this endpoint accepts GET requests and has a few optional parameters ex. to specify the date interval or the desired number of results. For this example, we will use the defaults and only specify the Access Token


In [None]:
# List activities
r = requests.get('https://www.strava.com/api/v3/athlete/activities', params={
    'access_token': token['access_token']
})
r.status_code # 200

As you can see, the syntax is similar to the one from the code with the post request from above. We specify the URL of the API endpoint, and pass the parameters as a Python dictionary in a params argument. Again, we check that the request was successful by printing the Response status code.

Finally, let’s save the result in an activities.json file. Again, we use the .json() method as the API returns a JSON response

In [None]:
# Save activities
with open('activities.json', 'w') as file:
    json.dump(r.json(), file, indent=4)

You should be able to see the different activities saved in your Strava account by opening the file in a text editor.

Since the results are stored on the disk, we can reuse them later in our analyzes. For instance, let’s load them in a Pandas DataFrame

In [None]:
# Load data into DataFrame
activities_df = pd.read_json(r.text)
activities_df[['name', 'type', 'distance', 'elapsed_time', 'max_speed']]

The result depends on the activities logged in the Strava account linked to our Access Token - which, in our case, only has two activities!

### With requests_oauthlib

In the example from above, we saw how to reach the Strava API by making all the required requests manually with the Python Requests .get() and .post() methods which should give you a good overview of interacting with a web service in general. However, we mentioned above that Strava implements OAuth 2.0 which is a very common protocol for APIs, and we can easily find Python libraries that simplify a bit the entire process. Let’s quickly see how using them would change our code!

For this example, we will use requests-oauthlib which is a popular Python library (> 1,000 stars on GitHub) that implements the OAuthlib protocol with Python Requests i.e. the library that we used above to manually perform GET/POST requests. Let’s first install it in the course environment. Open a new terminal window, activate the environment and install the library with

- Activate environment
conda activate adsml

- Install "requests-oauthlib" Conda package
conda install requests-oauthlib

from requests_oauthlib import OAuth2Session

The first step is to create an OAuth2Session object with the client information, redirect URI and requested scope


In [7]:
from requests_oauthlib import OAuth2Session

# Create a session for initialization
init_session = OAuth2Session(
    client_credentials['client_id'],
    redirect_uri='https://localhost',
    scope='read_all,profile:read_all,activity:read_all'
)

We can now automatically generate the authorization link with

In [8]:
# Get authorization link
user_link, state = init_session.authorization_url('https://www.strava.com/oauth/authorize')
print('Visit link:', user_link)

Visit link: https://www.strava.com/oauth/authorize?response_type=code&client_id=76830&redirect_uri=https%3A%2F%2Flocalhost&scope=read_all%2Cprofile%3Aread_all%2Cactivity%3Aread_all&state=PKdO6YqNnnB14gBgDJWY3MWJYvm82W


In [None]:
from getpass import getpass
authorization_response = getpass(prompt='https://localhost/?state=PKdO6YqNnnB14gBgDJWY3MWJYvm82W&code=73e621c0cd6c663fb88d54918145a7a6803a10c8&scope=read,activity:read_all,profile:read_all,read_all')

We can now use our init_session object from above to retrieve the Access Token with its fetch_token() method

In [None]:
# Get Access Token
token = init_session.fetch_token(
    'https://www.strava.com/oauth/token',
    authorization_response=authorization_response,
    include_client_id=True,
    client_secret=client_credentials['client_secret']
)

Again, let’s save the token with our function from above. If you then open the token.json file, you should see that it’s similar to the one that we got above

In [None]:
token_saver(token)

To interact with the API, we need to create a new OAuth2Session object with our client_id and token

In [None]:
# Create a session for reaching the API
api_session = OAuth2Session(
    client_credentials['client_id'],
    token=token, # pass Access Token
    
    # Automatically refresh expired token
    auto_refresh_url='https://www.strava.com/oauth/token',
    auto_refresh_kwargs={
        'client_id': client_credentials['client_id'],
        'client_secret': client_credentials['client_secret']
    },
    token_updater=token_saver # automatically saves new tokens
)

The object also provides a way to automatically refresh expired tokens with a few additional parameters

* auto_refresh_url the URL of the token service
* auto_refresh_kwargs to pass additional values to the * service when needed ex. Strava requires the client_id and client_secret in our case
* token_updater a function to automatically save the new token

We can now use this new api_session object to make requests

In [None]:
# List activities
r = api_session.get('https://www.strava.com/api/v3/athlete/activities')
r.status_code # 200

Creating the DataFrame from the Response object directly should give us the same result

In [None]:
activities_df = pd.read_json(r.text)
activities_df[['name', 'type', 'distance', 'elapsed_time', 'max_speed']]