# Obtaining Data with APIs

Author: Mike Wood

By the end of this notebook, you should be able to:
1. Access the Google API
2. Access metadata from a YouTube video
3. Obtain information about messages in your Gmail inbox

In this lesson, we will explore how to obtain data using APIs with Python. We will use the Google API as an example for getting started with APIs.

```{note}
The examples in this notebook require credentials specific to my Google account. As I don't want to put this information online, I have output the results of my API requests into a pickle file provided with this notebook. I will pre-load the results of the requests from this pickle file in the following code block.
```

In [1]:
# here, I will load the pre-staged API requests
import pickle
with open('api_requests.pkl','rb') as prestaged_requests:
    response1 = pickle.load(prestaged_requests)
    response2 = pickle.load(prestaged_requests)
    response3 = pickle.load(prestaged_requests)
    response4 = pickle.load(prestaged_requests)
    response5 = pickle.load(prestaged_requests)

**New conda packages**

To use this notebook, we will need to install the following conda packages:
```
conda install google-api-python-client
conda install google-auth-oauthlib
conda install google-auth-httplib2
```
After these packages are installed, import them below:

In [2]:
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors

## Google APIs
Google has a many different APIs that can be used to access data in applications hosted by Google. To begin, we will create a **New Project** in the [Google Cloud Console](https://console.cloud.google.com/).

After creating a project, we will navigate to the **Credentials** tab under **APIs and Services**. In this tab, choose **+ Credentials** and create an **API Key**. 

After you have created your key, enter it below:

In [3]:
# define your API key here
API_KEY = ''

Now that the API Key is generated and stored here, navigate to the API **Library** and scroll down to the **Youtube Data API v3**. Click on this API and choose to **Enable** the API. With this API enabled, we can now access data from YouTube using Google tools.

## The YouTube API
The Youtube API has a variety of functions for accessing information about YouTube including videos, channels, and users as well as functionality to post information to YouTube. For reference, all of the available API functions are documented on the following page: https://developers.google.com/youtube/v3/docs/?apix=true

We will explore a few of these functions in this notebook.

### Accessing data about a single video
Files on YouTube are public which means we can access them via the API without providing any credentials. For example, we might want to know what the statistics are for the Khan Academy video that serves as an introduction to Python: https://www.youtube.com/watch?v=husPzLE6sZc&list=PLJR1V_NHIKrCkswPMULzQFHpYa57ZFGbs

First, we define the API service and version for the Youtube API:

In [4]:
# define the api service and version
api_service_name = "youtube"
api_version = "v3"

Next, we make a request to the API for information about the video:

```
# access the youtube api client
youtube = googleapiclient.discovery.build(
    api_service_name, api_version, developerKey = API_KEY)

# make a request to the api for information about the video above
# note: the id of the video is the part after v= ....
request = youtube.videos().list(part="snippet,statistics",
                                id='husPzLE6sZc')
```

```
# execute the query in the request
response1 = request.execute()
```

The reponse from the query will return a series of nested dictionaries with several entries for the metadata of the video:

In [5]:
# check the keys of the dictionary
print('keys:')
print(response1.keys())

# the items list provides the most pertinent metadata components
print("\nresponse['items']")
print(response1['items'])

# for example,
# the title of the video is in the 'snippet' dictionary
print("\ntitle:")
title = response1['items'][0]['snippet']['title']
print(title)

# the number of views is in the 'statistics' dictionary
print("\nviewCount:")
viewCount = response1['items'][0]['statistics']['viewCount']
print(viewCount)

keys:
dict_keys(['kind', 'etag', 'items', 'pageInfo'])

response['items']
[{'kind': 'youtube#video', 'etag': '-m6myrYh-VSpVqRf0PAx_UgDin8', 'id': 'husPzLE6sZc', 'snippet': {'publishedAt': '2011-06-30T15:50:46Z', 'channelId': 'UC4a-Gbdw7vOaccHmFo40b9g', 'title': 'Introduction to Programs Data Types and Variables', 'description': 'Writing a basic program.  Basics of data types, variables and conditional statements', 'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/husPzLE6sZc/default.jpg', 'width': 120, 'height': 90}, 'medium': {'url': 'https://i.ytimg.com/vi/husPzLE6sZc/mqdefault.jpg', 'width': 320, 'height': 180}, 'high': {'url': 'https://i.ytimg.com/vi/husPzLE6sZc/hqdefault.jpg', 'width': 480, 'height': 360}, 'standard': {'url': 'https://i.ytimg.com/vi/husPzLE6sZc/sddefault.jpg', 'width': 640, 'height': 480}, 'maxres': {'url': 'https://i.ytimg.com/vi/husPzLE6sZc/maxresdefault.jpg', 'width': 1280, 'height': 720}}, 'channelTitle': 'Khan Academy', 'categoryId': '27', 'liveBroadc

### &#x1F914; Mini-Exercise
Goal: Find the number of likes the video linked above.

#### &#x1F4A1; Solution

In [6]:
# the number of likes is in the 'statistics' dictionary
likeCount = response1['items'][0]['statistics']['likeCount']
print('Likes:',likeCount)

Likes: 3697


### Accessing data about a Channel
As you are likely familiar, a channel typically has a library of different videos. We can also access information about the channel using the API. To search a channel, we need the `channelId`. We can obtain this using a request on an individual video. Equivalently, we can search the html page source on the page itself (search for `channelID` in the page source).

In [7]:
# define the channelID
channelId = 'UC4a-Gbdw7vOaccHmFo40b9g'

```
# make a request to the api for information about the channel above
request = youtube.channels().list(part="snippet,statistics",
                                  id=channelId)

# execute the query in the request
response2 = request.execute()
```

Similar to above, the response yields a nested set of dictionaries with metadata about the channel. 

In [8]:
# check the keys of the dictionary
print('keys:')
print(response2.keys())

# for example,
# the description of the channel is in the 'snippet' dictionary
print('\ndescription:')
title = response2['items'][0]['snippet']['description']
print(title)

# the number of views is in the 'statistics' dictionary
print('\nviewCount:')
viewCount = response2['items'][0]['statistics']['viewCount']
print(viewCount)

keys:
dict_keys(['kind', 'etag', 'pageInfo', 'items'])

description:
Khan Academy is a nonprofit providing a free, world-class education for anyone, anywhere. Our interactive practice problems, articles, and videos help students succeed in math, biology, chemistry, physics, history, economics, finance, grammar, and many other topics.

Khan Academy provides teachers with data on how their students are doing so they can identify gaps in learning and provide tailored instruction. We  also offer free personalized SAT and LSAT practice in partnership with the College Board and the Law School Admission Council. 

Our resources have been translated into dozens of languages, and 15 million people around the globe learn on Khan Academy each month.

Want to extend your learning beyond YouTube? Practice what you've just discovered for free: www.khanacademy.org


viewCount:
2155567083


### &#x1F914; Mini-Exercise
Goal: Get the number of subscribers for the Khan Academy Youtube Channel.

#### &#x1F4A1; Solution

In [9]:
# the number of subscribers is in the 'statistics' dictionary
subscriberCount = response2['items'][0]['statistics']['subscriberCount']
print('subscriberCount:',subscriberCount)

subscriberCount: 8830000


### The Cost of API inquiries
Each time your make a request to the Google API, it comes with a "cost". The list methods for videos and channels methods both come with a "cost" of 1 unit. By contrast, if you'd like to do a generic search of all of YouTube, it comes with a cost of 100 units. Each day, the Youtube API allows you 10000 units for free. If you'd like to do more searches than that, you can pay money to Google for more possibilities. This pay-for-access programmatic model is the underlying business model for almost all APIs on popular sites.

## Obtaining info from Gmail
Our Gmail accounts, by constrast with public Youtube videos, require our credentials to access. To obtain information from Gmail, we first need to obtain our OAuth 2.0 Client IDs. 

### Setting up OAuth 2.0 Client IDs

Navigate to the **APIs and Services** dropdown and select the **Libraries** tab (again). Now, select **Gmail** API and choose to Enable it. The top bar will indicate that you may need to authenticate your credentials. Click on **Create Credentials** and use the following options:
- select the option for **User Data** in Credential Type and click Next
- name your App something like `cs122` and choose your email address in the OAuth Screen
- don't add any scopes in the Scopes screen
- choose Desktop App in the OAuth client ID screen

When you have completed the steps above, you will be able to download a secret key for your credentials. Download the JSON file for your credentials and store it in your current directory (where this notebook is located). You may choose to rename your key as just `client_secret.json`. Now, we are almost ready to access Gmail with our credentials - we just need to make sure our account is recognized as a test user for our App.

Under the **APIs and Services** tab, navigate to the **Oauth consent screen** and to the **Audience** tab. Now, add your email under the **Test Users** section. Now, we're ready to go!

Begin by defining the api information, similar to what was done for YouTube above, along with information about the client file and the scopes:

In [10]:
# define the api service and version
api_service_name = "gmail"
api_version = "v1"

# define the client file and scopes
client_file = 'client_secret.json'
scopes = ['https://mail.google.com/']

The code block below will set up a service - the first time you run it, it will ask you to trust the application.

```
# authenticate the credentials
flow = google_auth_oauthlib.flow.InstalledAppFlow.from_client_secrets_file(client_file, scopes)
credentials = flow.run_local_server()
```

```
# define the API service
service = googleapiclient.discovery.build(api_service_name, api_version, credentials=credentials)
```

### Reading Gmail labels from your account

Now that the service is constructed with your credentials, you can now access the information in your inbox. Similar to the YouTube API, the Gmail API has lots of good documentation on the underlying methods here: https://developers.google.com/gmail/api/reference/rest

For example, what if we would like to get the "labels" on the messages our inbox? We can get these using the Gmail API methods:

```
# access the labels using the users() and labels() methods
response3 = service.users().labels().list(userId='me').execute()
```

In [11]:
# store the labels in a list
labels = response3.get('labels', [])

Now that we have obtained our labels, we can access them here:

In [12]:
# print the labels of your inbox
for label in labels:
    print(label['name'])

CHAT
SENT
INBOX
IMPORTANT
TRASH
DRAFT
SPAM
CATEGORY_FORUMS
CATEGORY_UPDATES
CATEGORY_PERSONAL
CATEGORY_PROMOTIONS
CATEGORY_SOCIAL
STARRED
UNREAD
300534061046890
300534060831190
300534064500960
300534062573580
300534061884700
300534060836350
300534060832230


This is a simple example, but you now have full control over your email with the Gmail API. For example, what if you wanted to access messages from a particular user?

### Reading messages from your inbox
From our inbox, we could search for messages that come from a common source. For this example, I am using a sample Gmail account used for a data stream that receives many messages from the email `sbdservice@sbd.iridium.com`. Let's grab the first "page" of these messages:

```
# search for a list of messages from sbdservice@sbd.iridium.com
response4 = service.users().messages().list(
    userId='me',
    q='sbdservice@sbd.iridium.com').execute()

# store the response into a messages response object
messages = response4.get('messages')
```

In [13]:
# store the response into a messages response object
messages = response4.get('messages')

Now, we have a subset of our messages. Next, we can read in the information about the most recent message:

In [14]:
# get the first message in the list
first_message = messages[0]

```
# get the details of the message
response5 = service.users().messages().get(
    userId='me',
    id=first_message['id']
).execute()
```

In [15]:
# print the body (or "snippet") of the message
print(response5['snippet'])

MOMSN: 6602 MTMSN: 0 Time of Session (UTC): Thu Mar 13 03:16:11 2025 Session Status: 00 - Transfer OK Message Size (bytes): 49 Unit Location: Lat = 69.40212 Long = -55.43375 CEPradius = 134


### &#x1F914; Mini-Exercise
Goal: Test this on your own - what is one email address that sends you consistent emails. Does the message you print out match the one you can find in your inbox?

### Modifying messages in your inbox
As mentioned above, you now have full control of your email (from Python!). You could read, draft, and send messages. For now, I will demonstrate how to change a message from "Unread" to "Read" and vice versa.

#### Changing a Message to Unread

```
# add an unread label to the message
service.users().messages().modify(userId='me', id=first_message['id'],
                                    body={'addLabelIds': ['UNREAD']}).execute()
```

#### Changing a Message to Read

```
# change the message back to read
service.users().messages().modify(userId='me', id=first_message['id'],
                                    body={'removeLabelIds': ['UNREAD']}).execute();
```

## A Gmail API Pipedream
Feeling inspired? Consider this pipedream project I have to save myself time answering emails:

Create a Python program to do the following:
1. Use the Google API to check for new emails.
2. Download the contents of the emails to my local computer
3. Use ChatGPT via the ChatGPT API to automatically draft a response to the emails in my inbox.
4. Use the Google API to stage my ChatGPT-drafted emails in my Drafts folder
5. When I have time for emails, I review the drafts before hitting send