<a href="https://colab.research.google.com/github/SusannYY/Unemployment-Rate-Analysis/blob/main/Week4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

In this week's lab, we are going to: 
1. Explore visualization methods using package Seaborn, which also often appear as a preferred requirement in industry recruitment. \
  
  Seaborn is basically like an extension based on Matplotlib, but has more beautiful and fancy customization of graph options.

  You can check out the [Seaborn](https://seaborn.pydata.org/index.html) tutorials for more information and more advanced usage! 
2. Build a basic API call and grab the dataset from the internet. (this may be a bit tricky but you don't need to understand all of the code when actually trying to extract unemployment data you are going to use for the final project.) API call of this lab is inspired by this [source](https://github.com/Strata-Scratch/api-youtube). You can also find more advanced information about how to upload your data to a database.

Now let's firt start by importing the library.

In [None]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Brief Seaborn Intro
![image.png](https://raw.githubusercontent.com/mwaskom/seaborn/master/doc/_static/logo-wide-lightbg.svg)

Checkout this collab: https://colab.research.google.com/drive/102qTiiOajayVIx0H9Ty5CnVU6z3diMmF?usp=sharing

# Basic API Call

## Why collect data from an API?

You might be thinking that why we don't just use a CSV or pull data from a database? There are two reasons why you should learn APIs:

1. An API is a very common industry, professional way of collecting data and so if you ever work as a data scientist, you’ll be required to learn how to do this.

2. It’s the more complicated and advanced way to collect data compared to pulling data from a database. So, just another reason to learn APIs and to impress your colleagues and hiring manager.

## What is API

![image.png](https://qph.cf2.quoracdn.net/main-qimg-864170b7cea6bd09ec3c876a77227117.webp) \
![image.png](https://cdn.sanity.io/images/oaglaatp/production/6fdc068d96ae659dac4248b1336fa9ce79b90cef-850x254.png?w=675&h=202&fit=crop)

## Some practice from Youtube API

Sources from: https://github.com/Strata-Scratch/api-youtube \
https://www.kdnuggets.com/2021/09/python-apis-data-science-project.html

First, we need to import the [requests](https://realpython.com/python-requests/) package. \
The request library is a library that is going to allow us to make API calls \
You can use this library to make a request to any API so depending on what API you want to grab data from, the techniques covered here will be the same. \
Then we have the Pandas library because we're going to save our data into a Pandas DataFrame and then there's a time library.

In [None]:
#import libraries
import requests
import pandas as pd
import time

The next step is to get an API key. We’re going to grab data from the Youtube API and specifically data about certain Channel. \
You can enable your API Key by going to this link - https://www.slickremix.com/docs/get-api-key-for-youtube/. \
The Channel ID won't be changed. It is directly directed to certain fixed channel

In [None]:
#keys
API_KEY = "AIzaSyBURo-IKzHYogbQDFSuEVaUqNUVWi1m7s0"
CHANNEL_ID = "UCW8Ews7tdKKkBT6GdtQaXvQ"

Let’s quickly test out an API call. Using the request library, you can make a call just by putting the URL of the API in the `get()` method. \
The data is located at api.github.com. We're passing the URL to the `get()` method and add the `json()` method which will return a JSON object in the response.

In [None]:
#make API call
response = requests.get('https://api.github.com').json()

What is a JSON file we get from Github API? \
We can simply print response using below command

In [None]:
response

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

This is the entire infor we get from this Github API. \
**Now think about this** \
If we want to get more detailed information, let's say the current user's url, what should we do? \
hint: how do you query data in a dictionary in python

In [None]:
response ["current_user_url"]

'https://api.github.com/user'

Congratsss! You now have sucessfully tested an API call. \
Now let's go back to our Youtube data!

We currently have a root URL: \
This is the location of our data.

In [None]:
url = "https://www.googleapis.com/youtube/v3/"

We still need to add parameters and properties to our url: \
We are going to make a “search” and include several parameters like “part”, “channelID” and my API key \
**We’re performing a “search” through the YouTube API. Everything to the right of the '?' is parameters we add to request specific information.** \
So the url becomes the following:



```
url = "https://www.googleapis.com/youtube/v3/" + "search?key=" + API_KEY + "&channelId=" + CHANNEL_ID + "&part=snippet,id&order=date&maxResults=10000" + pageToken
```



Next: make API call like the Github example:

In [None]:
pageToken = ""
url = "https://www.googleapis.com/youtube/v3/" + "search?key=" + API_KEY + "&channelId=" + CHANNEL_ID + "&part=snippet,id&order=date&maxResults=10000" + pageToken
response = requests.get(url).json()

In [None]:
response

{'kind': 'youtube#searchListResponse',
 'etag': 'Hi9BWR5gFwFd2318fgp4uTUUzt8',
 'nextPageToken': 'CDIQAA',
 'regionCode': 'US',
 'pageInfo': {'totalResults': 102, 'resultsPerPage': 50},
 'items': [{'kind': 'youtube#searchResult',
   'etag': 'EL-P3ta_7_QAPU8B1o6oMcFaKJA',
   'id': {'kind': 'youtube#video', 'videoId': '5Lpbw71xR3o'},
   'snippet': {'publishedAt': '2022-09-15T16:00:05Z',
    'channelId': 'UCW8Ews7tdKKkBT6GdtQaXvQ',
    'title': 'Framework to Solve Noom Advanced SQL Interview Question',
    'description': 'In this video, we look closely at a real-life data-science interview question from the Noom company. We will go through the ...',
    'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/default.jpg',
      'width': 120,
      'height': 90},
     'medium': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/mqdefault.jpg',
      'width': 320,
      'height': 180},
     'high': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/hqdefault.jpg',
      'width': 480,
      

👆 As you can see we have the same JSON object saved in the response variable. You’ll see all the properties for id and snippet.

## Now let's query data:

Try this: \
How do you make heads out of this data?

In [None]:
response['items']


[{'kind': 'youtube#searchResult',
  'etag': 'EL-P3ta_7_QAPU8B1o6oMcFaKJA',
  'id': {'kind': 'youtube#video', 'videoId': '5Lpbw71xR3o'},
  'snippet': {'publishedAt': '2022-09-15T16:00:05Z',
   'channelId': 'UCW8Ews7tdKKkBT6GdtQaXvQ',
   'title': 'Framework to Solve Noom Advanced SQL Interview Question',
   'description': 'In this video, we look closely at a real-life data-science interview question from the Noom company. We will go through the ...',
   'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/default.jpg',
     'width': 120,
     'height': 90},
    'medium': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/mqdefault.jpg',
     'width': 320,
     'height': 180},
    'high': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/hqdefault.jpg',
     'width': 480,
     'height': 360}},
   'channelTitle': 'StrataScratch',
   'liveBroadcastContent': 'none',
   'publishTime': '2022-09-15T16:00:05Z'}},
 {'kind': 'youtube#searchResult',
  'etag': 'e54ODzWXUbGyqg8Y0HtS_7zz7W8',
  'i

You see that the output starts with the square brackets and it lists all the videos we have on our channel. In order to isolate one video, we can specify the position.

In [None]:
response['items'][0]


{'kind': 'youtube#searchResult',
 'etag': 'EL-P3ta_7_QAPU8B1o6oMcFaKJA',
 'id': {'kind': 'youtube#video', 'videoId': '5Lpbw71xR3o'},
 'snippet': {'publishedAt': '2022-09-15T16:00:05Z',
  'channelId': 'UCW8Ews7tdKKkBT6GdtQaXvQ',
  'title': 'Framework to Solve Noom Advanced SQL Interview Question',
  'description': 'In this video, we look closely at a real-life data-science interview question from the Noom company. We will go through the ...',
  'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/default.jpg',
    'width': 120,
    'height': 90},
   'medium': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/mqdefault.jpg',
    'width': 320,
    'height': 180},
   'high': {'url': 'https://i.ytimg.com/vi/5Lpbw71xR3o/hqdefault.jpg',
    'width': 480,
    'height': 360}},
  'channelTitle': 'StrataScratch',
  'liveBroadcastContent': 'none',
  'publishTime': '2022-09-15T16:00:05Z'}}

So, through above processes, we get our latest video.

The above API call only shows the lists of videos in our channel, which is general information. \
However, each video also should have more detailed information including likes, favorite counts, comment counts, etc.

So let's run another API call to access this statistical data.

In [None]:
url_video_stats = "https://www.googleapis.com/youtube/v3/videos?id="+"5Lpbw71xR3o"+"&part=statistics&key="+API_KEY
response_video_stats = requests.get(url_video_stats).json()

In [None]:
response_video_stats

{'kind': 'youtube#videoListResponse',
 'etag': 'q8Cna_U1qoHSkNcc3tFIcl1XIjw',
 'items': [{'kind': 'youtube#video',
   'etag': 'lpmWVgFqUCs3uLkDeWJ5rR-l2zw',
   'id': '5Lpbw71xR3o',
   'statistics': {'viewCount': '1217',
    'likeCount': '48',
    'favoriteCount': '0',
    'commentCount': '0'}}],
 'pageInfo': {'totalResults': 1, 'resultsPerPage': 1}}

In [None]:
def get_video_details(video_id):

    #collecting view, like, dislike, comment counts
    url_video_stats = "https://www.googleapis.com/youtube/v3/videos?id="+video_id+"&part=statistics&key="+API_KEY
    response_video_stats = requests.get(url_video_stats).json()

    view_count = response_video_stats['items'][0]['statistics']['viewCount']
    like_count = response_video_stats['items'][0]['statistics']['likeCount']
    favorite_Count = response_video_stats['items'][0]['statistics']['favoriteCount']
    comment_count = response_video_stats['items'][0]['statistics']['commentCount']

    return view_count, like_count, favorite_Count, comment_count

In [None]:
def get_videos(df):
    pageToken = ""
    while 1:
        url = "https://www.googleapis.com/youtube/v3/search?key="+API_KEY+"&channelId="+CHANNEL_ID+"&part=snippet,id&order=date&maxResults=10000&"+pageToken

        response = requests.get(url).json()
        time.sleep(1) #give it a second before starting the for loop
        for video in response['items']:
            if video['id']['kind'] == "youtube#video":
                video_id = video['id']['videoId']
                video_title = video['snippet']['title']
                video_title = str(video_title).replace("&amp;","")
                upload_date = video['snippet']['publishedAt']
                upload_date = str(upload_date).split("T")[0]
                view_count, like_count, favorite_Count, comment_count = get_video_details(video_id)

                df = df.append({'video_id':video_id,'video_title':video_title,
                                "upload_date":upload_date,"view_count":view_count,
                                "like_count":like_count,"favorite_Count":favorite_Count,
                                "comment_count":comment_count},ignore_index=True)
        try:
            if response['nextPageToken'] != None: #if none, it means it reached the last page and break out of it
                pageToken = "pageToken=" + response['nextPageToken']

        except:
            break


    return df

In [None]:
#main

#build our dataframe
df2 = pd.DataFrame(columns=["video_id","video_title","upload_date","view_count","like_count","favorite_Count","comment_count"]) 

df_final = get_videos(df2)

In [None]:
df_final

Unnamed: 0,video_id,video_title,upload_date,view_count,like_count,favorite_Count,comment_count
0,5Lpbw71xR3o,Framework to Solve Noom Advanced SQL Interview...,2022-09-15,1217,48,0,0
1,gtHI672Tlbw,DoorDash Medium Level SQL Interview Question,2022-09-08,545,23,0,2
2,DmUR2QSNUq8,DoorDash Data Science SQL Interview Question W...,2022-09-07,1311,53,0,6
3,OLG6_EHMhFk,The One Thing That Keeps Data Scientists Up-to...,2022-08-31,550,45,0,4
4,0EoaJE3ePcE,Use STRING_AGG Function to Solve SQL Questions,2022-08-25,1842,87,0,4
...,...,...,...,...,...,...,...
91,UX4_IgagL9I,How to Use Google Colaboratory | Import a CSV ...,2020-05-02,2299,12,0,2
92,tDdo3FiWpgE,Interview Questions for SQL Joins and Subqueries,2020-02-01,1746,26,0,0
93,wW827gqxlRY,SQL Job Interview Mistakes #2,2019-09-15,1133,21,0,0
94,xbc2GpGUXwc,SQL Job Interview Mistakes #1,2019-08-03,8035,74,0,1


## Huge success!!!!!!!

## Convert the DataFrame table into csv for downloading

In [None]:
df_final.to_csv('youtube_pull_example.csv')

# Your task:
Try building api call to the U.S. Bureau of Labor Statistics public [API](https://www.bls.gov/developers/api_signature_v2.htm) and extract the data about unemployment rate. \

Please use this colab: https://colab.research.google.com/drive/183WOyF9H7o56nXf9iflCgD22iFRqh8CE?usp=sharing, where I write the skeleton and sample code for this task. \

Instructions and explanations are all in the above colab link. Please read through carefully since this step is important for you to get the dataset for this project, otherwise you may not be able to do the final project or utilize the models we are going to introduce in last few weeks. \

You don't need to write all things all by yourself, most of the time you only need to understand this lab and my sample code and make modifications based on your choices of data to retrieve.

**You can definitely feel free to find other APIs about unemployment rate!** \

**It is totally understandable if this part is much too difficult for you, if this kind of things happen, please contact me Shuran ASAP!** 