# Assignment 1-2: Data Collection Using Web APIs

## Objective

Many Websites (such as Twitter, Yelp, Spotify) provide free APIs to allow users to access their data. *API wrappers* simplify the use of these APIs by wrapping API calls into easy-to-use Python functions. At SFU, we are developing a unified API wrapper, called [DataPrep.Connector](https://docs.dataprep.ai/user_guide/connector/introduction.html#userguide-connector), which offers a unified programming interface to collect data from a variety of Web APIs.

In this assignment, you will learn the following:

* How to ask insightful questions about data
* How to collect data from Web APIs using DataPrep.Connector

**Requirements:**

1. Please use [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) rather than spark.DataFrame to manipulate data.

2. Please follow the python code style (https://www.python.org/dev/peps/pep-0008/). If TA finds your code hard to read, you will lose points. This requirement will stay for the whole semester.

## Preliminary

DataPrep.Connector is very easy to learn. By watching this 10-min [PyData Global 2020 talk](https://www.youtube.com/watch?v=56qu-0Ka-dA), you should be able to know how to use it. 

If you want to know more, below are some other useful resources.

* [Quick Introduction](https://github.com/sfu-db/dataprep#connector)
* [User Guide](https://sfu-db.github.io/dataprep/user_guide/connector/connector.html) 
* [Examples](https://github.com/sfu-db/dataprep/tree/develop/examples)
* [Fetch and analyze COVID-19 tweets using DataPrep](https://www.youtube.com/watch?v=vvypQB3Vp1o)

## Overview

This is a **group** assignment. Please check your group in this [PDF file](https://coursys.sfu.ca/2022sp-cmpt-733-g1/pages/Web-API-Assignment/view).

To do this assignment, your group needs to go through four steps:

1. Select a new Web API that is not listed on https://github.com/sfu-db/APIConnectors. 
2. Create a configuration file for the API (see tutorials at [link1](https://github.com/sfu-db/APIConnectors/blob/develop/CONTRIBUTING.md#add-configuration-files) and [link2](https://github.com/sfu-db/EZHacks-tutorial/blob/master/2.%20Tutorial.ipynb)). 
3. Come up with four questions about the API. 
4. Write code to answer these questions one by one.

For Step 3, please make sure your questions are **good**.

## What are "good questions"?

Please use the following to judge whether your questions are good or not.

1. Good questions need to be useful. That is, they are common questions asked about the API.
2. Good questions need to be diverse. That is, they cover different aspects of the API. 
3. Good questions have to cover different difficulty levels. That is, it consists of both easy and difficult questions,  where the difficulty can be measured by the number of lines of code or the number of input parameters.

The following shows four good questions about the Yelp API. The corresponding code can be found at this [link](https://github.com/sfu-db/DataConnectorConfigs#yelp----collect-local-business-data).

* Q1. What's the phone number of Capilano Suspension Bridge Park?
* Q2. Which yoga store has the highest review count in Vancouver?
* Q3. How many Starbucks stores are in Seattle and where are they?
* Q4. What are the ratings for a list of restaurants?

**Why are they useful?**
* Q1 is useful because "Capilano Suspension Bridge Park" is one of the most popular tourist attractions in Vancouver.
* Q2 is useful because a yoga fan wants to find out the most popular yoga store in Vancouver. 
* Q3 is useful because Starbucks was founded in Seattle.
* Q4 is useful because people often rely on yelp ratings to decide which restaurant to go to.

**Why are they diverse?**

This is because the [code](yelp-code.png) written to answer them has different inputs or outputs.
* Q1 takes `term` and `location` as input and returns 1 record with attributes `name` and `phone` 
* Q2 takes `categories`, `location`, and `sort_by` as input and returns 1 record with attributes `name` and `review_count`
* Q3 takes `term` and `location` as input and returns n records with attributes `name`, `address`, `city`, `state`, `country`, `zipcode`
* Q4 takes a list of retarurant `names` as input and return n records with attributes `name`, `rating`, `city`

**Why are they more and more difficult?**
* Q1 (4 lines of code, 2 query parameters)
* Q2 (4 lines of code, 3 query parameters)
* Q3 (5 lines of code, 2 query parameters)
* Q4 (11 lines of code, 2 query parameters)

Please note that you have to use DataPrep.Connector to get data from the Web API. If DataPrep.Connector cannot meet your needs, please post your questions on Slack (Channel: Assignment 1). We will help you. 

## Now, it's your turn. :) 

Please write down your questions and the corresponding code for your assigned API. 

In [1]:
## Provide your API key here for TAs to reproduce your results
weatherbit_access_token = "e6ab7fb0346d433fab3007190a763117"

In [2]:
!pip install dataprep



In [8]:
!pip install ipython ipykernel --upgrade

Collecting ipython
  Downloading ipython-7.31.1-py3-none-any.whl (792 kB)
[K     |████████████████████████████████| 792 kB 5.3 MB/s 
Collecting ipykernel
  Downloading ipykernel-6.7.0-py3-none-any.whl (127 kB)
[K     |████████████████████████████████| 127 kB 64.5 MB/s 
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
  Downloading prompt_toolkit-3.0.24-py3-none-any.whl (374 kB)
[K     |████████████████████████████████| 374 kB 64.8 MB/s 
Installing collected packages: prompt-toolkit, ipython, ipykernel
  Attempting uninstall: prompt-toolkit
    Found existing installation: prompt-toolkit 1.0.18
    Uninstalling prompt-toolkit-1.0.18:
      Successfully uninstalled prompt-toolkit-1.0.18
  Attempting uninstall: ipython
    Found existing installation: ipython 5.5.0
    Uninstalling ipython-5.5.0:
      Successfully uninstalled ipython-5.5.0
  Attempting uninstall: ipykernel
    Found existing installation: ipykernel 4.10.1
    Uninstalling ipykernel-4.10.1:
      Successfully un

### Q1 What is the Air Quality Index (AQI) of Burnaby on 4 AM UTC today?

In [3]:
## Write your code
import datetime
from dataprep.connector import connect

conn_weather = connect("./config",_auth={'access_token':weatherbit_access_token})
weather_df = await conn_weather.query("airquality", city="Burnaby")

today = datetime.date.today()
todays_4AM_time = str(today) + "T04:00:00"
weather_aqi = weather_df[weather_df['timestamp_utc']==todays_4AM_time]
display(weather_aqi[['timestamp_utc','aqi']])

Unnamed: 0,timestamp_utc,aqi
2,2022-01-20T04:00:00,7


### Q2 What is the Air Quality Index (AQI) around Equator on the Americas today?

In [4]:
## Write your code
from dataprep.connector import connect

conn_weather = connect("./config",_auth={'access_token':weatherbit_access_token})
weather_df = await conn_weather.query("airquality", lat=0.18, lon=77.46) #Coordinates for the equator over americas

weather_aqi = weather_df[weather_df['timestamp_utc']==todays_4AM_time]
display(weather_aqi[['timestamp_utc','aqi']])

Unnamed: 0,timestamp_utc,aqi
2,2022-01-20T04:00:00,32


### Q3 Has the Air Quality Index (AQI) of Vancouver improved since last 3 days?

In [5]:
import pandas as pd
from dataprep.connector import connect

conn_weather = connect("./config",_auth={'access_token':weatherbit_access_token})
weather_df = await conn_weather.query("airquality", city="Vancouver", lat=49, lon=122)

yesterday = today - datetime.timedelta(days=1)
day_before_yesterday = yesterday - datetime.timedelta(days=1)
yesterdays_4AM_time = str(yesterday) + "T04:00:00"
day_before_yesterdays_4AM_time = str(day_before_yesterday) + "T04:00:00"

todays_weather_aqi = weather_df[weather_df['timestamp_utc']==todays_4AM_time]
yesterdays_weather_aqi = weather_df[weather_df['timestamp_utc']==yesterdays_4AM_time]
day_before_yesterdays_weather_aqi = weather_df[weather_df['timestamp_utc']==day_before_yesterdays_4AM_time]

td = todays_weather_aqi[['timestamp_utc','aqi']]
yd = yesterdays_weather_aqi[['timestamp_utc','aqi']]
dbyd = day_before_yesterdays_weather_aqi[['timestamp_utc','aqi']]

dfs = [td, yd, dbyd]
display(pd.concat(dfs))

Unnamed: 0,timestamp_utc,aqi
2,2022-01-20T04:00:00,17
26,2022-01-19T04:00:00,12
50,2022-01-18T04:00:00,65


### Q4: Which Canadian provincial capital city has the lowest Air Quality Index in the last 3 days?

In [6]:
## Write your code
from dataprep.connector import connect
import pandas as pd
import asyncio

conn_weather = connect("./config",_auth={'access_token':weatherbit_access_token})

capital_cities = ["Edmonton", "Victoria", "Winnipeg", "Fredericton", "St. Johns", "Halifax", "Toronto", "Charlottetown", \
 "Quebec City", "Regina", "Yellowknife", "Iqaluit", "Whitehorse"]

global min_aqi 
min_aqi = 100000
global min_city
min_city = ''

for city in capital_cities:
    df = await conn_weather.query("airquality", city = city)
    
    print("city: ", city, "average aqi in last 3 days: ", df['aqi'].mean())
    if(df['aqi'].mean() < min_aqi):
        min_aqi = df['aqi'].mean()
        min_city = city

print("\n", min_city, "has the lowest Air Quality Index (", min_aqi, ") on avg in the last 3 days amongst all the provincial capital cities.")

city:  Edmonton average aqi in last 3 days:  17.73611111111111
city:  Victoria average aqi in last 3 days:  13.0
city:  Winnipeg average aqi in last 3 days:  13.069444444444445
city:  Fredericton average aqi in last 3 days:  18.22222222222222
city:  St. Johns average aqi in last 3 days:  22.291666666666668
city:  Halifax average aqi in last 3 days:  17.0
city:  Toronto average aqi in last 3 days:  18.25
city:  Charlottetown average aqi in last 3 days:  20.416666666666668
city:  Quebec City average aqi in last 3 days:  15.277777777777779
city:  Regina average aqi in last 3 days:  15.98611111111111
city:  Yellowknife average aqi in last 3 days:  10.63888888888889
city:  Iqaluit average aqi in last 3 days:  10.36111111111111
city:  Whitehorse average aqi in last 3 days:  10.26388888888889

 Whitehorse has the lowest Air Quality Index ( 10.26388888888889 ) on avg in the last 3 days amongst all the provincial capital cities.


## Submission

Complete this notebook, rename it to `A1-2-[WEB API Name].ipynb`, and submit it along with your config files to the CourSys activity `Assignment 1 - Part 2`. For example, if your group works on Yelp, then **every member of your group** needs to submit the same notebook named `A1-2-Yelp.ipynb` and the config files named `config.zip`.