# DS3000 Day 2

Sep 14/15, 2023

Admin
- Qwickly Attendance (PIN on board)
- New modules for today and next week:
    - `pip install requests plotly matplotlib`
- Homework 1 due next Tues, Sep 19 by midnight

Push-Up Tracker
- Section 04: 0
- Section 05: 2
- Section 06: 0

Content
- introduce APIs

# Basic tools in preparation for APIs

The `requests` module comes into play soon. While you may have installed it in the terminal earlier, it is actually a [magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pip) command, which means it can be installed directly from jupyter (should you ever need to install it again, or if you had difficulty installing it earlier).

In [1]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


## Building a DataFrame row by row

We often get data in chunks (web scraping / API calls).  We'll need to store our data incrementally:

In [2]:
import pandas as pd

dict_list = [{'a': 1, 'b': 2, 'c': 3},
             {'a': 4, 'b': 3874, 'c': 398}]

df = pd.DataFrame()

for d in dict_list:
    df = pd.concat([df, pd.Series(d).to_frame().T])
    
df

Unnamed: 0,a,b,c
0,1,2,3
0,4,3874,398


In [3]:
# to include index names
list_dict = [{'a': 1, 'b': 2, 'c': 3},
            {'a': 4, 'b': 3874, 'c': 398}]

name_list = ['first', 'second']

df = pd.DataFrame()
for idx in range(2):
    # extract dictionary & name
    d = list_dict[idx]
    name = name_list[idx]
    
    # build series and name it
    series = pd.Series(d, name=name)
    
    df = pd.concat([df, series.to_frame().T])
    
df

Unnamed: 0,a,b,c
first,1,2,3
second,4,3874,398


# Timestamps

Many datasets include a timestamp, or include a date/time as a feature in the dataset. Understanding how to deal with these is important! We actually already used pandas `.to_datetime()` function with the Korean Demographics data to cast strings to `datetime` objects. 

## Unix Time

- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude
        - how is 0 deg longitude defined?  
            - A succesfully warring empire (United Kingdom) chose it 
                - (It would be convenient if a metric system loving empire had been more successful at war ...)
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is The number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)
- UTC is time zone agnostic 
    - (more on this next lesson...)

## Python's `datetime` & `timedelta`
- helpful for all those pesky unit conversions

In [4]:
from datetime import datetime, timedelta

# would you believe that the below is exactly 2 am on Valentine's Day 2021?
utc_example = 1613286000

# assumes the time zone of the machine its running on!
dt0 = datetime.fromtimestamp(utc_example)
dt0

datetime.datetime(2021, 2, 14, 2, 0)

In [44]:
from datetime import datetime, timedelta

In [53]:
date1 = datetime.strptime('Today is the 25th', 'Today is the %dth')

In [57]:
date1 + timedelta(days = 90)

datetime.datetime(1900, 4, 25, 0, 0)

In [58]:
str1 = "September 25"
str2 = "10 am"

In [61]:
newdate = datetime.strptime(str1 + ' ' + str2, '%B %d %H %p')
newdate

datetime.datetime(1900, 9, 25, 10, 0)

In [65]:
newnewdate = datetime(year = 2023, month = newdate.month, day = newdate.day, hour = newdate.hour)
newnewdate

datetime.datetime(2023, 9, 25, 10, 0)

In [69]:
import pytz
tz_mali = pytz.timezone("Africa/Timbuktu")
inmali = tz_mali.localize(newnewdate)

In [70]:
tz_est = pytz.timezone("EST")
inmali.astimezone(tz_est)

datetime.datetime(2023, 9, 25, 5, 0, tzinfo=<StaticTzInfo 'EST'>)

In [5]:
# what about right.... now?
dt1 = datetime.now()
dt1

datetime.datetime(2023, 9, 13, 11, 21, 37, 127749)

In [6]:
# we can set future dates as well
dt2 = datetime(year=2031, month=4, day=15, hour=9, minute=26, second=53)
dt2

datetime.datetime(2031, 4, 15, 9, 26, 53)

In [7]:
# we can access meaningful date attributes of a datetime object
# year, month, day, hour, minute, second
dt2.month, dt2.day

(4, 15)

In [8]:
# we can add / subtract timedelta objects
offset = timedelta(days=5, seconds=8979)

print(dt2)
print(dt2 + offset)

2031-04-15 09:26:53
2031-04-20 11:56:32


In [9]:
# use strptime to take the time from strings contain other words
datetime.strptime('the time is now: September-30-2022 3:20 PM', 'the time is now: %B-%d-%Y %H:%M %p')

datetime.datetime(2022, 9, 30, 3, 20)

In [10]:
# use strftime to cast a time to a string that contains other words
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
s = datetime.now().strftime('the time is now: %B-%d-%Y %H:%M %p')
s

'the time is now: September-13-2023 11:21 AM'

In [11]:
# you can save useful time info in a dictionary (which could then become a series -> data frame)
dt = datetime.now()
{'hour': dt.hour,
'minute': dt.minute}

{'hour': 11, 'minute': 21}

In [12]:
# you can figure out how old you are in seconds
eric_age = (datetime.now() - datetime(year=1990, month=12, day=20, hour=22, minute=42)).total_seconds()
print(eric_age)
# put it in billions (it wasn't long ago that I turned 1 billion!)
eric_age/ 1e09

1032871177.204488


1.032871177204488

# Lecture Break/Practice 1

A logarithmic birthday party is the moment you are some integer power of 10 seconds old (1 second old, 10 seconds old, 100 seconds old, ...)

Compute your first 10 logarithmic birthday parties, store them in a dataframe as shown below.  (You're welcome to use a fake birthday if you'd like)

|  log_bday |   year |  month |  day |   hour | minute|sec |
|----------:|-------:|-------:|-----:|-------:|------:|---:|
|  10^0 sec | 1990   |  12    |  20  |   10   |   42  |  1 |
|  10^1 sec | 1990   |  12    |  20  |   10   |   42  | 10 |
|  10^2 sec | 1990   |  12    |  20  |   10   |   43  | 40 |
|  10^3 sec | 1990   |  12    |  20  |   10   |   58  | 40 |
|  10^4 sec | 1990   |  12    |  20  |   13   |   28  | 40 |
|  10^5 sec | 1990   |  12    |  21  |   14   |   28  | 40 |
|  10^6 sec | 1991   |   1    |   1  |    0   |   28  | 40 |
|  10^7 sec | 1991   |   4    |  15  |    4   |   28  | 40 |
|  10^8 sec | 1994   |   2    |  19  |   20   |   28  | 40 |
|  10^9 sec | 2022   |   8    |  28  |   12   |   28  | 40 |
| 10^10 sec | 2307   |  11    |  10  |    4   |   28  | 40 |

(++) add a column with a more easily readable time (e.g. `September-14-2023 3:11 PM`)

In [13]:
exp=0
f'10^{exp}sec'

'10^0sec'

## Representing Trees as Lists & Dictionaries
- useful for representing a tree of data
- (our API calls will return nested dictionaries)

<img src="https://i.ibb.co/Pmxqpb3/tree-ex.png" alt="Drawing" style="width: 400px;"/>

In [14]:
red_branch_dict = {'a': 0, 'b': 1, 'c': 2}
blu_branch_dict = {'x': 24, 'y': 25, 'z': 26}
tree_dict = {'f': red_branch_dict,
             'g': blu_branch_dict}
tree_dict

{'f': {'a': 0, 'b': 1, 'c': 2}, 'g': {'x': 24, 'y': 25, 'z': 26}}

In [15]:
tree_dict['f']['b']

1

<img src="https://i.ibb.co/4SSH4mm/tree-ex2.png" alt="Drawing" style="width: 600px;"/>

In [16]:
dict0 = {'num': 14,
        'letter': 'C'}
dict1 = {'num': 17,
        'letter': 'R'}
dict2 = {'num': 21,
        'letter': 'S'}

list_of_dict = [dict0, dict1, dict2]

In [17]:
list_of_dict[0]['num']

14

## Lecture Break/Practice 2
1. Express all of the following penguin group's height and weight as a list of dictionaries:
<img src="https://i.ibb.co/XXzX4Wk/penguin-tree.png" alt="Drawing" style="width: 700px;"/>

# API
###  Definitions
**API** Application Program Interface
 - a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software:
     - in this case, the server which hosts data & our own software which requests it
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

[https://openweathermap.org/api](https://openweathermap.org/api)

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key (my key was emailed to me with my confirmation of account)
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:
    
    https://api.openweathermap.org/data/2.5/onecall?lat=42.3601&lon=-71.0589&appid=YOUR-API-KEY-HERE-THIS-WONT-WORK&units=imperial
    
The result is a JSON object, which we can quickly convert to our dictionary of dictionary tree format.

In [18]:
# todo: swap this out
api_key = 'd36fa352ac73226b30772f64675f41bb'

# north = positive, south = negative
lat = 42.3601
# west = positive, east = negative
lon = -71.0589

units = 'imperial'
url = f'https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={api_key}&units={units}'
print(url)

https://api.openweathermap.org/data/3.0/onecall?lat=42.3601&lon=-71.0589&appid=d36fa352ac73226b30772f64675f41bb&units=imperial


In [19]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"lat":42.3601,"lon":-71.0589,"timezone":"America/New_York","timezone_offset":-14400,"current":{"dt":1694618502,"sunrise":1694600511,"sunset":1694645937,"temp":76.05,"feels_like":77.02,"pressure":1015,"humidity":78,"dew_point":68.68,"uvi":3.88,"clouds":75,"visibility":10000,"wind_speed":6.91,"wind_deg":120,"weather":[{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}]},"minutely":[{"dt":1694618520,"precipitation":0},{"dt":1694618580,"precipitation":0},{"dt":1694618640,"precipitation":0},{"dt":1694618700,"precipitation":0},{"dt":1694618760,"precipitation":0},{"dt":1694618820,"precipitation":0},{"dt":1694618880,"precipitation":0},{"dt":1694618940,"precipitation":0},{"dt":1694619000,"precipitation":0},{"dt":1694619060,"precipitation":0},{"dt":1694619120,"precipitation":0},{"dt":1694619180,"precipitation":0},{"dt":1694619240,"precipitation":0},{"dt":1694619300,"precipitation":0},{"dt":1694619360,"precipitation":0},{"dt":1694619420,"precipitation":0},{"dt":1694619480,"pr

In [20]:
# should not have to install the below
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict.keys()

dict_keys(['lat', 'lon', 'timezone', 'timezone_offset', 'current', 'minutely', 'hourly', 'daily'])

In [21]:
weather_dict['hourly'][2]

{'dt': 1694624400,
 'temp': 73.9,
 'feels_like': 74.95,
 'pressure': 1015,
 'humidity': 84,
 'dew_point': 68.76,
 'uvi': 2.48,
 'clouds': 85,
 'visibility': 10000,
 'wind_speed': 4.09,
 'wind_deg': 248,
 'wind_gust': 9.24,
 'weather': [{'id': 804,
   'main': 'Clouds',
   'description': 'overcast clouds',
   'icon': '04d'}],
 'pop': 0.74}

## Cleaning up data from one hour

In [22]:
from datetime import datetime
import pandas as pd

hour_dict = weather_dict['hourly'][0]
hour_dict

# lets convert from unix time to a datetime (easier to use)
hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

pd.Series(hour_dict)

dt                                                   1694617200
temp                                                      76.05
feels_like                                                77.02
pressure                                                   1015
humidity                                                     78
dew_point                                                 68.68
uvi                                                        3.88
clouds                                                       75
visibility                                                10000
wind_speed                                                 4.03
wind_deg                                                    205
wind_gust                                                  7.54
weather       [{'id': 500, 'main': 'Rain', 'description': 'l...
pop                                                        0.51
rain                                               {'1h': 0.51}
datetime                                

In [23]:
df_hourly = pd.DataFrame()
for hour_dict in weather_dict['hourly']:

    # lets convert from unix time to a datetime (easier to use)
    hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

    s_hour = pd.Series(hour_dict)
    
    df_hourly = pd.concat([df_hourly, s_hour.to_frame().T], ignore_index=True)
    
df_hourly.head()

Unnamed: 0,dt,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,weather,pop,rain,datetime
0,1694617200,76.05,77.02,1015,78,68.68,3.88,75,10000,4.03,205,7.54,"[{'id': 500, 'main': 'Rain', 'description': 'l...",0.51,{'1h': 0.51},2023-09-13 11:00:00
1,1694620800,75.54,76.55,1015,80,68.94,2.37,80,10000,5.7,229,11.12,"[{'id': 803, 'main': 'Clouds', 'description': ...",0.77,,2023-09-13 12:00:00
2,1694624400,73.9,74.95,1015,84,68.76,2.48,85,10000,4.09,248,9.24,"[{'id': 804, 'main': 'Clouds', 'description': ...",0.74,,2023-09-13 13:00:00
3,1694628000,71.33,72.34,1014,89,67.93,2.19,90,6493,4.38,250,8.9,"[{'id': 501, 'main': 'Rain', 'description': 'm...",0.94,{'1h': 1.68},2023-09-13 14:00:00
4,1694631600,69.22,70.27,1014,94,67.42,2.05,95,7020,2.15,195,6.8,"[{'id': 501, 'main': 'Rain', 'description': 'm...",1.0,{'1h': 3.58},2023-09-13 15:00:00


## Lecture Break/Practice 3

La Chaux-de-Fonds, Switzerland is located at:

    47.101333° N, 6.825° E
    
1. Create a dataframe of the next 48 hours of their weather as was done above
2. (++) Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the next 48 hours of the location's weather.

In [24]:
# get_forecast(47.101333, 6.825)

# Storing your API key in a local file

There exists a file `open_weather_access.py` in same directory as this jupyter notebook which contains:
    
    my_api_key = 'hello!'

In [25]:
from open_weather_access import my_api_key

print(my_api_key)

# from open_weather_access import my_real_api_key
# print(my_real_api_key)

hello!


# `datetime`, `date`, `time` and UTC refresher
## Unix Time (UTC)
- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is the number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)

In [26]:
from datetime import date, time, datetime

# building just a date (no time)
date(year=2022, month=11, day=11)

datetime.date(2022, 11, 11)

In [27]:
# building just a time (no date)
time(hour=15, minute=23)

datetime.time(15, 23)

In [28]:
# getting just a date from a datetime
datetime.now().date()

datetime.date(2023, 9, 13)

In [29]:
# getting just a time from a datetime
datetime.now().time()

datetime.time(11, 21, 42, 171846)

## datetimes to and from strings
Using [the strptime/strftime code](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior), we can convert between string and `datetime` representations:
- building datetimes with tzinfo explicitly passed
- strptime (from str to `datetime`)
- strftime (from `datetime` to str)
- use a date when swapping timezones switch (add space)

In [30]:
# get current time (as datetime)
now_datetime = datetime.now()

# convert datetime to str
format_str = 'its now %A %B %d at %I:%M %p, is that not great!'
now_str = now_datetime.strftime(format_str)
now_str

'its now Wednesday September 13 at 11:21 AM, is that not great!'

In [31]:
# convert str to datetime
# notice anything **strange**?
then_datetime = datetime.strptime(now_str, format_str)
then_datetime

datetime.datetime(1900, 9, 13, 11, 21)

# Timezones

[pytz](http://pytz.sourceforge.net/) will do all the heavy lifting for managing timezones for us

In [32]:
import pytz

# this is a lot of standards whose quirks are handled by pytz ...
pytz.all_timezones

['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara', 'Africa/Asmera', 'Africa/Bamako', 'Africa/Bangui', 'Africa/Banjul', 'Africa/Bissau', 'Africa/Blantyre', 'Africa/Brazzaville', 'Africa/Bujumbura', 'Africa/Cairo', 'Africa/Casablanca', 'Africa/Ceuta', 'Africa/Conakry', 'Africa/Dakar', 'Africa/Dar_es_Salaam', 'Africa/Djibouti', 'Africa/Douala', 'Africa/El_Aaiun', 'Africa/Freetown', 'Africa/Gaborone', 'Africa/Harare', 'Africa/Johannesburg', 'Africa/Juba', 'Africa/Kampala', 'Africa/Khartoum', 'Africa/Kigali', 'Africa/Kinshasa', 'Africa/Lagos', 'Africa/Libreville', 'Africa/Lome', 'Africa/Luanda', 'Africa/Lubumbashi', 'Africa/Lusaka', 'Africa/Malabo', 'Africa/Maputo', 'Africa/Maseru', 'Africa/Mbabane', 'Africa/Mogadishu', 'Africa/Monrovia', 'Africa/Nairobi', 'Africa/Ndjamena', 'Africa/Niamey', 'Africa/Nouakchott', 'Africa/Ouagadougou', 'Africa/Porto-Novo', 'Africa/Sao_Tome', 'Africa/Timbuktu', 'Africa/Tripoli', 'Africa/Tunis', 'Africa/Windhoek', 'Ameri

## Specifying a timezone info with datetime
- use `.localize()` method of a pytz timezone object
    - takes a `datetime` without any current timezone as input
- don't pass the pytz timezone object to the `tzinfo` keyword of `datetime` objects ... 
    - errors with daylight's savings time
    - these are "silent" errors, the code will work but things will be off by some amount of time

In [33]:
# build a datetime
ball_drop2024 = datetime(year=2024, month=1, day=1)

# load the timezone
time_zone_gmt = pytz.timezone('GMT')

# add the timezone to the datetime
ball_drop2024_gmt = time_zone_gmt.localize(ball_drop2024)
ball_drop2024_gmt

datetime.datetime(2024, 1, 1, 0, 0, tzinfo=<StaticTzInfo 'GMT'>)

In [34]:
# load the timezone
time_zone_est = pytz.timezone('US/Eastern')

# add the timezone to the datetime
ball_drop2024_est = time_zone_est.localize(ball_drop2024)
ball_drop2024_est

datetime.datetime(2024, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>)

In [35]:
# EST is living in the past ...
ball_drop2024_est - ball_drop2024_gmt

datetime.timedelta(seconds=18000)

In [36]:
# WARNING: don't specify a timezone at the construction of a datetime
time_zone_est = pytz.timezone('US/Eastern')
ball_drop2024_est_bug = datetime(year=2024, month=1, day=1, tzinfo=time_zone_est)

# not quite right ...
ball_drop2024_est_bug - ball_drop2024_gmt

datetime.timedelta(seconds=17760)

In [37]:
# notice, once a datetime has a timezone, you can no longer `.localize()` it
time_zone_gmt.localize(ball_drop2024_est)

ValueError: Not naive datetime (tzinfo is already set)

## Looking Ahead; Spotipy (for use on Homework 2)

The Spotify API is quite powerful and gives us access to any song/artist in its libraries, plus even more information that you might not have thought of. There is also a module that has been created to access the API within python. Open up a terminal (or do it in jupyter notebook; this is a magic module) and run:

`pip install spotipy`

In [None]:
pip install spotipy

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Just like with OpenWeather, we need to make an account [here](https://developer.spotify.com/) (this is essentially the same as making a regular Spotify account) and then get an API key (Spotify requires two things, actually, a Client ID and a secret key). At the above website, go to:

- Dashboard
- Log into your Spotify account (make one if you don't have one)
- Accept the terms of using the API
- Create an app (you can call it anything, I called mine `DS3000_Spotify`)
- Get a client ID (mine is `592acf2d2dc84d94bbc652f2f1d72375`, though it is usually good practice to **not** share this) and a client secret (**never share this with anyone**: save it in a separate file like we did with our OpenWeather API key earlier)

There exists a file `spotify_secret.py` in same directory as this jupyter notebook which contains:
    
    secret = 'professorgerberssecretspotify'

In [None]:
from spotify_secret import secret

In [None]:
# Authentication
cid = '592acf2d2dc84d94bbc652f2f1d72375'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

You will learn more about how to use Spotipy, including the tricky bits that are unique to its usage, on Homework 2.