# DS3000 Day 2

Sep 11/12, 2023

Admin
- New modules for today and next week:
    - `pip install requests plotly matplotlib`
- Homework 1 due next Tues, Sep 17 by end of the day
    - submit by Sunday night to get 5\% extra credit

Push-Up Tracker
- Section 03: 2
- Section 05: 1

Content
- pandas (last part of Day 1 lecture)
- introduce APIs

# Basic tools in preparation for APIs

The `requests` module comes into play soon. While you may have installed it in the terminal earlier, it is actually a [magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pip) command, which means it can be installed directly from jupyter (should you ever need to install it again, or if you had difficulty installing it earlier).

In [1]:
#pip install requests

## Building a DataFrame row by row

We often get data in chunks (web scraping / API calls).  We'll need to store our data incrementally:

In [2]:
import pandas as pd

dict_list = [{'a': 1, 'b': 2, 'c': 3},
             {'a': 4, 'b': 3874, 'c': 398}]

df = pd.DataFrame()

for d in dict_list:
    df = pd.concat([df, pd.Series(d).to_frame().T])
    
df

Unnamed: 0,a,b,c
0,1,2,3
0,4,3874,398


In [3]:
# to include index names
list_dict = [{'a': 1, 'b': 2, 'c': 3},
            {'a': 4, 'b': 3874, 'c': 398}]

name_list = ['first', 'second']

df = pd.DataFrame()
for idx in range(2):
    # extract dictionary & name
    d = list_dict[idx]
    name = name_list[idx]
    
    # build series and name it
    series = pd.Series(d, name=name)
    
    df = pd.concat([df, series.to_frame().T])
    
df

Unnamed: 0,a,b,c
first,1,2,3
second,4,3874,398


# Timestamps

Many datasets include a timestamp, or include a date/time as a feature in the dataset. Understanding how to deal with these can be important! We actually already used pandas `.to_datetime()` function with the Korean Demographics data to cast strings to `datetime` objects. We'll look a few time highlights which will come in handy on Homework 2.

## Unix Time

- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude
        - how is 0 deg longitude defined?  
            - A succesfully warring empire (United Kingdom) chose it 
                - (It would be convenient if a metric system loving empire had been more successful at war ...)
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is The number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)
- UTC is time zone agnostic 
    - (more on this next lesson...)

## Python's `datetime`, `timedelta`, and pytz
- helpful for all those pesky unit conversions

In [4]:
from datetime import datetime, timedelta

# would you believe that the below is exactly 2 am on Valentine's Day 2021?
utc_example = 1613286000

# assumes the time zone of the machine its running on!
dt0 = datetime.fromtimestamp(utc_example)
dt0

datetime.datetime(2021, 2, 14, 2, 0)

In [5]:
date1 = datetime.strptime('Today is the 25th', 'Today is the %dth')

In [6]:
date1 + timedelta(days = 90)

datetime.datetime(1900, 4, 25, 0, 0)

In [7]:
str1 = "September 25"
str2 = "10 am"

In [8]:
newdate = datetime.strptime(str1 + ' ' + str2, '%B %d %H %p')
newdate

datetime.datetime(1900, 9, 25, 10, 0)

In [9]:
newnewdate = datetime(year = 2023, month = newdate.month, day = newdate.day, hour = newdate.hour)
newnewdate

datetime.datetime(2023, 9, 25, 10, 0)

In [10]:
import pytz
tz_mali = pytz.timezone("Africa/Timbuktu")
inmali = tz_mali.localize(newnewdate)

In [11]:
tz_est = pytz.timezone("EST")
inmali.astimezone(tz_est)

datetime.datetime(2023, 9, 25, 5, 0, tzinfo=<StaticTzInfo 'EST'>)

In [12]:
# what about right.... now?
dt1 = datetime.now()
dt1

datetime.datetime(2024, 9, 11, 14, 18, 50, 510499)

In [13]:
# we can set future dates as well
dt2 = datetime(year=2031, month=4, day=15, hour=9, minute=26, second=53)
dt2

datetime.datetime(2031, 4, 15, 9, 26, 53)

In [14]:
# we can access meaningful date attributes of a datetime object
# year, month, day, hour, minute, second
dt2.month, dt2.day

(4, 15)

In [15]:
# we can add / subtract timedelta objects
offset = timedelta(days=5, seconds=8979)

print(dt2)
print(dt2 + offset)

2031-04-15 09:26:53
2031-04-20 11:56:32


In [16]:
# use strptime to take the time from strings contain other words
datetime.strptime('the time is now: September-30-2022 3:20 PM', 'the time is now: %B-%d-%Y %H:%M %p')

datetime.datetime(2022, 9, 30, 3, 20)

In [17]:
# use strftime to cast a time to a string that contains other words
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
s = datetime.now().strftime('the time is now: %B-%d-%Y %H:%M %p')
s

'the time is now: September-11-2024 14:18 PM'

In [18]:
# you can save useful time info in a dictionary (which could then become a series -> data frame)
dt = datetime.now()
{'hour': dt.hour,
'minute': dt.minute}

{'hour': 14, 'minute': 18}

In [19]:
# you can figure out how old you are in seconds
eric_age = (datetime.now() - datetime(year=1990, month=12, day=20, hour=22, minute=42)).total_seconds()
print(eric_age)
# put it in billions (it wasn't too long ago that I turned 1 billion!)
eric_age/ 1e09

1064331410.565581


1.064331410565581

# API
###  Definitions
**API** Application Program Interface
 - a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software:
     - in this case, the server which hosts data & our own software which requests it
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

[https://openweathermap.org/api](https://openweathermap.org/api)

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key (my key was emailed to me with my confirmation of account)
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:
    
    https://api.openweathermap.org/data/3.0/onecall?lat=42.3601&lon=-71.0589&appid=YOUR-API-KEY-HERE-THIS-WONT-WORK&units=imperial
    
The result is a JSON object, which we can quickly convert to our dictionary of dictionary tree format.

In [20]:
# todo: swap this out
api_key = 'd36fa352ac73226b30772f64675f41bb'

# north = positive, south = negative
lat = 42.3601
# west = positive, east = negative
lon = -71.0589

units = 'imperial'
url = f'https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={api_key}&units={units}'
print(url)

https://api.openweathermap.org/data/3.0/onecall?lat=42.3601&lon=-71.0589&appid=d36fa352ac73226b30772f64675f41bb&units=imperial


In [21]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"lat":42.3601,"lon":-71.0589,"timezone":"America/New_York","timezone_offset":-14400,"current":{"dt":1726078730,"sunrise":1726050032,"sunset":1726095670,"temp":76.44,"feels_like":75.47,"pressure":1021,"humidity":36,"dew_point":47.53,"uvi":4.74,"clouds":10,"visibility":10000,"wind_speed":5.99,"wind_deg":113,"wind_gust":11.99,"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01d"}]},"minutely":[{"dt":1726078740,"precipitation":0},{"dt":1726078800,"precipitation":0},{"dt":1726078860,"precipitation":0},{"dt":1726078920,"precipitation":0},{"dt":1726078980,"precipitation":0},{"dt":1726079040,"precipitation":0},{"dt":1726079100,"precipitation":0},{"dt":1726079160,"precipitation":0},{"dt":1726079220,"precipitation":0},{"dt":1726079280,"precipitation":0},{"dt":1726079340,"precipitation":0},{"dt":1726079400,"precipitation":0},{"dt":1726079460,"precipitation":0},{"dt":1726079520,"precipitation":0},{"dt":1726079580,"precipitation":0},{"dt":1726079640,"precipitation":0},{"dt":1

In [22]:
# should not have to install the below
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict.keys()

dict_keys(['lat', 'lon', 'timezone', 'timezone_offset', 'current', 'minutely', 'hourly', 'daily'])

In [23]:
#what does one hour of weather look like
weather_dict['hourly'][2]

{'dt': 1726084800,
 'temp': 76.64,
 'feels_like': 75.6,
 'pressure': 1021,
 'humidity': 34,
 'dew_point': 46.2,
 'uvi': 2.08,
 'clouds': 24,
 'visibility': 10000,
 'wind_speed': 6.33,
 'wind_deg': 125,
 'wind_gust': 7.16,
 'weather': [{'id': 801,
   'main': 'Clouds',
   'description': 'few clouds',
   'icon': '02d'}],
 'pop': 0}

## Cleaning up data from one hour

In [24]:
from datetime import datetime
import pandas as pd

hour_dict = weather_dict['hourly'][0]
hour_dict

# lets convert from unix time to a datetime (easier to use)
hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

pd.Series(hour_dict)

dt                                                   1726077600
temp                                                      76.44
feels_like                                                75.47
pressure                                                   1021
humidity                                                     36
dew_point                                                 47.53
uvi                                                        4.74
clouds                                                       10
visibility                                                10000
wind_speed                                                 3.87
wind_deg                                                    117
wind_gust                                                  5.35
weather       [{'id': 800, 'main': 'Clear', 'description': '...
pop                                                           0
datetime                                    2024-09-11 14:00:00
dtype: object

In [25]:
df_hourly = pd.DataFrame()
for hour_dict in weather_dict['hourly']:

    # lets convert from unix time to a datetime (easier to use)
    hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

    s_hour = pd.Series(hour_dict)
    
    df_hourly = pd.concat([df_hourly, s_hour.to_frame().T], ignore_index=True)
    
df_hourly.head()

Unnamed: 0,dt,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,weather,pop,datetime
0,1726077600,76.44,75.47,1021,36,47.53,4.74,10,10000,3.87,117,5.35,"[{'id': 800, 'main': 'Clear', 'description': '...",0,2024-09-11 14:00:00
1,1726081200,76.87,75.87,1021,34,46.4,3.63,9,10000,5.26,123,5.68,"[{'id': 800, 'main': 'Clear', 'description': '...",0,2024-09-11 15:00:00
2,1726084800,76.64,75.6,1021,34,46.2,2.08,24,10000,6.33,125,7.16,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2024-09-11 16:00:00
3,1726088400,75.61,74.48,1021,34,45.3,0.93,35,10000,6.87,124,8.95,"[{'id': 802, 'main': 'Clouds', 'description': ...",0,2024-09-11 17:00:00
4,1726092000,73.47,72.36,1021,39,47.03,0.26,36,10000,7.96,128,12.3,"[{'id': 802, 'main': 'Clouds', 'description': ...",0,2024-09-11 18:00:00


## Lecture Break/Practice 3

La Chaux-de-Fonds, Switzerland is located at:

    47.101333° N, 6.825° E
    
1. Create a dataframe of the next 48 hours of their weather as was done above
2. (++) Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the next 48 hours of the location's weather. Test it on a location of your choice.

In [26]:
# get_forecast(47.101333, 6.825)

# Storing your API key in a local file

There exists a file `open_weather_access.py` in same directory as this jupyter notebook which contains:
    
    my_api_key = 'hello!'

In [27]:
from open_weather_access import my_api_key

print(my_api_key)

# from open_weather_access import my_real_api_key
# print(my_real_api_key)

hello!


## Looking Ahead; Spotipy (for use on Homework 2)

The Spotify API is quite powerful and gives us access to any song/artist in its libraries, plus even more information that you might not have thought of. There is also a module that has been created to access the API within python. Open up a terminal (or do it in jupyter notebook; this is a magic module) and run:

`pip install spotipy`

In [28]:
#pip install spotipy

In [29]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Just like with OpenWeather, we need to make an account [here](https://developer.spotify.com/) (this is essentially the same as making a regular Spotify account) and then get an API key (Spotify requires two things, actually, a Client ID and a secret key). At the above website, go to:

- Dashboard
- Log into your Spotify account (make one if you don't have one)
- Accept the terms of using the API
- Create an app (you can call it anything, I called mine `DS3000_Spotify`)
- Get a client ID (mine is `592acf2d2dc84d94bbc652f2f1d72375`, though it is usually good practice to **not** share this) and a client secret (**never share this with anyone**: save it in a separate file like we did with our OpenWeather API key earlier)

There exists a file `spotify_secret.py` in same directory as this jupyter notebook which contains:
    
    secret = 'professorgerberssecretspotify'

In [30]:
from spotify_secret import secret

In [31]:
# Authentication
cid = '592acf2d2dc84d94bbc652f2f1d72375'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

In [32]:
# Let's take a look at how to look at a track from a playlist
# You'll do this on HW2
# if you receive a couldn't read cache or write token error, it should simply be a warning and not be a problem
playlist_link = "https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M" #Global Top Songs (I still need to make ours!)
playlist_URI = playlist_link.split("/")[-1].split("?")[0]
track_uris = [x["track"]["uri"] for x in sp.playlist_tracks(playlist_URI)["items"]]

In [33]:
# Notice, this API (while free) can be a little tricky to navigate!
#sp.playlist_tracks(playlist_URI)["items"][0]['track']['name'] #song name
#sp.playlist_tracks(playlist_URI)["items"][0]['track']['album']['name'] #album name
#sp.playlist_tracks(playlist_URI)["items"][0]['track']['album']['artists'][0]['name'] #artist name

You will learn more about how to use Spotipy, including the tricky bits that are unique to its usage, on Homework 2. **START THIS EARLY ONCE IT IS RELEASED NEXT WEEK!!**