# Practice: 

In this we will do some work on the following topics:

* Dates and times with pandas
* Regular expressions with pandas
* Getting data from APIs with the `requests` library

The APIs that we are going to work with are the following:

* Position of the International Space Station (ISS)API
    * http://open-notify.org/Open-Notify-API/ISS-Location-Now/
    * For this API call, you just need to pass the URL and it will return the current position of the ISS.
* Kanye West quotes API
    * https://kanye.rest/
    * For this API call, you just need to pass the URL and it will return a random Kanye West quote.

### Exercise 1

Use the ISS API to get the current position of the ISS.

In [17]:
import pandas as pd
import requests

url = 'http://api.open-notify.org/iss-now.json'

response = requests.get(url).json()

pd.DataFrame(response)

Unnamed: 0,timestamp,message,iss_position
latitude,1739274179,success,-10.7715
longitude,1739274179,success,80.7359


In [1]:
import pandas as pd
import requests

response = requests.get('http://api.open-notify.org/iss-now.json')
response_json = response.json()

response_json

{'timestamp': 1738060786,
 'message': 'success',
 'iss_position': {'longitude': '-126.0870', 'latitude': '-48.2639'}}

### Exercise 2

If you check the `timestamp` value in the response, you will see that it is in Unix time. The Unix timestamp represents the number of seconds that have passed since the Unix epoch time (January 1, 1970). Convert this to a timestamp in ISO format (YYYY-MM-DD HH:MM:SS).

You can do that using the [pd.to_datetime()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) with the paramter `unit`. Choose the right unit to convert the Unix timestamp to a timestamp in ISO format.


In [20]:
response['timestamp'] = pd.to_datetime(response['timestamp'], unit = 's')

pd.DataFrame(response)

Unnamed: 0,timestamp,message,iss_position
latitude,2025-02-11 11:42:59,success,-10.7715
longitude,2025-02-11 11:42:59,success,80.7359


In [14]:
unix_ts = response_json['timestamp']

pd.to_datetime(unix_ts, unit='s')

NameError: name 'response_json' is not defined

### Exercise 3

Using the [sleep function](https://docs.python.org/3/library/time.html#time.sleep) from the `time` library, write a function that prints the current datetime every 5 seconds. The function should stop after 10 iterations.

You can use the function `pd.Timestamp.now()` to get the current datetime at each iteration.

In [23]:
from time import sleep
import datetime as datetime

for i in range(5):
    print(pd.Timestamp.now())
    sleep(5)

2025-02-11 12:45:07.501457
2025-02-11 12:45:12.508077
2025-02-11 12:45:17.518671
2025-02-11 12:45:22.518717
2025-02-11 12:45:27.533160


In [3]:
from time import sleep

iters = range(10)

for i in iters:
    print(pd.Timestamp.now())
    sleep(5)

2025-01-28 11:39:47.045953
2025-01-28 11:39:52.051216
2025-01-28 11:39:57.056865
2025-01-28 11:40:02.062347
2025-01-28 11:40:07.067919
2025-01-28 11:40:12.070182
2025-01-28 11:40:17.075704
2025-01-28 11:40:22.080600
2025-01-28 11:40:27.084802
2025-01-28 11:40:32.087584


### Exercise 4

Create a function that receives the position of the ISS 10 times, using `sleep` to wait 5s between requests, and returns a list with the dictionaries from the responses.

In [4]:
iters = range(10)
positions = []

for i in iters:
    response = requests.get('http://api.open-notify.org/iss-now.json')
    response_json = response.json()
    positions.append(response_json)
    sleep(5)
    
positions

[{'timestamp': 1738060837,
  'message': 'success',
  'iss_position': {'longitude': '-121.6249', 'latitude': '-49.3484'}},
 {'timestamp': 1738060842,
  'message': 'success',
  'iss_position': {'longitude': '-121.1270', 'latitude': '-49.4549'}},
 {'timestamp': 1738060847,
  'message': 'success',
  'iss_position': {'longitude': '-120.6724', 'latitude': '-49.5497'}},
 {'timestamp': 1738060853,
  'message': 'success',
  'iss_position': {'longitude': '-120.1703', 'latitude': '-49.6517'}},
 {'timestamp': 1738060859,
  'message': 'success',
  'iss_position': {'longitude': '-119.5740', 'latitude': '-49.7691'}},
 {'timestamp': 1738060866,
  'message': 'success',
  'iss_position': {'longitude': '-118.9748', 'latitude': '-49.8832'}},
 {'timestamp': 1738060871,
  'message': 'success',
  'iss_position': {'longitude': '-118.5119', 'latitude': '-49.9686'}},
 {'timestamp': 1738060876,
  'message': 'success',
  'iss_position': {'longitude': '-118.0009', 'latitude': '-50.0602'}},
 {'timestamp': 173806088

### Exercise 5

Create a DataFrame with the responses from the previous exercise. The DataFrame should have the following columns:

* `timestamp`: the timestamp of the response
* `latitude`: the latitude of the ISS
* `longitude`: the longitude of the ISS

In [5]:
df_positions = pd.DataFrame(positions)

# break the sub-dictionary 'iss_position' into two columns: latitude and longitude
df_positions['latitude'] = df_positions['iss_position'].apply(lambda x: x['latitude']).astype(float)
df_positions['longitude'] = df_positions['iss_position'].apply(lambda x: x['longitude']).astype(float)

# drop the 'iss_position' and 'message' columns
df_positions.drop(columns=['iss_position', 'message'], inplace=True)

# convert the 'timestamp' column to a datetime object
df_positions['timestamp'] = pd.to_datetime(df_positions['timestamp'], unit='s')

df_positions

Unnamed: 0,timestamp,latitude,longitude
0,2025-01-28 10:40:37,-49.3484,-121.6249
1,2025-01-28 10:40:42,-49.4549,-121.127
2,2025-01-28 10:40:47,-49.5497,-120.6724
3,2025-01-28 10:40:53,-49.6517,-120.1703
4,2025-01-28 10:40:59,-49.7691,-119.574
5,2025-01-28 10:41:06,-49.8832,-118.9748
6,2025-01-28 10:41:11,-49.9686,-118.5119
7,2025-01-28 10:41:16,-50.0602,-118.0009
8,2025-01-28 10:41:24,-50.1812,-117.3006
9,2025-01-28 10:41:29,-50.267,-116.7846


In [6]:
df_positions.dtypes

timestamp    datetime64[ns]
latitude            float64
longitude           float64
dtype: object

### Exercise 6

Read about the `diff` method in pandas and use it to calculate the differences between the timestamp of each request. Why is it not 1s?

In [7]:
df_positions['timestamp_diff'] = df_positions['timestamp'].diff().dt.seconds

df_positions

Unnamed: 0,timestamp,latitude,longitude,timestamp_diff
0,2025-01-28 10:40:37,-49.3484,-121.6249,
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0


### Exercise 7

I've change my mind and now we need a new column that contains tuples with the latitude and longitude of the ISS. Create this column.

In [8]:
df_positions['position_tuple'] = tuple(zip(df_positions['latitude'], df_positions['longitude']))

df_positions

Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)"
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)"
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)"
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)"
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)"
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)"
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)"
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)"
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)"
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)"


### Exercise 8

Take the column with the tuples, and zip it to itself in this way:

```python
df['new_column'] = list(zip(df['position'].shift(), df['position']))
```

In [9]:
df_positions['pos_start_end'] = list(zip(df_positions['position_tuple'].shift(), df_positions['position_tuple']))

df_positions

Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple,pos_start_end
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)","(None, (-49.3484, -121.6249))"
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)","((-49.3484, -121.6249), (-49.4549, -121.127))"
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)","((-49.4549, -121.127), (-49.5497, -120.6724))"
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)","((-49.5497, -120.6724), (-49.6517, -120.1703))"
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)","((-49.6517, -120.1703), (-49.7691, -119.574))"
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)","((-49.7691, -119.574), (-49.8832, -118.9748))"
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)","((-49.8832, -118.9748), (-49.9686, -118.5119))"
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)","((-49.9686, -118.5119), (-50.0602, -118.0009))"
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)","((-50.0602, -118.0009), (-50.1812, -117.3006))"
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)","((-50.1812, -117.3006), (-50.267, -116.7846))"


### Exercise 9

Use the `haversine` [library](https://pypi.org/project/haversine/) with a lambda function on the column with the two positions you just calcualted, to calculate the distance between two points. How can you deal with the NaN values in the first row?

The usage of the haversine library is as follows:

```python
from haversine import haversine

coord1 = (52.2296756, 21.0122287) # (lat, lon)
coord2 = (52.406374, 16.9251681) # (lat, lon)

haversine(coord1, coord2) # distance in km
```

Now calcualte the speed of the ISS between two points. The speed should be stored in a new column in the DataFrame, as km/h.
$$speed = \frac{distance}{time}$$


Extra: If you want to calculate manually the distance between two points given their latitude and longitude, you can use the [haversine formula](https://en.wikipedia.org/wiki/Haversine_formula).

In [10]:
from haversine import haversine

# dealing with NaN values in the first row of the 'pos_start_end' column
df_positions['pos_start_end'] = df_positions['pos_start_end'].apply(lambda x: (x[0], x[1]) if pd.notnull(x[0]) else (x[1], x[1]))

# calculate distance and speed
df_positions['distance'] = df_positions['pos_start_end'].apply(lambda x: haversine(x[0], x[1]))
df_positions['speed_kmh'] = df_positions['distance'] / df_positions['timestamp_diff'] * 3600

df_positions


Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple,pos_start_end,distance,speed_kmh
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)","((-49.3484, -121.6249), (-49.3484, -121.6249))",0.0,
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)","((-49.3484, -121.6249), (-49.4549, -121.127))",37.924522,27305.655643
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)","((-49.4549, -121.127), (-49.5497, -120.6724))",34.478472,24824.499938
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)","((-49.5497, -120.6724), (-49.6517, -120.1703))",37.920498,22752.29881
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)","((-49.6517, -120.1703), (-49.7691, -119.574))",44.819711,26891.826756
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)","((-49.7691, -119.574), (-49.8832, -118.9748))",44.815637,23048.041945
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)","((-49.8832, -118.9748), (-49.9686, -118.5119))",34.470406,24818.692208
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)","((-49.9686, -118.5119), (-50.0602, -118.0009))",37.906646,27292.784971
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)","((-50.0602, -118.0009), (-50.1812, -117.3006))",51.708922,23269.014804
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)","((-50.1812, -117.3006), (-50.267, -116.7846))",37.928249,27308.339109


### Exercise 10

Let's change APIs. Use the Kanye West API to get 10 quotes. Create a DataFrame with the quotes and the timestamp of the request.

In this API you don't get the timestamp. Build it yourself with the `pd.Timestamp.now()` function.

In [12]:
iters = range(10)
quotes = []

for i in iters:
    # get a quote from Kanye's API
    response = requests.get('https://api.kanye.rest')
    response_json = response.json()
    
    # add the timestamp to the response
    response_json['timestamp'] = pd.Timestamp.now()
    
    # append the response to the list of quotes
    quotes.append(response_json)

    # wait for 5 seconds
    sleep(5)
    
quotes

[{'quote': 'I am running for President of the United States',
  'timestamp': Timestamp('2025-01-28 11:49:36.254744')},
 {'quote': "Let's be like water",
  'timestamp': Timestamp('2025-01-28 11:49:41.341841')},
 {'quote': 'I am one of the most famous people on the planet',
  'timestamp': Timestamp('2025-01-28 11:49:46.416351')},
 {'quote': "People always say that you can't please everybody. I think that's a cop-out. Why not attempt it? Cause think of all the people that you will please if you try.",
  'timestamp': Timestamp('2025-01-28 11:49:51.491018')},
 {'quote': '2024', 'timestamp': Timestamp('2025-01-28 11:49:56.566302')},
 {'quote': 'Decentralize',
  'timestamp': Timestamp('2025-01-28 11:50:01.638714')},
 {'quote': 'I watch Bladerunner on repeat',
  'timestamp': Timestamp('2025-01-28 11:50:06.742456')},
 {'quote': "You can't look at a glass half full or empty if it's overflowing.",
  'timestamp': Timestamp('2025-01-28 11:50:11.800766')},
 {'quote': 'Sometimes you have to get rid o

### Exercise 11

Convert it into a Dataframe and, using regex and `findall` to count the words in each quote. Save it as a new column.

In [17]:
df_kanye = pd.DataFrame(quotes)

df_kanye['count_words'] = df_kanye['quote'].str.findall(r'\w+').str.len()

df_kanye

Unnamed: 0,quote,timestamp,count_words
0,I am running for President of the United States,2025-01-28 11:49:36.254744,9
1,Let's be like water,2025-01-28 11:49:41.341841,5
2,I am one of the most famous people on the planet,2025-01-28 11:49:46.416351,11
3,People always say that you can't please everyb...,2025-01-28 11:49:51.491018,33
4,2024,2025-01-28 11:49:56.566302,1
5,Decentralize,2025-01-28 11:50:01.638714,1
6,I watch Bladerunner on repeat,2025-01-28 11:50:06.742456,5
7,You can't look at a glass half full or empty i...,2025-01-28 11:50:11.800766,15
8,Sometimes you have to get rid of everything,2025-01-28 11:50:16.874773,8
9,I've known my mom since I was zero years old. ...,2025-01-28 11:50:21.984768,15


### Exercise 12

Create a new column that contains a boolean value that is True if the quote contains the word "I" and False otherwise.

Read about the `\b` regex pattern and use it.

In [18]:
df_kanye['contains_I'] = df_kanye['quote'].str.contains(r'\bI\b')

df_kanye

Unnamed: 0,quote,timestamp,count_words,contains_I
0,I am running for President of the United States,2025-01-28 11:49:36.254744,9,True
1,Let's be like water,2025-01-28 11:49:41.341841,5,False
2,I am one of the most famous people on the planet,2025-01-28 11:49:46.416351,11,True
3,People always say that you can't please everyb...,2025-01-28 11:49:51.491018,33,True
4,2024,2025-01-28 11:49:56.566302,1,False
5,Decentralize,2025-01-28 11:50:01.638714,1,False
6,I watch Bladerunner on repeat,2025-01-28 11:50:06.742456,5,True
7,You can't look at a glass half full or empty i...,2025-01-28 11:50:11.800766,15,False
8,Sometimes you have to get rid of everything,2025-01-28 11:50:16.874773,8,False
9,I've known my mom since I was zero years old. ...,2025-01-28 11:50:21.984768,15,True
