# Practice: 

In this we will do some work on the following topics:

* Dates and times with pandas
* Regular expressions with pandas
* Getting data from APIs with the `requests` library

The APIs that we are going to work with are the following:

* Position of the International Space Station (ISS)API
    * http://open-notify.org/Open-Notify-API/ISS-Location-Now/
    * For this API call, you just need to pass the URL and it will return the current position of the ISS.
* Kanye West quotes API
    * https://kanye.rest/
    * For this API call, you just need to pass the URL and it will return a random Kanye West quote.

### Exercise 1

Use the ISS API to get the current position of the ISS.

In [1]:
import requests 
import json 
import pandas as pd

request = requests.get('http://api.open-notify.org/iss-now.json')
df = pd.DataFrame(request.json())

df

Unnamed: 0,iss_position,timestamp,message
latitude,-37.9305,1738328920,success
longitude,-102.4884,1738328920,success


### Exercise 2

If you check the `timestamp` value in the response, you will see that it is in Unix time. The Unix timestamp represents the number of seconds that have passed since the Unix epoch time (January 1, 1970). Convert this to a timestamp in ISO format (YYYY-MM-DD HH:MM:SS).

You can do that using the [pd.to_datetime()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) with the paramter `unit`. Choose the right unit to convert the Unix timestamp to a timestamp in ISO format.


In [2]:
df['time'] = pd.to_datetime(df['timestamp'], unit = 's')
df

Unnamed: 0,iss_position,timestamp,message,time
latitude,-37.9305,1738328920,success,2025-01-31 13:08:40
longitude,-102.4884,1738328920,success,2025-01-31 13:08:40


### Exercise 3

Using the [sleep function](https://docs.python.org/3/library/time.html#time.sleep) from the `time` library, write a function that prints the current datetime every 5 seconds. The function should stop after 10 iterations.

You can use the function `pd.Timestamp.now()` to get the current datetime at each iteration.

In [3]:
import time 

def time_every_5_seconds():
    for i in range(10):
        print(pd.Timestamp.now())
        time.sleep(5)

time_every_5_seconds()

2025-01-31 14:08:58.150965
2025-01-31 14:09:03.166099
2025-01-31 14:09:08.168717
2025-01-31 14:09:13.180011
2025-01-31 14:09:18.180689
2025-01-31 14:09:23.193865
2025-01-31 14:09:28.196318
2025-01-31 14:09:33.200729
2025-01-31 14:09:38.214297
2025-01-31 14:09:43.219522


### Exercise 4

Create a function that receives the position of the ISS 10 times, using `sleep` to wait 5s between requests, and returns a list with the dictionaries from the responses.

In [4]:
requests.get('http://api.open-notify.org/iss-now.json').json()['timestamp']

1738328988

In [5]:
def get_iss_position_10():
    list_of_responses = []
    for i in range(10):
        response = requests.get('http://api.open-notify.org/iss-now.json').json()
        list_of_responses.append(response)
        time.sleep(5)
    return list_of_responses

iss_positions = get_iss_position_10()

In [6]:
iss_positions

[{'iss_position': {'latitude': '-34.6771', 'longitude': '-97.9346'},
  'timestamp': 1738329000,
  'message': 'success'},
 {'iss_position': {'latitude': '-34.4216', 'longitude': '-97.6042'},
  'timestamp': 1738329006,
  'message': 'success'},
 {'iss_position': {'latitude': '-33.8872', 'longitude': '-96.9243'},
  'timestamp': 1738329018,
  'message': 'success'},
 {'iss_position': {'latitude': '-33.6506', 'longitude': '-96.6281'},
  'timestamp': 1738329024,
  'message': 'success'},
 {'iss_position': {'latitude': '-33.4133', 'longitude': '-96.3336'},
  'timestamp': 1738329029,
  'message': 'success'},
 {'iss_position': {'latitude': '-33.1969', 'longitude': '-96.0675'},
  'timestamp': 1738329034,
  'message': 'success'},
 {'iss_position': {'latitude': '-32.9580', 'longitude': '-95.7763'},
  'timestamp': 1738329040,
  'message': 'success'},
 {'iss_position': {'latitude': '-32.6747', 'longitude': '-95.4343'},
  'timestamp': 1738329046,
  'message': 'success'},
 {'iss_position': {'latitude': '

### Exercise 5

Create a DataFrame with the responses from the previous exercise. The DataFrame should have the following columns:

* `timestamp`: the timestamp of the response
* `latitude`: the latitude of the ISS
* `longitude`: the longitude of the ISS

In [8]:
new_df = pd.DataFrame(iss_positions)

new_df['latitude'] = new_df['iss_position'].apply(lambda x: x['latitude'])
new_df['longitute'] = new_df['iss_position'].apply(lambda x: x['longitude'])

df_cleaned = new_df.drop(columns = ['iss_position', 'message'])
df_cleaned['timestamp'] = pd.to_datetime(df_cleaned['timestamp'], unit = 's')
df_cleaned

Unnamed: 0,timestamp,latitude,longitute
0,2025-01-31 13:10:00,-34.6771,-97.9346
1,2025-01-31 13:10:06,-34.4216,-97.6042
2,2025-01-31 13:10:18,-33.8872,-96.9243
3,2025-01-31 13:10:24,-33.6506,-96.6281
4,2025-01-31 13:10:29,-33.4133,-96.3336
5,2025-01-31 13:10:34,-33.1969,-96.0675
6,2025-01-31 13:10:40,-32.958,-95.7763
7,2025-01-31 13:10:46,-32.6747,-95.4343
8,2025-01-31 13:10:52,-32.4123,-95.1208
9,2025-01-31 13:11:01,-32.039,-94.6799


### Exercise 6

Read about the `diff` method in pandas and use it to calculate the differences between the timestamp of each request. Why is it not 1s?

In [9]:
df_cleaned['timestamp_diff_seconds'] = df_cleaned['timestamp'].diff().dt.seconds
df_cleaned # It is not 1 second because the response time as well as the time.sleep(5)

Unnamed: 0,timestamp,latitude,longitute,timestamp_diff_seconds
0,2025-01-31 13:10:00,-34.6771,-97.9346,
1,2025-01-31 13:10:06,-34.4216,-97.6042,6.0
2,2025-01-31 13:10:18,-33.8872,-96.9243,12.0
3,2025-01-31 13:10:24,-33.6506,-96.6281,6.0
4,2025-01-31 13:10:29,-33.4133,-96.3336,5.0
5,2025-01-31 13:10:34,-33.1969,-96.0675,5.0
6,2025-01-31 13:10:40,-32.958,-95.7763,6.0
7,2025-01-31 13:10:46,-32.6747,-95.4343,6.0
8,2025-01-31 13:10:52,-32.4123,-95.1208,6.0
9,2025-01-31 13:11:01,-32.039,-94.6799,9.0


Unnamed: 0,timestamp,latitude,longitude,timestamp_diff
0,2025-01-28 10:40:37,-49.3484,-121.6249,
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0


### Exercise 7

I've change my mind and now we need a new column that contains tuples with the latitude and longitude of the ISS. Create this column.

In [14]:
df_cleaned['tuple_lon_lat'] = list(zip(df_cleaned['longitute'], df_cleaned['latitude']))

df_cleaned

Unnamed: 0,timestamp,latitude,longitute,timestamp_diff_seconds,tuple_lon_lat
0,2025-01-31 13:10:00,-34.6771,-97.9346,,"(-97.9346, -34.6771)"
1,2025-01-31 13:10:06,-34.4216,-97.6042,6.0,"(-97.6042, -34.4216)"
2,2025-01-31 13:10:18,-33.8872,-96.9243,12.0,"(-96.9243, -33.8872)"
3,2025-01-31 13:10:24,-33.6506,-96.6281,6.0,"(-96.6281, -33.6506)"
4,2025-01-31 13:10:29,-33.4133,-96.3336,5.0,"(-96.3336, -33.4133)"
5,2025-01-31 13:10:34,-33.1969,-96.0675,5.0,"(-96.0675, -33.1969)"
6,2025-01-31 13:10:40,-32.958,-95.7763,6.0,"(-95.7763, -32.9580)"
7,2025-01-31 13:10:46,-32.6747,-95.4343,6.0,"(-95.4343, -32.6747)"
8,2025-01-31 13:10:52,-32.4123,-95.1208,6.0,"(-95.1208, -32.4123)"
9,2025-01-31 13:11:01,-32.039,-94.6799,9.0,"(-94.6799, -32.0390)"


Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)"
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)"
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)"
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)"
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)"
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)"
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)"
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)"
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)"
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)"


### Exercise 8

Take the column with the tuples, and zip it to itself in this way:

```python
df['new_column'] = list(zip(df['position'].shift(), df['position']))
```

In [18]:
df_cleaned['pos_start_end'] = list(zip(df_cleaned['tuple_lon_lat'].shift(), df_cleaned['tuple_lon_lat']))

df_cleaned

Unnamed: 0,timestamp,latitude,longitute,timestamp_diff_seconds,tuple_lon_lat,new_column,pos_start_end
0,2025-01-31 13:10:00,-34.6771,-97.9346,,"(-97.9346, -34.6771)","(None, (-97.9346, -34.6771))","(None, (-97.9346, -34.6771))"
1,2025-01-31 13:10:06,-34.4216,-97.6042,6.0,"(-97.6042, -34.4216)","((-97.9346, -34.6771), (-97.6042, -34.4216))","((-97.9346, -34.6771), (-97.6042, -34.4216))"
2,2025-01-31 13:10:18,-33.8872,-96.9243,12.0,"(-96.9243, -33.8872)","((-97.6042, -34.4216), (-96.9243, -33.8872))","((-97.6042, -34.4216), (-96.9243, -33.8872))"
3,2025-01-31 13:10:24,-33.6506,-96.6281,6.0,"(-96.6281, -33.6506)","((-96.9243, -33.8872), (-96.6281, -33.6506))","((-96.9243, -33.8872), (-96.6281, -33.6506))"
4,2025-01-31 13:10:29,-33.4133,-96.3336,5.0,"(-96.3336, -33.4133)","((-96.6281, -33.6506), (-96.3336, -33.4133))","((-96.6281, -33.6506), (-96.3336, -33.4133))"
5,2025-01-31 13:10:34,-33.1969,-96.0675,5.0,"(-96.0675, -33.1969)","((-96.3336, -33.4133), (-96.0675, -33.1969))","((-96.3336, -33.4133), (-96.0675, -33.1969))"
6,2025-01-31 13:10:40,-32.958,-95.7763,6.0,"(-95.7763, -32.9580)","((-96.0675, -33.1969), (-95.7763, -32.9580))","((-96.0675, -33.1969), (-95.7763, -32.9580))"
7,2025-01-31 13:10:46,-32.6747,-95.4343,6.0,"(-95.4343, -32.6747)","((-95.7763, -32.9580), (-95.4343, -32.6747))","((-95.7763, -32.9580), (-95.4343, -32.6747))"
8,2025-01-31 13:10:52,-32.4123,-95.1208,6.0,"(-95.1208, -32.4123)","((-95.4343, -32.6747), (-95.1208, -32.4123))","((-95.4343, -32.6747), (-95.1208, -32.4123))"
9,2025-01-31 13:11:01,-32.039,-94.6799,9.0,"(-94.6799, -32.0390)","((-95.1208, -32.4123), (-94.6799, -32.0390))","((-95.1208, -32.4123), (-94.6799, -32.0390))"


Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple,pos_start_end
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)","(None, (-49.3484, -121.6249))"
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)","((-49.3484, -121.6249), (-49.4549, -121.127))"
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)","((-49.4549, -121.127), (-49.5497, -120.6724))"
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)","((-49.5497, -120.6724), (-49.6517, -120.1703))"
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)","((-49.6517, -120.1703), (-49.7691, -119.574))"
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)","((-49.7691, -119.574), (-49.8832, -118.9748))"
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)","((-49.8832, -118.9748), (-49.9686, -118.5119))"
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)","((-49.9686, -118.5119), (-50.0602, -118.0009))"
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)","((-50.0602, -118.0009), (-50.1812, -117.3006))"
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)","((-50.1812, -117.3006), (-50.267, -116.7846))"


### Exercise 9

Use the `haversine` [library](https://pypi.org/project/haversine/) with a lambda function on the column with the two positions you just calcualted, to calculate the distance between two points. How can you deal with the NaN values in the first row?

The usage of the haversine library is as follows:

```python
from haversine import haversine

coord1 = (52.2296756, 21.0122287) # (lat, lon)
coord2 = (52.406374, 16.9251681) # (lat, lon)

haversine(coord1, coord2) # distance in km
```

Now calcualte the speed of the ISS between two points. The speed should be stored in a new column in the DataFrame, as km/h.
$$speed = \frac{distance}{time}$$


Extra: If you want to calculate manually the distance between two points given their latitude and longitude, you can use the [haversine formula](https://en.wikipedia.org/wiki/Haversine_formula).

In [19]:
from haversine import haversine

cord1 = df_cleaned['pos_start_end'][1][0]
cord2 = df_cleaned['pos_start_end'][1][1]

haversine(cord1, cord2)

Unnamed: 0,timestamp,latitude,longitude,timestamp_diff,position_tuple,pos_start_end,distance,speed_kmh
0,2025-01-28 10:40:37,-49.3484,-121.6249,,"(-49.3484, -121.6249)","((-49.3484, -121.6249), (-49.3484, -121.6249))",0.0,
1,2025-01-28 10:40:42,-49.4549,-121.127,5.0,"(-49.4549, -121.127)","((-49.3484, -121.6249), (-49.4549, -121.127))",37.924522,27305.655643
2,2025-01-28 10:40:47,-49.5497,-120.6724,5.0,"(-49.5497, -120.6724)","((-49.4549, -121.127), (-49.5497, -120.6724))",34.478472,24824.499938
3,2025-01-28 10:40:53,-49.6517,-120.1703,6.0,"(-49.6517, -120.1703)","((-49.5497, -120.6724), (-49.6517, -120.1703))",37.920498,22752.29881
4,2025-01-28 10:40:59,-49.7691,-119.574,6.0,"(-49.7691, -119.574)","((-49.6517, -120.1703), (-49.7691, -119.574))",44.819711,26891.826756
5,2025-01-28 10:41:06,-49.8832,-118.9748,7.0,"(-49.8832, -118.9748)","((-49.7691, -119.574), (-49.8832, -118.9748))",44.815637,23048.041945
6,2025-01-28 10:41:11,-49.9686,-118.5119,5.0,"(-49.9686, -118.5119)","((-49.8832, -118.9748), (-49.9686, -118.5119))",34.470406,24818.692208
7,2025-01-28 10:41:16,-50.0602,-118.0009,5.0,"(-50.0602, -118.0009)","((-49.9686, -118.5119), (-50.0602, -118.0009))",37.906646,27292.784971
8,2025-01-28 10:41:24,-50.1812,-117.3006,8.0,"(-50.1812, -117.3006)","((-50.0602, -118.0009), (-50.1812, -117.3006))",51.708922,23269.014804
9,2025-01-28 10:41:29,-50.267,-116.7846,5.0,"(-50.267, -116.7846)","((-50.1812, -117.3006), (-50.267, -116.7846))",37.928249,27308.339109


### Exercise 10

Let's change APIs. Use the Kanye West API to get 10 quotes. Create a DataFrame with the quotes and the timestamp of the request.

In this API you don't get the timestamp. Build it yourself with the `pd.Timestamp.now()` function.

[{'quote': 'I am running for President of the United States',
  'timestamp': Timestamp('2025-01-28 11:49:36.254744')},
 {'quote': "Let's be like water",
  'timestamp': Timestamp('2025-01-28 11:49:41.341841')},
 {'quote': 'I am one of the most famous people on the planet',
  'timestamp': Timestamp('2025-01-28 11:49:46.416351')},
 {'quote': "People always say that you can't please everybody. I think that's a cop-out. Why not attempt it? Cause think of all the people that you will please if you try.",
  'timestamp': Timestamp('2025-01-28 11:49:51.491018')},
 {'quote': '2024', 'timestamp': Timestamp('2025-01-28 11:49:56.566302')},
 {'quote': 'Decentralize',
  'timestamp': Timestamp('2025-01-28 11:50:01.638714')},
 {'quote': 'I watch Bladerunner on repeat',
  'timestamp': Timestamp('2025-01-28 11:50:06.742456')},
 {'quote': "You can't look at a glass half full or empty if it's overflowing.",
  'timestamp': Timestamp('2025-01-28 11:50:11.800766')},
 {'quote': 'Sometimes you have to get rid o

### Exercise 11

Convert it into a Dataframe and, using regex and `findall` to count the words in each quote. Save it as a new column.

Unnamed: 0,quote,timestamp,count_words
0,I am running for President of the United States,2025-01-28 11:49:36.254744,9
1,Let's be like water,2025-01-28 11:49:41.341841,5
2,I am one of the most famous people on the planet,2025-01-28 11:49:46.416351,11
3,People always say that you can't please everyb...,2025-01-28 11:49:51.491018,33
4,2024,2025-01-28 11:49:56.566302,1
5,Decentralize,2025-01-28 11:50:01.638714,1
6,I watch Bladerunner on repeat,2025-01-28 11:50:06.742456,5
7,You can't look at a glass half full or empty i...,2025-01-28 11:50:11.800766,15
8,Sometimes you have to get rid of everything,2025-01-28 11:50:16.874773,8
9,I've known my mom since I was zero years old. ...,2025-01-28 11:50:21.984768,15


### Exercise 12

Create a new column that contains a boolean value that is True if the quote contains the word "I" and False otherwise.

Read about the `\b` regex pattern and use it.

Unnamed: 0,quote,timestamp,count_words,contains_I
0,I am running for President of the United States,2025-01-28 11:49:36.254744,9,True
1,Let's be like water,2025-01-28 11:49:41.341841,5,False
2,I am one of the most famous people on the planet,2025-01-28 11:49:46.416351,11,True
3,People always say that you can't please everyb...,2025-01-28 11:49:51.491018,33,True
4,2024,2025-01-28 11:49:56.566302,1,False
5,Decentralize,2025-01-28 11:50:01.638714,1,False
6,I watch Bladerunner on repeat,2025-01-28 11:50:06.742456,5,True
7,You can't look at a glass half full or empty i...,2025-01-28 11:50:11.800766,15,False
8,Sometimes you have to get rid of everything,2025-01-28 11:50:16.874773,8,False
9,I've known my mom since I was zero years old. ...,2025-01-28 11:50:21.984768,15,True
