# Data Collection

We are using free open api to get the data.A credit to: https://www.7timer.info/. It provide API to get the weather forecast infomation by given geolocation. 
For example, inputting shanghai will get unique endpoint: https://www.7timer.info/bin/api.pl?lon=121.474&lat=31.23&product=civil&output=json

Weather forecast information for next few days is returned in this endpoint. 
By default,CIVIL product is used to present the result.

## Send Request

If we want to use Python to get response from url, we have to use some packages, like requests.

```
! pip install requests

```
https://docs.python-requests.org/en/latest/
jump to quickstart of document, and start coding.
https://docs.python-requests.org/en/latest/user/quickstart/

In [1]:
import requests

In [2]:
url = 'https://www.7timer.info/bin/api.pl?lon=121.474&lat=31.23&product=civil&output=json'

In [3]:
r = requests.get(url)
print(r)
print(r.encoding)
print(type(r.text))

<Response [200]>
UTF-8
<class 'str'>


It is time to parase the response contents to python object, like dict. Keep in mind, by default, requests return the response in json format. Python is integrated standard json package, so we just need to import it and loads the contents.

In [4]:
import json

In [5]:
text_j= json.loads(r.text)
print(type(text_j))

<class 'dict'>


Let us have a glance at text_j dict.

In [6]:
text_j.keys()

dict_keys(['product', 'init', 'dataseries'])

In [7]:
text_j['product']

'civil'

In [8]:
text_j['init']

'2022020200'

In [9]:
text_j['dataseries'][0]

# https://github.com/Yeqzids/7timer-issues/wiki/Wiki

{'timepoint': 3,
 'cloudcover': 9,
 'lifted_index': 15,
 'prec_type': 'rain',
 'prec_amount': 1,
 'temp2m': 4,
 'rh2m': '74%',
 'wind10m': {'direction': 'NE', 'speed': 3},
 'weather': 'lightrainday'}

## Parse Result
We are noted that most useful information is kept in "dataseries" element. For this type of data (list of dict), we suggest to use DataFrame to parse it. 


In [10]:
import pandas as pd

In [11]:
weather_info = pd.DataFrame(text_j['dataseries'])
weather_info.head(5)

Unnamed: 0,timepoint,cloudcover,lifted_index,prec_type,prec_amount,temp2m,rh2m,wind10m,weather
0,3,9,15,rain,1,4,74%,"{'direction': 'NE', 'speed': 3}",lightrainday
1,6,9,15,rain,2,4,75%,"{'direction': 'NE', 'speed': 3}",lightrainday
2,9,9,15,rain,2,4,72%,"{'direction': 'NE', 'speed': 3}",lightrainday
3,12,9,15,rain,2,3,85%,"{'direction': 'NE', 'speed': 3}",lightrainnight
4,15,9,15,rain,2,3,92%,"{'direction': 'NE', 'speed': 3}",lightrainnight


## Transform Data

### string to datetime

"init" datetime is quite important, but it is returned as str. It is good practice to convert it to datatime. Since we decide to use DataFrame to parse raw text, consistently, we are using pd.to_datetime function rather than datetime package.

In [12]:
start_time = pd.to_datetime(text_j['init'],format='%Y%m%d%H')
start_time

Timestamp('2022-02-02 00:00:00')

### int to timedelta
timepoint column is recorded as "next N" hours from init time, so we can reverse them back to real timestamps.
1. convert int to timedelta
2. shift init time by using each timedelta

In [13]:
# convert timepoint to timestamp
weather_info['timepoint'] = pd.to_timedelta(weather_info['timepoint'],unit='h')

In [14]:
weather_info['timestamp'] = start_time+ weather_info['timepoint']

In [15]:
weather_info.head(5)

Unnamed: 0,timepoint,cloudcover,lifted_index,prec_type,prec_amount,temp2m,rh2m,wind10m,weather,timestamp
0,0 days 03:00:00,9,15,rain,1,4,74%,"{'direction': 'NE', 'speed': 3}",lightrainday,2022-02-02 03:00:00
1,0 days 06:00:00,9,15,rain,2,4,75%,"{'direction': 'NE', 'speed': 3}",lightrainday,2022-02-02 06:00:00
2,0 days 09:00:00,9,15,rain,2,4,72%,"{'direction': 'NE', 'speed': 3}",lightrainday,2022-02-02 09:00:00
3,0 days 12:00:00,9,15,rain,2,3,85%,"{'direction': 'NE', 'speed': 3}",lightrainnight,2022-02-02 12:00:00
4,0 days 15:00:00,9,15,rain,2,3,92%,"{'direction': 'NE', 'speed': 3}",lightrainnight,2022-02-02 15:00:00



## From City Name to Geolocation
what if we want to get weather info by inputing the city name? We have to get the geolocation (longitude and latitude) from city name. Luckily, there are multiple wheels for this and geopy is one of them.

```
pip install geopy
```
***More info about geopy: ***
https://geopy.readthedocs.io/en/stable/

In [16]:
from geopy.geocoders import Nominatim

In [17]:
geolocator = Nominatim(user_agent='baidu')
location = geolocator.geocode("shanghai")
print(location.address)
print((location.latitude, location.longitude))

上海市, 黄浦区, 上海市, 200001, 中国
(31.2322758, 121.4692071)


So far, we know how to request the weather info by inputing the city name step by step. The codes shall be sorted somehow before we move to next functionality design.