# Seminar - APIs, DBs and Live coding

In [None]:
import requests # for making HTTP requests
import pandas as pd # for dataframes
import time # for sleep
import re # regular expressions

## Task 1: Requesting API

Let us work with data of sreality.cz which we can access via their api. An intuition is that the api is limited for a number of requests (but not verified).

### 1a. Create a function requesting data from sreality

```python
base_url = 'https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&locality_region_id=10&per_page60&page={}'.format(i)

r = requests.get(base_url)
d = r.json()
```

0) function should parametrize: 
    * `category_main_cb` - `{'flat':1, 'house':2, 'land':3 }`
    * `category_type_cb` - `{'sell':1,'rent':2}`
    * `locality_region_id` - use 10 as default value
    * `page` parameter
1) use string inputs for `category_main_cb` and `category_type_cb`
2) include `try/except` clause to handle errors
3) function should return JSON data in python types
4) do not forget to sleep each request at least 0.5s

In [None]:
def request_sreality(page, category_main_str, category_type_str, locality_region_id=10):
    """
    Request data from sreality.cz API
    :param page: page number
    :param category_main_str: category of the property
    :param category_type_str: type of the offer
    :param locality_region_id: region id
    :return json: json response
    """
    category_mains = {'flat':1, 'house':2, 'land':3 }
    category_types = {'sell':1, 'rent':2}
    template_url = 'https://www.sreality.cz/api/cs/v2/estates?category_main_cb={category_main}&category_type_cb={category_type}&locality_region_id={locality_region_id}&per_page60&page={page}'
    request_url = template_url.format(
        category_main=category_mains[category_main_str],
        category_type=category_types[category_type_str],
        locality_region_id=locality_region_id,
        page=page
    )
    r = requests.get(request_url)
    return r.json()

d = request_sreality(0, 'flat', 'sell', 10)

Inspect the element `d`:

### 1b. Create a function converting sreality json data into pandas dataframe

In [None]:
def convert_sreality_data_to_df(sreality_data):
    return

raw = convert_sreality_data_to_df(d)

In [None]:
raw.head()

### 1c. link function `1b` into function `1a`

In [None]:
df = request_sreality(0, 'flat', 'sell', 10)
df.head()

### 1c. Combining multiple requests into single df

* Function should parametrize:
    * `start_page` and `end_page`
    * request parameters
* construct a list of individual request dfs
* then feed it into `pd.concat` function

In [None]:
raw.shape

In [None]:
request_sreality

In [None]:
def request_multiply_sreality(start_page, end_page, category_main_str, category_type_str, locality_region_id=10):
    
    return pd.concat(list_of_dfs)

df = request_multiply_sreality(1, 5, 'flat', 'sell',10)
df.shape

## Task 2: Cleaning data

### 2a. Filter columns
* filter only columns: `['locality', 'price', 'name', 'gps','hash_id','exclusively_at_rk']`
* use `.copy()` to avoid `SettingWithCopyWarning` later


### 2b: GPS
* Convert dictionary in `gps` column into two columns - `lat` and `lon`
* use apply function on gps column
* Note apply can return multiple columns

### 2c. Get flat type from name

* Name is always represented by string `Prodej bytu [type of flat] [Area] m^2`
* Try picking third word in string
* Check meaningfulness using `.value_counts()`

### 2d. Get the area of a flat from name

* Naive: select the word before last word
* Then try navigating using the index of `'m²'`
* If this also fail, then you will need to use regex - `import re`

In [None]:
def name_to_area(nm):

    return 

clean['area_2'] = clean.name.apply(name_to_area)
clean

## Bonus Tasks: Convert `labelsAll` into categorical variables

### 4a. Get all possible label names

* Deal with nested-list structure
* Hint: try to sum the whole column
* Needed to Iterate through all labels in all rows and 

### 4b. Test existence of label `cellar` for offers

* Again, deal with nested list of list structures
* Write generic function `test_existence_of_label(offer_labels,label)`

### 4c. Test existence of all possible labels

* Use apply returning series with all labels