## Task 1
### Choose any weather or exchange rate api and create an ETL job in python to download data that will be loaded to a fact table nightly.

There is not so easy to find API with the free historical data.
One of the ways: [darksky.net](https://darksky.net/dev/docs#time-machine-request)

It uses Latitude and Longitude coordinates . We will use coordinates of [Brandenburg Gate, Berlin, Germany](https://www.latlong.net/place/brandenburg-gate-berlin-germany-193.html): latitude and longitude coordinates are 52.516266, 13.377775.

It is possible to get data:
- using [Requests](http://www.python-requests.org/en/master/)
- using library [darkskylib](https://github.com/lukaskubis/darkskylib)

I will do the first way

In [1]:
import datetime
import time 
import pandas as pd
from pprint import pprint
import requests

Since it was not specified in the task - I will get nighly historical data for yesterday.

Datetime parameters need to be in unix time. Let us calcuate variable `yesterday_beginning_time`

In [2]:
import datetime 

yesterday = datetime.datetime.now() - datetime.timedelta(days = 1)
yesterday_beginning = datetime.datetime(yesterday.year, yesterday.month, yesterday.day,0,0,0,0)
yesterday_beginning_time = int(time.mktime(yesterday_beginning.timetuple()))
#yesterday_end = datetime.datetime(yesterday.year, yesterday.month, yesterday.day,23,59,59,999)
#yesterday_end_time = int(time.mktime(yesterday_end.timetuple()))
 
print(yesterday_beginning_time)
#print(yesterday_end_time)


1549321200


In [3]:
key = '9a16c79da203323b0b7256a473c2f030'
lat = '52.516266'
lng = '13.377775'

In [4]:
#we want to get only hourly information in Si units
request_str = 'https://api.darksky.net/forecast/'+key+'/'+lat+','+lng+','+str(yesterday_beginning_time)+'?units=si&exclude=currently,minutely,daily,alerts,flags' 
print(request_str)
r = requests.get(request_str)
print(type(r))
#pprint(r.json())

https://api.darksky.net/forecast/9a16c79da203323b0b7256a473c2f030/52.516266,13.377775,1549321200?units=si&exclude=currently,minutely,daily,alerts,flags
<class 'requests.models.Response'>


In [5]:
#extract the data from the responce
import json
#json_data = json.loads(r.text)
json_data = r.json()
print(type(json_data))
print(type(json_data.get("hourly")))
print(type(json_data.get("hourly").get("data")))
print('Response contains rows: ' + str(len(json_data.get("hourly").get("data"))))
json_data.get("hourly").get("data")[0]

<class 'dict'>
<class 'dict'>
<class 'list'>
Response contains rows: 24


{'time': 1549321200,
 'summary': 'Overcast',
 'icon': 'cloudy',
 'precipIntensity': 0,
 'precipProbability': 0,
 'temperature': 1.03,
 'apparentTemperature': -3.67,
 'dewPoint': -2.98,
 'humidity': 0.75,
 'pressure': 1026.75,
 'windSpeed': 5.08,
 'windGust': 12.59,
 'windBearing': 180,
 'cloudCover': 1,
 'uvIndex': 0,
 'visibility': 10.01,
 'ozone': 369.81}

In [6]:
#list to the dataframe
df = pd.DataFrame(json_data.get("hourly").get("data"))
df.head()

Unnamed: 0,apparentTemperature,cloudCover,dewPoint,humidity,icon,ozone,precipAccumulation,precipIntensity,precipProbability,precipType,pressure,summary,temperature,time,uvIndex,visibility,windBearing,windGust,windSpeed
0,-3.67,1.0,-2.98,0.75,cloudy,369.81,,0.0,0.0,,1026.75,Overcast,1.03,1549321200,0,10.01,180,12.59,5.08
1,-3.75,0.99,-2.98,0.77,cloudy,365.74,,0.0,0.0,,1025.87,Overcast,0.61,1549324800,0,10.01,180,10.87,4.32
2,-5.06,0.95,-3.98,0.74,cloudy,362.52,,0.0,0.0,,1025.45,Overcast,0.05,1549328400,0,10.01,191,10.99,5.36
3,-4.24,0.88,-3.98,0.71,partly-cloudy-night,358.83,,0.0,0.0,,1025.05,Mostly Cloudy,0.61,1549332000,0,10.01,201,11.3,5.15
4,-4.04,0.85,-3.98,0.71,partly-cloudy-night,356.84,,0.0,0.0,,1024.77,Mostly Cloudy,0.61,1549335600,0,10.01,205,11.38,4.79


In [7]:
df.dtypes

apparentTemperature    float64
cloudCover             float64
dewPoint               float64
humidity               float64
icon                    object
ozone                  float64
precipAccumulation     float64
precipIntensity        float64
precipProbability      float64
precipType              object
pressure               float64
summary                 object
temperature            float64
time                     int64
uvIndex                  int64
visibility             float64
windBearing              int64
windGust               float64
windSpeed              float64
dtype: object

In [8]:
#adding the column for the human timestamp
df["timestamp"] = pd.to_datetime(df['time'],unit='s')
df.head()

Unnamed: 0,apparentTemperature,cloudCover,dewPoint,humidity,icon,ozone,precipAccumulation,precipIntensity,precipProbability,precipType,pressure,summary,temperature,time,uvIndex,visibility,windBearing,windGust,windSpeed,timestamp
0,-3.67,1.0,-2.98,0.75,cloudy,369.81,,0.0,0.0,,1026.75,Overcast,1.03,1549321200,0,10.01,180,12.59,5.08,2019-02-04 23:00:00
1,-3.75,0.99,-2.98,0.77,cloudy,365.74,,0.0,0.0,,1025.87,Overcast,0.61,1549324800,0,10.01,180,10.87,4.32,2019-02-05 00:00:00
2,-5.06,0.95,-3.98,0.74,cloudy,362.52,,0.0,0.0,,1025.45,Overcast,0.05,1549328400,0,10.01,191,10.99,5.36,2019-02-05 01:00:00
3,-4.24,0.88,-3.98,0.71,partly-cloudy-night,358.83,,0.0,0.0,,1025.05,Mostly Cloudy,0.61,1549332000,0,10.01,201,11.3,5.15,2019-02-05 02:00:00
4,-4.04,0.85,-3.98,0.71,partly-cloudy-night,356.84,,0.0,0.0,,1024.77,Mostly Cloudy,0.61,1549335600,0,10.01,205,11.38,4.79,2019-02-05 03:00:00



And finally, we will need to load extracted data to a fact table nightly.

We can either at first step load it to some raw DB table and later merge information with the final destination table, or insert data to the final table.

I don't have created DB for that goal, so will just show a rough code for data insert using the syntax of [SQLite](http://www.sqlitetutorial.net/sqlite-python/)

```python
conn = sqlite3.connect(db_file)
print(conn)

sql = ''' 
    INSERT INTO weather_data_raw
        ([apparentTemperature], [cloudCover], [dewPoint], [humidity], [icon], [ozone], [precipAccumulation], [precipIntensity], [precipProbability], [precipType], 
        [pressure], [summary], [temperature], [time], [uvIndex], [visibility], [windBearing], [windGust], [windSpeed], [timestamp]
        )
              VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
    '''
cur = conn.cursor()
cur.executemany(sql, df.values.tolist())
print(cur.lastrowid)
conn.commit()
conn.close()
```

## Task 2
### Take your best guess with regards to the logical schema of our data, and design a rough dimensional model for it. (as a graphic)

I built a rough schema for B2C business (for example, related to the row in the [business_types] table with id=1 and name = 'B2C').

For B2B2C business (where you sell the contracts to some company for their employees) there need to add some additional logic.

And, of course, financial balances - is a wide topic. There has to be a lot of checks and scheduled jobs for creation of subscriptions for each payment period (month), activation and deactivation, matching to the received payments, additional possible discount withing the life cycle of the customer.

You can [observe my schema online](https://dbdiagram.io/d/5c5aff8039ca7c00141ba8df)

Or here it is as a picture:

In [9]:
%%html
<img src="USC.png",width=60,height=60>

## Task 3
### A typical advertising network can take 48h to 2 weeks to settle the data (remove bot traffic). Such, we would always like to update the last 2 weeks of data in our ad performance table. How would you approach this? Describe it in text, graphic, pseudocode or code, at your convenience.

It depends from the amount of data in the table. 

Basically, we can just every day remove from the final DB Table all the rows from the last 14 days, get them again from the advertising network and put to the final table of events.

In case there is huge count of the rows in the table and already built indexes on them - instead of simple deleting and inserting can be possible using the modification through partitions:
- data for the each date can be store in the DB Table (and on the disk) in the separate partition
- each day exclude from the final table partitions for the last 14 days
- add new re-imported from the advertising network "clear" data to the final table as a new partitions.