# Analytical Dataset (ADS)

An `ADS` is a table created for specific analytic purposes. The concept is to merge different data sources so that all possible information about the objects of interest (most often the clients) are in one place. This data is then distilled with sliding windows.

1. The core part of `ADS` is a sliding window for each time period (eg. 1 week).
2. `ADS` contains one line for each observation every week.

![graph](https://i.imgur.com/ojRMfB0.png)

The advantages of `ADS` is as follows:
- combines all data sources into one table
- in the future, ML models can be based on one table
- by using time slices (weekly, monthly), we take care of fluctuations in the data
- it suits as aggregation layer for the reporting
- batch scoring (weekly, monthly) is easy to implement
- new data sources can be simple added in the future using joins

![graph](https://i.imgur.com/zXsvJ25.png)

## Connecting to the `northwind` database

In [1]:
import sqlite3
from sqlite3 import Error

In [2]:
def create_connection(path):
  con = None
  try:
    con = sqlite3.connect(database=path)
    print('Connection to SQLite DB successful.')
  except Error as e:
    print(f'The error \'{e}\' occurred.')
  
  return con

In [3]:
con = create_connection('./_data/northwind.db')

Connection to SQLite DB successful.


In [4]:
def execute_query(connection, query):
  cur = con.cursor()
  result = None
  try:
    cur.execute(query)
    result = cur.fetchall()
    return result
    print('Query executed successfully.')
  except Error as e:
    print(f'The error \'{e}\' occurred.')

In [13]:
query_count = """ 
SELECT COUNT(*) FROM orders
"""

query_min = """ 
SELECT MIN(orderdate) FROM orders
"""

query_max = """ 
SELECT MAX(orderdate) FROM orders
"""

In [26]:
order_count = execute_query(con, query_count)
min_orderdate = execute_query(con, query_min)
max_orderdate = execute_query(con, query_max)
print(f'order count: {order_count[0][0]}\nmin order date: {min_orderdate[0][0]}\nmax order date: {max_orderdate[0][0]}')

order count: 830
min order date: 1996-07-04
max order date: 1998-05-06


There are 830 orders ranging from `1996-07-04` to `1998-05-06`. From this, an `ADS` can be built aggregated by month. \
It is also possible to aggregate by day or week but for this example, monthly windows are sufficient.

For traditional banking, 1 month may be enough. For telecommunications, 1 week can be appropriate, but there are also industries like e-commerce where they need to aggregate per day.

This is tutorial orders will be aggregated each month and label it with the column called `end_obs_date` (end observation date). `end_obs_date` is the column that labels the monthly slice aggregated.

Example:
- order date: 1996-12-12 --> `endobsdate`: 1997-01-01
- order date: 1997-01-31 --> `endobsdate`: 1997-02-01
