# Automatic cube creation with atoti - Advanced

[atoti](https://www.atoti.io/) is a free Python BI analytics platform for Quants, Data Analysts, Data Scientists & Business Users to collaborate better, analyze faster and translate their data into business KPIs.  

This notebook is an extension of [main.ipynb](main.ipynb), demonstrating how users could customize the data type of each column. This is useful particularly for columns storing an array list. We will also the atoti session and its attributes in this notebook after the BI application is created (with reference to the  [VaR dataset](https://s3.eu-west-3.amazonaws.com/data.atoti.io/notebooks/auto-cube/var_dataset.csv)).   

<img src="https://data.atoti.io/notebooks/auto-cube/spin-up-cube.gif" width="70%" />

__NOTE:__
- This is a simplified use case where there is only 1 single atoti table (created from the uploaded CSV)
- The CSV should be of encoding UTF8
- For best experience, choose a dataset with a fair number of numeric and non-numeric columns, e.g. [Data Science Job Salaries dataset](https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries) from Kaggle:  
    - non-numerical columns are translated into hierarchies
    - a SUM and a MEAN measure will be automatically created for numerical columns (non-key columns)
- When selecting keys for the atoti table, choose the columns that will ensure data uniqueness.
    - When unsure, skip key selection.
    - Non-unique keys will result in a smaller dataset getting loaded. Only the last occurrence of the duplicates will be kept.
    

To understand more about multidimensional datacubes, check out the [atoti tutorial](https://docs.atoti.io/latest/getting_started/tutorial/tutorial.html).  

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=auto-cube" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover.png" alt="Try atoti"></a></div>

In [1]:
import functools
import io
import typing
import webbrowser

import atoti as tt
import ipywidgets as widgets
import numpy as np
import pandas as pd
from IPython.display import SVG, Markdown

Welcome to atoti 0.7.0!

By using this community edition, you agree with the license available at https://docs.atoti.io/latest/eula.html.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.

atoti collects telemetry data, which is used to help understand how to improve the product.
If you don't wish to send usage data, set the ATOTI_DISABLE_TELEMETRY environment variable to True.

You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.


Since atoti is a Python library, we can use it along with other libraries such as ipywidget and Pandas.  
We used FloatProgress from ipywidget to track the loading progress of web application.

In [2]:
out = widgets.Output()
fp = widgets.FloatProgress(min=0, max=6)

We create some global variables in order to access the atoti cube for exploration in the notebook.

In [3]:
session: tt.Session
cube: tt.Cube
table: tt.Table

# just managing some common data types in this use case
data_types = [
    ty
    for ty in (["Default"] + list(typing.get_args(tt.type.DataType)))
    if ty
    not in [
        "boolean",
        "Object",
        "Object[]",
        "ZonedDateTime",
        "LocalDateTime",
        "LocalTime",
    ]
]

## Steps to creating BI analytics platform with atoti

In the following function, the key steps to create an atoti web application are defined:
- Instantiate atoti session (web application is created upon instantiation)
- Create atoti table by loading the Pandas DataFrame (atoti also accepts other datasources such as CSV, Parquet, SQL, Spark DataFrame etc.)
- Create cube with the atoti table
- Create [single-value measures](https://docs.atoti.io/latest/lib/atoti/atoti.agg.single_value.html#atoti.agg.single_value) for numerical columns 

<img src="https://data.atoti.io/notebooks/auto-cube/img/steps_to_bi_platform.gif" width="70%" />

__It is possible to create and join multiple atoti table.__ However, in our use case, we are only creating one atoti table using the __Pandas connector__.  
We could have used the CSV connector instead to create the atoti table but Pandas allow us to manipulate the data (e.g. select the key columns and set data type) through interaction with ipywidget.

__We can also create multiple cubes within a session and access them from the web application.__ To keep things simpler, we stick with a single cube in this notebook.  

Finally, we make use of the [webbrowser](https://docs.python.org/3/library/webbrowser.html) api to launch the web application in a new browser tab.

In [4]:
def create_cube(df, keys=None, single_value_cols=None, port=19090):
    global session, cube, table

    print(f"-- Creating session on port {port}")
    fp.value = 2
    session = tt.Session(port=port, user_content_storage="./content")

    print("--- Loading data into table")
    fp.value = 3
    table = session.read_pandas(df, table_name="table", keys=keys)

    print("---- Creating cube")
    fp.value = 4
    cube = session.create_cube(table)

    fp.value = 5
    if single_value_cols:
        print(
            f"---- Create single value measures for non-keys numerical columns: {single_value_cols}"
        )
        for col in single_value_cols:
            cube.measures[f"{col}.VALUE"] = tt.agg.single_value(table[col])

    fp.value = 6
    print(f"----- Launching web application: {session._local_url}")
    webbrowser.open(session._local_url)

    print("======================================================")
    print(f"Number of records loaded: {len(table)}")
    print("Table schema: ")
    display(cube.schema)

    print()
    display(Markdown("### Access web application"))
    display(
        Markdown(
            "__Click on this URL if web application is not automatically launched:__"
        ),
        session.link(),
    )
    print()
    print("======================================================")

## Data processing prior to BI platform creation

Using iPyWidget, users are able to:
- interactively select CSV for upload
- choose keys for table column and set specific data type for columns where necessary
- monitor progress of creation with the use of `FloatProgress`
- re-create new cube

We trigger the creation of the cube upon selection of a CSV.  
__Note that we recreate the session whenever a new CSV is selected.__ So the previous dataset will no longer be accessible.

In [5]:
def disable_widget(w):
    w.disabled = True


@out.capture()
def on_key_change(b, _df, _keys, _datatypes):
    b.disabled = True
    [disable_widget(ck) for ck in (_keys + _datatypes)]

    keys = []
    datatypes = {}
    numerical_cols = []

    for i in range(0, len(_keys)):

        # unless datatype is specified, datatype is inferred by Pandas
        # atoti inherits datatype from pandas dataframe
        if _datatypes[i].value != "Default":
            try:
                if _datatypes[i].value in ["int[]", "long[]"]:
                    _df[_keys[i].description] = (
                        _df[_keys[i].description]
                        .apply(eval)
                        .apply(lambda x: np.array(x).astype(int))
                    )
                elif _datatypes[i].value in ["double[]", "float[]"]:
                    _df[_keys[i].description] = (
                        _df[_keys[i].description]
                        .apply(eval)
                        .apply(lambda x: np.array(x).astype(float))
                    )
                elif _datatypes[i].value in ["String"]:
                    _df[_keys[i].description] = _df[_keys[i].description].astype(str)
                elif _datatypes[i].value in ["LocalDate"]:
                    _df[_keys[i].description] = pd.to_datetime(
                        _df[_keys[i].description]
                    )
                elif _datatypes[i].value in ["double", "float"]:
                    _df[_keys[i].description] = _df[_keys[i].description].astype(
                        _datatypes[i].value
                    )
                elif _datatypes[i].value in ["int", "long"]:
                    _df[_keys[i].description] = _df[_keys[i].description].astype(int)

                if _datatypes[i].value not in ["LocalDate", "String"]:
                    numerical_cols = numerical_cols + [_keys[i].description]

            except:
                print(
                    f"Error encountered casting {_keys[i].description} to {_datatypes[i].value}. Value remain in default type."
                )

        if _keys[i].value == True:
            keys = keys + [_keys[i].description]

    # we gather the numerical columns in order to create single_value measures
    numerical_cols = (
        numerical_cols + _df.select_dtypes(include="number").columns.to_list()
    )
    # exclude the selected table keys as we will not create measures for them
    if len(keys) > 0:
        numerical_cols = [col for col in numerical_cols if col not in keys]
        print(f"numerical_cols: {numerical_cols}")

    create_cube(_df, keys, numerical_cols)
    displayFileLoader()

## Set the stage with ipywidget

Using ipywidget, we can interact with the uploaded data to:
1. choose keys for the atoti table that we are creating
2. choose datatype for column (to override the default type inferred by Pandas)

In [6]:
@out.capture()
def on_upload_change(change):
    out.clear_output()
    display(fp)
    print("Starting cube creation for ", change["new"][0].name)

    fp.value = 0
    print("- Reading file")
    input_file = list(change["new"])[0]
    content = input_file["content"]
    df = pd.read_csv(io.BytesIO(content))

    fp.value = 1
    columns = df.columns.tolist()

    # checkboxes for list of columns for users to select table keys
    checkboxes = [widgets.Checkbox(value=False, description=label) for label in columns]

    # dropdown list for data type options for each column
    dropdowns = [
        widgets.Dropdown(options=data_types, value=data_types[0]) for label in columns
    ]

    button = widgets.Button(
        description="Submit",
        disabled=False,
        button_style="",
        tooltip="Submit selected keys",
        icon="check",  # (FontAwesome names without the `fa-` prefix)
    )

    instructions = widgets.HTML(
        value="""<b><ol>
                    <li>Select checkbox to select column as keys.</li>
                    <li>Select data type from drop-down list for specific column. Common types are inferred when creating Pandas DataFrame.</li>
                </ol></b>"""
    )

    left_box = widgets.VBox(children=checkboxes)
    right_box = widgets.VBox(children=dropdowns)

    display(widgets.VBox([instructions, widgets.HBox([left_box, right_box]), button]))

    button.on_click(
        functools.partial(on_key_change, _df=df, _keys=checkboxes, _datatypes=dropdowns)
    )

In [7]:
def displayFileLoader():
    uploader = widgets.FileUpload(
        accept=".csv",
        multiple=False,
    )

    uploader.observe(on_upload_change, "value")
    with out:
        display(uploader)

Feel free to re-select a new CSV file to test out different datasets.

In [8]:
displayFileLoader()
out

Output()

## Technology behind atoti   

<img src="https://data.atoti.io/notebooks/auto-cube/img/atoti-tech-stack.png" width="50%"/>  

### In-memory multidimensional data cube

Behind the scene, we create an in-memory multidimensional data cube following the [snowflake schema](https://en.wikipedia.org/wiki/Snowflake_schema). 
Once the cube is formed, user is able to perform multidimensional data analytics from different perspectives:
- slice and dice
- drill-down and roll-up
- drill-through for investigation

In [9]:
cube

### JupyterLab for prototyping and Web application for end-user

atoti makes it easy to explore your dataset and construct your data model in __JupyterLab__ during prototyping stage:
- easily add new data source to the cube
- create new measures
- visualize data within notebook

In [10]:
session.visualize()

### Working with cube

In [11]:
h, l, m = cube.hierarchies, cube.levels, cube.measures

#### Creating measures

In [12]:
m["scaled_pnl_vector"] = m["quantity.SUM"] * m["pnl_vector.VALUE"]

In [13]:
m["Position vector"] = tt.agg.sum(
    m["scaled_pnl_vector"], scope=tt.OriginScope(l["instrument_code"], l["book_id"])
)

In [14]:
session.visualize()

### Running simulations

In [None]:
# m["VaR"] = tt.array.quantile(m["Position vector"], 0.95)

In [15]:
confidence_simulation = cube.create_parameter_simulation(
    "confidence_simulation",
    measures={"Confidence level": 0.95},
    base_scenario_name="95%",
)

In [16]:
cube.query(m["Confidence level"])

Unnamed: 0,Confidence level
0,0.95


In [17]:
m["VaR"] = tt.array.quantile(m["Position vector"], m["Confidence level"])

In [18]:
session.visualize()

In [19]:
confidence_simulation += ("90%", 0.90)
confidence_simulation += ("98%", 0.98)

In [20]:
session.visualize()

## Find out more about atoti

<table style="width: 100%">
<thead>
  <tr>
    <td rowspan="5" style="width: 420px"><img src="https://data.atoti.io/notebooks/auto-cube/img/qr-code.png" width="400px"/></td>
    <td><div style="display: inline-block; vertical-align: bottom;"><img src="https://data.atoti.io/notebooks/banners/logo.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">https://www.atoti.io/</div></td>
  </tr>
  <tr>
    <td><div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/linkedin.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">https://www.linkedin.com/company/atoti</div></td>
  </tr>
  <tr>
    <td><div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/twitter.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">https://twitter.com/atoti_io</div></td>
  </tr>
  <tr>
    <td><div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/youtube.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">https://www.youtube.com/c/atoti</div></td>
  </tr>
  <tr>
    <td><div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/medium.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">https://medium.com/atoti</div></td>
  </tr>
</thead>
</table>

## More examples
<div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/github.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">Notebook gallery https://github.com/atoti/notebooks </div>


## Reach out to us
<div style="display: inline-block;"><img src="https://data.atoti.io/notebooks/covid-analytics/img/github.png" width = "50px"/></div><div style="font-size:30px;display: inline-block;padding-left: 15px; vertical-align: top;">GitHub Discussion https://github.com/atoti/atoti/discussions  </div>



<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=auto-cube" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover-try.png" alt="Try atoti"></a></div>