### Relational Data Demo

This is a demo notebook which shows how to make use of HMA1 model to generate synthetic data on multiple table which have a relationship.

#### Loading Data

We load up three different csv files which contain the table data. Then we create a dictionary representation of all the tables which would be used for creating/fitting the model.

We have three tables here, namely - `application`, `bureau` and `previous_application`. The `application` table is parent to `bureau` and `previous_application` which are related to the main application table using a foreign key `SK_ID_CURR`.

For more information about the data you can visit - https://www.kaggle.com/competitions/home-credit-default-risk/data 


In [1]:
import pandas as pd

application = pd.read_csv("application_train.csv")
bureau = pd.read_csv("bureau.csv")
previous_application = pd.read_csv("previous_application.csv")

tables = {
    'application':application,
    'bureau': bureau,
    'previous_application': previous_application
}

#### Metadata

Here we create an instance of metadata and add our three tables to it. The metadata contains important information about table relationships (primary and foreign keys) which would be used to generate synthetic data.

In [None]:
from bulian.metadata.dataset import Metadata

metadata = Metadata()
metadata.add_table(name="application", data=application, primary_key='SK_ID_CURR')
metadata.add_table(name="previous_application", data=previous_application, primary_key='SK_ID_PREV', foreign_key='SK_ID_CURR', parent='application')
metadata.add_table(name="bureau", data=bureau, primary_key='SK_ID_BUREAU', foreign_key='SK_ID_CURR', parent='application')
metadata

Metadata
  root_path: .
  tables: ['application', 'previous_application', 'bureau']
  relationships:
    previous_application.SK_ID_CURR -> application.SK_ID_CURR
    bureau.SK_ID_CURR -> application.SK_ID_CURR

#### HMA1 Model

The bulian.relational.HMA1 class implements a Hierarchical Modeling Algorithm which is an algorithm that allows to recursively walk through a relational dataset and applies tabular models across all the tables in a way that lets the models learn how all the fields from all the tables are related.

In [None]:
from bulian.relational import HMA1

model = HMA1(metadata)
model.fit(tables)

Now we can sample rows from the above fitted model using the `sample` function.

In [None]:
new_data = model.sample(num_rows=100,)

#### Generate Report

After we have sampled new synthetic data, a data quality report can be generated using the **get_multi_table_report** api. The report can be viewed as a dashboard or a Jupyter output.

In [None]:
from bulian.metrics.reports import get_multi_table_report
get_multi_table_report(tables, new_data, metadata, show_dashboard=True)