# **Analytics**

- In this notebook we will generate the table that will serve as a base for the PowerBI App
- for this particular use case we will only need to execute it once, then the PBI app will have the data embedded within it
- in a real world scenario : 
    - There will be an analytics pipeline that runs periodically to take new data into account
    - The PBI app will then refresh with the new data
- Here the goal is to generate a "raw table" with enough informations to facilitate the creation of our dashboard

In [7]:
import sys
from pathlib import Path
import logging 

LOGGER = logging.getLogger(__name__)
sys.path.append(str(Path("../src").resolve()))

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

In [8]:
import polars as pl 
import pandas as pd 

instances = pl.read_parquet((Path().cwd().parent / "data/02_intermediate/instances.parquet").as_posix()).to_pandas()
labels = pl.read_parquet((Path().cwd().parent / "data/02_intermediate/labels.parquet").as_posix()).to_pandas()
merged_instances_labels = pd.merge(
    left = instances,
    right=labels,
    on = "policy_number",
    how = "inner"
)
merged_instances_labels.sample(n=3)

target_directory = (Path().cwd().parent / "data/05_reporting/")
target_directory.mkdir(parents=True, exist_ok=True)
merged_instances_labels.to_parquet((target_directory / "data_for_pbi_app.parquet").as_posix())