# BDI-Viz Demo On GDC

## Introduction
Welcome to the BDI-Viz demonstration. This demo is designed to generate potential ground truths based on the GDC schema. Should you encounter any bugs or identify areas for improvement, please contact **Eden Wu** or **Vitoria Guardieiro**. Alternatively, you can open an issue in the [BDI-Kit](https://github.com/VIDA-NYU/bdi-kit) repository. We appreciate your feedback and collaboration!

In [1]:
import os, sys

parent_dir = os.path.abspath("..")
# the parent_dir could already be there if the kernel was not restarted,
# and we run this cell again
if parent_dir not in sys.path:
    sys.path.append(parent_dir)


import pandas as pd
import numpy as np
import json

import panel as pn
import bdikit as bdi
from bdikit.visualization.schema_matching import BDISchemaMatchingHeatMap
from bdikit import top_matches

pn.extension("mathjax")
pn.extension("vega")

## Load Dataset

In [4]:
# Here we load dou.csv for example, please use whatever dataset you like :)
source = pd.read_csv("./datasets/dou.csv")
target = "gdc"
# None GDC
# target = pd.read_csv("./datasets/cao.csv")

## Run BDI-Viz

We utilize a pretrained model to identify the Top-20 candidate columns from the GDC dataset. While the computation may take some time initially, the results will be cached to enable rapid visualizations thereafter.

Once BDI-Viz has finished loading, feel free to explore and either accept or reject any candidates you deem appropriate. After completing your review, please proceed to run the next cell to update the manager with the revised scope.

In [5]:
heatmap_manager = BDISchemaMatchingHeatMap(source, target=target, top_k=20)
heatmap_manager.get_heatmap()
heatmap_manager.plot_heatmap()

## Update Column Mapping Scope

In [6]:
from bdikit.mapping_algorithms.column_mapping.algorithms import TwoPhaseSchemaMatcher
from bdikit import GDC_DATA_PATH

two_phase_viz = TwoPhaseSchemaMatcher(top_k_matcher=heatmap_manager)
bdi.match_schema(source, target=target, method=two_phase_viz)

Unnamed: 0,source,target
0,Country,necrosis_present
1,Histologic_Grade_FIGO,histologic_progression_type
2,Histologic_type,dysplasia_type
3,Path_Stage_Primary_Tumor-pT,uicc_clinical_m
4,Path_Stage_Reg_Lymph_Nodes-pN,figo_stage
5,Clin_Stage_Dist_Mets-cM,inrg_stage
6,Path_Stage_Dist_Mets-pM,last_known_disease_status
7,tumor_Stage-Pathological,tumor_grade_category
8,FIGO_stage,figo_stage
9,BMI,hpv_positive_type
