# Datasets Demo Notebook

This notebook demonstrates how to access data tables using the `msk_cdm` python package

### Data Access
Datasets are queried from the MSK Institutional Database. A user is required to create a `MinIO` configuration profile in order to connect to the database.

The `connect_to_db` function looks for your configuration file and connects to the server. Before using this notebook, be sure to establish a Minio connection describe in this [page](https://clinical-data-mining.github.io/msk_cdm/reference/user-guide/minio/).

## Using MSK-IMPACT Clinical Datasets

Query clinical summary and timeline data shown in cBioPortal using these predefined loading functions. 
  

In [None]:
from msk_cdm.datasets import connect_to_db
from msk_cdm.datasets.impact import load_data_clinical_patient


In [None]:
# Connect to the database
auth_file = 'path/to/config.txt'
connect_to_db(auth_file=auth_file)


In [None]:
# Load the dataset
df_clinical_patient = load_data_clinical_patient()

# Access the data
df_clin_p = df_clinical_patient['data']


In [None]:
# Display the first few rows of the data
print(df_clin_p.head())


## Use the DatasetLoader to Query Data from Object Storage
Datasets are stored in our Minio object storage. These datasets can also be access with the associated pathname.  

In [None]:
from msk_cdm.datasets import DatasetLoader


In [None]:
# Instantiate the DatasetLoader object and connect using the authorization file
loader = DatasetLoader()
loader.connect_to_db(auth_file=auth_file)



In [None]:
# Define the object storage path of the dataset
path_to_object = 'path/to/object/clinical_data.tsv'


In [None]:
df_demo1 = loader.load_from_object_path(path_object=path_to_object)

In [None]:
df_demo1.head()