# Query from IDB Demo Notebook


This notebook demonstrates how to query data from IDB into a Pandas dataframe using IBM db2 queries

Dependencies:
`msk_cdm`

## Important Files to Load
* File path to a `.env` file containing your MSK username and PW. In this example, the `.env` contains `USER` and `PW` as the variables.
* The path to `config_ddp_query.txt` file containing `DATABASE`, `HOST`, `PORT`, and `PROTOCOL` variables. All MSK database config info can be found [here](https://github.mskcc.org/datadojo/dbconfig/blob/master/catalog.yml)
* The path to the SQL query to execute

## Note on Configurations
A config file can be created to include connection details for simpler use. The file must include the following, as an example:

```
DATABASE=BL
HOST=idb
PORT=50000
PROTOCOL=TCPIP
```

In [1]:
import os
import sys
from msk_cdm.data_classes.legacy import CDMProcessingVariables as var
from msk_cdm.data_processing import read_db2_api_config
from msk_cdm.db2 import db2connection

In [None]:
db2_config = 'idb_queries/config_ddp_query.txt'
sql_file = 'demographics/demographics.sql'


## Query Data


### Load Your DB2 Configuration File

In [None]:
config = read_db2_api_config(fname_env=db2_config)
database = config['DATABASE']
host = config['HOST']
port = config['PORT']
protocol = config['PROTOCOL']

### Grab the Data!

In [None]:
obj_db2 = db2connection(
    database=database,
    host=host,
    port=port,
    protocol=protocol,
    uid='<YOU_MSK_ID>',
    pwd='<<PASSWORD>>'
)

In [None]:
df = obj_db2.query_ddp(fname_sql=sql_file)

### What's in the frame?

In [None]:
# Query data
print(df.shape)
df.head()