# MinIO Connection Demo
Example script that reads an object from minio via the minio API.

## Setup

Create your conda environment including the `msk_cdm` package (See [README](https://github.com/clinical-data-mining/msk_cdm/blob/main/README.md)) 

This examples requires you to have a .env file in your home directory with SECRET_KEY, ACCESS_KEY, and certificate
for minio. 

[Detailed instructions to create the SECRET_KEY, ACCESS_KEY, and certificate](https://clinical-data-mining.github.io/msk_cdm/reference/user-guide/minio/)

[API Reference for using msk_cdm.minio](https://clinical-data-mining.github.io/msk_cdm/reference/minio/)


### Creating an environment file 

Store connection details in a file and instantiate through the file instead of individual connection details. A user can create a file with this template:

`<pathname>/env_minio.txt`:
```
ACCESS_KEY=<ACCESS_KEY>
SECRET_KEY=<SECRET_KEY>
CA_CERTS=<PATH_TO>/certificate.crt
URL_PORT=pllimsksparky3:9000
BUCKET=cdm-data
```


### Instantiate in Python
```
from msk_cdm.minio import MinioAPI

obj_minio = MinioAPI(fname_minio_env=fname_minio)
```

---

In [2]:
import pandas as pd
from msk_cdm.minio import MinioAPI


### Define configuration and dataset to be loaded

In [41]:
fname_minio = '<PATH>>/minio_env.txt' 
fname_dataset = 'cbioportal/data_clinical_patient.txt'
bucket_name = 'cdm-data'

In [None]:
### Instantiate
obj_minio = MinioAPI(fname_minio_env=fname_minio);

### Load data from MinIO

In [43]:
obj = obj_minio.load_obj(
    path_object=fname_dataset, 
    bucket_name=bucket_name
)
df = pd.read_csv(obj, sep='\t', low_memory=False, header=4)

In [47]:
df.head();

### Subset the data

In [46]:
df_pdl1 = df[df['HISTORY_OF_PDL1'] == 'Yes'].copy()
df_pdl1.head();

### Save the data

In [36]:
fname_pdl1 = '/Users/test_user/data_clinical_patient_pdl1.tsv'
obj_minio.save_obj(
    df=df_pdl1,
    path_object=fname_pdl1, 
    sep='\t'
)


### Print out the objects with a name like "/Users/test_user/"

In [38]:

obj_minio.print_list_objects(prefix='/Users/test_user/', recursive=False, bucket_name='cdm-data')

['Users/test_user/data_clinical_patient_pdl1.tsv']

### Remove the Newly Created File

In [39]:
obj_minio.remove_obj(path_object=fname_pdl1, bucket_name='cdm-data')

Object removed. Bucket: cdm-data, Object: /Users/test_user/data_clinical_patient_pdl1.tsv


### Print out the objects again to check if file still exists

In [40]:
obj_minio.print_list_objects(prefix='/Users/test_user/', recursive=False, bucket_name='cdm-data')


[]