# Ingest Module Use Cases

The purpose of this notebook is just to a few examples on how to use the ingest module I have created to interact with S3 bucket. 

For the moment, the S3 bucket is located in my personal account. However, it is easy to set up in any other AWS account. In the near future I'll research the best approach to interact with SageMaker and observe if it is better for us to move the S3 bucket to the CAPSTONE CLASS AWS account.

### S3 Bucket Object List

In [3]:
from src.d01_data.ingest import ProjectIngest

ingest = ProjectIngest('01_raw', '20210225-ems-raw-v04.xlsx')

ingest.remote_object_list()

['01_raw/20210203-ems-raw-v01.xlsx',
 '01_raw/20210204-ems-raw-v02.xlsx',
 '01_raw/20210213-admin-01-raw-test.txt',
 '01_raw/20210214-ems-raw-v03.xlsx',
 '01_raw/20210214_v3_patients.parquet',
 '01_raw/20210215-admin-01-raw-test-upload.txt',
 '01_raw/20210225-ems-raw-v04.xlsx']

In [4]:
from src.d01_data.ingest import ProjectIngest

ProjectIngest('01_raw', '20210225-ems-raw-v04.xlsx').remote_object_list()

['01_raw/20210203-ems-raw-v01.xlsx',
 '01_raw/20210204-ems-raw-v02.xlsx',
 '01_raw/20210213-admin-01-raw-test.txt',
 '01_raw/20210214-ems-raw-v03.xlsx',
 '01_raw/20210214_v3_patients.parquet',
 '01_raw/20210215-admin-01-raw-test-upload.txt',
 '01_raw/20210225-ems-raw-v04.xlsx']

In the previous two block sections we can observe two forms to call and use the remote_object_list() method of the ingest class.

Key factor to remember is that the class needs two very important inputs to work:
* The name of the folder (e.g., 01_raw)
* The name of the file ('20210213-ems-raw.xlsx

The other observation is that the remote_object_list() only uses the folder name to generate a list of documents that are inside the S3 logical folder.

### Remote Upload

In [5]:
from src.d01_data.ingest import ProjectIngest

ingest = ProjectIngest('01_raw','20210225-ems-raw-v04.xlsx')

ingest.remote_upload()    

The 20210225-ems-raw-v04.xlsx file already exist in the S3 bucket under the key 01_raw/20210225-ems-raw-v04.xlsx


By looking at the response message from the previous code, we can observe that the S3 bucket already have the file I'm trying to upload. In this particular case the file exist and as such the ingest module will not send the file to the S3 bucket.

### Does file exist on S3?

In [6]:
from src.d01_data.ingest import ProjectIngest

ProjectIngest('01_raw','20210225-ems-raw-v04.xlsx').s3_key_exist()

True

In the previous block we can observe that the ingest module allows you to validate the existance of a file within a logical folder. If the file exists the repsonse will be True and False if it doesn't exist.

### Local Download

In [5]:
from src.d01_data.ingest import ProjectIngest

ingest = ProjectIngest('01_raw', '20210225-ems-raw-v04.xlsx')

ingest.local_download_gen()

By looking at the response message from the previous code block, we can observe that the file we're trying to download already exist in our local folders. For this reason, the ingest module doesn't re-download the file that already exist in our folders.

### Error Handling

In [7]:
from src.d01_data.ingest import ProjectIngest

ingest = ProjectIngest('01_raw', '20210225-ems-raw-v04.txt')

ingest.local_download()

TypeError: Instance Creation: Please validate that '20210225-ems-raw-v04.txt' have a valid file extension

At some point I'm expecting errors to occur and as such I try to capture then as much as possible. In this particular case the previous code block is telling us that an error occur.

In particular, if you look at the last line of the error information. You will notice that the problem is that you are trying to make an instance of the ProjectIngest class with a non supported data file.

For the moment the class is only handling *.csv and *.xlsx

### Other Considerations
Please, keep in mind that class will not send you a message if the download and upload methods were successful. 

If you want to validate if the file was downloaded, please look inside the folder you dowloaded the file.

If you want to validate if the file was uploaded, please use the s3_key_exist() method. If it returns as True, then the file was successfuly uploaded to the S3 bucket.