##### Imports

In [1]:
import pandas as pd
import sagemaker

#### Load data set

In [2]:
df = pd.read_csv('./data/diabetic_readmission.csv')
df.head()

Unnamed: 0,readmitted,race,gender,age,time_in_hospital,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,number_diagnoses,max_glu_serum,a1c_result,change,diabetes_med
0,no,caucasian,female,5,1,41,0,1,0,0,0,1,none,none,0,0
1,>30,caucasian,female,15,3,59,0,18,0,0,0,9,none,none,1,1
2,no,african_american,female,25,2,11,5,13,2,0,1,6,none,none,0,1
3,no,caucasian,male,35,2,44,1,16,0,0,0,7,none,none,1,1
4,no,caucasian,male,45,1,51,0,8,0,0,0,5,none,none,1,1


In [3]:
df.shape

(69570, 16)

| **Column name**       | **Description**     | 
| :------------- | :---------- | 
|`Readmitted`|Days to inpatient readmission. Values: "0" if the patient was readmitted in less than 30 days, ">30" if the patient was readmitted in more than 30 days, and "No" for no record of readmission|
|`Race Values`| Caucasian, Asian, African American or Hispanic|
|`Gender Values`| Male, Female, and Unknown/Invalid|
|`Age Grouped in 10-year intervals`|[0-10), [10-20), ..., [90-100)|
|`Time in hospital`|Integer number of days between admission and discharge|
|`Number of lab procedures`|Number of lab tests performed during the encounter|
|`Number of procedures`|Numeric Number of procedures (other than lab tests) performed during the encounter|
|`Number of medications`|Number of distinct generic names administered during the encounter|
|`Number of outpatient visits`|Number of outpatient visits of the patient in the year preceding the encounter|
|`Number of emergency visits`|Number of emergency visits of the patient in the year preceding the encounter|
|`Number of inpatient visits`|Number of inpatient visits of the patient in the year preceding the encounter|
|`Number of diagnoses`|Number of diagnoses entered to the system|
|`Glucose serum test result`|Indicates the range of the result or if the test was not taken. Values: ">200", ">300",  "normal" and "none" if not measured|
|`A1c test result`|Indicates the range of the result or if the test was not taken. Values: ">8" if the result was greater than 8%, ">7" if the result was greater than 7% but less than 8%, "normal" if the result was less than 7%, and "none" if not measured.|
|`Change of medications`|Indicates if there was a change in diabetic medications (either dosage or generic name). Values: "change" and "no change"|
|`Diabetes medications`|Indicates if there was any diabetic medication prescribed. Values: "yes" and "no" for 24 different kind of medical drugs.|

##### Copy dataset from local to S3

In [4]:
session = sagemaker.Session()
default_bucket = session.default_bucket()
print(f'Default S3 bucket = {default_bucket}')

Default S3 bucket = sagemaker-us-east-1-119174016168


In [5]:
!aws s3 cp ./data/diabetic_readmission.csv s3://{default_bucket}/datasets/diabetic_readmission.csv

upload: data/diabetic_readmission.csv to s3://sagemaker-us-east-1-119174016168/datasets/diabetic_readmission.csv


In [6]:
!aws s3 cp ./data/diabetic_transformed.csv s3://{default_bucket}/datasets/diabetic_transformed.csv

upload: data/diabetic_transformed.csv to s3://sagemaker-us-east-1-119174016168/datasets/diabetic_transformed.csv
