# 03. Amazon SageMaker Algorithms

* 딥리워드 / 딥스케일 - AWS SageMaker 기초 [1]
* 발표자 : 김무성

----------------------------

# 차례
* Exercise 3.1: Using the k-means Algorithm
    - 1. Create an Amazon SageMaker notebook instance
    - 2. Download Pandas-Demo.ipynb to the notebook instance
    - 3. Run Pandas-Demo.ipynb on the notebook instance
* Exercise 3.2: Using the XGBoost Algorithm
    - 1. Download notebook code to the notebook instance
    - 2. Run the notebook code on the notebook instance
    - 3. End running Jupyter processes, and stop the notebook instance
    - 4. Delete the endpoint con guration and model

--------------------------------------

<img src="figures/sagemaker_overview.png" width=600 />

# Exercise 3.1: Using the k-means Algorithm

##### 참고자료
* [1] edX course : Amazon SageMaker: Simplifying Machine Learning Application Development 
    - https://www.edx.org/course/simplifying-machine-learning-app-development-with-amazon-sagemaker
    - Week 3. Amazon SageMaker Algorithms

### 0. edXSageMakerUser IAM user 로 로그인

###### 01 chapter 발표자료 참조
01_intro / 01_intro_aws_sagemaker.ipynb
- https://nbviewer.jupyter.org/github/deepreward/DeepScale/blob/master/aws_sagemaker_basic/01_intro/01_intro_aws_sagemaker.ipynb


-------------

* Sign in as the edXSageMakerUser IAM user (IAM user name: SageMakerOnAWS)
* https://your-sign-in-URL

### 1. Create an Amazon SageMaker notebook instance

* Sign in to the AWS Management Console as the edXSageMakerUser IAM user.
* In the console, click Services > Amazon SageMaker to open the Amazon SageMaker dashboard. 
* Make sure you are in the Oregon Region.
* In the left navigation pane, click Notebook instances > Create notebook instance.
* For Notebook instance name, enter edXSageMaker in the text box.
* Confirm that the IAM role is populated with the role you created in the previous exercise.
* Click Create notebook instance.

<img src="figures/cap01.png" width=600 />

### 2. Start the Amazon SageMaker example

In this section, you will launch the sample kmeans_mnist.ipynb notebook into your notebook instance.

The Amazon SageMaker examples are maintained in a Git repository at https://github.com/awslabs/amazon-sagemaker-examples.

* From the Jupyter notebook home, click SageMaker Examples.

<img src="figures/cap02.png" width=600 />

* Locate the kmeans_mnist example in SageMaker Python Sdk > kmeans_mnist.ipynb, and click Use. 
* Click Create copy to copy and launch the example.

<img src="figures/cap03.png" width=600 />
<img src="figures/cap04.png" width=600 />
<img src="figures/cap05.png" width=600 />
<img src="figures/cap06.png" width=600 />
<img src="figures/cap07.png" width=600 />
<img src="figures/cap08.png" width=600 />

#### 관련 코드 조각들

##### 모델 생성

```python
from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
bucket = Session().default_bucket()
```

```python
from sagemaker import KMeans

data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/kmeans_example/output'.format(bucket)

print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))

kmeans = KMeans(role=role,
                train_instance_count=2,
                train_instance_type='ml.c4.xlarge',
                output_path=output_location,
                k=10,
                data_location=data_location)
```

<img src="figures/cap09.png" width=600 />

##### 학습

```python
kmeans.fit(kmeans.record_set(train_set[0]))
```

<img src="figures/cap10.png" width=600 />

##### 디플로이(엔드포인트 생성) & 추론

```python
kmeans_predictor = kmeans.deploy(initial_instance_count=1,
                                 instance_type='ml.m4.xlarge')

result = kmeans_predictor.predict(train_set[0][30:31])
print(result)
```

<img src="figures/cap11.png" width=600 />

##### 엔드포인트 해제

```python
print(kmeans_predictor.endpoint)

import sagemaker
sagemaker.Session().delete_endpoint(kmeans_predictor.endpoint)
```

###### 참고 - 위 명령어대로 하면, 엔드포인트 구성은 삭제되지 않는다. 

<img src="figures/cap12.png" width=600 />

-------------------------------

# Exercise 3.2: Using the XGBoost Algorithm

In this exercise, you will complete an Amazon SageMaker example notebook for Customer Churn Prediction with XGBoost. XGBoost (Extreme Gradient Boosting) is a popular and e cient open source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining the estimates of a set of simpler, weaker models.

In this example notebook, you will work with a dataset of customers from a  ctional mobile operator. The dataset has a churn value, which indicates whether a customer left the service. At the end of the notebook, you will look at optimizing the threshold for predicted churn. A value is assigned to the four states—false positive, true positive, false negative, and true negative. You will use this formula to determine the cuto  value where costs are minimized.

### 1. Start the Amazon SageMaker example

In this section, you will launch the sample xgboost_customer_churn.ipynb notebook into your notebook instance.

* From the Jupyter notebook home, click SageMaker Examples.
* To open the xgboost_customer_churn example, click Introduction to Applying Machine Learning > xgboost_customer_churn.ipynb, and click Use.
* Click Create copy to copy and launch the example.
* In the  rst code cell of the notebook, locate the code that sets the bucket. Update the variable with your <b>REPLACE_WITH_YOUR_INITIALS-sagemaker</b> bucket created in the  rst exercise.


```shell
 bucket = 'REPLACE_WITH_YOUR_INITIALS-sagemaker'
```

<img src="figures/cap13.png" width=600 />
<img src="figures/cap16.png" width=600 />
<img src="figures/cap14.png" width=600 />
<img src="figures/cap15.png" width=600 />

#### 관련 코드 조각들

##### 모델 생성

```python
from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
bucket = Session().default_bucket()
```

```python
train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))])
train_data.to_csv('train.csv', header=False, index=False)
validation_data.to_csv('validation.csv', header=False, index=False)
```

```python
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')
```

```python
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'xgboost')
```

```python
sess = sagemaker.Session()

xgb = sagemaker.estimator.Estimator(container,
                                    role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    sagemaker_session=sess)
```

```python
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        num_round=100)
```

```python
xgb.fit({'train': s3_input_train, 'validation': s3_input_validation}) 
```

<img src="figures/cap17.png" width=600 />
<img src="figures/cap18.png" width=600 />

<img src="figures/cap19.png" width=600 />

##### 디플로이(엔드포인트 생성) & 추론

```python
xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='ml.m4.xlarge')
```

<img src="figures/cap20.png" width=600 />

```python
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None

def predict(data, rows=500):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])

    return np.fromstring(predictions[1:], sep=',')

predictions = predict(test_data.as_matrix()[:, 1:])
```

```python
pd.crosstab(index=test_data.iloc[:, 0], columns=np.round(predictions), rownames=['actual'], colnames=['predictions'])
```

<img src="figures/cap21.png" width=200 />

```python
plt.hist(predictions)
plt.show()
```

<img src="figures/cap22.png" width=400 />

##### 엔드포인트 해제

```python
sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)
```

---------------------------------

<font color="red"> 실습이 끝나면 요금폭탄을 맞지 않도록, 반드시 멈추고 지웁시다. </font>

### 3. End running Jupyter processes, and stop the notebook instance.

* From the Jupyter notebook home, click Running. 
* Click Shutdown next to the terminal and notebook.
* To stop the notebook instance, return to the AWS console, and click Services > Amazon SageMaker to open the Amazon SageMaker dashboard.
* In the left navigation pane, click Notebook instances, and then click Stop next to the edXSageMaker instance.

### 4. Delete the endpoint con guration and model.

#### kmeans

* Return to the the Amazon SageMaker dashboard.
* In the left navigation pane, click Endpoints, and ensure the endpoint created by the notebook has been removed. 
* In the left navigation pane, for Endpoint con gurations, click the the endpoint that starts with kmeans.
* For Actions, click Delete.
* Click Delete to confirm.
* In the left navigation pane, for Models, click the model that starts with kmeans.
* For Actions, click Delete.
* Click Delete to confirm.

#### xgboost

* Return to the the Amazon SageMaker dashboard.
* In the left navigation pane, click Endpoints, and ensure the endpoint created by the notebook has been removed. 
* In the left navigation pane, for Endpoint con gurations, click the endpoint that starts with xgboost.
* For Actions, click Delete.
* Click Delete to confirm.
* In the left navigation pane, for Models, click the model that starts with xgboost.
* For Actions, click Delete.
* Click Delete to confirm

----------------------------------

# 참고자료
* [1] edX course : Amazon SageMaker: Simplifying Machine Learning Application Development 
    - https://www.edx.org/course/simplifying-machine-learning-app-development-with-amazon-sagemaker
    - Week 3. Amazon SageMaker Algorithms
* [2] Amazon SageMaker의 기본 제공 알고리즘 사용 - https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/algos.html
* [3] The Amazon SageMaker examples Git repository at https://github.com/awslabs/amazon-sagemaker-examples