# TensorFlow CPU training

Create a pod file for your cluster. A pod file will provide the instructions about what the cluster should run. This pod file will download Keras and run a Keras example. This example uses the TensorFlow framework. Open vi or vim and copy and paste the following content. Save this file as tf.yaml. You can use this with either TensorFlow or TensorFlow 2. To use it with TensorFlow 2, change the Docker image to a TensorFlow 2 image.



## Clone Deep Learning Containers Repo

In [None]:
#!git clone https://github.com/aws/deep-learning-containers.git

# Setup Environment Variables

In [None]:
import boto3

aws_region_as_slist=!curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/\(.*\)[a-z]/\1/'
region = aws_region_as_slist.s
print('Region: {}'.format(region))

account_id=boto3.client('sts').get_caller_identity().get('Account')
print('Account ID: {}'.format(account_id))

bucket='sagemaker-{}-{}'.format(region, account_id)
print('S3 Bucket: {}'.format(bucket))

role='arn:aws:iam::{}:role/TeamRole'.format(account_id)
print('SageMaker Role ARN: {}'.format(role))

docker_repo='dlc-demo'
print('Docker Repo Name: {}'.format(docker_repo))

# Login to ECR

In [None]:
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)

## Create Custom ECR Repo `dlc-demo`

In [None]:
!aws ecr describe-repositories --repository-names $docker_repo || aws ecr create-repository --repository-name $docker_repo

# Pull the Deep Learning Container for Tensorflow 2.1 Training

Available Deep Learning Container Images:  
https://github.com/aws/deep-learning-containers/blob/master/available_images.md


In [None]:
dlc_repo_account_id='763104351884'
print(dlc_repo_account_id)

In [None]:
train_image='763104351884.dkr.ecr.{}.amazonaws.com/tensorflow-training:2.1.0-cpu-py36-ubuntu18.04'.format(region)
print(train_image)

In [None]:
dlc_repo='763104351884.dkr.ecr.{}.amazonaws.com'.format(region)
print(dlc_repo)

## Login to official DLC Repo

In [None]:
!$(aws ecr get-login --region $region --registry-ids $dlc_repo_account_id --no-include-email)

## Pull DLC 

In [None]:
!docker system prune -a -f

In [None]:
!docker images

In [None]:
!docker pull $train_image

In [None]:
!docker images

# Extend DLC to your needs

## Build Container

In [None]:
!pygmentize ./Dockerfile

In [None]:
docker_repo = 'dlc-demo'
docker_tag = 'bert'

bert_image_uri = f'{account_id}.dkr.ecr.{region}.amazonaws.com/{docker_repo}:{docker_tag}'
print(bert_image_uri)

In [None]:
!docker images

In [None]:
!docker build --pull --no-cache -t $docker_repo:$docker_tag -f ./Dockerfile .

In [None]:
!docker inspect $docker_repo:$docker_tag

In [None]:
!docker images

# Push Container To ECR

In [None]:
!docker tag $docker_repo:$docker_tag $bert_image_uri

In [None]:
!docker images

In [None]:
!docker push $bert_image_uri

In [None]:
!aws ecr list-images --repository-name $docker_repo

# Define Training Job

# Create Training Job

In [None]:
!pygmentize bert.yaml

In [None]:
!kubectl delete -f bert.yaml

In [None]:
!kubectl create -f bert.yaml

In [None]:
!kubectl get pod bert

In [None]:
!kubectl describe pod bert

In [None]:
!kubectl logs -f bert

In [None]:
!kubectl get pods 

In [None]:
!pygmentize bert.yaml

In [None]:
!kubectl delete -f bert.yaml

In [None]:
!kubectl get pods bert

In [None]:
!kubectl create -f bert.yaml

In [None]:
!kubectl get pods bert

In [None]:
!kubectl describe pods bert

In [None]:
!kubectl logs -f bert