Welcome to the re:Invent 2021 Inferentia Workshop!
In this workshop, you will walk through three exercises to help you gain experience with AWS Inferentia. First, you will launch an AWS EC2 Inf1 instance. Next, you will compile and prepare a model for deployment. Lastly, you will deploy the model and analyze its performance on Inferentia. Each exercise is contained in a subdirectory, however, the exercises need to be completed in order since the latter exercises depend on artifacts produced by previous ones.
First we will provision an Inferentia instance, then we will build a deep learning container and use it to compile and run a model on the Inferentia chip, and finally we will use AWS Batch to automatically run a model on Inferentia and benchmark it with different batch sizes.
Before we get started, you will need to set up your AWS account and open a CloudShell terminal. You will use this terminal for the rest of the workshop. You will not be required to install anything on your computer, as all of the exercises will be completed on the cloud through your browser.
For this workshop, you may use your own AWS account, or use an account generated by AWS Event Engine. If you are using your own AWS account, please proceed to Section 0.2. If you would like to receive a temporary AWS Account through AWS Event Engine follow these steps:
- Go to the Event Engine link provided by your workshop host
- Follow the on-screen instructions to gain access to your temporary AWS account
Once you have completed these steps, proceed to Section 0.2
Once logged into the account through the AWS console, navigate to IAM Users and add a new user by clicking the Add users button and filling out the form as shown below. Use inferentia_user as the User Name.
Click Next: Permissions and click the Create group button on the screen.
Provide group name admins and select the AdministratorAccess policy as shown below.
Click the Create group button and you will be brought back to the Set permissions screen. Select the admins group as shown on the figure below, then click Next: Tags .
Follow the wizard through to the end to create the user (remaining options can be left as default). When the user is added successfully, you will see a confirmation screen from which you can copy the user's Access Key and Secret Access Key.
Click the Download .csv button to download the user's credentials as a .csv
file. Alternatively you can press the Show link and copy/paste the Access key ID and Secret access key locally. You will need to enter the crecentials later while you are completing the exercises in this workshop. This is the only time these credentials will be available for download or display. You will be able to generate new credentials if necessary.
In this step you will sign in to the AWS Console as the user you just created. Pull down the user menu from your current AWS Console screen and copy the Account number displayed next to My Account as shown on the figure below.
Once you have copied the account number, cick Sign Out, then click Sign In to the Console.
On the Sign in screen select IAM user, enter the Account ID that you just copied, and click Next.
When presented with the login screen shown below, fill in the IAM username and password that you created in the previous step.
Next, click the Sign in button and sign in as the new IAM user.
Please verify that the us-west-2 region Oregon is selected in your console and is showing in the upper right corner of your browser as highlighted in the figure below. We will use CloudShell to execute the steps in this workshop. To provision a CloudShell, pull down the Services menu (from the top left of the screen) then select Developer Tools and choose CloudShell, or just open the following link to CloudShell.
Your CloudShell instance will launch automatically and you will be dropped into a web terminal.
Note: It may take a few minutes to prepare the environment, if this is your first time using it. It is important to ensure that your CloudShell is started in the correct region as the AMI used in this workshop by default is region-specific.
Clone the workshop repository into your CloudShell.
git clone https://github.com/aws-samples/aws-reinvent21-inf1-workshop
Setup for the workshop is now complete and you are ready to proceed with the exercises below.
In this exercise you will launch an Inferentia instance, connect to it, and become familiar with several useful Inferentia-specific utilities. This repository supports two ways of launching an Inferentia EC2 instance:
- Configuring and creating an EC2 launch template, then using the template to launch an instance via
./template-launch.sh
. In part 3 of the workshop, we will need to use this launch template in order to configure an AWS Batch job. That is why we will use this method for the workshop. All of the scripts and configuration required to complete the task are located in directory1-ec2-instance
- Directly launching the instance via the AWS CLI using the
./launch.sh
script, according to configuration provided inec2.conf
. This method is provided as an example, but will not be used in this workshop.
To configure the EC2 instance, open your CloudShell window and execute:
cd aws-reinvent21-inf1-workshop/1-ec2-instance
./config.sh
The ./config.sh
script will ask you to enter the following:
- Access key id from Section 0
- Secret access key from Section 0
- Enter us-west-2 for the Region
- Enter json for the Output Format
The configuration file ec2.conf
will automatically be opened using the 'vi' editor. The purpose of showing the configuration file is to become familiar with its content. Note that the EC2_SUBNET_ID setting is left blank. This means that the instance can be launched in any availability zone and public subnet in your VPC where there is capacity. The configuration can be restricted to a single subnet by specifying a subnet id. You can see a list of VPCs and Subnets in your account by executing scripts ./vpc-list.sh
and ./subnet-list.sh
. These scripts are provided here for your information and exploration.
You are not required to change any of the values in the ec2.conf
file. After you are done reviewing it, you can exit vi
by pressing Esc : q Enter
.
Once you exit the vi
editor, the script ./template-config.sh
will be executed and will generate a launch template configuration based on the content of the ec2.conf
file. The template file will be printed on the screen.
Review the template file. If you would like to make any changes, you can repeat the config step or manually edit the generated launch template file before using it in the next step.
Next, execute the following two scripts to create the template and use it to launch an AWS EC2 Inf1 instance.
./template-create.sh
./template-launch.sh
An instance will be provisioned and you will see a message like the one below:
Instance i-xxxxxxxxxxxxxxxxx launched from template reinvent21-inf1-workshop
Note:
Depending on current workloads and your subnet mapping, there is a chance for the availability zone that your subnet is located in to not have the particular Inferentia instance type available. If that happens an instance will not be provisioned and you will receive an InsufficientInstanceCapacity error. If this occurs, execute ./subnet-list.sh
and select a different subnet than the one currently being used, configure the EC_SUBNET_ID
value in ec2.conf
and run ./template-delete.sh
, ./template-config.sh
, before running ./template-create.sh
and ./template-launch.sh
again.
To monitor the status of your instance, execute the following command:
watch ./list.sh
Once the instance enters status running
please wait for about 3 minutes to allow the instance to start fully and update its Neuron SDK deployment. You can then press Ctrl-C
to return to the command prompt. We are now ready to connect to the instance.
As mentioned at the beginning of Section 1, there are different ways to connect an Inferentia instance, but in this exercise we will just run the ./connect.sh
script.
./connect.sh
Note:
CloudShell sessions have time limits and will automatically terminate after 20-30 minutes of inactivity. You can start a new session immediately after your session is terminated, however the new CloudShell will have a different IP address. In order to allow connections to the inf1 EC2 instance from the new IP address, before running the ./connect.sh
script, execute ./authorize.sh
.
Once connected to the Inferentia instance, let's become familiar with some utilities that can be used to monitor utilization of the instance resources.
This utility is installed as part of the Neuron SDK. It lists the available Inferentia processors a.k.a. Neuron devices on your AWS EC2 Inf1 instance.
neuron-ls
This utility shows the CPU and memory utilization on your AWS EC2 Inf1 instance. By default htop
is not installed as it is a generic linux utility and not a part of the Neuron SDK. Please install it by running the command below:
sudo yum install -y htop
This utility shows the utilization of the Neuron cores on your AWS EC2 Inf1 instance.
Before we proceed further, let's split the cloud shell window into two rows. In the upper right corner select Actions->Split into rows
.
Execute the following commands in the bottom row of CloudShell to establish a second connection to the Inferentia instance:
cd aws-reinvent21-inf1-workshop/1-ec2-instance
./connect.sh
When both the top and bottom shell windows are connected, in the top row execute
htop
and in the bottom row execute
neuron-top
This allows us to monitor the utilization of both CPU and Neuron resources of the Inferentia EC2 instance in real-time.
We will learn more about these metrics in the next exercise.
You have completed Exercise 1!
In this exercise you created an AWS EC2 Inf1 instance and learned how to monitor utilization of its CPU and Neuron resources.
Note: Leave this window open so you can see the utilization of the Neuron cores while we run a model on this instance.
Building and running deep learning containers is essential to the deployment of self-managed ML workloads. In this exercise, you will build a Docker image which contains the Neuron SDK, as well as all necessary drivers and packages required to successfully compile and run models on Inferentia. You will also walk through the manual process of compiling, running, and benchmarking a model. Lastly, you will push the container image to the AWS Elastic Container Registry ECR for use in the next section and then will terminate the instance.
Since we concluded the previous exercise with launching utilization monitors in the CloudShell, we will keep that window open and will use a different way to connect to the Inferentia instance for this part of the workshop.
Open the EC2 Console and select the Inferentia instance, then click the Connect button and select the Session Manager tab.
Using Session Manager to connect opens a terminal in your browser logged into the instance as ssm-user
.
To login as ec2-user
and set the current directory to the user's home, execute:
sudo su ec2-user
cd ~
To start, clone the workshop repository into the AWS EC2 Inf1 instance.
git clone https://github.com/aws-samples/aws-reinvent21-inf1-workshop
All of the scripts needed for this part of the workshop are located in directory 2-dl-container.
cd aws-reinvent21-inf1-workshop/2-dl-container
To prepare the exercise for execution, we will first run the ./config.sh
script.
./config.sh
The ./config.sh
script will ask you to enter the following:
- Access key id from Section 0
- Secret access key from Section 0
- Enter us-west-2 for the Region
- Enter json for the Output Format
After configuring the aws client, the config.sh
script will open the dlc.conf
file for your review and editing in vi
. If you wish to change anything in the file, press i
to enter insert mode and then edit the desired values. When done reviewing/editing, save the file and exit vi
by pressing Esc : wq Enter
. Finally the config script will ensure an ECR registry exists for the image name specified in dlc.conf
.
Optionally, you can verify that the image repository was created successfully by reviewing the list of repositories in Elastic Container Registry through the AWS Console.
Next, execute the following sequence of scripts to build and push the deep learning container image to ECR. Note: This step may take up to 12 mins to complete.
./build.sh
./login.sh
./push.sh
Let us now start the container locally and manually run a model benchmark. For a detailed explanation of how the benchmarking code works, please refer to the benchmarking job documentation.
Execute
./run.sh bash
and you will be dropped into a shell within the container.
To compile the BERT
model, change the current directory to bert
and run the compile_model-inf.py
script.
cd bert
python3 compile_model-inf.py
Notice that upon successful completion, compile_model-inf.py produces a `.pt` file, which is the compiled model serialized as torch-script.
To evaluate the model performance on Inferentia, we will run the direct_benchmark-inf.py
script.
While executing the benchmark script, monitor your CloudShell window where htop
and neuron-top
are running to observe the Neuron core utilization while the test is running on the Inferentia processor.
Note:
If your CloudShell session has timed out, you may have to start a new session and split it again into 2 shell rows, then from the first shell execute cd aws-reinvent21-inf1-workshop/1-ec2-instance; ./authorize.sh; ./connect.sh
and from the second shell execute cd aws-reinvent21-inf1-workshop/1-ec2-instance; ./connect.sh
.
ls -alh
python3 direct_benchmark-inf.py
The top part of the screen is showing vCPU and system memory utilization, while the bottom part of the screen is showing utilization of the Neuron Cores as well as aggregated system metrics. We can see that the benchmark test is running on the Neuron Cores and they are being utilized at close to 95%.
To wrap up this part of the workshop, we are going to stop the container and terminate the EC2 instance by executing the following:
exit
./stop.sh
This exits the container and stops it.
Close the Session Manager window. Go back to the CloudShell window and exit the monitoring tools by pressing q
or Ctrl-C
in each of the shell windows to return to the command shell. You may also close one of the shell panes, or just refresh the browser window.
To terminate the AWS EC2 Inf1 instance, execute the following in CloudShell:
cd ~/aws-reinvent21-inf1-workshop/1-ec2-instance
./terminate.sh
./list.sh
Execute
watch ./list.sh
and wait until the status of the instance shows as terminated
. Then press Ctrl-C
to return to the command shell.
Note:
The AWS EC2 Inf1 instance must be terminated prior to proceeding to the next exercise. This is necessary in order to conform to a default service quota for Inf1 instances in Event Engine.
You have completed the second exercise of the workshop!
In this exercise you built a deep learning container, pushed the container image to ECR, and ran it locally to trace an NLP model and run it on Inferentia while monitoring resource utilization of the instance. You have also terminated the instance to prepare your account for the next exercise.
In the next exercise you will run a few instances of the model, using the image you pushed to ECR.
In this exercise you will use scripts that utilize the AWS CLI to create an Inferentia compute environment in AWS Batch. You will submit benchmark jobs with different batch sizes to this compute environment. A model will be compiled for each batch size and uploaded to S3. A benchmark log will also be saved for each batch size and uploaded to S3. When all models and logs have been uploaded, you will run a report that parses the logs and extracts metrics that help you compare the performance of the model at different request batch sizes.
The scripts used in this part of the tutorial are located in directory 3-batch-performance
, however artifacts and configuration from the previous two exercises are used in this section as well. The configuration settings for this section are available in file batch.conf
. The instance types and batch sizes configured by default have been optimized for use with AWS Event Engine accounts.
To complete this exercise, we will use the following workflow:
Setup batch compute environment ---> Submit batch jobs ---> Monitor batch jobs until they complete ---> Report and analyze batch job results ---> Clean up batch compute environment
To setup the compute environment, execute:
cd ~/aws-reinvent21-inf1-workshop/3-batch-performance
./run.sh setup
Optionally, you can view the compute environment in the AWS console by visiting AWS Batch > Compute Environments
Next, we will compile and benchmark the BERT
model for batch sizes 2, 4, and 8.
To submit the benchmark jobs, execute:
./run.sh submit
Once the jobs have been successfully submitted, execute the following command to monitor their status:
watch ./status.sh
Execution of each of the jobs may take between 5 and 10 minutes. If for any reason you would like to cancel any non-finished jobs, execute the ./stop.sh
script.
While waiting for the jobs to finish running, you may explore Batch through the AWS Console. The compute environment you created will be listed under Compute environments. You can also review the Job queues, Job definitions, and Jobs.
To see the current jobs, toggle "Searching and filtering" to Enable, then select the job queue from the pull-down.
If you select a job that is in status RUNNING
, SUCCEEDED
, or FAILED
, you will see a link to the job's CloudWatch log stream.
Following the log stream name link will open the job's CloudWatch log.
If you wish to watch the log stream, scroll to the bottom of the log and click the Resume
link to periodically retrieve new events.
Once the jobs are in status SUCCEEDED
, press Ctrl-C
in CloudShell to return to the command prompt.
The traced models and job execution logs have been uploaded to S3.
To parse the logs and produce a summary report, execute:
./run.sh report
A CSV report will be produced, comparing performance of the model with the different batch sizes. The report will also be uploaded to S3 and saved alongside the compiled models and logs.
Inspecting the benchmarking report, we can see that increasing the batch size increases the throughput, but also increases the latency of the model responses.
If you wish to explore more details about any of the lines of the report, review the content of the corresponding .log
and .json
file in the current directory.
The .log
files are produced by using the direct benchmark scripts, while the .json
files are produced using the neuronperf
benchmark package which is part of the Neuron SDK. The two benchmarks are independent. They are included together for completeness and to demonstrate different ways that are available for measuring performance of models running on Inferentia.
Now that you have completed optimizing the performance of the natural language processing BERT
model on Inferentia, as a bonus task, configure the batch jobs to do the same for the computer vision model RESNET
. You can start by editing the DOE_MODEL_FAMILY
setting in batch.conf
. Then repeat steps 3.1 to 3.4 to generate a report which includes RESNET
.
Finally, let's clean up the compute environment by executing the following:
./run.sh cleanup
Note:
If any error messages occur while the compute environment is being cleaned up, this would likely be due to variable timing while disabling and deleting components of the environment. Interrupting and re-running the cleanup step would resolve errors of this kind.
You have completed the third and last exercise!
In this exercise you used AWS Batch to explore how model performance is impacted by the choice of batch size in the request. Following a similar approach, through making configuration changes in the DOE section of batch.conf
, exploration can be expanded to other model parameters, other instance sizes, or EC2 instance types.
Congratulations!
You have completed the re:Invent 2021 Inferentia workshop!
Through the exercises in this workshop you gained experience in provisioning Inferentia EC2 instances, compiling and running models on Inferentia using containers, and optimizing model performance with the help of AWS Batch. Although this workshop's examples focused on a BERT-based NLP model, please note that the same approach can be used for deployment and evaluation of other self-managed AI & ML models on AWS Inferentia.
- AWS CloudShell
- EC2 inf1 instance types
- Neuron SDK
- Docker
- Elastic Container Registry
- Elastic Container Service
- AWS Batch
- AWS re:Invent 2021
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.