Skip to content

abzdel/Hugging_Face_AWS_Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hosting Hugging Face Models on Amazon SageMaker

Welcome to the comprehensive guide on hosting your own inference environment for Hugging Face models using Amazon SageMaker! Whether you're a seasoned developer or just getting started, this tutorial will walk you through the steps to effortlessly deploy and serve your models with confidence. By the end, you'll have a robust setup on AWS SageMaker, empowering you to seamlessly deploy your models for real-world applications.

For our application, we'll be hosting our own instance of GPT2 for text generation!

Why Amazon SageMaker? Why Hugging Face?

SageMaker offers managed solutions for the entire machine learning lifecycle. It provides a hassle-free environment to deploy, manage, and scale your models. If you're looking for ways to get your models into production without needing to worry too much about the underlying complexities of model hosting, SageMaker is your tool.

As for Hugging Face - you and I don't have millions of dollars to throw at training advanced neural networks (at least not yet 😉) with billions of parameters. Luckily for us, Hugging Face has changed the game and allowed users to host their own custom models that we can download, fine-tune, and host either in-house in Hugging Face Spaces or seamlessly transfer over to AWS. Hugging Face allows mass availability of open-source models designed with the most cutting-edge methods. I believe expertise around using pre-trained models will grow in demand for MLOps practitioners in the near future, as it's simply too efficient and cost-effective to ignore.

What you'll Learn

  • Setting up an Amazon SageMaker instance
  • Preparing your Hugging Face model for deployment
  • Deploying your model on SageMaker with ease
  • Accessing your deployed model

Prerequisites

Before getting started, ensure you have:

Step 1: Authenticate your Environment

  • for this, we need an IAM User and IAM role
  • skip to step 2 if you already have this set up

1.1 Setup IAM User

To authenticate for a command line tool like this one, we need to access AWS through access key credentials. Head over to your AWS console and go to IAM Users. Click on the yellow "Create User" button on the top right:

Alt text

Make sure you check the box that says "Provide Access to the Management Console". For our purposes, we'll tick the second box to create an IAM User. Alt text

To keep things simple, we won't require a new password for the new user's first sign-in. Alt text

As for permissions, let's attach policies directly and look for "AmazonSageMakerFullAccess": Alt text

Navigate through the remainder of user creation. Once AWS takes you back to your list of all IAM Users, select the IAM User you just created. Navigate to the security credentials tab: Alt text

Scroll down a bit to create access keys. When prompted for a use case, select command line interface (should be the first option). Alt text

Now, we have an access key and a secret key. Alt text

We can now authenticate our environment using these keys. Type in aws configure, copy and paste your access keys, and select your desired region (we will use us-east-1). Alt text



1.2 Setup IAM Role

Head over to your AWS console and go to IAM Roles. Click on the yellow "Create Role" button on the top right: Alt text

Select AWS service, and SageMaker when prompted. It should automatically attach the SageMaker Full Access policy. Name your role, and continue with the default settings. Alt text

Select the role you just made and copy your ARN. Export this to an environment variable Alt text Alt text

Step 2: Setting up Deployment Steps

Choose the Hugging Face model you'd like to be hosted. For this project, we'll host our own version of DistilGPT2

I've split up this process into two files. I've made the following changes to the default py deployment file you can find on Hugging Face:

  • host.py
    • changed the top few lines to load in Role from environment variable
    • added some print statements
    • removed model querying - this notebook should only deploy the model to SageMaker
  • query.py
    • this takes command line input that will be passed to our model
    • we're using a text generation model, so this will tell the model what to generate

I recommend using these two python files, but if you'd like to create your own you can find the boilerplate on your model's hugging face page:

On the right side of the page, look for Deploy->Amazon SageMaker Alt text From here, you can copy and paste the python code into your own file and make changes as you see fit.

Step 3: Host your model

Now comes the fun part. Run python host.py This will take a few minutes - this is the step that deploys your model to SageMaker for inference. Alt text

Step 4: Query your model

You may need to open query.py and change endpoint_name to your model endpoint's name. To find this, go to your AWS Management Console and head over to SageMaker. On the left side, look for Inference->Endpoints, and copy the name of the endpoint you just created via the host.py script. Alt text

Then, run query.py and see your deployed text generation model in action: Alt text

As can be seen, the output is truncated due to model constraints, and it doesn't give an output we'd consider ideal given out input. This is a drawback of using a smaller model. For larger models, we'd either have to increase the instance size, or look to another inference solution altogether.

Step 5: IMPORTANT!! Delete your resources!

Once you're done using your model, you don't want to be billed for the idle resources. Navigate to SageMaker on the AWS Management Console and scroll down the left hand side until you find "inference". The Model, Endpoint, and Endpoint Configuration should all be deleted. Alt text

Fortunately, the python files in this project lend themselves to high reproducibility, and this solution can be spun-up again in minutes.

About

Zero to One: How to find pre-trained models on Hugging Face, host them on AWS SageMaker, and query them for your outputs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages