# Get started with SageMaker Studio

## Table of contents<span id= "toc"></span>

1. [Introduction](#introduction)
1. [Navigating Studio](#navigating-studio)
1. [Pricing](#pricing)
1. [Creating notebooks, source code files and accessing the Terminal](#creating-notebooks)
1. [Build and Train a model](#installing-python)
1. [Shutting down instances](#shutting-down)
1. [Manage your workspace](#manage-workspace)
1. [About SageMaker resources](#sagemaker-resources)
1. [Practical walkthroughs](#walkthroughs)
1. [Additional learning resources](#additional-learning)
1. [Common troubleshooting tips](#troubleshooting)
<br>
<br>
***

## 1. Introduction<span id="introduction"></span>

This getting started guide is a perfect starting place for first-time users of SageMaker Studio and covers everything from the basics of JupyterLab to a practical walkthrough of training an ML model. This notebook also provides detailed insight into SageMaker-specific functionality, resources, and tools. We’ll start at the very beginning and then ramp up to advanced topics, so feel free to skip any sections that aren’t relevant to you.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
***

## 2. Navigating Studio<span id="navigating-studio"></span>

Navigation is primarily done through the left-hand menu. This gives you access to custom SageMaker resources, files, running terminals, Git, and more. You can expand or collapse this navigation by simply clicking the icons to toggle the state.

![02-Navigating-Studio.gif](attachment:4eb28595-b559-4202-95a1-afb42ee4e370.gif)

### ![icon-home.png](attachment:12c1a27b-0dbb-48c9-abe0-c9db93b3132d.png) Home

Users familiar with JupyterLab will feel right at home in SageMaker Studio. That’s because we have maintained all the functionality you love, while extending the capabilities of JupyterLab with powerful custom resources that can speed up your ML process by harnessing the power of AWS compute. Navigating these custom SageMaker resources has never been easier.
You can find all SageMaker resources under the Home Icon in the top-left corner of the application. Here you will find:
* Projects
* Data Wrangler flows
* Pipelines
* Experiments 
* Trials
* Models
* Endpoints
* Feature store
* And more...

### ![icon-folder.png](attachment:4df5d99c-8295-4908-81dd-627b4a2db684.png) File browser

The file and resource browser displays lists of your notebooks, experiments, trials, trial components, endpoints, and low-code solutions. On the menu at the top of the file browser, choose the plus (+) sign to open the Studio Launcher. The Launcher allows you to create a notebook, launch a Python interactive shell, or open a terminal.

The file browser is where you can find all of your notebooks and other resources for your ML projects. Depending on if you are in a personal or shared space determines who has access to your files. You can identify which type of space you are in by looking at the top right corner. If you are in a personal app you will see a user icon followed by “[user_name] / Personal Studio” and if you are in a collaborative space you will see a globe icon followed by “[user_name] / [space_name].”

**Personal Studio app:** A private EFS directory that only you can access.

**Collaborative space:** A shared EFS directory with other members of your team enabling group access to notebooks and resources. Working in a shared space allows for real-time team collaboration on notebooks.

**Upload files:** Choose the Upload Files icon to add files to Studio or drag and drop them from your desktop.

**Open files:** Double-click a file to open the file in a new tab or right-click and select “Open.”

**Panel management:** To have adjacent files open, choose a tab that contains a notebook, Python, or text file, then choose New View for File.

**Studio launcher:** Choose the plus (+) sign on the menu at the top of the file browser to open the Studio Launcher.

### ![icon-running.png](attachment:30792d1a-0de2-4dae-9c1c-b6fd598d44a5.png) Running terminals and kernels

You can shut down individual resources, including notebooks, terminals, kernels, apps, and instances. You can also shut down all resources in one of these categories at the same time.

For more information, see [Shut Down Resources](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-run-and-manage-shut-down.html).

### ![icon-git.png](attachment:75b32b9c-6af1-4070-a728-502300259b82.png) Git

You can connect to a Git repository and then access a full range of Git tools and operations.

For more information, see [Clone a Git Repository in SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks-git.html).

### ![icon-toc.png](attachment:a13326d9-263c-4e61-aa7c-6dac4453d05d.png) Table of contents

The table of contents extension is built-in JupyterLab since version 3.0. This makes it easy to see and navigate the structure of a document. A table of contents is auto-generated in the left sidebar when you have a notebook, markdown, latex or python files opened. The entries are clickable, and scroll the document to the heading in question.

In the sidebar panel, you can number headings, collapse sections, and quickly navigate the notebook file.

### ![icon-extensions.png](attachment:308a1abb-e557-4e57-b093-6649366e54f3.png) Extension manager

You can enable and manage third-party JupyterLab extensions. Here you can check the already installed extensions and search for extensions by typing the name in the search bar. When you have found the extension you want to install, click the "Install" button. After installing your new extensions, be sure to restart JupyterLab by refreshing your browser.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
***

## 3. Pricing <span id="pricing"></span>

With SageMaker, you pay only for what you use. You have two choices for payment: an On-Demand Pricing that offers no minimum fees and no upfront commitments, and the SageMaker Savings Plans that offer a flexible, usage-based pricing model in exchange for a commitment to a consistent amount of usage.

To ensure you aren’t charged for unused resources, make sure you shut down resources before exiting Studio by using the “Running Terminals and Kernels” node in the primary left-navigation.

For full pricing information, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
***

## 4. Creating notebooks, source code files and accessing the Terminal<span id="creating-notebooks"></span>

### Creating compute resources

SageMaker Studio lets you create notebooks, source code files, and access the image terminal. You can do this by clicking on the "+" button at the top of the file browser in the left panel to open the Launcher:

![launcher-plus.png](attachment:88fba333-7ac6-4e90-9cc6-3eb843c1f1b5.png)

In the Launcher, there are a set of cards that allow you to create notebooks, create source code files, or access the image terminal with the selected environment.

All of the notebooks, files, and datasets that you create from launcher are saved in your persistent project directory and are available when you open your project.

## Customize compute environment

You can customize compute environment for notebook, code console, and image terminal by selecting the right instance type, image, kernel, and start-up script depending on your needs.

**[Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-instance-types.html)**<br>
An Amazon Elastic Compute Cloud (Amazon EC2) instance is used to run a notebook. Instance types comprise of varying combinations of CPU, GPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.

**[Image](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi-launch.html)**<br>
A SageMaker image is a file that identifies the kernels, language packages, and other dependencies required to run a Jupyter notebook in Amazon SageMaker Studio. SageMaker provides many built-in images for you to use. If your organization has different needs, your admin can also create custom images.

**[Start-up script](https://aws.amazon.com/blogs/machine-learning/customize-amazon-sagemaker-studio-using-lifecycle-configurations/)**<br>
A start-up script is a lifecycle configuration script triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use it to install custom packages, configure notebook extensions, preload datasets, and set up source code repositories.

You can change the compute environment that your Studio notebook uses from the launcher by clicking **Change environment**:

![launcher-cards.png](attachment:3f6710b5-7850-4e45-88ef-760314f9fa1b.png)

![04-01-Choosing-the-right-instance-type.gif](attachment:767b1fa9-1c0a-417f-af45-328774cfa5e0.gif)

You can also modify the existing settings of a running notebook by clicking the instance details in the upper right of the notebook (shown in the below screenshots) to open the environment setup modal where you can select images, kernels, instance types, and start-up scripts that are available to you.

![notebook-header.png](attachment:e62077e4-8f62-4663-b544-eb9a4a239901.png)

![04-02-Choosing-the-right-instance-type.gif](attachment:062c3781-5b25-4315-ae90-f85939d25ede.gif)

<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 5. Build and Train a model<span id="installing-python"></span>

In this section we will train a simple model within the notebook instance using sklearn.

The first thing we would need to do when building a ML model is to install necessary packages that are required to train the model.

The simplest way of installing Python packages is to use either of the following magic commands in a code cell of a notebook:

`%conda install <package>`

`%pip install <package>`

These magic commands will always install packages into the environment used by that notebook and any packages you install are saved in your persistent project directory. 

Note: we don't recommend using !pip or !conda as those can behave in unexpected ways when you have multiple environments.

Here is an example that shows how to install NumPy into the environment used by this notebook:

In [None]:
%pip install numpy 
# If you are using a Conda environment (such as with the Data Science Kernel), you may prefer to use conda instead of pip:
# %conda install numpy

Now you can use NumPy:

In [None]:
import numpy as np
np.random.rand(10)

Similarly we can do:

In [None]:
%pip install sklearn pandas

It's that easy to install and start using Python packages.

Now lets write a simple code that downloads data from sklearn datasets and trains a model.

To turn on automated visualizations and data insights for your pandas data frame, import the sagemaker_datawrangler library.

In [None]:
%pip install pandas
# If you are using a Conda environment (such as with the Data Science Kernel), you may prefer to use conda instead of pip:
# %conda install pandas

# Import necessary packages 
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor

import pandas as pd
import numpy as np
import os
import joblib

# Import sagemaker_datawrangler which will enable an enhanced visualization and data prep widget for Pandas.
try:
    import sagemaker_datawrangler
except ImportError:
    # sagemaker_datawrangler is available in Data Science Kernel Image 
    pass

In [None]:
# We use the California housing dataset below and preview in Pandas
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df

In [None]:
# To prepare data for training, we split data into train and test

X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.25, random_state=42
)

trainX = pd.DataFrame(X_train, columns=data.feature_names)
trainX["target"] = y_train

testX = pd.DataFrame(X_test, columns=data.feature_names)
testX["target"] = y_test

In [None]:
trainX.head()

In [None]:
# Train
print("training model")
model = RandomForestRegressor(
        n_estimators=100, min_samples_leaf=2, n_jobs=-1
    )

model.fit(X_train, y_train)

# Print abs error
print("validating model")
abs_err = np.abs(model.predict(X_test) - y_test)

# Print couple perf metrics
for q in [10, 50, 90]:
    print("AE-at-" + str(q) + "th-percentile: " + str(np.percentile(a=abs_err, q=q)))

In [None]:
# Persist the model
path = "./model.joblib"
joblib.dump(model, path)
print("model persisted at " + path)

There are [more demos below](#walkthroughs) that explore advanced topics, dive deeper into specific SageMaker features, and provide notebook examples on which you can base your projects.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
***

## 6. Shutting down instances<span id="shutting-down"></span>

You can shut down individual resources, including notebooks, terminals, kernels, apps, and instances. You can also shut down all resources in one of these categories at the same time.

### Shut down an open notebook

To shut down an open notebook from the File menu:
1. Optionally, save the notebook contents by choosing the Disk icon on the left of the notebook menu.
2. Choose File then Close and Shutdown Notebook.
3. Choose OK.

![shutdown-menu.png](attachment:cce682d6-df50-4f1d-ab4c-b69b385d1704.png)

### Shut down resources panel

You can reach the Running Terminals and Kernels pane on the left-navigation from the above icon. The Running Terminals and Kernels pane consists of four sections. Each section lists all the resources of that type. You can shut down each resource individually or shut down all the resources in a section at the same time.

![running-panel.png](attachment:8b6a53f2-1b4e-44a7-8ea3-20a9cfb489c5.png)
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 7. Manage your workspace<span id="manage-workspace"></span>

### Collapse navigation

If you want to focus on a notebook or workflow you can optimize your workspace real estate by fully collapsing the navigation. Just click on the currently active icon and the navigation will become hidden. When hidden, click any icon to re-expand the navigation.

![09-01-Manage-your-workspace.gif](attachment:fb85dbf2-3261-4a08-8236-7ccbb9d7dcbd.gif)

### Panel management

The main work area in JupyterLab enables you to arrange documents (notebooks, text files, etc.) and other activities (terminals, code consoles, etc.) into panels of tabs that can be resized or subdivided. Drag a tab to the center of a tab panel to move the tab to the panel. Subdivide a tab panel by dragging a tab to the left, right, top, or bottom of the panel.

The work area has a single current activity. The tab for the current activity is marked with a colored top border (blue by default).

![09-02-Manage-your-workspace.gif](attachment:5abcd780-f734-4512-a872-b27884919bd3.gif)
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 8. About SageMaker resources<span id="sagemaker-resources"></span>

Extending the capabilities of JupyterLab, SageMaker resources can drastically speed up your ML process by providing tools designed to harness the power of AWS compute.

### Data

**[Data Wrangler](#)**<br>
A visual data preparation tool that makes it easy and fast to clean and prepare data for analytics and machine learning.

**[Feature store](#)**<br>
A fully managed service to store, share, and manage model features for training and inference, enabling feature reuse across ML applications. Feature store allows you to ingest features from streaming and batch data sources, then build feature pipelines and speed to model deployment.

**[Clusters](#)**<br>
Compute clusters enable you to do petabyte-scale interactive data preparation and machine learning within the a notebook.

### AutoML

**[AutoML](#)**<br>
Automates the key tasks of the machine learning process. It explores your data, selects the algorithms relevant to your problem type, and prepares the data to facilitate model training and tuning. 

### Experiments

**[Experiments](#)**<br>
Generates and automates ML experiments using the Experiments SDK. Visualize experiments, log key metrics, analyze runs, and identify the best performing run in Sagemaker Studio. Promote models to the Model Registry by registering and deploying them to an endpoint.

### Pipelines

**[Pipelines](#)**<br>
Pipelines generates and automates end-to-end ML workflows using the Pipelines SDK. Experiment rapidly and create repeatable processes for data processing, model training, model tuning, and batch transformation. Visualize pipelines with a comprehensive graph interface, and manage pipelines executions with ease.

### Models

**[Model registry](#)**<br>
Model Registry catalogs and promotes models to production using the Model Registry SDK. You can instance new model versions and log important model metadata in SageMaker Studio. As models evolve, manage their status by updating the approval state of a deployment pipeline, effectively automating the CI/CD process.

**[Inference compiler](#)**<br>
Inference compiler supports compilation and deployment for two main platforms: cloud instances (including Inferentia) and edge devices. It automatically optimizes Gluon, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, and ONNX models for inference on Android, Linux, and Windows machines.

**[Edge packager](#)**<br>
Edge Manager packaging jobs take Neo–compiled models and make any changes necessary to deploy the model with the inference engine, Edge Manager agent.

**[Shared models](#)**<br>
Share your models and notebooks with other users in your organization for collaboration and knowledge transfer. View all ML contents shared by other users and share yours with other users.

### Deployments

**[Projects](#)**<br>
SageMaker Projects allows you to organize your most important ML resources into a single, ordered system. This includes code repositories, experiments, pipelines, registered models, and endpoints. With SageMaker Project templates, you can automate model building, training, and deployment, effectively industrializing your model lifecycle, and CI/CD process.

**[Inference recommender](#)**<br>
Inference Recommender reduces the time to get models in production by automating load testing and model tuning. It can deploy models to real-time inference endpoints with the best performance at the lowest cost. It helps select the best instance type and configuration for models and workloads. Inference Recommender only charges for the instances used.

**[Endpoints](#)**<br>
Endpoints provides infrastructure and model deployment options for your inference needs. Create highly-available endpoints to get inference from deployed models with low latency and high throughput. SageMaker makes it easy to deploy models to make predictions for any use case.

**[Fleet and devices](#)**<br>
Edge Manager operates machine learning models on devices like a fleet of smart cameras, smart speakers, and robots. Fleets are collections of logically grouped devices you can use to collect and analyze data.

### JumpStart

**[Models, notebooks, solutions](#)**<br>
View pretrained models, example notebooks, and prebuilt solutions that solve common use cases.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 9. Practical walkthroughs<span id="walkthroughs"></span>

## Getting started demos

A collection of walkthroughs for various aspects of the Studio workflow for Data Scientists and ML Engineers.

[Getting started ML tutorials](https://aws.amazon.com/sagemaker/getting-started/)

## SageMaker examples Github repo

To help you take the next steps, we have a GitHub repository with a set of example notebooks that cover a wide range of data science and machine learning topics, from importing and cleaning data to data visualization and training machine learning models.

[SageMaker examples repo](https://github.com/aws/amazon-sagemaker-examples)

**OR**

1. Go to the Git icon in the left navigation and click **Clone a Repository**. 
2. Paste into the input `https://github.com/aws/amazon-sagemaker-examples.git`
3. Click **Clone**.

<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 10. Additional learning resources<span id="additional-learning"></span>

A collection of tools to help you ramp up your skills and workflow.

### AWS Machine Learning University

[Machine Learning University (MLU)](https://aws.amazon.com/machine-learning/mlu/) provides anybody, anywhere, at any time access to the same machine learning courses used to train Amazon’s own developers on machine learning. Learn how to use ML with the learn-at-your-own-pace MLU Accelerator learning series.

<!-- <button>Clone MLU notebooks</button> -->

### Dive into Deep Learning (D2L)

[Dive into Deep Learning (D2L)](https://www.d2l.ai/) is an open-source, interactive book that teaches the ideas, the mathematical theory, and the code that powers deep learning. With over 150 Jupyter notebooks, D2L provides a comprehensive overview of deep learning principles and a state-of-the-art introduction to deep learning in computer vision and natural language processing. With tens of millions of online page views, D2L has been adopted for teaching by over 300 universities from 55 countries, including Stanford, MIT, Harvard, and Cambridge.

<!-- <button>Clone D2L notebooks</button> -->

### Hugging Face

[Hugging Face](https://huggingface.co/) is the home of the [Transformers library](https://huggingface.co/docs/transformers/index) and state-of-the-art natural language processing, speech, and computer vision models.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>

## 11. Common troubleshooting tips<span id="troubleshooting"></span>

Some resources to help if you find yourself stuck while working.

**[SageMaker FAQs](https://aws.amazon.com/sagemaker/faqs/)**<br>
A collection of frequently asked questions from SageMaker customers.

**[AWS Glossary](https://docs.aws.amazon.com/general/latest/gr/glos-chap.html)**<br>
Lists the latest AWS terminology and usage.

**[Getting started with Amazon SageMaker](https://aws.amazon.com/sagemaker/getting-started/?nc=sn&loc=7)**<br>
Follow along the hands-on tutorials to learn how to use Amazon SageMaker to accomplish various machine learning lifecycle tasks, including data preparation, training, deployment, and MLOps.
<br>
<br>
<a href="#toc" style="display: block; text-align: right;">⬆️ Table of contents</a>
<hr>