From bc049c99243f594aaff439891c9be7dff354b9c4 Mon Sep 17 00:00:00 2001 From: Chen Qian Date: Tue, 26 Dec 2023 21:07:55 -0800 Subject: [PATCH] Add quickstart for pytorch flavor (#10737) Signed-off-by: chenmoneygithub --- docs/source/deep-learning/pytorch/index.rst | 24 + .../quickstart/pytorch_quickstart.ipynb | 1380 +++++++++++++++++ 2 files changed, 1404 insertions(+) create mode 100644 docs/source/deep-learning/pytorch/quickstart/pytorch_quickstart.ipynb diff --git a/docs/source/deep-learning/pytorch/index.rst b/docs/source/deep-learning/pytorch/index.rst index 5da1b6d4e8c39..2434d0ef9aba6 100644 --- a/docs/source/deep-learning/pytorch/index.rst +++ b/docs/source/deep-learning/pytorch/index.rst @@ -18,6 +18,30 @@ in MLflow we provide a set of APIs for: - **Experiments Management**: Store your PyTorch experiments in MLflow server, and you can view and share them from MLflow UI. - **Effortless Deployment**: Deploy PyTorch models with simple API calls, catering to a variety of production environments. +5 Minute Quick Start with the MLflow PyTorch Flavor +---------------------------------------------------- + +To get a quick overview of how to use the MLflow PyTorch flavor, please read the quickstart guide. It +will walk you through the basics of tracking PyTorch experiments. + +.. raw:: html + + View the Quickstart + +To download the PyTorch quickstart notebook to run in your environment, click the respective link below: + +.. raw:: html + + Download the Quickstart of MLflow's PyTorch Integration
+ +.. toctree:: + :maxdepth: 1 + :hidden: + + quickstart/pytorch_quickstart.ipynb + + `Developer Guide of PyTorch with MLflow `_ ------------------------------------------------------------- diff --git a/docs/source/deep-learning/pytorch/quickstart/pytorch_quickstart.ipynb b/docs/source/deep-learning/pytorch/quickstart/pytorch_quickstart.ipynb new file mode 100644 index 0000000000000..3000a598cd510 --- /dev/null +++ b/docs/source/deep-learning/pytorch/quickstart/pytorch_quickstart.ipynb @@ -0,0 +1,1380 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "S9KwXx6Hz3yl" + }, + "source": [ + "# Quickstart with MLflow PyTorch Flavor\n", + "\n", + "In this quickstart guide, we will walk you through how to log your PyTorch experiments to MLflow. After reading this quickstart, you will learn the basics of logging PyTorch experiments to MLflow, and how to view the experiment results in the MLflow UI.\n", + "\n", + "This quickstart guide is compatible with cloud-based notebook such as Google Colab and Databricks notebook, you can also run it locally." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CZwvVcIg0Z1P" + }, + "source": [ + "## Install Required Packages" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "Sds9DYr9zcs0" + }, + "outputs": [], + "source": [ + "%pip install -q mlflow torchmetrics torchinfo" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "VH4jrEpRzkg-" + }, + "outputs": [], + "source": [ + "import mlflow\n", + "import torch\n", + "\n", + "from torch import nn\n", + "from torch.utils.data import DataLoader\n", + "from torchinfo import summary\n", + "from torchmetrics import Accuracy\n", + "from torchvision import datasets\n", + "from torchvision.transforms import ToTensor" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3ucY0NjP0tPD" + }, + "source": [ + "## Task Overview\n", + "\n", + "In this guide, we will demonstrate the functionality of MLflow with PyTorch through a simple MNIST image classification task. We will build a convolutional neural network as the image classifier, and log the following information to mlflow:\n", + "\n", + "- **Training Metrics**: training loss and accuracy.\n", + "- **Evalluation Metrics**: evaluation loss and accuracy.\n", + "- **Training Configs**: learning rate, batch size, etc.\n", + "- **Model Information**: model structure.\n", + "- **Saved Model**: model instance after training.\n", + "\n", + "Now let's dive into the details!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kIqWWgO72Bwp" + }, + "source": [ + "## Prepare the Data\n", + "\n", + "Let's load our training data `FashionMNIST` from `torchvision`, which has already been preprocessed into scale the [0, 1). We then wrap the dataset into an instance of `torch.utils.data.Dataloader`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "zHLPgjFyzw_P" + }, + "outputs": [], + "source": [ + "training_data = datasets.FashionMNIST(\n", + " root=\"data\",\n", + " train=True,\n", + " download=True,\n", + " transform=ToTensor(),\n", + ")\n", + "\n", + "test_data = datasets.FashionMNIST(\n", + " root=\"data\",\n", + " train=False,\n", + " download=True,\n", + " transform=ToTensor(),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l4Z8bDxI3TAh" + }, + "source": [ + "Let's look into our data." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GlZkxQHu2t7V", + "outputId": "2ae9190a-03ac-4371-d6a9-0e9309a4e9de" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Image size: torch.Size([1, 28, 28])\n", + "Size of training dataset: 60000\n", + "Size of test dataset: 10000\n" + ] + } + ], + "source": [ + "print(f\"Image size: {training_data[0][0].shape}\")\n", + "print(f\"Size of training dataset: {len(training_data)}\")\n", + "print(f\"Size of test dataset: {len(test_data)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NhV1htwx3qJR" + }, + "source": [ + "We wrap the dataset a `Dataloader` instance for batching purposes. `Dataloader` is a useful tool for data preprocessing. For more details, you can refer to the [developer guide from PyTorch](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#preparing-your-data-for-training-with-dataloaders)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "RoZj28jT3N_n" + }, + "outputs": [], + "source": [ + "train_dataloader = DataLoader(training_data, batch_size=64)\n", + "test_dataloader = DataLoader(test_data, batch_size=64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J4XB9_TS4Gkc" + }, + "source": [ + "## Define our Model\n", + "\n", + "Now, let's define our model. We will build a simple convolutional neural network as the classifier. To define a PyTorch model, you will need to subclass from `torch.nn.Module` and override `__init__` to define model components, as well as the `forward()` method to implement the forward-pass logic.\n", + "\n", + "We will build a simple convolution neural network (CNN) consisting of 2 convolutional layers as the image classifier. CNN is a common architecture used in image classification task, for more details about CNN please read [this doc](https://en.wikipedia.org/wiki/Convolutional_neural_network). Our model output will be the logits of each class (10 classes in total). Applying softmax on logits yields the probability distribution across classes." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "5FMb4gMK4ECL" + }, + "outputs": [], + "source": [ + "class ImageClassifier(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.model = nn.Sequential(\n", + " nn.Conv2d(1, 8, kernel_size=3),\n", + " nn.ReLU(),\n", + " nn.Conv2d(8, 16, kernel_size=3),\n", + " nn.ReLU(),\n", + " nn.Flatten(),\n", + " nn.LazyLinear(10), # 10 classes in total.\n", + " )\n", + "\n", + " def forward(self, x):\n", + " return self.model(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6rufiZF78hmd" + }, + "source": [ + "## Connect to MLflow Tracking Server\n", + "\n", + "Before implementing the training loop, we need to configure the MLflow tracking server because we will log data into MLflow during training.\n", + "\n", + "In this guide, we will use [Databricks Community Edition](https://www.databricks.com/try-databricks#account) as MLflow tracking server. For other options such as using your local MLflow server, please read the [Tracking Server Overview](https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html).\n", + "\n", + "If you have not, please register an account of [Databricks community edition](https://www.databricks.com/try-databricks#account). It should take no longer than 1 min to register. Databricks CE (community edition) is a free platform for users to try out Databricks features. For this guide, we need the ML experiment dashboard for us to track our training progress.\n", + "\n", + "After successfully registering an account on Databricks CE, let's connnect MLflow to Databricks CE. You will need to enter following information:\n", + "\n", + "- Databricks Host: https://community.cloud.databricks.com/\n", + "- Username: your signed up email\n", + "- Password: your password" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "6fa35hgk9XE3" + }, + "outputs": [], + "source": [ + "mlflow.login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TKV0SG-Q9ahb" + }, + "source": [ + "Now you have successfully connected to MLflow tracking server on Databricks CE, and let's give our experiment a name." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vRPWOj1o9h-V", + "outputId": "2cc120fe-5d06-45f0-a83b-985af5d89f8e" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mlflow.set_experiment(\"/mlflow-pytorch-quickstart\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TkyEv1qa6Uxe" + }, + "source": [ + "## Implement the Training Loop\n", + "\n", + "Now let's define the training loop, which basically iterating through the dataset and applying a forward and backward pass on each data batch." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YXAdFfTa6tkH" + }, + "source": [ + "Get the device info, as PyTorch requires manual device management." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "bVLaZw396puE" + }, + "outputs": [], + "source": [ + "# Get cpu or gpu for training.\n", + "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YULTT1EI6_1L" + }, + "source": [ + "Define the training function." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "LDiFPTGR5ybY" + }, + "outputs": [], + "source": [ + "def train(dataloader, model, loss_fn, metrics_fn, optimizer, epoch):\n", + " \"\"\"Train the model on a single pass of the dataloader.\n", + "\n", + " Args:\n", + " dataloader: an instance of `torch.utils.data.DataLoader`, containing the training data.\n", + " model: an instance of `torch.nn.Module`, the model to be trained.\n", + " loss_fn: a callable, the loss function.\n", + " metrics_fn: a callable, the metrics function.\n", + " optimizer: an instance of `torch.optim.Optimizer`, the optimizer used for training.\n", + " epoch: an integer, the current epoch number.\n", + " \"\"\"\n", + " model.train()\n", + " for batch, (X, y) in enumerate(dataloader):\n", + " X, y = X.to(device), y.to(device)\n", + "\n", + " pred = model(X)\n", + " loss = loss_fn(pred, y)\n", + " accuracy = metrics_fn(pred, y)\n", + "\n", + " # Backpropagation.\n", + " loss.backward()\n", + " optimizer.step()\n", + " optimizer.zero_grad()\n", + "\n", + " if batch % 100 == 0:\n", + " loss, current = loss.item(), batch\n", + " step = batch // 100 * (epoch + 1)\n", + " mlflow.log_metric(\"loss\", f\"{loss:2f}\", step=step)\n", + " mlflow.log_metric(\"accuracy\", f\"{accuracy:2f}\", step=step)\n", + " print(f\"loss: {loss:2f} accuracy: {accuracy:2f} [{current} / {len(dataloader)}]\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z6LeQOrS7C98" + }, + "source": [ + "Define the evaluation function, which will be run at the end of each epoch." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "CnfEnFHp7El7" + }, + "outputs": [], + "source": [ + "def evaluate(dataloader, model, loss_fn, metrics_fn, epoch):\n", + " \"\"\"Evaluate the model on a single pass of the dataloader.\n", + "\n", + " Args:\n", + " dataloader: an instance of `torch.utils.data.DataLoader`, containing the eval data.\n", + " model: an instance of `torch.nn.Module`, the model to be trained.\n", + " loss_fn: a callable, the loss function.\n", + " metrics_fn: a callable, the metrics function.\n", + " epoch: an integer, the current epoch number.\n", + " \"\"\"\n", + " num_batches = len(dataloader)\n", + " model.eval()\n", + " eval_loss, eval_accuracy = 0, 0\n", + " with torch.no_grad():\n", + " for X, y in dataloader:\n", + " X, y = X.to(device), y.to(device)\n", + " pred = model(X)\n", + " eval_loss += loss_fn(pred, y).item()\n", + " eval_accuracy += metrics_fn(pred, y)\n", + "\n", + " eval_loss /= num_batches\n", + " eval_accuracy /= num_batches\n", + " mlflow.log_metric(\"eval_loss\", f\"{eval_loss:2f}\", step=epoch)\n", + " mlflow.log_metric(\"eval_accuracy\", f\"{eval_accuracy:2f}\", step=epoch)\n", + "\n", + " print(f\"Eval metrics: \\nAccuracy: {eval_accuracy:.2f}, Avg loss: {eval_loss:2f} \\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gO5T-7hw99pm" + }, + "source": [ + "## Start Training\n", + "\n", + "It's time to start the training! First let's define training hyperparameters, create our model, declare our loss function and instantiate our optimizer." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HyCkGGLT923N", + "outputId": "54e8afc6-40b2-4464-9eb2-a52405771f05" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.\n", + " warnings.warn('Lazy modules are a new feature under heavy development '\n" + ] + } + ], + "source": [ + "epochs = 3\n", + "loss_fn = nn.CrossEntropyLoss()\n", + "metric_fn = Accuracy(task=\"multiclass\", num_classes=10).to(device)\n", + "model = ImageClassifier().to(device)\n", + "optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2LUTdwjs-VVc" + }, + "source": [ + "Putting everything together, let's kick off the training and log information to MLflow. At the beginning of training, we log training and model information to MLflow, and during training, we log training and evaluation metrics. After everything is done, we log the trained model." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 937, + "referenced_widgets": [ + "eb8f4cd57790474c8bc18318785ad5c8", + "06051992554c47a09dce9e8661599fa8", + "bbdc29ba105144aa87086558ed8e0d99", + "516ef242a82344bca3b7468b439edf20", + "47d3292b709f4f62afc7198ed958ff4d", + "a5851c3027444173891c25ef5260a6bb", + "27bced7bd0614699bdd590bc3b609635", + "5028d80a932b4f9b843a7bdac7c54e2c", + "18069f9f23a146e6874a5b509f892086", + "123c14935c9c4b509604e0318aaec27a", + "02a4e35506fc4fa8829da873424fa0bc" + ] + }, + "id": "ufz5w7tb-TsJ", + "outputId": "5fa3f4a3-0c5b-448a-bf3b-28642f0f6631" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1\n", + "-------------------------------\n", + "loss: 2.294313 accuracy: 0.046875 [0 / 938]\n", + "loss: 2.151955 accuracy: 0.515625 [100 / 938]\n", + "loss: 1.825312 accuracy: 0.640625 [200 / 938]\n", + "loss: 1.513407 accuracy: 0.593750 [300 / 938]\n", + "loss: 1.059044 accuracy: 0.718750 [400 / 938]\n", + "loss: 0.931140 accuracy: 0.687500 [500 / 938]\n", + "loss: 0.889886 accuracy: 0.703125 [600 / 938]\n", + "loss: 0.742625 accuracy: 0.765625 [700 / 938]\n", + "loss: 0.786106 accuracy: 0.734375 [800 / 938]\n", + "loss: 0.788444 accuracy: 0.781250 [900 / 938]\n", + "Eval metrics: \n", + "Accuracy: 0.75, Avg loss: 0.719401 \n", + "\n", + "Epoch 2\n", + "-------------------------------\n", + "loss: 0.649325 accuracy: 0.796875 [0 / 938]\n", + "loss: 0.756684 accuracy: 0.718750 [100 / 938]\n", + "loss: 0.488664 accuracy: 0.828125 [200 / 938]\n", + "loss: 0.780433 accuracy: 0.718750 [300 / 938]\n", + "loss: 0.691777 accuracy: 0.656250 [400 / 938]\n", + "loss: 0.670005 accuracy: 0.750000 [500 / 938]\n", + "loss: 0.712286 accuracy: 0.687500 [600 / 938]\n", + "loss: 0.644150 accuracy: 0.765625 [700 / 938]\n", + "loss: 0.683426 accuracy: 0.750000 [800 / 938]\n", + "loss: 0.659378 accuracy: 0.781250 [900 / 938]\n", + "Eval metrics: \n", + "Accuracy: 0.77, Avg loss: 0.636072 \n", + "\n", + "Epoch 3\n", + "-------------------------------\n", + "loss: 0.528523 accuracy: 0.781250 [0 / 938]\n", + "loss: 0.634942 accuracy: 0.750000 [100 / 938]\n", + "loss: 0.420757 accuracy: 0.843750 [200 / 938]\n", + "loss: 0.701463 accuracy: 0.703125 [300 / 938]\n", + "loss: 0.649267 accuracy: 0.656250 [400 / 938]\n", + "loss: 0.624556 accuracy: 0.812500 [500 / 938]\n", + "loss: 0.648762 accuracy: 0.718750 [600 / 938]\n", + "loss: 0.630074 accuracy: 0.781250 [700 / 938]\n", + "loss: 0.682306 accuracy: 0.718750 [800 / 938]\n", + "loss: 0.587403 accuracy: 0.750000 [900 / 938]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023/12/21 21:39:55 WARNING mlflow.models.model: Model logged without a signature. Signatures will be required for upcoming model registry features as they validate model inputs and denote the expected schema of model outputs. Please visit https://www.mlflow.org/docs/2.9.2/models.html#set-signature-on-logged-model for instructions on setting a model signature on your logged model.\n", + "2023/12/21 21:39:56 WARNING mlflow.utils.requirements_utils: Found torch version (2.1.0+cu121) contains a local version label (+cu121). MLflow logged a pip requirement for this package as 'torch==2.1.0' without the local version label to make it installable from PyPI. To specify pip requirements containing local version labels, please use `conda_env` or `pip_requirements`.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Eval metrics: \n", + "Accuracy: 0.77, Avg loss: 0.616615 \n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023/12/21 21:40:02 WARNING mlflow.utils.requirements_utils: Found torch version (2.1.0+cu121) contains a local version label (+cu121). MLflow logged a pip requirement for this package as 'torch==2.1.0' without the local version label to make it installable from PyPI. To specify pip requirements containing local version labels, please use `conda_env` or `pip_requirements`.\n", + "/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.\n", + " warnings.warn(\"Setuptools is replacing distutils.\")\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eb8f4cd57790474c8bc18318785ad5c8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Uploading artifacts: 0%| | 0/6 [00:00