From 04d6a4a00f7739188b38b6b851d7795d6ebfda10 Mon Sep 17 00:00:00 2001
From: Favour Kelvin <favourkelvin17@gmail.com>
Date: Wed, 15 Feb 2023 16:15:25 +0100
Subject: [PATCH] update building Custom Python Container (#53)

---
 .../Python-Custom-Container/index.ipynb       | 252 ++++++++----------
 1 file changed, 106 insertions(+), 146 deletions(-)

diff --git a/workload-onboarding/Python-Custom-Container/index.ipynb b/workload-onboarding/Python-Custom-Container/index.ipynb
index f37357a2..8c97285f 100644
--- a/workload-onboarding/Python-Custom-Container/index.ipynb
+++ b/workload-onboarding/Python-Custom-Container/index.ipynb
@@ -37,23 +37,21 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "JMnJRIzlNA15"
       },
       "source": [
+        "In this tutorial example, we will walk you through building your own docker container and running the container on the bacalhau network.\n",
         "\n",
-        "This example will walk you through building your own docker container and running the container on the bacalhau network and viewing the results\n",
+        "## Prerequisites\n",
         "\n",
+        "To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation)\n",
         "\n",
-        "For that we will build a Simple Recommender Script that when Given a movie ID\n",
-        "\n",
-        "\n",
-        "will recommend other movies based on user ratings.\n",
-        "\n",
-        "\n",
-        "Suppose if you want recommendations for the movie Toy Story (1995) it will recommend movies from similar categories\n",
+        "## Sample Recommedation Dataset\n",
         "\n",
+        "We will using a simple recommendation script that when given a movie ID will recommend other movies based on user ratings. Assuming you want if recommendations for the movie Toy Story (1995) it will recommend movies from similar categories:\n",
         "\n",
         "```\n",
         "Recommendations for Toy Story (1995):\n",
@@ -71,31 +69,16 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "jU7eSd3vNAhO"
       },
       "source": [
         "\n",
+        "### Downloading the dataset\n",
         "\n",
-        "### \n",
-        "**Downloading the dataset**\n",
-        "\n",
-        "\n",
-        "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)\n",
-        "\n",
-        "\n",
-        "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n",
-        "\n",
-        "The structure of input directory should be\n",
-        "\n",
-        "\n",
-        "```\n",
-        "input\n",
-        "├── movies.dat\n",
-        "└── ratings.dat\n",
-        "```\n",
-        "\n"
+        "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)"
       ]
     },
     {
@@ -111,29 +94,27 @@
           "skip-execution"
         ]
       },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "--2022-09-18 11:01:58--  https://files.grouplens.org/datasets/movielens/ml-1m.zip\n",
-            "Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152\n",
-            "Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.\n",
-            "HTTP request sent, awaiting response... 200 OK\n",
-            "Length: 5917549 (5.6M) [application/zip]\n",
-            "Saving to: ‘ml-1m.zip’\n",
-            "\n",
-            "ml-1m.zip           100%[===================>]   5.64M  28.7MB/s    in 0.2s    \n",
-            "\n",
-            "2022-09-18 11:01:59 (28.7 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]\n",
-            "\n"
-          ]
-        }
-      ],
+      "outputs": [],
       "source": [
         "!wget https://files.grouplens.org/datasets/movielens/ml-1m.zip"
       ]
     },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n",
+        "\n",
+        "The structure of input directory should be\n",
+        "\n",
+        "```\n",
+        "input\n",
+        "├── movies.dat\n",
+        "└── ratings.dat\n",
+        "```"
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -182,15 +163,16 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "2ig2GvdRNZS_"
       },
       "source": [
         "\n",
-        "### **Installing Dependencies**\n",
+        "### Installing Dependencies\n",
         "\n",
-        "Create a requirements.txt for the Python libraries we’ll be using:"
+        "Create a `requirements.txt` for the Python libraries we’ll be using:"
       ]
     },
     {
@@ -234,14 +216,15 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "fq_tiopcN6Yy"
       },
       "source": [
-        "### **Writing the Script**\n",
+        "### Writing the Script\n",
         "\n",
-        "Create a new file called <code><em>similar-movies.py</em></code> and in it paste the following script"
+        "Create a new file called `similar-movies.py` and in it paste the following script"
       ]
     },
     {
@@ -330,6 +313,7 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "GSfSljOlOWxZ"
@@ -345,19 +329,20 @@
         "* Calculate cosine similarity, sort by most similar and return the top N.\n",
         "* Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results.\n",
         "\n",
-        "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)\n"
+        "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)"
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "YY4k-R-xObIe"
       },
       "source": [
         "\n",
-        "### **Running the script**\n",
+        "## Running the script\n",
         "\n",
-        "Running the script  similar-movies.py using the default values you can also use other flags to set your own values\n"
+        "Running the script similar-movies.py using the default values you can also use other flags to set your own values\n"
       ]
     },
     {
@@ -375,20 +360,15 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "ifea4e2TO68d"
       },
       "source": [
+        "## Setting Up Docker\n",
         "\n",
-        "\n",
-        "**Setting Up Docker**\n",
-        "\n",
-        "In this step you will create a  `Dockerfile` to create your Docker deployment. The `Dockerfile` is a text document that contains the commands used to assemble the image.\n",
-        "\n",
-        "First, create the `Dockerfile`.\n",
-        "\n",
-        "Next, add your desired configuration to the `Dockerfile`. These commands specify how the image will be built, and what extra requirements will be included.\n"
+        "In this step, we will create a  `Dockerfile` and add the desired configuration to the file. These commands specify how the image will be built, and what extra requirements will be included."
       ]
     },
     {
@@ -411,6 +391,7 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "ynbIvLBWRJxe"
@@ -429,34 +410,33 @@
         "│   └── ratings.dat\n",
         "├── requirements.txt\n",
         "└── similar-movies.py\n",
-        "```\n",
-        "\n"
+        "```\n"
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "Zs2d88iyRNIV"
       },
       "source": [
-        "Build the container\n",
+        "### Build the container\n",
+        "\n",
+        "We will run `docker build` command to build the container;\n",
         "\n",
         "```\n",
         "docker build -t <hub-user>/<repo-name>:<tag> .\n",
         "```\n",
         "\n",
+        "Before running the command replace;\n",
         "\n",
-        "Please replace\n",
-        "\n",
-        "&lt;hub-user> with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n",
+        "- **hub-user** with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n",
         "\n",
-        "&lt;repo-name> This is the name of the container, you can name it anything you want\n",
+        "- **repo-name*** with the name of the container, you can name it anything you want\n",
         "\n",
-        "&lt;tag> This is not required but you can use the latest tag\n",
+        "- **tag** this is not required but you can use the latest tag\n",
         "\n",
-        "After you have build the container, the next step is to test it locally and then push it docker hub\n",
-        "\n",
-        "Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n",
+        "After you have build the container, the next step is to test it locally and then push it docker hub. Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n",
         "\n",
         "Now you can push this repository to the registry designated by its name or tag.\n",
         "\n",
@@ -465,7 +445,6 @@
         " docker push <hub-user>/<repo-name>:<tag>\n",
         "```\n",
         "\n",
-        "\n",
         "After the repo image has been pushed to docker hub, we can now use the container for running on bacalhau\n",
         "\n",
         "\n",
@@ -475,62 +454,23 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "j7mBHBsaS0gO"
       },
       "source": [
-        "## **Running the container on bacalhau**\n",
+        "## Running the container on bacalhau\n",
         "\n",
-        "You can either run the container on bacalhau with default or custom parameters\n",
-        "\n",
-        "Running the container with default parameters\n"
+        "You can either run the container on bacalhau with default or custom parameters"
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
-      "metadata": {
-        "id": "24HuygvzTwnT"
-      },
-      "source": [
-        "Insalling bacalhau"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "W1joNKGJT5eN",
-        "outputId": "a703088d-4c44-426a-a24d-6e928159898b",
-        "tags": [
-          "skip-execution"
-        ]
-      },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Your system is linux_amd64\n",
-            "No BACALHAU detected. Installing fresh BACALHAU CLI...\n",
-            "Getting the latest BACALHAU CLI...\n",
-            "Installing v0.2.3 BACALHAU CLI...\n",
-            "Downloading https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz ...\n",
-            "Downloading sig file https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz.signature.sha256 ...\n",
-            "Verified OK\n",
-            "Extracting tarball ...\n",
-            "NOT verifying Bin\n",
-            "bacalhau installed into /usr/local/bin successfully.\n",
-            "Client Version: v0.2.3\n",
-            "Server Version: v0.2.3\n"
-          ]
-        }
-      ],
+      "metadata": {},
       "source": [
-        "!curl -sL https://get.bacalhau.org/install.sh | bash"
+        "### Running the container with default parameters"
       ]
     },
     {
@@ -573,6 +513,14 @@
         " -- python similar-movies.py"
       ]
     },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "When a job is sumbitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on."
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -611,13 +559,20 @@
       ]
     },
     {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Running the container with custom parameters"
+      ]
+    },
+    {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "N6oIjUz9TEiq"
       },
       "source": [
-        "Running the container with custom \n",
-        "parameters (Optional)\n",
         "\n",
         "```\n",
         "bacalhau docker run \\\n",
@@ -626,6 +581,22 @@
         "```"
       ]
     },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Checking the State of your Jobs"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "- **Job status**: You can check the status of the job using `bacalhau list`. "
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -655,15 +626,21 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "kFYpNA32c7t5"
       },
       "source": [
-        "\n",
-        "Where it says \"`Completed `\", that means the job is done, and we can get the results.\n",
-        "\n",
-        "To find out more information about your job, run the following command:"
+        "When it says `Published` or `Completed`, that means the job is done, and we can get the results."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "- **Job information**: You can find out more information about your job by using `bacalhau describe`."
       ]
     },
     {
@@ -682,12 +659,13 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "2I4DHnt0Vzua"
       },
       "source": [
-        "If you see that the job has completed and there are no errors, then you can download the results with the following command:"
+        "- **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory."
       ]
     },
     {
@@ -728,33 +706,15 @@
       ]
     },
     {
+      "attachments": {},
       "cell_type": "markdown",
       "metadata": {
         "id": "HEtmR7a6WVuD"
       },
       "source": [
-        "The structure of the files and directories will look like this:\n",
-        "```\n",
-        "├── shards\n",
-        "│   └── job-940c7fd7-c15a-4d00-8170-0d138cdca7eb-shard-0-host-QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL\n",
-        "│       ├── exitCode\n",
-        "│       ├── stderr\n",
-        "│       └── stdout\n",
-        "├── stderr\n",
-        "├── stdout\n",
-        "└── volumes\n",
-        "    └── outputs\n",
-        "```\n",
-        "\n",
-        "* stdout contains things printed to the console like outputs, etc.\n",
-        "\n",
-        "* stderr contains any errors. In this case, since there are no errors, it's will be empty\n",
-        "\n",
-        "* Volumes folder contain the volumes you named when you started the job with the `-o` flag. In addition, you will always have a `outputs` volume, which is provided by default.\n",
-        "\n",
-        "Because your script is printed to stdout, the output will appear in the stdout file. You can read this by typing the following command:\n",
+        "## Viewing your Job Output\n",
         "\n",
-        "\n"
+        "Each job creates 3 subfolders: the **combined_results**,**per_shard files**, and the **raw** directory. To view the file, run the following command:"
       ]
     },
     {
@@ -791,7 +751,7 @@
         }
       ],
       "source": [
-        "!cat  results/combined_results/outputs"
+        "!cat  results/combined_results/stdout"
       ]
     }
   ],