From 04d6a4a00f7739188b38b6b851d7795d6ebfda10 Mon Sep 17 00:00:00 2001 From: Favour Kelvin Date: Wed, 15 Feb 2023 16:15:25 +0100 Subject: [PATCH] update building Custom Python Container (#53) --- .../Python-Custom-Container/index.ipynb | 252 ++++++++---------- 1 file changed, 106 insertions(+), 146 deletions(-) diff --git a/workload-onboarding/Python-Custom-Container/index.ipynb b/workload-onboarding/Python-Custom-Container/index.ipynb index f37357a2..8c97285f 100644 --- a/workload-onboarding/Python-Custom-Container/index.ipynb +++ b/workload-onboarding/Python-Custom-Container/index.ipynb @@ -37,23 +37,21 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "JMnJRIzlNA15" }, "source": [ + "In this tutorial example, we will walk you through building your own docker container and running the container on the bacalhau network.\n", "\n", - "This example will walk you through building your own docker container and running the container on the bacalhau network and viewing the results\n", + "## Prerequisites\n", "\n", + "To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation)\n", "\n", - "For that we will build a Simple Recommender Script that when Given a movie ID\n", - "\n", - "\n", - "will recommend other movies based on user ratings.\n", - "\n", - "\n", - "Suppose if you want recommendations for the movie Toy Story (1995) it will recommend movies from similar categories\n", + "## Sample Recommedation Dataset\n", "\n", + "We will using a simple recommendation script that when given a movie ID will recommend other movies based on user ratings. Assuming you want if recommendations for the movie Toy Story (1995) it will recommend movies from similar categories:\n", "\n", "```\n", "Recommendations for Toy Story (1995):\n", @@ -71,31 +69,16 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "jU7eSd3vNAhO" }, "source": [ "\n", + "### Downloading the dataset\n", "\n", - "### \n", - "**Downloading the dataset**\n", - "\n", - "\n", - "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)\n", - "\n", - "\n", - "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n", - "\n", - "The structure of input directory should be\n", - "\n", - "\n", - "```\n", - "input\n", - "├── movies.dat\n", - "└── ratings.dat\n", - "```\n", - "\n" + "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)" ] }, { @@ -111,29 +94,27 @@ "skip-execution" ] }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2022-09-18 11:01:58-- https://files.grouplens.org/datasets/movielens/ml-1m.zip\n", - "Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152\n", - "Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 5917549 (5.6M) [application/zip]\n", - "Saving to: ‘ml-1m.zip’\n", - "\n", - "ml-1m.zip 100%[===================>] 5.64M 28.7MB/s in 0.2s \n", - "\n", - "2022-09-18 11:01:59 (28.7 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "!wget https://files.grouplens.org/datasets/movielens/ml-1m.zip" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n", + "\n", + "The structure of input directory should be\n", + "\n", + "```\n", + "input\n", + "├── movies.dat\n", + "└── ratings.dat\n", + "```" + ] + }, { "cell_type": "code", "execution_count": null, @@ -182,15 +163,16 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "2ig2GvdRNZS_" }, "source": [ "\n", - "### **Installing Dependencies**\n", + "### Installing Dependencies\n", "\n", - "Create a requirements.txt for the Python libraries we’ll be using:" + "Create a `requirements.txt` for the Python libraries we’ll be using:" ] }, { @@ -234,14 +216,15 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "fq_tiopcN6Yy" }, "source": [ - "### **Writing the Script**\n", + "### Writing the Script\n", "\n", - "Create a new file called similar-movies.py and in it paste the following script" + "Create a new file called `similar-movies.py` and in it paste the following script" ] }, { @@ -330,6 +313,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "GSfSljOlOWxZ" @@ -345,19 +329,20 @@ "* Calculate cosine similarity, sort by most similar and return the top N.\n", "* Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results.\n", "\n", - "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)\n" + "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)" ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "YY4k-R-xObIe" }, "source": [ "\n", - "### **Running the script**\n", + "## Running the script\n", "\n", - "Running the script similar-movies.py using the default values you can also use other flags to set your own values\n" + "Running the script similar-movies.py using the default values you can also use other flags to set your own values\n" ] }, { @@ -375,20 +360,15 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "ifea4e2TO68d" }, "source": [ + "## Setting Up Docker\n", "\n", - "\n", - "**Setting Up Docker**\n", - "\n", - "In this step you will create a `Dockerfile` to create your Docker deployment. The `Dockerfile` is a text document that contains the commands used to assemble the image.\n", - "\n", - "First, create the `Dockerfile`.\n", - "\n", - "Next, add your desired configuration to the `Dockerfile`. These commands specify how the image will be built, and what extra requirements will be included.\n" + "In this step, we will create a `Dockerfile` and add the desired configuration to the file. These commands specify how the image will be built, and what extra requirements will be included." ] }, { @@ -411,6 +391,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "ynbIvLBWRJxe" @@ -429,34 +410,33 @@ "│ └── ratings.dat\n", "├── requirements.txt\n", "└── similar-movies.py\n", - "```\n", - "\n" + "```\n" ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "Zs2d88iyRNIV" }, "source": [ - "Build the container\n", + "### Build the container\n", + "\n", + "We will run `docker build` command to build the container;\n", "\n", "```\n", "docker build -t /: .\n", "```\n", "\n", + "Before running the command replace;\n", "\n", - "Please replace\n", - "\n", - "<hub-user> with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n", + "- **hub-user** with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n", "\n", - "<repo-name> This is the name of the container, you can name it anything you want\n", + "- **repo-name*** with the name of the container, you can name it anything you want\n", "\n", - "<tag> This is not required but you can use the latest tag\n", + "- **tag** this is not required but you can use the latest tag\n", "\n", - "After you have build the container, the next step is to test it locally and then push it docker hub\n", - "\n", - "Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n", + "After you have build the container, the next step is to test it locally and then push it docker hub. Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n", "\n", "Now you can push this repository to the registry designated by its name or tag.\n", "\n", @@ -465,7 +445,6 @@ " docker push /:\n", "```\n", "\n", - "\n", "After the repo image has been pushed to docker hub, we can now use the container for running on bacalhau\n", "\n", "\n", @@ -475,62 +454,23 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "j7mBHBsaS0gO" }, "source": [ - "## **Running the container on bacalhau**\n", + "## Running the container on bacalhau\n", "\n", - "You can either run the container on bacalhau with default or custom parameters\n", - "\n", - "Running the container with default parameters\n" + "You can either run the container on bacalhau with default or custom parameters" ] }, { + "attachments": {}, "cell_type": "markdown", - "metadata": { - "id": "24HuygvzTwnT" - }, - "source": [ - "Insalling bacalhau" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "W1joNKGJT5eN", - "outputId": "a703088d-4c44-426a-a24d-6e928159898b", - "tags": [ - "skip-execution" - ] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Your system is linux_amd64\n", - "No BACALHAU detected. Installing fresh BACALHAU CLI...\n", - "Getting the latest BACALHAU CLI...\n", - "Installing v0.2.3 BACALHAU CLI...\n", - "Downloading https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz ...\n", - "Downloading sig file https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz.signature.sha256 ...\n", - "Verified OK\n", - "Extracting tarball ...\n", - "NOT verifying Bin\n", - "bacalhau installed into /usr/local/bin successfully.\n", - "Client Version: v0.2.3\n", - "Server Version: v0.2.3\n" - ] - } - ], + "metadata": {}, "source": [ - "!curl -sL https://get.bacalhau.org/install.sh | bash" + "### Running the container with default parameters" ] }, { @@ -573,6 +513,14 @@ " -- python similar-movies.py" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When a job is sumbitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on." + ] + }, { "cell_type": "code", "execution_count": null, @@ -611,13 +559,20 @@ ] }, { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Running the container with custom parameters" + ] + }, + { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "N6oIjUz9TEiq" }, "source": [ - "Running the container with custom \n", - "parameters (Optional)\n", "\n", "```\n", "bacalhau docker run \\\n", @@ -626,6 +581,22 @@ "```" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Checking the State of your Jobs" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- **Job status**: You can check the status of the job using `bacalhau list`. " + ] + }, { "cell_type": "code", "execution_count": null, @@ -655,15 +626,21 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "kFYpNA32c7t5" }, "source": [ - "\n", - "Where it says \"`Completed `\", that means the job is done, and we can get the results.\n", - "\n", - "To find out more information about your job, run the following command:" + "When it says `Published` or `Completed`, that means the job is done, and we can get the results." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- **Job information**: You can find out more information about your job by using `bacalhau describe`." ] }, { @@ -682,12 +659,13 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "2I4DHnt0Vzua" }, "source": [ - "If you see that the job has completed and there are no errors, then you can download the results with the following command:" + "- **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory." ] }, { @@ -728,33 +706,15 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "HEtmR7a6WVuD" }, "source": [ - "The structure of the files and directories will look like this:\n", - "```\n", - "├── shards\n", - "│ └── job-940c7fd7-c15a-4d00-8170-0d138cdca7eb-shard-0-host-QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL\n", - "│ ├── exitCode\n", - "│ ├── stderr\n", - "│ └── stdout\n", - "├── stderr\n", - "├── stdout\n", - "└── volumes\n", - " └── outputs\n", - "```\n", - "\n", - "* stdout contains things printed to the console like outputs, etc.\n", - "\n", - "* stderr contains any errors. In this case, since there are no errors, it's will be empty\n", - "\n", - "* Volumes folder contain the volumes you named when you started the job with the `-o` flag. In addition, you will always have a `outputs` volume, which is provided by default.\n", - "\n", - "Because your script is printed to stdout, the output will appear in the stdout file. You can read this by typing the following command:\n", + "## Viewing your Job Output\n", "\n", - "\n" + "Each job creates 3 subfolders: the **combined_results**,**per_shard files**, and the **raw** directory. To view the file, run the following command:" ] }, { @@ -791,7 +751,7 @@ } ], "source": [ - "!cat results/combined_results/outputs" + "!cat results/combined_results/stdout" ] } ],