diff --git a/workload-onboarding/Python-Custom-Container/index.ipynb b/workload-onboarding/Python-Custom-Container/index.ipynb
index f37357a2..8c97285f 100644
--- a/workload-onboarding/Python-Custom-Container/index.ipynb
+++ b/workload-onboarding/Python-Custom-Container/index.ipynb
@@ -37,23 +37,21 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "JMnJRIzlNA15"
},
"source": [
+ "In this tutorial example, we will walk you through building your own docker container and running the container on the bacalhau network.\n",
"\n",
- "This example will walk you through building your own docker container and running the container on the bacalhau network and viewing the results\n",
+ "## Prerequisites\n",
"\n",
+ "To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation)\n",
"\n",
- "For that we will build a Simple Recommender Script that when Given a movie ID\n",
- "\n",
- "\n",
- "will recommend other movies based on user ratings.\n",
- "\n",
- "\n",
- "Suppose if you want recommendations for the movie Toy Story (1995) it will recommend movies from similar categories\n",
+ "## Sample Recommedation Dataset\n",
"\n",
+ "We will using a simple recommendation script that when given a movie ID will recommend other movies based on user ratings. Assuming you want if recommendations for the movie Toy Story (1995) it will recommend movies from similar categories:\n",
"\n",
"```\n",
"Recommendations for Toy Story (1995):\n",
@@ -71,31 +69,16 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "jU7eSd3vNAhO"
},
"source": [
"\n",
+ "### Downloading the dataset\n",
"\n",
- "### \n",
- "**Downloading the dataset**\n",
- "\n",
- "\n",
- "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)\n",
- "\n",
- "\n",
- "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n",
- "\n",
- "The structure of input directory should be\n",
- "\n",
- "\n",
- "```\n",
- "input\n",
- "├── movies.dat\n",
- "└── ratings.dat\n",
- "```\n",
- "\n"
+ "Download Movielens1M dataset from this link [https://files.grouplens.org/datasets/movielens/ml-1m.zip](https://files.grouplens.org/datasets/movielens/ml-1m.zip)"
]
},
{
@@ -111,29 +94,27 @@
"skip-execution"
]
},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "--2022-09-18 11:01:58-- https://files.grouplens.org/datasets/movielens/ml-1m.zip\n",
- "Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152\n",
- "Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.\n",
- "HTTP request sent, awaiting response... 200 OK\n",
- "Length: 5917549 (5.6M) [application/zip]\n",
- "Saving to: ‘ml-1m.zip’\n",
- "\n",
- "ml-1m.zip 100%[===================>] 5.64M 28.7MB/s in 0.2s \n",
- "\n",
- "2022-09-18 11:01:59 (28.7 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]\n",
- "\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"!wget https://files.grouplens.org/datasets/movielens/ml-1m.zip"
]
},
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this example we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat and movies.dat. After the dataset is downloaded extract the zip and place ratings.dat and movies.dat into a folder called input\n",
+ "\n",
+ "The structure of input directory should be\n",
+ "\n",
+ "```\n",
+ "input\n",
+ "├── movies.dat\n",
+ "└── ratings.dat\n",
+ "```"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -182,15 +163,16 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "2ig2GvdRNZS_"
},
"source": [
"\n",
- "### **Installing Dependencies**\n",
+ "### Installing Dependencies\n",
"\n",
- "Create a requirements.txt for the Python libraries we’ll be using:"
+ "Create a `requirements.txt` for the Python libraries we’ll be using:"
]
},
{
@@ -234,14 +216,15 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "fq_tiopcN6Yy"
},
"source": [
- "### **Writing the Script**\n",
+ "### Writing the Script\n",
"\n",
- "Create a new file called similar-movies.py
and in it paste the following script"
+ "Create a new file called `similar-movies.py` and in it paste the following script"
]
},
{
@@ -330,6 +313,7 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "GSfSljOlOWxZ"
@@ -345,19 +329,20 @@
"* Calculate cosine similarity, sort by most similar and return the top N.\n",
"* Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results.\n",
"\n",
- "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)\n"
+ "For further reading on how the script works, go to [Simple Movie Recommender Using SVD | Alyssa](https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/)"
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "YY4k-R-xObIe"
},
"source": [
"\n",
- "### **Running the script**\n",
+ "## Running the script\n",
"\n",
- "Running the script similar-movies.py using the default values you can also use other flags to set your own values\n"
+ "Running the script similar-movies.py using the default values you can also use other flags to set your own values\n"
]
},
{
@@ -375,20 +360,15 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ifea4e2TO68d"
},
"source": [
+ "## Setting Up Docker\n",
"\n",
- "\n",
- "**Setting Up Docker**\n",
- "\n",
- "In this step you will create a `Dockerfile` to create your Docker deployment. The `Dockerfile` is a text document that contains the commands used to assemble the image.\n",
- "\n",
- "First, create the `Dockerfile`.\n",
- "\n",
- "Next, add your desired configuration to the `Dockerfile`. These commands specify how the image will be built, and what extra requirements will be included.\n"
+ "In this step, we will create a `Dockerfile` and add the desired configuration to the file. These commands specify how the image will be built, and what extra requirements will be included."
]
},
{
@@ -411,6 +391,7 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ynbIvLBWRJxe"
@@ -429,34 +410,33 @@
"│ └── ratings.dat\n",
"├── requirements.txt\n",
"└── similar-movies.py\n",
- "```\n",
- "\n"
+ "```\n"
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "Zs2d88iyRNIV"
},
"source": [
- "Build the container\n",
+ "### Build the container\n",
+ "\n",
+ "We will run `docker build` command to build the container;\n",
"\n",
"```\n",
"docker build -t /: .\n",
"```\n",
"\n",
+ "Before running the command replace;\n",
"\n",
- "Please replace\n",
- "\n",
- "<hub-user> with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n",
+ "- **hub-user** with your docker hub username, If you don’t have a docker hub account [Follow these instructions to create docker account](https://docs.docker.com/docker-id/), and use the username of the account you created\n",
"\n",
- "<repo-name> This is the name of the container, you can name it anything you want\n",
+ "- **repo-name*** with the name of the container, you can name it anything you want\n",
"\n",
- "<tag> This is not required but you can use the latest tag\n",
+ "- **tag** this is not required but you can use the latest tag\n",
"\n",
- "After you have build the container, the next step is to test it locally and then push it docker hub\n",
- "\n",
- "Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n",
+ "After you have build the container, the next step is to test it locally and then push it docker hub. Before pushing you first need to create a repo which you can create by following the instructions here [https://docs.docker.com/docker-hub/repos/](https://docs.docker.com/docker-hub/repos/)\n",
"\n",
"Now you can push this repository to the registry designated by its name or tag.\n",
"\n",
@@ -465,7 +445,6 @@
" docker push /:\n",
"```\n",
"\n",
- "\n",
"After the repo image has been pushed to docker hub, we can now use the container for running on bacalhau\n",
"\n",
"\n",
@@ -475,62 +454,23 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "j7mBHBsaS0gO"
},
"source": [
- "## **Running the container on bacalhau**\n",
+ "## Running the container on bacalhau\n",
"\n",
- "You can either run the container on bacalhau with default or custom parameters\n",
- "\n",
- "Running the container with default parameters\n"
+ "You can either run the container on bacalhau with default or custom parameters"
]
},
{
+ "attachments": {},
"cell_type": "markdown",
- "metadata": {
- "id": "24HuygvzTwnT"
- },
- "source": [
- "Insalling bacalhau"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "W1joNKGJT5eN",
- "outputId": "a703088d-4c44-426a-a24d-6e928159898b",
- "tags": [
- "skip-execution"
- ]
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Your system is linux_amd64\n",
- "No BACALHAU detected. Installing fresh BACALHAU CLI...\n",
- "Getting the latest BACALHAU CLI...\n",
- "Installing v0.2.3 BACALHAU CLI...\n",
- "Downloading https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz ...\n",
- "Downloading sig file https://github.com/filecoin-project/bacalhau/releases/download/v0.2.3/bacalhau_v0.2.3_linux_amd64.tar.gz.signature.sha256 ...\n",
- "Verified OK\n",
- "Extracting tarball ...\n",
- "NOT verifying Bin\n",
- "bacalhau installed into /usr/local/bin successfully.\n",
- "Client Version: v0.2.3\n",
- "Server Version: v0.2.3\n"
- ]
- }
- ],
+ "metadata": {},
"source": [
- "!curl -sL https://get.bacalhau.org/install.sh | bash"
+ "### Running the container with default parameters"
]
},
{
@@ -573,6 +513,14 @@
" -- python similar-movies.py"
]
},
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "When a job is sumbitted, Bacalhau prints out the related `job_id`. We store that in an environment variable so that we can reuse it later on."
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -611,13 +559,20 @@
]
},
{
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Running the container with custom parameters"
+ ]
+ },
+ {
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "N6oIjUz9TEiq"
},
"source": [
- "Running the container with custom \n",
- "parameters (Optional)\n",
"\n",
"```\n",
"bacalhau docker run \\\n",
@@ -626,6 +581,22 @@
"```"
]
},
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Checking the State of your Jobs"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "- **Job status**: You can check the status of the job using `bacalhau list`. "
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -655,15 +626,21 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "kFYpNA32c7t5"
},
"source": [
- "\n",
- "Where it says \"`Completed `\", that means the job is done, and we can get the results.\n",
- "\n",
- "To find out more information about your job, run the following command:"
+ "When it says `Published` or `Completed`, that means the job is done, and we can get the results."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "- **Job information**: You can find out more information about your job by using `bacalhau describe`."
]
},
{
@@ -682,12 +659,13 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "2I4DHnt0Vzua"
},
"source": [
- "If you see that the job has completed and there are no errors, then you can download the results with the following command:"
+ "- **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory."
]
},
{
@@ -728,33 +706,15 @@
]
},
{
+ "attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "HEtmR7a6WVuD"
},
"source": [
- "The structure of the files and directories will look like this:\n",
- "```\n",
- "├── shards\n",
- "│ └── job-940c7fd7-c15a-4d00-8170-0d138cdca7eb-shard-0-host-QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL\n",
- "│ ├── exitCode\n",
- "│ ├── stderr\n",
- "│ └── stdout\n",
- "├── stderr\n",
- "├── stdout\n",
- "└── volumes\n",
- " └── outputs\n",
- "```\n",
- "\n",
- "* stdout contains things printed to the console like outputs, etc.\n",
- "\n",
- "* stderr contains any errors. In this case, since there are no errors, it's will be empty\n",
- "\n",
- "* Volumes folder contain the volumes you named when you started the job with the `-o` flag. In addition, you will always have a `outputs` volume, which is provided by default.\n",
- "\n",
- "Because your script is printed to stdout, the output will appear in the stdout file. You can read this by typing the following command:\n",
+ "## Viewing your Job Output\n",
"\n",
- "\n"
+ "Each job creates 3 subfolders: the **combined_results**,**per_shard files**, and the **raw** directory. To view the file, run the following command:"
]
},
{
@@ -791,7 +751,7 @@
}
],
"source": [
- "!cat results/combined_results/outputs"
+ "!cat results/combined_results/stdout"
]
}
],