Round 2: packaging guidelines
This document explains how to prepare your code repository for participation in the second round of the challenge. The main goal here is to make sure that your repository can be turned into a docker container by
repo2docker and executes successfully. Once done, proceed with the submission guidelines.
It involves the following steps:
- Entry point
- Declaring dependencies
- Building a docker image with
- Executing the docker image
The first two steps ensure that your code can be run by invoking a script in a defined conda environment. The last two steps then build and run a container based on the declared environment.
The following commands shall be available after successful installations:
Create and activate a fresh conda environment:
conda create python=3.6 --name www_music_py36 source activate www_music_py36
Install jupyter-repo2docker with:
pip install jupyter-repo2docker
The below commands can be used to follow the guide with this starter-kit repository. You should otherwise work on your own code repository.
git clone https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit cd crowdai-musical-genre-recognition-starter-kit pip install -r requirements.txt
The goal is to make sure that your code can be invoked by the
run.sh script and looks for the mp3s in a given directory and write its predictions in a given file. The
run.sh entry point must be placed in the root of the repository.
Assuming that your test files are present at
data/crowdai_fma_test/*.mp3, you should be able to run the code with:
export TEST_DIRECTORY=data/crowdai_fma_test export OUTPUT_PATH=/tmp/output.csv ./run.sh
The script is provided the location of the directory containing mp3 files by the
TEST_DIRECTORY environment variable, and it has to write the output CSV to the location specified by the
OUTPUT_PATH environment variable.
You should modify the run.sh script to do whatever is needed to run your model. In our example it calls the random_submission.py script with appropriate parameters. Take a look at those files to get a sense of the arguments they expect.
If it worked, predictions should be available at
You can check its presence and content by doing:
The goal is to make sure that all the dependencies (Python or otherwise) needed to run your code are declared in an
environment.yml file. That file captures all the details (packages, versions, channels) required to deterministically replicate your environment. It is very important to do this step and register all the required dependencies. If dependencies are missing, the container will fail to run.
You can install dependencies with
conda install <package name> conda install -c conda-forge <package name>
You can also install dependencies with
pip (they will be caught by
pip install <package name> pip install -r requirements.txt
Once all dependencies are installed in the conda environment, generate the
environment.yml by running:
conda env export > environment.yml
Note: Please ensure that you are not using
9.0.2, as it introduced some breaking changes and hence is not available on conda. Relevant discussion around this can be found here
Building a docker image
Before building an image, please ensure that you can successfully run your code with
./run.sh (after the environment variables have been exported) in the conda environment defined by the
In this step, we use repo2docker to convert your source code to a docker image.
repo2docker uses the
environment.yml file in your repository to build a fresh conda environment and make it available as a docker image, named
Note: In the rest of the section, the strings
my_submission_container can be replaced by arbitrary strings, as long as your are consistent.
You can locally build an image out of the repository by running:
repo2docker --no-run \ --user-id 1001 \ --user-name crowdai \ --image-name my_submission_image \ --debug .
Note: If you have all your data inside the
data/ folder, then this step can lead to an unreasonably large docker image. This is because of a bug, and we currently have a pull request open with a bug fix. So, you can either ensure that you do not have all your training/testing data inside the
data/ folder (temporarily move it), or you can use a custom fork of
repo2docker which has the bugfix included, by running:
pip uninstall jupyter-repo2docker pip install https://github.com/crowdai/repo2docker/archive/issue268.zip
which is a custom fork of
jupyter-repo2docker with the bug fix included. But if you use the official version of
repo2docker and have a lot of data inside the
data/ folder, then everything will still work, it will just be very slow, and size of the generated docker images will be huge.
Docker client initialization error. Check if docker is running on the host., you either need to start the docker daemon (e.g. with
sudo systemctl start docker) or to run the command as
sudo repo2docker ... (see those instructions if you want to manage Docker as a non-root user).
Note: If the
image-name already exists, you can either change it to some other unique string or delete the old image with
docker rmi my_submission_image. You can get a list of all images with
Note: This step can take some time to execute, especially if it is the first time you are trying to build the image. Please be patient.
Executing the docker image
On a successful execution of the build step, the logs should end with
Successfully tagged my_submission_image:latest.
We will now use the docker image to create a new container named
my_submission_container and execute it.
You can locally test your code by running:
docker run \ -v `pwd`/data/crowdai_fma_test:/crowdai-payload \ -e TEST_DIRECTORY='/crowdai-payload' \ -e OUTPUT_PATH='/tmp/output.csv' \ --name my_submission_container \ -it my_submission_image \ /home/crowdai/run.sh
You can again verify that it worked by checking the content of
If it executes successfully and you see
Output file written at /tmp/output.csv (or whatever your own code logs), then your repository is binder compatible, which means it will be accepted by our grading infrastructure.
Note: If you get a
container exists error, you can either change the name of the container from
my_submission_container to something else, either delete the old container with
docker rm my_submission_container. You can get a list of all containers with
docker ps -a.
Below is a description of the parameters we used:
-v `pwd`/data/crowdai_fma_test:/crowdai-payload: tells docker to map the folder
data/crowdai_fma_teston your host to the location
/crowdai-payloadinside the container. All the files inside the directory
data/crowdai_fma_testwill be visible at
/crowdai-payloadin the container. You might also notice that in the command we also prepend the path on the host container by
`pwd`/, that is because this argument in
docker-run, only allows absolute paths when the path has a
/in its name. This is a usage specific detail, and you might have to use
pwdon windows, but you just have to ensure that the overall path provided is of the form
-e TEST_DIRECTORY='/crowdai-payload': sets the environment variable
/crowdai-payloadinside the container. The
run.shscript expects the location of the test directory to be passed via this environment variable.
-e OUTPUT_PATH='/tmp/output.csv': sets the environment variable
/tmp/output.csvinside the container. The
run.shscript expects the location of the output path to be passed via this environment variable.
--name my_submission_container: name the container as
my_submission_container. You are free to choose any arbitrary string.
-it my_submission_image: specify the image and tells docker to run the script referenced in the next argument in an interactive mode, and attach a pseudo TTY to the execution.
/home/crowdai/run.sh: sets the location of the script to run inside the docker container. Our grading orchestration system expects the entry-point for your code to be at this location, so you have to ensure that the script is available at the said location. All the content of your repository will be available in the
/home/crowdai/directory inside the container. So in principle you just have to ensure that
run.shexists in the root of your source code repository.
If you find any of these sections confusing, or notice typos, or have a nice trick, or simply a question or an answer to a FAQ, please do send us a pull request with your suggestion.
- My code requires a GPU, how do I deal with that?
During the orchestration of the containers, we will use nvidia-docker, which is a drop-in replacement for
dockerand exposes GPUs from the host machine to the containers. If you want to test it out yourself, please follow the installation instructions. You should then be able to use
dockerin all the steps above. For your code, you can assume that you will have access to at least 1 GPU. You can confirm that by checking the
$CUDA_VISIBLE_DEVICESenvironment variable. If this environment variable is not set, then you are running on a server without a GPU.
Please send a pull request if you think you have a Frequently Asked Question, or the answer to one.