Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

I modified this Helm chart which installs Dask on a Kubernetes cluster, adding support for nodeSelectors. This allows you to put Jupyter and the Dask scheduler in a separate node pool from the Dask workers; you can scale Dask workers up / down without inadvertently killing your Jupyter instance.

Use kubectl cp to copy files to/from the Jupyter instance.

I specifically used Dask to parallelize hyperparameter tuning. The dask-searchcv package provides implementations of sklearn’s GridSearchCV and RandomizedSearchCV classes.

Initialize cluster, scale up/down, and destroy:

time source scripts/initialize.sh
time source scripts/scale.sh <num_nodes> <num_workers>
time source scripts/destroy.sh

Copy models/params to/from Jupyter:

export JUPYTER_POD=$(kubectl get pods --selector=component=jupyter -o jsonpath='{.items[0].metadata.name}')
kubectl cp model.ipynb $JUPYTER_POD:model.ipynb
kubectl cp $JUPYTER_POD:model.ipynb model.ipynb
kubectl cp $JUPYTER_POD:model_param.yaml model_param.yaml

About

Code for Kaggle Jigsaw challenge

Resources

Releases

No releases published

Packages

No packages published