-
Notifications
You must be signed in to change notification settings - Fork 236
Creating a Course
Let's say we're going to create a new course on Kaggle Learn called Data Science.
- On the command line, navigate to the
learntools/notebooks/
directory. - Create a new branch on
master
with a name likeds-course
. Be sure to check that there isn't already a branch with that name. - Decide on a "track name" like
data_science
. This will be the name of the directory where your course files will exist. Check that there isn't already a directory with that name. - There should be a Bash script called
new_track.sh
. Run/.new_track.sh data_science
. - Stage the new files:
git add data_science
. - Commit the changes:
git commit -m "Create track ds-course."
- Create a pull request on GitHub named
[Data Science] New course
.
- Navigate to the learntools root directory
learntools/
(the directory containingsetup.py
). Do this from inside a Jupyter notebook, either with!cd
oros.chdir
. - Uninstall the current version of learntools. Inside of a Jupyter notebook, run
!pip uninstall learntools
. - Install an editable version of learntools. Inside of a Jupyter notebook, run
!pip install --editable .
(note the period). Installing the local copy of learntools from inside Jupyter helps ensure the Python kernel can find the installation. Due to environment weirdness, installing it from the command line can be broken. - Navigate to
learntools/learntools
. - Create a directory for your course:
mkdir data_science
. - Create an initialization file:
touch data_science/__init__.py
. - Commit the changes.
Create a folder to contain local copies of the course data: mkdir learntools/notebooks/input
. This folder will just be for your own use while developing and won't be committed to the repository (it's in notebooks/.gitignore
).
Create a folder for a course dataset: mkdir input/ds-course-data
. Put all of the data you plan to use in here. If you develop your notebooks in the raw
folder (notebooks/data_science/raw/
), then you can access your datasets just like you would on Kaggle, like '../input/ds-course-data/data.csv'
. (NB: This trick relies on using a relative path to the input
folder. Unlike on Kaggle, absolute paths like /kaggle/input
won't work.)
Now navigate to to the dataset folder and zip up the datasets:
cd input/ds-course-data
zip -r ds-course-data.zip *
Create a dataset on Kaggle with a name that matches the folder you created, like: DS Course Data
, and upload the zip file. Whenever you add files to your dataset, just repeat the process.
Add track name 'data_science'
to TRACKS
and TESTABLE_NOTEBOOK_TRACKS
in learntools/notebooks/test.sh
.
Create a new file setup_data.sh
in learntools/notebooks/data_science/
:
#!/bin/bash
# Download the datasets used in the ML notebooks to correct relative_paths (../input/...)
mkdir -p input
DATASETS="ryanholbrook/ds-course-data ryanholbrook/some-other-data"
for slug in $DATASETS
do
name=`echo $slug | cut -d '/' -f 2`
dest="input/$name"
mkdir -p $dest
kaggle d download -p $dest --unzip $slug
done
COMPDATASETS="competition-name"
for comp in $COMPDATASETS
do
dest="input/$comp"
mkdir -p $dest
kaggle competitions download $comp -p $dest
cd $dest
unzip ${comp}.zip
chmod 700 *.csv
cp *.csv ..
cd ../..
done
You'll need to keep this list of datasets in DATASETS
up-to-date with those you use in your course (that is, those defined in track_meta.py
). For competition datasets, user @dansbecker needs to accept the competition rules for Jenkins to be able to access the dataset. (Check this in case of a 403 - Forbidden
error.)
After you've saved the file, at the command prompt run:
chmod a+x setup_data.sh
This will make the file executable by Jenkins.