Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Bring changes from main into JOSE branch following JOSE Review" #50

Closed
wants to merge 7 commits into from
58 changes: 14 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

![GitHub](https://img.shields.io/github/license/Cambridge-ICCS/ml-training-material)
[![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Cambridge-ICCS/ml-training-material/main)

This repository contains documentation, resources, and code for the Introduction to
Machine Learning with PyTorch session designed and delivered by [Jack Atkinson](https://jackatkinson.net/) ([**@jatkinson1000**](https://github.com/jatkinson1000))
Expand All @@ -24,7 +23,6 @@ A website for this workshop can be found at [https://cambridge-iccs.github.io/ml
- [Preparation and prerequisites](#preparation-and-prerequisites)
- [Installation and setup](#installation-and-setup)
- [License information](#license)
- [Contribution Guidelines and Support](#contribution-guidelines-and-support)


## Learning Objectives
Expand Down Expand Up @@ -72,6 +70,13 @@ These are for recapping after the course in case you missed anything, and contai
[linted](https://docs.pylint.org/intro.html), and conforming to the
[black](https://black.readthedocs.io/en/stable/) code style.

If you were working on Colab you can open the worked solutions using the following links:

* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/01_penguin_classification_solutions.ipynb)
* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/02_penguin_regression_solutions.ipynb)
* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/03_mnist_classification_solutions.ipynb)
* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/04_ellipse_regression_solutions.ipynb)


## Preparation and prerequisites

Expand Down Expand Up @@ -131,18 +136,17 @@ us before a training session.

## Installation and setup

There are three options for participating in this workshop for which instructions are provided below:
There are two options for participating in this workshop for which instructions are provided below:

* via a [local install](#local-install)
* on [Google Colab](#google-colab)
* on [binder](#binder)

We recommend the [local install](#local-install) approach, especially if you forked
the repository, as it is the easiest way to keep a copy of your work and push back to GitHub.

However, if you experience issues with the installation process or are unfamiliar with
the terminal/installation process there is the option to run the notebooks in
[Google Colab](#google-colab) or on [binder](#binder).
[Google Colab](#google-colab).

### Local Install

Expand Down Expand Up @@ -215,31 +219,18 @@ python -m ipykernel install --user --name=MLvenv

### Google Colab

Running on Colab is useful as it allows you to access GPU resources.
To launch the notebooks in Google Colab click the following links for each of the exercises:
To run the notebooks in Google Colab click the following links for each of the exercises:

* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/01_penguin_classification.ipynb) - [Worked Solution 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/01_penguin_classification_solutions.ipynb)
* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/02_penguin_regression.ipynb) - [Worked Solution 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/02_penguin_regression_solutions.ipynb)
* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/03_mnist_classification.ipynb) - [Worked Solution 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/03_mnist_classification_solutions.ipynb)
* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/04_ellipse_regression.ipynb) - [Worked Solution 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/worked-solutions/04_ellipse_regression_solutions.ipynb)
* [Exercise 01](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/01_penguin_classification.ipynb)
* [Exercise 02](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/02_penguin_regression.ipynb)
* [Exercise 03](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/03_mnist_classification.ipynb)
* [Exercise 04](https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/04_ellipse_regression.ipynb)

_Notes:_
* _Running in Google Colab requires you to have a Google account._
* _If you leave a Colab session your work will be lost, so be careful to save any work
you want to keep._

### binder

If you cannot operate using a local install, and do not wish to sign up for a Google account,
the repository can be launched
[on binder](https://mybinder.org/v2/gh/Cambridge-ICCS/ml-training-material/main).

_Notes:_
* _If you leave a binder session your work will be lost, so be careful to save any work
you want to keep_
* _Due to the limited resources provided by binder you will struggle to run training in
exercises 3 and 4._


## License

Expand All @@ -253,24 +244,3 @@ The teaching materials are licensed under a
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

## Contribution Guidelines and Support

If you spot an issue with the materials please let us know by
[opening an issue](https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose)
here on GitHub clearly describing the problem.

If you are able to fix an issue that you spot, or an
[existing open issue](https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose)
please get in touch by commenting on the issue thread.

Contributions from the community are welcome.
To contribute back to the repository please first
[fork it](https://github.com/Cambridge-ICCS/ml-training-material/fork),
make the neccessary changes to fix the problem, and then open a pull request back to
this repository clerly describing the changes you have made.
We will then preform a review and merge once ready.

If you would like support using these materials, adapting them to your needs, or
delivering them please get in touch either via GitHub or via
[ICCS](https://github.com/Cambridge-ICCS).
12 changes: 3 additions & 9 deletions exercises/01_penguin_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,7 @@
" train=True,\n",
")\n",
"\n",
"\n",
"for features, target in data_set:\n",
" # print the features and targets here\n",
" pass"
]
},
Expand All @@ -126,7 +124,7 @@
"source": [
"### Task 4: Applying transforms to the data\n",
"\n",
"A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects (i.e., functions) and applies them to the incoming data.\n",
"A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n",
"\n",
"These transforms can be very useful for mapping between file paths and tensors of images, etc.\n",
"\n",
Expand All @@ -143,12 +141,8 @@
"outputs": [],
"source": [
"from torchvision.transforms import Compose\n",
"# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n",
"# where `tensor` and `eye` are used for constructing tensors,\n",
"# and using a lower-precision float32 is advised for performance\n",
"from torch import tensor, eye, float32 \n",
"\n",
"# Apply the transforms we need to the PenguinDataset to get out input\n",
"# Apply the transforms we need to the PenguinDataset to get out inputs\n",
"# targets as Tensors."
]
},
Expand All @@ -160,7 +154,7 @@
"\n",
"- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n",
" - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n",
" - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n",
" - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n",
" - The number of items we supply at once is called the batch size.\n",
" - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n",
" - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n",
Expand Down
6 changes: 0 additions & 6 deletions setup.py

This file was deleted.

25 changes: 2 additions & 23 deletions slides/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,10 @@ <h2 id="contents">Contents</h2>
<ul>
<li><a href="#github">GitHub</a></li>
<li><a href="#colab">Colab</a></li>
<li><a href="#binder">binder</a></li>
<li><a href="#solutions">Solutions</a></li>
</ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#license">License</a></li>
<li><a href="#contribution-guidelines-and-support">Contribution Guidelines and Support</a></li>
</ul>
</div>

Expand Down Expand Up @@ -137,7 +135,6 @@ <h2 id="setup">Setup Instructions</h2>
<ul>
<li><a href="#github">a local install via Github</a></li>
<li><a href="#colab">online via Google Colab</a></li>
<li><a href="#binder">online via binder</a></li>
</ul>

<p>We recommend the local install approach, especially if you forked the repository, as it is the easiest way to keep a copy of your work and push back to github.</p>
Expand Down Expand Up @@ -186,8 +183,7 @@ <h6>Optional) Keep virtual environment persistent in Jupyter Notebooks</h6>
<code>python -m ipykernel install --user --name=MLvenv</code></p>

<h4 id="colab">Google Colab</h4>
<p>Running on Colab is useful as it allows you to access GPU resources.<br>
To launch the notebooks in Google Colab click the following links for each of the exercises:</p>
<p>To run the notebooks in Google Colab click the following links for each of the exercises:</p>
<ul>
<li><a href="https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/01_penguin_classification.ipynb">Exercise 01</a></li>
<li><a href="https://colab.research.google.com/github/Cambridge-ICCS/ml-training-material/blob/colab/exercises/02_penguin_regression.ipynb">Exercise 02</a></li>
Expand All @@ -200,17 +196,6 @@ <h4 id="colab">Google Colab</h4>
<indent>If you leave a Colab session your work will be lost, so be careful to save any work you want to keep.</indent>
</i></p>

<h4 id="binder">binder</h4>
<p>To run the notebooks in binder click the following link:</p>
<ul>
<li><a href="https://mybinder.org/v2/gh/Cambridge-ICCS/ml-training-material/main">Launch repository in binder</a></li>
</ul>

<p><i>Notes:<br>
<indent>If you leave a binder session your work will be lost, so be careful to save any work you want to keep.</indent>
<indent>Due to the limited resources provided by binder you will struggle to run training in exercises 3 and 4.</indent>
</i></p>

<h4 id="solutions">Solutions</h4>

<p>Worked solutions for all of the exercises can be found in the <inlinecode>worked-solutions/</inlinecode> directory.
Expand Down Expand Up @@ -261,7 +246,7 @@ <h4>Python</h4>
</ul>

<h4>git and GitHub</h4>
<p>Unless participating via <a href="#colab">Colab</a> or <a href="#binder">binder</a> you will be expected to know how to:</p>
<p>Unless participating via <a href="#colab">Colab</a> you will be expected to know how to:</p>
<ul>
<li>clone and/or fork a repository,</li>
<li>commit, and</li>
Expand Down Expand Up @@ -291,12 +276,6 @@ <h2 id="license">License</h2>
<p>The teaching materials are licensed under <a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC BY-NC-SA 4.0</a>.<br>
<img src="https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png"></img></p>

<h2 id="contribution-guidelines-and-support">Contribution Guidelines and Support</h2>
<p>If you spot an issue with the materials please let us know by <a href="https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose" target=blank>opening an issue</a> on GitHub clearly describing the problem.</p>
<p>If you are able to fix an issue that you spot, or an <a href="https://github.com/Cambridge-ICCS/ml-training-material/issues/new/choose" target=blank>existing open issue</a> please get in touch by commenting on the issue thread.</p>
<p>Contributions from the community are welcome. To contribute back to the repository please first <a href="https://github.com/Cambridge-ICCS/ml-training-material/fork" target=blank>fork it</a>, make the neccessary changes to fix the problem, and then open a pull request back to this repository clerly describing the changes you have made. We will then preform a review and merge once ready.</p>
<p>If you would like support using these materials, adapting them to your needs, or delivering them please get in touch either via GitHub or via <a href="https://github.com/Cambridge-ICCS" target=blank>ICCS</a>.</p>

</div>
</div>

Expand Down
8 changes: 3 additions & 5 deletions src/ml_workshop/_penguins.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""Penguins dataset."""

from typing import Optional, List, Dict, Tuple, Any

from torch.utils.data import Dataset
Expand All @@ -18,9 +17,9 @@ class PenguinDataset(Dataset):

Parameters
----------
input_keys : List[str]
input_keys : Sequence[str]
The column titles to use in the input feature vectors.
target_keys : List[str]
target_keys : Sequnce[str]
The column titles to use in the target feature vectors.
train : bool
If ``True``, this object will serve as the training set, and if
Expand All @@ -40,7 +39,7 @@ class PenguinDataset(Dataset):
def __init__(
self,
input_keys: List[str],
target_keys: List[str],
target_keys: str,
train: bool,
x_tfms: Optional[Compose] = None,
y_tfms: Optional[Compose] = None,
Expand Down Expand Up @@ -110,7 +109,6 @@ def _load_penguin_data() -> DataFrame:
.sort_values(by=sorted(data.keys()))
.reset_index(drop=True)
)
# Transform the sex field into a float, with male represented by 1.0, female by 0.0
data.sex = (data.sex == "male").astype(float)
return data

Expand Down
11 changes: 4 additions & 7 deletions worked-solutions/01_penguin_classification_solutions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@
"source": [
"### Task 4: Applying transforms to the data\n",
"\n",
"A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects and applies them to the incoming data.\n",
"A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n",
"\n",
"These transforms can be very useful for mapping between file paths and tensors of images, etc.\n",
"\n",
Expand Down Expand Up @@ -242,14 +242,11 @@
}
],
"source": [
"from torchvision.transforms import Compose\n",
"# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n",
"# where `tensor` and `eye` are used for constructing tensors,\n",
"# and using a lower-precision float32 is advised for performance\n",
"from torch import tensor, float32, eye\n",
"from torchvision.transforms import Compose\n",
"\n",
"\n",
"# Apply the transforms we need to the PenguinDataset to get out input\n",
"# Apply the transforms we need to the PenguinDataset to get out inputs\n",
"# targets as Tensors.\n",
"\n",
"\n",
Expand Down Expand Up @@ -324,7 +321,7 @@
"\n",
"- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n",
" - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n",
" - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n",
" - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n",
" - The number of items we supply at once is called the batch size.\n",
" - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n",
" - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n",
Expand Down