# Python Image Manipulation with thousands of files
## Or, How to Get Computers to Do Tedious Things for You


### Before you start
Make sure you've run though the *setup* before running though this notebook.  In summary you'll need:

A local python install (system version is fine for MacOS and most Linux Distros)
Git
a Github account
Git configured with your username and email address.

## Intro

Sometimes, you find yourself performing repetitive tasks on a computer. Since computers excel at handling repetitive tasks, we'll explore an example that involves manipulating tens of thousands of images. During this session, we'll cover:

* Downloading and working with large datasets
* Using Python virtual environments
* Iteratively writing code
* Version controlling our code as we progress
* Optimising code to run in parallel for increased speed

We'll conduct most of this using Python in a Jupyter notebook. If you've completed the setup, you should be able to follow along easily. However, feel free to attend and ask questions even if you're not planning on actively participating.

## Setup the initial Python Virtual Environment

If you work on multiple Python projects, you'll likely have encountered Python virtual environments (venvs) and/or conda. You can think of these as ways to have a separate Python "install" or more accurately, an environment for each project. There are numerous reasons to use a venv or conda such as keeping different versions of a library separate. However, my favorite reason is reproducibility.

If you move to another computer, High-Performance Computing (HPC) system, or have a collaborator working with you, you'll need to know which versions of which libraries are necessary to install to make your project work. Venvs help solve this problem with requirements files. Conda can also save its environment. For this example, we'll use venvs and pip as they are lightweight and require less to install. That being said, Conda is a great option too.

Let's start by creating a venv, assuming you have a fairly recent version of Python 3 installed, everything you need should already be included.

```bash
python3 -m venv env #create a venv called env

ls #see the env folder
```

To actually use the venv, you'll need to know some particular commands. These aren't too complicated and eventually, you'll remember them. There are helper scripts available to simplify this process, and Conda is also a bit simpler. We'll do it in a generic, lightweight way:

```bash
source env/bin/activate
```

Notice the leading `env` on your prompt.

![Activated venv](docs/images/activated_env.png "Activated Virtual Environment")

### Installing and starting Jupyter

Now we can install Jupyter which we'll be using for most of the session.

in your terminal
```bash
pip install jupyter
```

If that doesn't work, you might need to specify ```pip3``` instead.

Now we can start Jupyter, though depending on your operating system, you might need to deactivate and reactivate your virtual environment first

```
deactivate

#then
source env/bin/activate # just as before
```

To start Jupyter

```
jupyter-lab
```

This should open a tab in your browser with Jupyter, otherwise you might need to copy paste the line from the terminal, it'll look something like this

![jupyter start text](docs/images/Jupyter-start.png "jupyter start text")

In [None]:
Next, install Pillow:

```bash
pip install pillow
```

Now, let's generate a Python requirements file:

```bash
pip freeze > requirements.txt
```

Finally, add the requirements file to Git:

```bash
git add requirements.txt 
git commit -m "Added requirements file with pillow"
```