In [1]:
# Install the environnement
%pip install git+https://github.com/AwePhD/NotebooksLabsessionImage.git

Collecting git+https://github.com/AwePhD/NotebooksLabsessionImage.git
  Cloning https://github.com/AwePhD/NotebooksLabsessionImage.git to /tmp/pip-req-build-3vdf1de3
  Running command git clone -q https://github.com/AwePhD/NotebooksLabsessionImage.git /tmp/pip-req-build-3vdf1de3
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Building wheels for collected packages: NLI
  Building wheel for NLI (PEP 517) ... [?25l[?25hdone
  Created wheel for NLI: filename=NLI-1.0.0-py3-none-any.whl size=2406 sha256=1560bca79741ab76575f29d7909879ca0bdb2ee81d8ce93d940f944b7820964b
  Stored in directory: /tmp/pip-ephem-wheel-cache-c4y459bw/wheels/17/4a/a4/4f920391e876c3c2632ecc7851748e1c11539349fe2eefd2c4
Successfully built NLI
Installing collected packages: NLI
Successfully installed NLI-1.0.0


In [2]:
# Import dataset 
# Can be found at https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types
!rm -rf ./*
!curl -LO https://github.com/AwePhD/NotebooksLabsessionImage/raw/main/pokemon_dataset.zip
!unzip -qq pokemon_dataset.zip
!rm pokemon_dataset.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   156  100   156    0     0    445      0 --:--:-- --:--:-- --:--:--   444
100 2484k  100 2484k    0     0  4155k      0 --:--:-- --:--:-- --:--:-- 4155k


In [1]:
# Standard imports
from pathlib import Path
from pprint import pprint
from typing import List, Dict

# Third party imports
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
from skimage import data

## Use of numpy witn ndarray for Image processing

Numpy array - `ndarray` - is the most used data structure in Numpy package (maybe overall Python packages). This array has a *ton of features* which are used in data processing.

Basically, you can use the numpy array to store variables with a fixed `dtype`. More on data type in Numpy [here](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html).

In [2]:
np.array([3, 2], dtype=np.float64)

array([3., 2.])

In [3]:
np.array([3, 2], dtype=np.int16)

array([3, 2], dtype=int16)

In [4]:
np.array([3, 2], dtype=np.int16)  == np.array([3, 2], dtype=np.float64)

array([ True,  True])

### Getting information about `ndarray` objects

There are several information that you can get on a ndarray, a more details walkthrough is available [here](https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html).

We will illustrate some features of the ndarray on the object `data`. It has 3 dimensions like RGB images: first two dimensions can be interpreted as width and height and third one is the RGB value of each pixel.

In [5]:
data = np.random.random((500, 250, 3))

In [6]:
print(
    f"data dimension: {data.ndim}\n"
    f"data shape: {data.shape}\n"
    f"data size: {data.size}\n"
    f"data dtype: {data.dtype}\n"
)

data dimension: 3
data shape: (500, 250, 3)
data size: 375000
data dtype: float64



As we will see later in this notebook and during lab session, it's extremely important to keep in mind the data type contained in `ndarray` and the shape.

### "Smart artihmetic" operation on `ndarray`

The _smart_ arithmetic is known as **broadcasting**. There [is a section](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html) of the Python Data Handbook which is about this feature. Mainly, we can use `ndarray` easily and handily. 

Let's add one to our `data` variable. Due to the broadcasting, Python handles an addition between an `int` and a `ndarray` variable.

In [7]:
data_plus_one: np.ndarray = data + 1
print(
    f"data[25,30,0]: {data[25,30,0]:.2f}\n"
    f"data_plus_one[25,30,0]: {data_plus_one[25,30,0]:.2f}\n"
)


data[25,30,0]: 0.66
data_plus_one[25,30,0]: 1.66



More, we can add the following to the code. We can add a 1D vector that will add to every components based on the minimum dimensions. This might be useful for linear algebra computation for example. This example is not useful in itself, this is just to exhibit the power of broadcasting.

Plus, this broadcasting enables Python to vectorize the operation, so it's much more performant than a for loop, for instance.

In [8]:
data_plus_vector: np.ndarray = data + np.arange(3)
print(
    f"np.arange(3): {np.arange(3)}\n"
    f"{'':-^35}\n"
    f"data[25,30,0]: {data[25,30,0]:.2f}\n"
    f"data_plus_vector[25,30,0]: {data_plus_vector[25,30,0]:.2f}\n"
    f"{'':-^35}\n"
    f"data[24,0,0]: {data[24,0,0]:.2f}\n"
    f"data_plus_vector[24,0,120]: {data_plus_vector[24,0,0]:.2f}\n"
    f"{'':-^35}\n"
    f"data[25,30,1]: {data[25,30,1]:.2f}\n"
    f"data_plus_vector[25,30,1]: {data_plus_vector[25,30,1]:.2f}\n"
    f"{'':-^35}\n"
    f"data[25,30,122]: {data[25,30,2]:.2f}\n"
    f"data_plus_vector[25,30,2]: {data_plus_vector[25,30,2]:.2f}\n"
)

np.arange(3): [0 1 2]
-----------------------------------
data[25,30,0]: 0.66
data_plus_vector[25,30,0]: 0.66
-----------------------------------
data[24,0,0]: 0.93
data_plus_vector[24,0,120]: 0.93
-----------------------------------
data[25,30,1]: 0.28
data_plus_vector[25,30,1]: 1.28
-----------------------------------
data[25,30,122]: 0.70
data_plus_vector[25,30,2]: 2.70



### Mean processing

We can process the mean easily. Here a brief section to illustrate how to do it. For more details how to process other stats, check the [section](https://jakevdp.github.io/PythonDataScienceHandbook/02.04-computation-on-arrays-aggregates.html) of Python Datascience Handbook.

Let's imagine, we are going to take the temperature during 1 day every hours in 50 cities in France. So, each town has 24 points to represent the temperature during the day. I take some random data over an uniform distribution as dummy values for illustration.

In [9]:
temperature_in_cities: np.ndarray = np.random.uniform(low=5, high=12, size=(50,24))

Then, we would like to have the mean temperature over every hours and every towns in our data.

In [10]:
temperature_in_cities.mean()

8.460650036106923

Now, we can have the mean temperature for each city. We need to compute the mean on the 24 hours of the day.

In [11]:
mean_temp_in_cities_over_1_day: np.ndarray = temperature_in_cities.mean(1)
print(
    f"shape: {mean_temp_in_cities_over_1_day.shape}\n"
    f"mean temperature for city 0: {mean_temp_in_cities_over_1_day[0]:.2f}°C"
)

shape: (50,)
mean temperature for city 0: 8.83°C


Vice versa, we would like to have the mean temperature over the cities for every hours of the day.

In [12]:
mean_temp_in_hours_over_all_cities: np.ndarray = temperature_in_cities.mean(0)
print(
    f"shape: {mean_temp_in_hours_over_all_cities.shape}\n"
    f"Mean temperature at 8AM: {mean_temp_in_hours_over_all_cities[7]:.2f}°C"
)

shape: (24,)
Mean temperature at 8AM: 8.03°C
