In [None]:
# Install the environnement
%pip install git+https://github.com/AwePhD/NotebooksLabsessionImage.git

In [None]:
# Import dataset 
# Can be found at https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types
!rm -rf ./*
!curl -LO https://github.com/AwePhD/NotebooksLabsessionImage/raw/main/pokemon_dataset.zip
!unzip -qq pokemon_dataset.zip
!rm pokemon_dataset.zip

In [None]:
# Standard imports
from pathlib import Path
from pprint import pprint
from typing import List, Dict

# Third party imports
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
from skimage import data
from skimage import img_as_float, img_as_ubyte

# Local imports
from NLI.utils import (
    iprint,
    print_img_info
)

## Introduction to the pythonic comprehension

Writing actual Python code means to write *pythonic* code. This is a set of practices which are shared withing the community that makes you standard, readable and professional and efficient at writing Python code. **Comprehension** is one of the most well spread *pythonic* concepts.

### List comprehension

List comprehension are **only** meant to build a list from another object. Plus, a list comprehension is *usually* a bit faster than a regular loop ([SO reply](https://stackoverflow.com/a/22108640)) and more natural than a loop structure - once we are used to it! 

Here we are going to illustrate a list comprehension. We have a list of names $\{\text{Alice, Adam, Bob, Bertrand, Céline}\}$. The point is to retrieve each noun beginning by an $\text{A}$.

In [None]:
names_list: List[str] = ["Alice", "adam", "Bob", "Bertrand", "céline"]
print(f"name_list: {names_list}\n")

Then there is the classic way to loop over the `name_list` in order to create a new list with names beginning with an $\text{A}$.

In [None]:
A_names_list: List[str] = []
for name in names_list:
  if name[0] == 'A':
    A_names_list.append(name)
print(f"A_name_list: {A_names_list}\n")

Now, we are going to write the list comprehension in a one liner*. The **main** idea of a list comprehension is to generate _right away_ the final list: the list of the names beginning with an $\text{A}$.

Here, we basically take each name in the original tab with `for name in name_list` then we added a guard `if name[0] == 'A'` in order to filter which name we are keeping. 

*: we expanded the expression on several lines for readibility. It could have been done in one line if we remove the new line characters.

In [None]:
A_name_list_c: List[str] = [
  name
  for name in names_list
  if name[0] == 'A'
]

print(f"A_name_list: {A_name_list_c}\n")

We do not have to use the `if` guard for the list comprehension. Let's consider we want to generate the same list of names with names in title, namely these names should have the first letter capitalized.

In [None]:
A_name_list_c: List[str] = [
  name.title()
  for name in names_list
]

print(f"A_name_list: {A_name_list_c}\n")

Note, the mean vectors are lame and not very representative in this case. A lot of pixel are taken in consideration while their pixel have are white or / and transparent The solution would be to compute the mean from a masked image version. 

### Dictionnary comprehension

Another common data structure in Python is the dictionnary. This is a mapping between a _simple_ element - hashable - and another data structure. For example, dictionnary have keys (the simple element) with the name of the picture and has as value the mean vector of the picture.

__Tips__: List can be seen as special dictionnary where keys are indexes. With dictionnary, we do not need to have indexes, the keys are the index.

#### Hashable ?

To ellaborate a bit further about hashing, this means that there is a way for Python to determine an ID for the Python variable that contain every information about this object and **will never change**.

In [None]:
hash(tuple())

In [None]:
hash(list())

💀 **Warning**: it's possible that a tuple cannot be hashable though. If the tuple is containing an object that can change, then the hash is not possible.

In [None]:
tuple_unhashable = (5, list())
hash(tuple_unhashable)


To sum up, any hashable object can be set as key and any object - at all - can be set as value in a dictionnary.

In [None]:
sample_dict = {
    (5, 2): "value 1",
    "key 2": list(),
}

# We can access the elements with this syntax
sample_dict["key 2"]

#### Simple dictionnary example

Let's take a student record: each student has a name and a grad assigned.

In [None]:
student_records = (
    ("Alice", 19),
    ("Bertrand", 20),
    ("Céline", 15),
    ("Sarah", 19),
    ("Henry David", 10),
)

print(student_records)

We can do that with a classic loop. It would take some lines of code as follows:

In [None]:
grade_by_student_name = dict()
for student_record in student_records:
    grade_by_student_name[student_record[0]] = student_record[1]
    
print(grade_by_student_name)

With list comprehension we can do a one liner as previously seen with list

In [None]:
{
    student_record[0]: student_record[1]
    for student_record in student_records
}

Also, we can add an `if` guard to add exclude some values.

In [None]:
{
    student_record[0]: student_record[1]
    for student_record in student_records
    if student_record[1] > 18
}

## Apply comprehension to image data

How to get data from files and path is explained in the Path collab.

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AwePhD/NotebooksLabsessionImage/blob/main/notebooks/manage_path.ipynb)

In [None]:
path_images_dir: Path = Path.cwd() / "images"

### A list to store the mean of images

Here, we want to use the new shiny list comprehension to do some image processing. For this, we have `path_images_dir` which is the directory of containing all the images. In a single line of code we can iter into this directory and compute the mean with a friendly syntax.

In [None]:
mean_list: List[np.ndarray] = [
  io.imread(path_file).mean((0,1))
  for path_file in path_images_dir.iterdir()
]

pprint(mean_list[:3])

### A dictionnary to map the path and the mean of the file

In the same spirit we can map the name of a file with the value of its mean RGB value. The main advantage is to have a clear mapping between the useful value and something to retrieve the file, here, the name of the file.

In [None]:
mean_by_filename: Dict[str, np.ndarray] = {
  path_file.name: io.imread(path_file).mean((0,1)) 
  for path_file in path_images_dir.iterdir() 
}

for key in tuple(mean_by_filename.keys())[:3]:
  print(f"{key}: {mean_by_filename[key]}")

Plus, you can even map the `Path` object of your file to the mean, `Path` objects are hashable. Actually, it would be even more convenient to have access of its path if we need to retrieve the file later.

In [None]:
mean_by_path: Dict[Path, np.ndarray] = {
  path_file: io.imread(path_file).mean((0,1))
  for path_file in path_images_dir.iterdir()
}

for key in tuple(mean_by_path.keys())[:3]:
  print(f"{key}: {mean_by_path[key]}")