# Exampes with natural images
In this notebook I show several examples with natural images. The rows (and columns) in a natural image form a clear one-dimensional sequence. Therefore, we can shuffle the rows by changing their relative order. Considering each row as a separate 1D object, we can define the shuffled image as the input dataset and apply the Sequencer to it. We can then reorder the rows according to the detected sequence, and to check whether the Sequencer was able to recover the original image. <br>

The images I use in this notebook are taken from the COCO dataset (link: XXX). In particular, I used images from the 2017 validation set. <br> 

The notebook consists of the following parts: <br>
1. **Loading and shuffling a natural image:** this part of the notebook will contain functions that I need to load and shuffle images. <br>
2. **Applying the Sequencer to a set of shuffled images:** in this part of the notebook we will apply the Sequencer to a set of shuffled images, and will check whether the Sequencer is able to recover the original images. <br>
3. **Application of tSNE and UMAP to a shuffled image:** in this part of the notebook I will try to reorder the shuffled images using tSNE and UMAP. I will apply tSNE and UMAP to the shuffled image, and using their one-dimensional embedding, I will reorder the rows to obtain a reconstructed image. Both tSNE and UMAP depend on several hyper-parameters. In the Jupyter notebook `comparison_with_tsne_and_umap.ipynb` in the examples directory, I showed that one can define an axis ratio of the resulting embedding by tSNE and UMAP, and use it as a figure of merit to optimize the hyper parameters of various dimensionality reduction algorithms. In this part of the notebook I will demonstrate this by examining tSNE/UMAP hyper-parameters, and comparing the axis ratios of the resulting embeddings.

In [1]:
# imports
%matplotlib inline

import sys
sys.path.append("../code/")
import importlib

import sequencer
importlib.reload(sequencer)

import numpy
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from scipy.stats import wasserstein_distance
import umap
from skimage.io import imread, imshow
from skimage.transform import resize

## 1. Loading and shuffling a natural image
This part of the notebook contains functions that I need to load and shuffle images. The following cell contains a function that takes as an input an image path, loads the image, and shuffles its rows.

In [2]:
def return_shuffled_dataset(image_path):
    """Function loads the image from the input path and down-samples it.
    Function then shuffles the image and returns it.
    """
    # load the image
    data = imread(image_path, as_gray=True)

    # down-sample the image to be 1/3 of its original size
    # this is done to reduce the computation time, users can remove it
    shape_x = int(data.shape[0]*0.3)
    shape_y = int(data.shape[1]*0.3)
    new_shape = (shape_x, shape_y)
    objects_list = resize(data, new_shape) + 1.

    # shuffle the objects in the sample
    random_indices = numpy.arange(len(objects_list))
    numpy.random.shuffle(random_indices)
    objects_list_shuffled = objects_list[random_indices, :]
    
    # x-axis
    grid = numpy.arange(len(objects_list_shuffled[0]))
    
    return random_indices, grid, objects_list, objects_list_shuffled

## 2. Applying the Sequencer to a set of shuffled images
In this part of the notebook I will load the different natural images from the directory and will shuffle them. I will then apply the Sequencer to the shuffled images and check whether the Sequencer was able to recover the original image. 

In [3]:
%ls data_for_examples/images_from_COCO_dataset/

000000000285.jpg  000000085682.jpg  000000249025.jpg  000000458992.jpg
000000012576.jpg  000000098520.jpg  000000287649.jpg  000000544519.jpg
000000032081.jpg  000000170893.jpg  000000374545.jpg  000000570736.jpg
000000065485.jpg  000000210708.jpg  000000414261.jpg


In [None]:
image_path = "data_for_examples/images_from_COCO_dataset/000000000285.jpg"

# get the shuffled image
random_indices, grid, objects_list, objects_list_shuffled = return_shuffled_dataset(image_path)
print("shape of the input dataset: ", objects_list_shuffled.shape)

# apply the Sequencer to the shuffled dataset
estimator_list = ['EMD', 'energy', 'KL', 'L2']
seq = sequencer.Sequencer(grid, objects_list_shuffled, estimator_list)
output_path = "sequencer_output_directory"
final_axis_ratio, final_sequence = seq.execute(output_path, 
                                               to_average_N_best_estimators=True, 
                                               number_of_best_estimators=3)

calculating the distance matrices for estimator: EMD, scale: 1
finished calculating this distance matrix list, it took: 2.070777177810669 seconds
calculating the distance matrices for estimator: EMD, scale: 2
finished calculating this distance matrix list, it took: 3.3603482246398926 seconds
calculating the distance matrices for estimator: EMD, scale: 4
finished calculating this distance matrix list, it took: 6.048532009124756 seconds
calculating the distance matrices for estimator: EMD, scale: 8
finished calculating this distance matrix list, it took: 12.590080976486206 seconds
calculating the distance matrices for estimator: energy, scale: 1
finished calculating this distance matrix list, it took: 2.171283006668091 seconds
calculating the distance matrices for estimator: energy, scale: 2
finished calculating this distance matrix list, it took: 3.8356170654296875 seconds
calculating the distance matrices for estimator: energy, scale: 4
finished calculating this distance matrix list, i