#### Install Pre-requisite Python Modules

In [None]:
%pip install -r requirements.txt

### List Available Modules
Run the Cell below to list the names of the available:
- Perceptual Hashing Algorithms
- Transformers
- Similarity Metrics

In [None]:
from notebooksupport import list_modular_components

nl = '\n'
for module_name, functions in list_modular_components().items():
    print( f"{module_name}:{nl}{nl.join(functions)}")
    print(nl)

#### Set Experimental Conditions
- Specify path to original (unmodified) images. Transforms will be applied to these, and these serve as the ground truth for intra-distance analysis.
- Specify output directory
- Specify dictionary of Perceptual Hashes to use (with parameters)
    - Format: ```<name_string>: <class_name>(<arguments>)```
    - Must be imported in this cell, base algorithms defined in ```phaser.hashing._algorithms.py```.
    - Class should extend ```phaser.hashing._algorithms.PerceptalHash```
- Specify Transforms to use.
    - Format: ```<clas_name>(<arguments>)```
    - Must be imported in this cell, base algorithms defined in ```phaser.transformers._transforms.py```.
    - Use ```TransformFromDisk``` and specify a path as the argument if the transform files already exist on disk (must have same name as originals to match)
- Specify Distance metrics to use.
    - Format: Dictionary with ```<Human Readable Name>:<metric>``` (metric should be str or Callable, see below)
    - If the distance metric is part of ```scipy.spatial.distance```, specify the name as a ```str```
    - If a custom distance metric is provided in ```phaser.similarities._distances.py```, import and pass the ```function reference```

In [None]:
# Specify path of the original (non-transformed) dataset
# e.g. "F:\Datasets\images"
original_path = r"images"

# Specify output directory
output_directory = r"./demo_outputs"

# Specify Perceptual Hashing Algorithms

from phaser.hashing import PHASH, ColourHash
algorithms = {
        'phash': PHASH(hash_size=8, highfreq_factor=4),
        'colour': ColourHash()
        }

# Specify Transforms functions 
from phaser.transformers import Border, Flip
transformers = [
    Border(border_color=(255,0,0), border_width=30, saveToPath=''),
    Flip(direction='Horizontal', saveToPath='')
    ]

from phaser.similarities import test_synthetic# Specify Distance Algorithms
distance_metrics = {
    "Hamming": "hamming",
    "Cosine": "cosine",
    "Test_Synthetic": test_synthetic
}

# Test that metrics have been entered correctly
from phaser.similarities import validate_metrics
if validate_metrics(distance_metrics):
    print("Metrics look valid!")

#### Process Files

- Hash original files with each algorithm
- Generate and hash transform files
- Output hashes to CSV files compressed to .bz (bzip) files

In [None]:
from notebooksupport import do_hashing

# Pass the settings through to a helper function which does the hashing and applies transforms
do_hashing(originals_path=original_path, algorithms=algorithms, transformers=transformers, output_directory=output_directory, progress_report=True)

#### Calculate Similarity Scores
There are two types of score calculated here:
- Intra-score: Where the original images are compared to their modifications. This is a 1-to-1 mapping (N * #hashes * #transforms * #comparison_metrics)
    - This is used to determine how robust the hash and comparison metric are to each transform class.
    - Ideally original images should have a distance of 0 (simlartiy of 1) to their transforms.
- Inter-score: Where images within a given tranform (or originals) are compared to themselves for each given comparison metric. Inter-scores are sampled to match the size of the intra-score class. (Calculating all pairwise combinations generates many more samples, (N*N-1/2)
    - This gives us a baseline behaviour, with the assumption that the images should *not* match.
    - On aggregate, random unrelated images should be about 0.5 different, as this makes the best use of the metric space.



In [None]:
from notebooksupport import calculate_distances

# Load hashes and labels from the output generated by the previous step and calculate inter- and intra-distances.
calculate_distances(hash_directory=output_directory, distance_metrics=distance_metrics, progress_report=True)

In [None]:
## Delete me or move me to a test file later - just a demo of the synthetic metric.

import numpy as np
import matplotlib.pyplot as plt
from phaser.similarities import test_synthetic

def pdist_test():
    nums = []
    for n in range(10000):
        nums.append(test_synthetic())

    plt.xlim(0,1)
    plt.hist(nums, bins=50)
    plt.show()

def cdist_test():
    nums = []
    for n in range(10000):
        nums.append(test_synthetic())

    plt.xlim(0,1)
    plt.hist(nums, bins=50)
    plt.show()


pdist_test()
cdist_test()