#### Install Pre-requisite Python Modules

In [None]:
%pip install -r requirements.txt

### List Available Modules
Run the Cell below to list the names of the available:
- Perceptual Hashing Algorithms
- Transformers
- Similarity Metrics

In [None]:
from notebooksupport import list_modular_components

nl = '\n'
for module_name, functions in list_modular_components().items():
    print( f"{module_name}:{nl}{nl.join(functions)}")
    print(nl)

#### Set Experimental Conditions

In [1]:
# Specify path of the original (non-transformed) dataset
# e.g. "F:\Datasets\images"
original_path = "images"

# Specify output directory
output_directory = "demo_outputs/"

# Specify Perceptual Hashing Algorithms (need to have a class defined in phaser.hashing._algorithms.py)
# Format: <name_string>: <clas_name>(<arguments>)
from phaser.hashing._algorithms import PHASH, ColourHash
algorithms = {
        'phash': PHASH(hash_size=8, highfreq_factor=4),
        'colour': ColourHash()
        }

# Specify Transforms functions (need to have a transformer defined in phaser.transformers._transforms.py)
# Format: <clas_name>(<arguments>)
# Use TransformFromDisk and specify a path as the argument if the transform files already exist on disk (must have same name as originals to match)
from phaser.transformers._transforms import Border, Flip
transformers = [
    Border(border_color=(255,0,0), border_width=30, saveToPath=''),
    Flip(direction='Horizontal', saveToPath='')
    ]

# Specify Distance Algorithms (need to have distance metrics defined in phaser.similarities._distances.py)
distances = [
    "hamming",
    "cosine"
]

#### Process Files

- Hash original files with each algorithm
- Generate and hash transform files
- Output hashes to CSV files compressed to .bz (bzip) files

In [2]:
from notebooksupport import do_hashing

# Pass the settings through to a helper function which does the hashing and applies transforms
do_hashing(originals_path=original_path, algorithms=algorithms, transformers=transformers, output_directory=output_directory)

Found 20 images in c:\Users\smck\Documents\GitHub\aaby_test_lib\test_lib\images.
Creating output directory at c:\Users\smck\Documents\GitHub\aaby_test_lib\test_lib\demo_outputs...
Doing hashing...


Files: 100%|██████████| 20/20 [00:02<00:00,  8.59it/s]


Saving hashes.csv and labels for filenames (f), algorithms (a) and transforms (t) to bzip files..


#### Calculate Similarity Scores
There are two types of score calculated here:
- Intra-score: Where the original images are compared to their modifications. This is a 1-to-1 mapping (N * #hashes * #transforms * #comparison_metrics)
    - This is used to determine how robust the hash and comparison metric are to each transform class.
    - Ideally original images should have a distance of 0 (simlartiy of 1) to their transforms.
- Inter-score: Where images within a given tranform (or originals) are compared to themselves for each given comparison metric. Inter-scores are sampled to match the size of the intra-score class. (Calculating all pairwise combinations generates many more samples, (N*N-1/2)
    - This gives us a baseline behaviour, with the assumption that the images should *not* match.
    - On aggregate, random unrelated images should be about 0.5 different, as this makes the best use of the metric space.



In [4]:
from notebooksupport import calcualte_distances

# Load hashes and labels from the output generated by the previous step and calculate inter- and intra-distances.
calcualte_distances(hash_directory=output_directory, progress_report=True)

Dataframe loaded from c:\Users\smck\Documents\GitHub\aaby_test_lib\test_lib\demo_outputs\hashes.csv.bz2
ALGORITHMS=array(['colour', 'phash'], dtype='<U6')
TRANSFORMS=array(['Border_bw30_bc255.0.0', 'Flip_Horizontal', 'orig'], dtype=object)
Saving metric encoder to le_m.
Computing Intra-distances...


Hash: 100%|██████████| 2/2 [00:00<00:00, 83.84it/s]


Number of total intra-image comparisons = 80
Computing Inter-distance with 7 per image.


Hash: 100%|██████████| 2/2 [00:00<00:00, 110.54it/s]

Number of pairwise comparisons = 21
Number of inter distances = 84
Saving distance scores to c:\Users\smck\Documents\GitHub\aaby_test_lib\test_lib\demo_outputs\distances.csv.bz2.



