# Pre-processing

Pre-processing is performed mostly by configuring and running existing scripts as shown below. There are two kinds of analysis performed in Processing.ipynb that need to be prepared here, namely latent space exploration and manipulation. 
Latent space exploration involves 
- projection with principal component analysis (PCA),
- t-distributed stochastic neighbor embeddings (t-SNE),
- classification of latent representations as materials and actions using k-nearest neighbors (KNN) and 
- disententanglement with a flow model.

Latent space manipulation involves
- invertible projection with PCA. This requires a complete PCA model whose output and input dimensionality is the same which is processing intensive to set up.
- disentanglement with a flow model. This flow model needs to be invertible. The one listed for exploration will work as it is.

The preparation for these two analyses is similar in many ways. They both require the conversion of data from its waveform domain to Yament's latent representation for each layer and the projection to manageable dimensionality, e.g. 64 dimensions using PCA. Note that computing a full PCA model is resource intensive and a small model with e.g. 64 dimensions will suffice for most layers. For the majority of layers, the original latent space representations that are of higher dimensionality than the projection are also no longer needed after projection. Only the layers whose latent space shall be manipulated need the full PCA model for invertability and need the original latent space representation. 

The pre-processing can thus be prepared as follows: For each layer l of Yamnet:
  - Convert all sounds to their latent representation at layer l.
  - Fit standard scalers and PCA to a sample of these representations. Since the dimensionality of the first 3 Yamnet layers is too large, a complete PCA model cannot be created for them. Yet, for these layers, a PCA model that only creates the first few dimensions is sufficient to explore the latent space in subsequent experiments. Only for the layers whose original dimensionality is sufficiently small, it will be possible to create a PCA model with as many output dimensions as input dimensions. Depending on system resources, this might work for layers 4 and 5. It should work for layers 6 and onwards for every reasonably equipped machine.  
  - Using the previously created PCA model, project the latent Yamnet representations of sounds to a more manageable dimensionality.
  - Optionally delete the higher dimensional representations. The  experiment of the accompanying paper only does keeps the high dimensional representations for layer 9. As a consequence, the final disk storage will be minimized.
  
Important: These steps require several hours to be executed and memory as well as disk storage demands can temporarily peak.

In [2]:
from latent_audio.scripts import audio_to_latent_yamnet as aud2lat, create_scalers_and_PCA_model_for_latent_yamnet as lat2pca, latent_yamnet_to_calibration_data_set as lat2cal
import shutil, os

full_dim_layer_indices = [9]
reduced_target_dimensionality = 64

for layer_index in range(14):
    print(f'Layer {layer_index}')
    # Extract data
    aud2lat.run(layer_index=layer_index) # Converts audio to latent yamnet representation of original dimensionality
    lat2pca.run(layer_index=layer_index, target_dimensionality=None if layer_index in full_dim_layer_indices else reduced_target_dimensionality) # Creates standard scalers and PCA for projection to lower dimensional space
    lat2cal.run(layer_index=layer_index, dimensionality=reduced_target_dimensionality) # Performs the projection (this will be needed for all layers)

    # Delete latent representations of original dimensionality to save disk storage
    if layer_index not in full_dim_layer_indices:
        shutil.rmtree(os.path.join("data","latent yamnet","original",f"Layer {layer_index}"))

Layer 13
Running script to convert audio to latent yamnet
	100.0% Completed
	Run Completed
Running script to create scalers and PCA model for latent yamnet
	Loading sample of latent data Completed. Shape == [instance count, dimensionality] == (10000, 6144)
	Fitting Pre-PCA Standard Scaler to sample Completed
	Fitting 64-dimensional PCA to sample Completed
	Fitting Post-PCA Standard Scaler to sample Completed
	Run Completed
Running script to convert latent yamnet to calibration data set
	The top 64 dimensions explain 83.16 % of variance.
	100.0 % Completed
	Run completed
