# Your Data

Run this notebook on Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_starter/03_Your_Data.ipynb)

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

**Goals**

In this tutorial, you can train EncoderMap on your own data.

**for Google colab only:**

If you're on Google colab, please uncomment these lines and install EncoderMap.

In [None]:
# !wget https://gist.githubusercontent.com/kevinsawade/deda578a3c6f26640ae905a3557e4ed1/raw/b7403a37710cb881839186da96d4d117e50abf36/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh

**Primer**

Now it's time to take advantage of your new knowledge about dimensionality reduction with EncoderMap. Load your own data and get started! The data set you use should be a table where each line contains one sample and the number of columns is the dimensionality of the data-set.

### Load Libraries

In [None]:
import encodermap as em
import numpy as np

### Load Your Data

In [None]:
csv_path = "path/to/your/data.csv"
high_d_data = np.loadtxt(csv_path, delimiter=",")

In [None]:
# This is a hidden cell it won't be displayed in the documentation
# The cell above won't be able to execute, because there is no file at path /path/to/yout/data.csv
# Instead we will be loading the linear_dimers dataset from EncoderMap's example projects and run this in hidden cells.

import xarray as xr

trajs, emap = em.load_project("linear_dimers", load_autoencoder=True)
h5_file = trajs[0]._traj_file

da = xr.open_dataset(h5_file, group="CVs", engine="h5netcdf").central_cartesians
cartesians = da.stack({"frame": ("traj_num", "frame_num")}).transpose("frame", ...).dropna("frame", how="all")
high_d_data = em.misc.pairwise_dist(
    cartesians[::1000, 1::3],
).numpy()

### Set Parameters

In [None]:
parameters = em.Parameters()
parameters.n_steps = 1000
parameters.dist_sig_parameters = [40, 10, 5, 1, 2, 5]
parameters.periodicity = 2*np.pi

# if your data set is large you should not try to calculate 
# the pairwise distance histogram with the complete data. 
em.plot.distance_histogram_interactive(
    data=high_d_data,  # e.g. use high_d_data[::10] to use evrey 10th point
    periodicity=parameters.periodicity, 
    initial_guess=parameters.dist_sig_parameters,
)

### Run the Dimensionality Reduction

In [None]:
e_map = em.EncoderMap(parameters, high_d_data)
history = e_map.train()

low_d_projection = e_map.encode(dihedrals)

In [None]:
# This is a hidden cell it won't be displayed in the documentation
# Instead we will be loading the linear_dimers dataset from EncoderMap's example projects and run this in hidden cells.
low_d_projection = emap.encode()

### Plot the Results

In [None]:
em.plot.plot_free_energy(
    *low_d_projection.T
)