# PCA_Exp tutorial

This tutorial teaches how to use, and explains basic functionality of pca_exp package. The main task of pca_exp is to take a set of different experimental measurements, preprocess them, perform principal component analysis (PCA) and present the result in a user friendly format. Let's start from the quick explanation of PCA and why it is useful in analysis of experimental data.

## PCA 

Let's suppose that our experimental measurements look as follows:


There is a visible change of behaviour at some parameter value $T$, but we are not exactly sure where. Principal component analysis will allow us to find the most common deviations from the average of these curves (presented as a dashed line), called principal components (PC). By looking at the projections of experimental measurements onto different PCs, and how those projections differ with parameter $T$, we will be able to better understand the transition seen on figure above.

First, we should import required modules:

In [6]:
import numpy as np
import matplotlib.pyplot as plt
from pca_exp.data_handler import DataHandler
from pca_exp.pca_machine import PCAMachine

We imported two important classes: "DataHandler" loads, stores and preprocess the experimental measurement data, and "PCAMachine" is responsible for performing PCA and showing the results.

The data on figure 1 should be in your working directory in the folder "exp_data_example". Each textfile in that folder correspond to different measurement and it has folowing format (in textfiles there are no headers):

|X|Y|ErrorY|
|---|---|---|
|$x_1$|$y_1$|$ey_1$|
|$x_2$|$y_2$|$ey_2$|
|...|...|...|
|$x_N$|$y_N$|$ey_N$|



"DataHandler" class can be used to load the textfiles if they are in similar format as the above. First, let's create instance of the class:

In [None]:
dh = DataHandler()

The class stores the data in its member "batches". To load all the data from "exp_data_example" folder we need to define few parameters:

In [7]:
loc = './exp_data_example/'            # loc specifies the location of textfiles

prenum = 'ede'                         # the class assumes that names of the textfiles in the folder 
stsp = (0, 9)                          # have form "prenum(NUM)ext" where NUM runs through integer numbers
ext = '.txt'                           # from stsp[0] to stsp[1].

There are some additional options to consider loading the files (changing delimiter, skiping first rows etc.) but this is enough for this tutorial (for more options check data_handler.py). Loading the files require only one call: 

In [None]:
dh.load_batch(stsp=stsp, loc=loc, prenum=prenum, ext=ext)

We can check that we loaded files properly by displaying them directly:

In [None]:
plt.figure(1)
plt.plot(dh.batches[0][:,:,0], dh.batches[0][:,:,1])
plt.show()

Looking at the curves above, we can see that the error seem to grow at later times. Useful tool in preprocessing such data is to re-bin the x-windows, so that each new bin captures approximately the same amount of standard deviation. This is done automatically by

In [9]:
dh.filter_data()

NameError: name 'dh' is not defined

You can see how it changed the data by plotting measurements again (on cell above). The last command for "DataHandler" is to prepare the data in right format using "prepare_XYE_PCA" function:

In [None]:
dh.prepare_XYE_PCA()