# Working with Pandas and XArray

This notebook is demonstrates how Pandas and XArray can be used to work with the QCoDeS Dataset. It is not meant as a general introduction to Pandas and XAarray. We refer to the official documentation for [Pandas](https://pandas.pydata.org/) and [XArray](http://xarray.pydata.org/en/stable/) for this. This notebook requires that both Pandas and XArray is installed.

## Setup

First we borrow an example from the Dataset notebook to have some data to work with. We split the measurement in two so we can try merging it with Pandas.

In [1]:
%matplotlib notebook
import pandas as pd
from functools import partial
import numpy as np
import matplotlib.pyplot as plt

import qcodes as qc
from qcodes.dataset.experiment_container import new_experiment
from qcodes.dataset.database import initialise_database
from qcodes.tests.instrument_mocks import DummyInstrument
from qcodes.dataset.measurements import Measurement

qc.logger.start_all_logging()

Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename       : C:\Users\jenielse\.qcodes\logs\command_history.log
Mode           : append
Output logging : True
Raw input log  : False
Timestamping   : True
State          : active


In [2]:
# preparatory mocking of physical setup
dac = DummyInstrument('dac', gates=['ch1', 'ch2'])
dmm = DummyInstrument('dmm', gates=['v1', 'v2'])
station = qc.Station(dmm, dac)

In [3]:
initialise_database()
new_experiment(name='tutorial_exp', sample_name="no sample")

tutorial_exp#no sample#136@C:\Users\jenielse/experiments.db
-----------------------------------------------------------

In [4]:
# For the 2D, we'll need a new batch of parameters, notably one with two 
# other parameters as setpoints. We therefore define a new Measurement
# with new parameters

meas = Measurement()
meas.register_parameter(dac.ch1)  # register the first independent parameter
meas.register_parameter(dac.ch2)  # register the second independent parameter
meas.register_parameter(dmm.v1, setpoints=(dac.ch1, dac.ch2))  # now register the dependent oone

<qcodes.dataset.measurements.Measurement at 0x1e4b47ae278>

In [5]:
# and we'll make a 2D gaussian to sample from/measure
def gauss_model(x0: float, y0: float, sigma: float, noise: float=0.0005):
    """
    Returns a generator sampling a gaussian. The gaussian is
    normalised such that its maximal value is simply 1
    """
    while True:
        (x, y) = yield
        model = np.exp(-((x0-x)**2+(y0-y)**2)/2/sigma**2)*np.exp(2*sigma**2)
        noise = np.random.randn()*noise
        yield model + noise

In [6]:
# and finally wire up the dmm v1 to "measure" the gaussian

gauss = gauss_model(0.1, 0.2, 0.25)
next(gauss)

def measure_gauss(dac):
    val = gauss.send((dac.ch1.get(), dac.ch2.get()))
    next(gauss)
    return val

dmm.v1.get = partial(measure_gauss, dac)

We then perform a very basic experiment. To be able to demonstrate merging of datasets in Pandas we will perform the measurement in two parts.

In [7]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(-1, 0, 200, endpoint=False):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v1.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v1, val))
            
    dataid = datasaver.run_id
df1 = datasaver.dataset.get_data_as_pandas_dataframe()['dmm_v1']

Starting experimental run with id: 478


In [8]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(0, 1, 201):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v1.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v1, val))
            
    dataid = datasaver.run_id
df2 = datasaver.dataset.get_data_as_pandas_dataframe()['dmm_v1']

Starting experimental run with id: 479


`get_data_as_pandas_dataframe` returns the data as a dict from measured (dependent) parameters to DataFrames. Here we are only interested in the dataframe of a single parameter so we selec that from the dict.

## Working with Pandas

Lets first inspect the Pandas DataFrame. Note how both dependent variables are used for the index. Pandas refers to this as a [MultiIndex](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html)

In [9]:
df1[:10]

Unnamed: 0_level_0,Unnamed: 1_level_0,dmm_v1
dac_ch1,dac_ch2,Unnamed: 2_level_1
-1.0,-1.0,-0.0005073949
-1.0,-0.99,0.0007058781
-1.0,-0.98,9.828259e-05
-1.0,-0.97,-0.000179598
-1.0,-0.96,-0.0003483247
-1.0,-0.95,-7.151994e-05
-1.0,-0.94,3.202385e-05
-1.0,-0.93,-8.596531e-07
-1.0,-0.92,1.450387e-06
-1.0,-0.91,4.319205e-07


We can also reset the index to return a simpler view where all data points are simply indexed by a running counter. As we shall see below this can be needed in some situations.

In [10]:
df1.reset_index()[0:10]

Unnamed: 0,dac_ch1,dac_ch2,dmm_v1
0,-1.0,-1.0,-0.0005073949
1,-1.0,-0.99,0.0007058781
2,-1.0,-0.98,9.828259e-05
3,-1.0,-0.97,-0.000179598
4,-1.0,-0.96,-0.0003483247
5,-1.0,-0.95,-7.151994e-05
6,-1.0,-0.94,3.202385e-05
7,-1.0,-0.93,-8.596531e-07
8,-1.0,-0.92,1.450387e-06
9,-1.0,-0.91,4.319205e-07


Pandas has build in support for various forms of plotting. This does however, not support MultiIndex at the moment so we use `reset_index` to make the data available for plotting.

In [11]:
df1.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v1')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1e4b4cfe1d0>

Merging two dataframes with the same labels is fairly simple.

In [12]:
df = pd.concat([df1, df2], sort=True)

In [13]:
df.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v1')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1e4b5914588>

It is also possible to select a subset of data from the datframe based on the x and y values.

In [14]:
df.loc[(slice(-1, -0.95), slice(-1, -0.97)), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,dmm_v1
dac_ch1,dac_ch2,Unnamed: 2_level_1
-1.0,-1.0,-0.0005073949
-1.0,-0.99,0.0007058781
-1.0,-0.98,9.828259e-05
-1.0,-0.97,-0.000179598
-0.995,-1.0,7.680241e-10
-0.995,-0.99,9.29848e-10
-0.995,-0.98,1.123969e-09
-0.995,-0.97,1.356443e-09
-0.99,-1.0,8.381701e-10
-0.99,-0.99,1.014774e-09


## Working with XArray

In many cases when working with data on a rectangular grids it may be more convenient to export the data to a [XArray](http://xarray.pydata.org) Dataset or DataArray

The Pandas DataSet can be directly converted to a XArray [Dataset](http://xarray.pydata.org/en/stable/data-structures.html?#dataset):

In [15]:
xaDataSet = df.to_xarray()

In [16]:
xaDataSet

<xarray.Dataset>
Dimensions:  (dac_ch1: 401, dac_ch2: 201)
Coordinates:
  * dac_ch1  (dac_ch1) float64 -1.0 -0.995 -0.99 -0.985 ... 0.985 0.99 0.995 1.0
  * dac_ch2  (dac_ch2) float64 -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0
Data variables:
    dmm_v1   (dac_ch1, dac_ch2) float64 -0.0005074 0.0007059 ... 1.039e-05

However, in many cases it is more convenient to work with a XArray [DataArray](http://xarray.pydata.org/en/stable/data-structures.html?#dataarray). The DataArray can only contain a single dependent variable and can be obtained from the Dataset by indexing using the parameter name.

In [17]:
xaDataArray = xaDataSet['dmm_v1']

In [18]:
xaDataArray

<xarray.DataArray 'dmm_v1' (dac_ch1: 401, dac_ch2: 201)>
array([[-5.073949e-04,  7.058781e-04,  9.828259e-05, ...,  5.451526e-07,
         4.808069e-07,  4.233782e-07],
       [ 7.680241e-10,  9.298480e-10,  1.123969e-09, ...,  5.951812e-07,
         5.249305e-07,  4.622315e-07],
       [ 8.381701e-10,  1.014774e-09,  1.226624e-09, ...,  6.495409e-07,
         5.728740e-07,  5.044485e-07],
       ...,
       [ 1.991485e-08,  2.411094e-08,  2.914449e-08, ...,  1.543304e-05,
         1.361144e-05,  1.198566e-05],
       [ 1.854251e-08,  2.244944e-08,  2.713612e-08, ...,  1.436954e-05,
         1.267347e-05,  1.115972e-05],
       [ 1.725783e-08,  2.089408e-08,  2.525605e-08, ...,  1.337397e-05,
         1.179541e-05,  1.038654e-05]])
Coordinates:
  * dac_ch1  (dac_ch1) float64 -1.0 -0.995 -0.99 -0.985 ... 0.985 0.99 0.995 1.0
  * dac_ch2  (dac_ch2) float64 -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0

In [29]:
fig, ax = plt.subplots(2,2)
xaDataArray.plot(ax=ax[0,0])
xaDataArray.mean(dim='dac_ch1').plot(ax=ax[1,0])
xaDataArray.mean(dim='dac_ch2').plot(ax=ax[0,1])
xaDataArray[200,:].plot(ax=ax[1,1])
fig.tight_layout()

<IPython.core.display.Javascript object>

Above we demonstrate a few ways to index the data from a DataArray. For instance the DataArray can be directly plotted, the mean extracted or a specific row/column selected.