# Fitting Thermoelectric data
Models and data are from Danny/Kedar.

## Import Modules, Functions, and Data

`functions.py` has the Python implementations of all the helper functions (I used a previously written package, `fdint`, for the Fermi-Dirac integrals).

In [1]:
import numpy as np
from functions import *

In [2]:
celldata = {}
celldata['xdata'] = np.loadtxt('xdata.csv',delimiter=',')
celldata['ydata'] = np.loadtxt('ydata.csv',delimiter=',')
celldata['n'] = 40

## Validate Python implementation
I did a test evaluation in Matlab and Python with the same input parameters. Let's import the results and compare to make sure we're getting the same thing.

In [3]:
test_in = [-2.499946233286e1,1.833885014595e-3,-2.2588468610036e-3,8.6217332036812e-4]
test_y,test_S,test_Rou=tefunnew(celldata,test_in)
matlab_y = np.loadtxt('matlab_y.csv',delimiter=',')
matlab_S = np.loadtxt('matlab_S.csv',delimiter=',')
matlab_Rou = np.loadtxt('matlab_Rou.csv',delimiter=',')

In [4]:
y_pct_diff=(test_y-matlab_y)/matlab_y
print('There is an average of a %.2f%% difference (with a standard deviation of %.2f%%) between the Matlab and Python implementations in the y output.'%(round(100.0*np.mean(y_pct_diff),2),round(100.0*np.std(y_pct_diff),2)))

There is an average of a -3.27% difference (with a standard deviation of 0.05%) between the Matlab and Python implementations in the y output.


In [5]:
S_pct_diff=(test_S-matlab_S)/matlab_S
print('There is an average of a %.2f%% difference (with a standard deviation of %.2f%%) between the Matlab and Python implementations in the S output.'%(round(100.0*np.mean(S_pct_diff),2),round(100.0*np.std(S_pct_diff),2)))

There is an average of a -1.65% difference (with a standard deviation of 0.03%) between the Matlab and Python implementations in the S output.


In [6]:
Rou_pct_diff=(test_Rou-matlab_Rou)/matlab_Rou
print('There is an average of a %.3f%% difference (with a standard deviation of %.3f%%) between the Matlab and Python implementations in the Rou output.'%(round(100.0*np.mean(Rou_pct_diff),3),round(100.0*np.std(Rou_pct_diff),3)))

There is an average of a -0.051% difference (with a standard deviation of 0.002%) between the Matlab and Python implementations in the Rou output.


Okay, so the differences aren't nothing, but they're small enough that I think we can work with them.

## Fitting with Bayesim
Now let's do a fit to the data using the grid approach implemented in the `bayesim` code.
### Import Things

In [7]:
import sys
sys.path.append('../../')
import bayesim.model as bym
import bayesim.param_list as byp
import functions as tefcns # model functions implemented in a separate file to keep this notebook tidy
import deepdish as dd # for interacting with HDF5 files
from joblib import Parallel, delayed # to parallelize model computations

### Initialize
First, we set up the list of parameters to be fit and their ranges.

In [8]:
fp = byp.param_list()
"""
fp.add_fit_param(name='P0', val_range=[1e-34,1e-20], spacing='log', length=28, units='sec.')
fp.add_fit_param(name='fs', val_range=[-1,2], length=21, units='eV')
fp.add_fit_param(name='r', val_range=[-1,2], length=21)
fp.add_fit_param(name='Z', val_range=[-10,10], length=20)
"""
fp.add_fit_param(name='P0', val_range=[1e-34,1e-20], spacing='log', length=7, units='sec.')
fp.add_fit_param(name='fs', val_range=[-1,2], length=5, units='eV')
fp.add_fit_param(name='r', val_range=[-1,2], length=5)
fp.add_fit_param(name='Z', val_range=[-10,10], length=5)


Next, define the experimental conditions.

In [9]:
ec = ['T','R','n']

Now, set up the `bayesim.model` object. All we need to feed in are the parameters, experimental conditions, and name of the output variable.

In [10]:
m = bym.model(params=fp,ec=ec,output_var='P')

### Attach Experimental Observations
The next thing to do is to attach the observed data. I reformatted it to work with `bayesim` and saved an HDF5 file. You can see the format in the Excel sheet `TE_expt_data.xlsx`. Here I use only every third point (integer values of resistances) to speed up model computation and also because that's probably enough data.

In [11]:
#m.attach_observations(fpath='TE_expt_data.h5')
m.attach_observations(fpath='TE_expt_data_sparse.h5')

Identified experimental conditions as ['n', 'T', 'R']. If this is wrong, rerun and explicitly specify them with attach_ec (make sure they match data file columns) or remove extra columns from data file.


### Attaching the Model
Next, we attach the model. In this example I'll precompute the modeled data and attach a file with the outputs. You could also attach the function used to do the modeling, but the code can't currently parallelize those computations so I do it outside `bayesim` to take advantage of both cores on my laptop.
First we write out a file with the list of all simulation points. (it's good practice to write this out rather than keep it only as a Python object so we can pick up where we left off later)

This next cell should take about 30 seconds to evaluate, but if you don't want to do the model computations yourself you can skip it.

In [12]:
#m.list_model_pts_to_run('./sim_list.h5')

The code in the next cell will actually do the model computations. On my two-core laptop, it takes about 24 minutes to evaluate. Assuming your processor supports multithreading (almost all modern ones do), you should set `n_jobs` to be twice the number of cores on your machine if you want to run this cell efficiently.

You can also just skip this cell and instead evaluate the following one to just load in the results of the computation that I did. :)

In [13]:
#sim_list = dd.io.load('./sim_list.h5')
#outputs=Parallel(n_jobs=4,verbose=7)(delayed(tefunnew_singlept)(sim[1][m.ec_names],sim[1][m.param_names]) for sim in sim_list.iterrows())
#sim_list['P'] = outputs
#dd.io.save('sim_outputs.h5',sim_list)

In [15]:
sim_outputs = dd.io.load('sim_outputs.h5')

In [16]:
sim_outputs.query('abs(-8.000000-Z)/Z<1e-6 & abs(0.000000-P0)/P0<1e-6 & abs(-0.700000-fs)/fs<1e-6 & abs(-0.700000-r)/r<1e-6')

Unnamed: 0,P0,fs,r,Z,n,T,R,P


In [14]:
m.attach_model(mode='file',fpath='sim_outputs.h5')

abs(-8.000000-Z)/Z<1e-6 & abs(0.000000-P0)/P0<1e-6 & abs(-0.700000-fs)/fs<1e-6 & abs(-0.700000-r)/r<1e-6


IndexError: index 0 is out of bounds for axis 0 with size 0

In [17]:
m.fit_params

[{'edges': array([  1.00000000e-34,   1.00000000e-32,   1.00000000e-30,
           1.00000000e-28,   1.00000000e-26,   1.00000000e-24,
           1.00000000e-22,   1.00000000e-20]),
  'length': 7,
  'min_width': 1.5848931924611134,
  'name': 'P0',
  'spacing': 'log',
  'units': 'sec.',
  'val_range': [1e-34, 1e-20],
  'vals': array([  1.00000000e-33,   1.00000000e-31,   1.00000000e-29,
           1.00000000e-27,   1.00000000e-25,   1.00000000e-23,
           1.00000000e-21])},
 {'edges': array([-1. , -0.4,  0.2,  0.8,  1.4,  2. ]),
  'length': 5,
  'min_width': 0.06,
  'name': 'fs',
  'spacing': 'linear',
  'units': 'eV',
  'val_range': [-1, 2],
  'vals': array([-0.7, -0.1,  0.5,  1.1,  1.7])},
 {'edges': array([-1. , -0.4,  0.2,  0.8,  1.4,  2. ]),
  'length': 5,
  'min_width': 0.06,
  'name': 'r',
  'spacing': 'linear',
  'units': 'unitless',
  'val_range': [-1, 2],
  'vals': array([-0.7, -0.1,  0.5,  1.1,  1.7])},
 {'edges': array([-10.,  -6.,  -2.,   2.,   6.,  10.]),
  'length': 5