Arni/extract henry coeff #100

Surluson · 2019-01-18T21:36:50Z

Added two functions (along with docstrings and tests):
extract_henry_coefficient:
Grabs a DataFrame, names of the Pressure column and Adsorption column, and the number of points it uses to extract the Henry coefficient.
I used MultivariateStats for a linear least squared regression (llsq). One issue I found was that the llsq function doesn't like the columns in the DataFrames because they're usually a Union between Floats and Missings.

fit_langmuir:
Here I linearize N = (MKP)/(1 + KP) and solve it with llsq as well. I check to see if the first pressure point is 0, because after I linearize the I divide by the pressure (which blows up if P=0)

SimonEnsemble · 2019-01-19T17:50:50Z

very nice and clean code!

Could you also return information about the goodness of fit, e.g. the error? This way we can assess the fit systematically. Does this package provide error bounds on the slope? So KH = 12 +/- 3.0? If so, that would be really helpful for assessing goodness of fit and automatically selecting how many points to include.
For Langmuir fitting, I typically minimize sum over i pressure points:
[n_i - M * K * p_i / (1 + K * p_i)]^2
which is a non-linear optimization problem. You first linearized the Langmuir isotherm, which will not minimize the above. I haven't thought much about which is better (it is probably context dependent). Do you know the difference between the two methods? I imagine one will over-emphasize high or low pressure data points? This well-cited paper suggests that the linearized version (while easier) is not a good way to go: https://pubag.nal.usda.gov/download/5945/PDF The Optim.jl package is good for non-linear optimization.

Surluson · 2019-01-20T00:26:17Z

I added a RMSE error between the points that were fitted to the slope for now. Error bounds for the slope would require us to assume some error for the datapoints no?

SimonEnsemble · 2019-01-22T20:18:24Z

The RMSE is smallest if you include two points, so based on RMSE, you would always choose two points to estimate the Henry coefficient. There is a tradeoff here though: include more points might mean more confidence in the slope estimate, but also might be less confidence if you are outside of the Henry regime. So how to systematically choose the number of points to include in a Henry coefficient calculation?

The error estimate in the slope is based on Gaussian distributed noise, but the variance of that Gaussian is inferred from the data. See here: https://www.chem.utoronto.ca/coursenotes/analsci/stats/ErrRegr.html I'm not 100% sure the estimate of error in slope is the way to go, but certainly RMSE is not suitable on its own since that always leads to the conclusion to use only two points.

Code looks good except in practice you need a more automatic guess for K and M; I suspect rarely will it converge to a global min with your default guess for K and M; it will get stuck in local min often. In pyIAST, I use the last data pt times 1.1 as the saturation loading, and use the first data points to get a Henry coefficient, giving K = M * KH. But you already thought of a better way to get a good starting point! you can keep your linearized function for Langmuir fitting. Use those as starting params for the nonlinear fitting routine! 👍 Maybe it would be beautiful to keep one function with method=:nonlinear default option passed, which calls method=:linear for starting params.

SimonEnsemble · 2019-03-02T19:06:38Z

@Surluson this might be stale, could you re-make a pull request now that we have Travis working?

Also it looks like the Langmuir guess is poor starting point; we can do better than having a default guess of M=1.0; this probably will only rarely converge. If you guess M as say 1.1 * maximum(df[:P]) then it will be more robust. Then you can estimate Henry coefficient from the first point, then get the Langmuir K from that as a better estimate. (this is simpler, two lines of code, than the linear fitting I suggested above)

…. Added tests

SimonEnsemble · 2019-03-09T02:56:43Z

@Surluson checks failed.

looks like there are commented out lines there?
i just realize we can generalize this, since there is a lot of repeating code. how about model can be henry or langmuir, then we hv one function fit_isotherm(df, pressure_col_name, n_col_name, model) then we hv _guess(df, model) that guesses the params specific to each model. the guess for the henry coeff can be the slope based on the first data pt. for the langmuir, slope of the first data pt, then 1.1 times max value of adsorbed gas in the data. So instead of using the llsq function, we use the non-linear fitting routine for generality. the objective function is just different for every isotherm. this will allow us to incorprate all sorts of isotherm models. and it is as easy as adding if method == :mymodel then adding a guess and writing the objective function and functional form for that model.

Surluson · 2019-03-09T20:02:19Z

It seems all tests failed because MultivariateStats wasn't in the REQUIRE file.
And I really like that idea, I'll make the changes asap

…y or langmuir

…one with Optim.jl now

SimonEnsemble · 2019-04-09T15:47:52Z

@Surluson to get this PR merged how about we simplify it and

return the MSE without trying some way of normalizing it
assume the user will trim the dataframe themselves if they don't want any points to be used in the fitting (i.e. do not pass n_pts as the number of points to use or, more ambitiously, choose n_pts automatically for Henry fits)

SimonEnsemble

@test isapprox(henry, 0.9688044280548711) what is the rationale of why it should be 0.96? Can you make a comment in the test to explain?
what is the point of the prepend!() of zeros?
should we use "H" as a name for the Henry coefficient to distinguish it from "K" since the Langmuir constant is not the Henry coefficient in the Langmuir model?
if θ0["K0"] is the guess returned from the guessing function, why are we multiplying it by 1.1 in the optimize function for the Henry coeff? Should we build the 1.1 into the guess function instead?
n[2]/p[2] etc should be n[2] / p[2] for style, so we can't look at a piece of code and tell who wrote it
seems you can save some lines of code in _guess since the henry coefficient needs to be guessed for both models.
the henry constant in the langmuir model is not equal to K, so the guess for K0 for the Langmuir model is not correct, shouldn't it be multiplied by M? so as stated above you could get the henry coeff guess then multiply it by M for the Langmuir K. H = K * M in the Langmuir model if you write it out

Surluson · 2019-04-10T22:06:39Z

I've made some edits to the code.
As for the first point: This is a value I used after calculating the slope from the first 3 points. At the time, the code was also trying to find the perfect number of data points to use for the fitting, so this was also used to make sure exactly 3 data points were used for the fitting procedure.

SimonEnsemble

what does min_mse do?
what does henry_tol do?

these seem to be artifacts of automatically choosing the number of points?

Surluson · 2019-04-11T18:16:40Z

Yeah sorry, those shouldn't be there. They've been removed

…tion_isotherm, make a few comments

SimonEnsemble · 2019-04-11T22:28:36Z

fit_isotherm --> fit_adsorption_isotherm
simplified Henry test (thought we didn't need to load in a .csv for that and have an additional file laying around just for this simple test when we could put the data in the code)
added comments to explain guesses, plus comment to explain what _guess is doing.

please review my changes and approve, then we can merge the PR; it looks good to me!

Surluson · 2019-04-11T22:55:39Z

Everything looks good to me 👍
You'll have to approve the changes though, because I made the pull request.

Surluson added 3 commits January 18, 2019 12:21

Add extract_henry_coefficient function along with tests for it

362c877

Added a functionality to fit a isotherm to a basic langmuir model

a006a8f

Added a new test module, misc_test.jl

1b72f69

Add a RMSE error to and changed the langmuir fit procedure

ca216a2

Update initial guesses for langmuir fit and allow for a linear method…

c0f78aa

…. Added tests

Surluson added 9 commits March 9, 2019 14:12

Redid the fit_isotherm functionality. Now takes in model, suchas henr…

eb2809c

…y or langmuir

Remove dependancy on MultivariateStats. Fitting procedure is purely d…

4749b26

…one with Optim.jl now

Reworked tests to apply correctly to the new fitting procedure

871b2f3

Fixed merger

9767501

Fixed mistake in misc_test.jl

6d5295a

Removed old functions and added docstring to

4b403e0

Add Optim.jl to REQUIRE

cac38ef

Removed old call to MultivariateStats

b7ca417

Changed how the scaled rmse is scaled

779c310

Simplified the fitting procedure for Henry. Func now returns a Dict

1d95b9a

SimonEnsemble reviewed Apr 10, 2019

View reviewed changes

Fixed error in Langmuir guess. Reused more code.

938523b

SimonEnsemble reviewed Apr 11, 2019

View reviewed changes

Removed old artifacts from code

28a3115

CorySimon added 2 commits April 11, 2019 15:07

remove unneccesariy isotherm data, change function name to fit_adsorp…

7f68e70

…tion_isotherm, make a few comments

update doc string function name

59679f4

SimonEnsemble approved these changes Apr 11, 2019

View reviewed changes

SimonEnsemble merged commit 6f759f3 into master Apr 11, 2019

SimonEnsemble deleted the arni/extract_henry_coeff branch April 11, 2019 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arni/extract henry coeff #100

Arni/extract henry coeff #100

Surluson commented Jan 18, 2019

SimonEnsemble commented Jan 19, 2019

Surluson commented Jan 20, 2019

SimonEnsemble commented Jan 22, 2019

SimonEnsemble commented Mar 2, 2019

SimonEnsemble commented Mar 9, 2019

Surluson commented Mar 9, 2019

SimonEnsemble commented Apr 9, 2019

SimonEnsemble left a comment

Surluson commented Apr 10, 2019

SimonEnsemble left a comment

Surluson commented Apr 11, 2019

SimonEnsemble commented Apr 11, 2019

Surluson commented Apr 11, 2019

Arni/extract henry coeff #100

Arni/extract henry coeff #100

Conversation

Surluson commented Jan 18, 2019

SimonEnsemble commented Jan 19, 2019

Surluson commented Jan 20, 2019

SimonEnsemble commented Jan 22, 2019

SimonEnsemble commented Mar 2, 2019

SimonEnsemble commented Mar 9, 2019

Surluson commented Mar 9, 2019

SimonEnsemble commented Apr 9, 2019

SimonEnsemble left a comment

Choose a reason for hiding this comment

Surluson commented Apr 10, 2019

SimonEnsemble left a comment

Choose a reason for hiding this comment

Surluson commented Apr 11, 2019

SimonEnsemble commented Apr 11, 2019

Surluson commented Apr 11, 2019