Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian Model for competition assay #127

Merged
merged 15 commits into from
Apr 12, 2019
Merged

Bayesian Model for competition assay #127

merged 15 commits into from
Apr 12, 2019

Conversation

sonyahanson
Copy link
Contributor

First added simple competition model to bindingmodels.py

@sonyahanson
Copy link
Contributor Author

Added example notebook for new competition bayesian model also compares to very slow general binding model.

@jchodera
Copy link
Member

jchodera commented Dec 9, 2017

I looked through 4-Analytic-v-General-Bayesian-Model.ipynb but I have no idea what it is showing.

@sonyahanson
Copy link
Contributor Author

It basically shows that the AnalyticalBindingModel (link below) works, and is better and faster than the general binding model.
https://github.com/choderalab/assaytools/blob/Competition_Bayes/AssayTools/bindingmodels.py#L123-L208

Let me know if you want to go through it in person.

@sonyahanson
Copy link
Contributor Author

Going to add a function in our analytical binding model to prevent this error from happening (in ipynb):

/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/bindingmodels.py:195: RuntimeWarning: invalid value encountered in arccos
  theta = np.arccos((-2*(a**3)+9*a*b-27*c)/(2*np.sqrt((a**2-3*b)**3)))

@sonyahanson
Copy link
Contributor Author

This seems to have worked just fine: no more error and the DeltaG trace is a bit less crazy:
before-v-after
(Orange is the trace of the DeltaG of our competitor.)

Note that there are still other errors popping up in the notebook:

/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:240: RuntimeWarning: overflow encountered in exp
  scaling[large_indices] = (1 - np.exp(-alpha[large_indices])) / alpha[large_indices]
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/ipykernel_launcher.py:178: RuntimeWarning: overflow encountered in exp
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:95: RuntimeWarning: invalid value encountered in true_divide
  mu = np.log(mean**2 / np.sqrt(stddev**2 + mean**2))
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:72: RuntimeWarning: invalid value encountered in multiply
  tau = 0*mean
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:86: RuntimeWarning: divide by zero encountered in reciprocal
  tau[small_indices] = (x[small_indices] - x[small_indices]**2/2 + x[small_indices]**3/3 - x[small_indices]**4/4 + x[small_indices]**5/5)**(-1)

does this function need to be vectorised?

I used map to do what I think you are alluding to here, perhaps vectorizing would be better...

@sonyahanson
Copy link
Contributor Author

So right now this PR includes the addition of the Analytic Competition Binding model to bindingmodels.py, to be complete it should probably also include the appropriate changes to the make_model function in pymcmodels.py.

For now I've just been making my own simplified version of this make_model function in ipynb's for testing.

One reason I have yet to incorporate this into the real pymcmodels.py is that I'm still seeing the errors mentioned in the comment above, and am not sure if they are related to numerical instabilities and can be easily corrected.

Furthermore I have been playing a bit with simulated data, and the fits seem to get stuck in a way that the real experimental data isn't exhibiting (see stuckness below). The latest commit in this PR, is an ipynb example of this attempt at analyzing simulated data. Perhaps this is also related to some numerical stability that can be easily fixed?

deltag_stuck

@jchodera
Copy link
Member

Furthermore I have been playing a bit with simulated data, and the fits seem to get stuck in a way that the real experimental data isn't exhibiting (see stuckness below). The latest commit in this PR, is an ipynb example of this attempt at analyzing simulated data. Perhaps this is also related to some numerical stability that can be easily fixed?

I think this is the issue we talked about on Wednesday. Does your simulated data have experimental measurement noise in it? Or is it still getting stuck even with that noise?

@sonyahanson
Copy link
Contributor Author

It has noise, but it is possible it is not in the right form...

@sonyahanson
Copy link
Contributor Author

@jchodera did you ever get a chance to look at the numerical instabilities in this model and see if there are any easy fixes?

@jchodera
Copy link
Member

Not yet, but I hope to dive in on Tuesday!

@sonyahanson
Copy link
Contributor Author

Note, setting F_PL does not seem to prevent bistability. This is the delG trace when using a prior on F_PL:
ima_delg_trace-fpl

@sonyahanson
Copy link
Contributor Author

So I chatted with @chaya about some potential numerical stabilities in the analytical binding model, and these were the candidates we came up with:

@jchodera do you have any opinions about either of these?

@steven-albanese
Copy link
Member

np.exp and np.sqrt would be unstable for any very big or small numbers, I'm not that suspicious of anything involving np.exp, but to check np.sqrtwe should look at what values we are usually getting for a and b

For mathematical functions in pymc3, they recommend using theano's built in functions. Is there something equivalent for pymc, or is that numpy?

@sonyahanson
Copy link
Contributor Author

@jchodera did you get a chance to take a look at these numerical instabilities? I'll probably take a stab this week if you have any quick suggestions.

@sonyahanson
Copy link
Contributor Author

Since it has gotten burried in other comments, here's a rehash of errors that I think are related to the numerical stability of this competition binding model. These are the ones I have come across in a trial run today to try and get some more insight into this, e.g. by looking at other traces besides just delG.

pymcmodels.py:240: RuntimeWarning: overflow encountered in exp
  scaling[large_indices] = (1 - np.exp(-alpha[large_indices])) / alpha[large_indices]
  
ipykernel_launcher.py:126: RuntimeWarning: overflow encountered in exp

pymcmodels.py:72: RuntimeWarning: invalid value encountered in multiply
  tau = 0*mean
  
pymcmodels.py:86: RuntimeWarning: divide by zero encountered in reciprocal
  tau[small_indices] = (x[small_indices] - x[small_indices]**2/2 + x[small_indices]**3/3 - x[small_indices]**4/4 + x[small_indices]**5/5)**(-1)
  
pymcmodels.py:95: RuntimeWarning: invalid value encountered in true_divide
  mu = np.log(mean**2 / np.sqrt(stddev**2 + mean**2))

@jchodera
Copy link
Member

It appears that, somehow, these quantities are going negative:

    ELC_ex = epsilon_ex*path_length*concentration
    ELC_em = epsilon_em*path_length*concentration

I'm not sure how this can happen, since all of these quantities must be positive.

@jchodera
Copy link
Member

Specifically, it looks like the concentrations are going negative in the notebook!

@jchodera
Copy link
Member

They're not just a little negative either:

AssertionError: concentration should not be negative, but are: [ 3.29791690e-13 -1.18127773e+02  1.08564837e-07  7.92493225e-14
  9.20966796e-03 -1.82372674e+01  5.82184201e-10  3.67619973e+01
  1.51951215e-09  2.74082464e-03  2.72987858e-11  0.00000000e+00]

That's -118 Molar!

@jchodera
Copy link
Member

@sonyahanson : OK, the problem is your binding model:

    221         # Check all concentrations are nonnegative
--> 222         assert np.all(P >= 0)
    223         assert np.all(L >= 0)
    224         assert np.all(PL >= 0)

I've added some assertions to the model that check to make sure the concentrations are all nonnegative, and this is failing. Since you derived this model, can you fix it?

This is the problematic function.

I'd suggest switching to the GeneralBindingModel I wrote.

@jchodera
Copy link
Member

jchodera commented Mar 25, 2019

Based on our chat today:

@jchodera will

@sonyahanson will

  • Make sure the CompetitionBindingModel code implements the equations---we suspect there is a small error leading to large negative concentrations

@jchodera
Copy link
Member

@sonyahanson: I think I've sorted out the issues we discussed today!

I've committed some changes that make the standard TwoComponentBindingModel a more robust to negative concentrations.

I've also discovered the issue with the warnings from your use of GeneralBindingModel! The issue was that it only accepts log concentrations, but you were feeding it log(0) since some wells had zero ligand concentration in them. For now, I've set these to be the log of 1/1000 the smallest nonzero concentration, but I think I can modify the API for the general binding model to accept and return concentrations instead of log concentrations if it's inconvenient to be careful to feed it valid logarithms.

I've checked in an updated Jupyter notebook with my changes, but note that I haven't fixed your CompetitiveBindingModel, so that part of the notebook still fails.

I've created an issue for me to implement a simpler API for you:
#141

The GeneralBindingModel is still super slow. I'll have to see how we can speed that up.

@sonyahanson
Copy link
Contributor Author

Hmmm. Travis is still failing, but for a different reason, now...

@jchodera
Copy link
Member

jchodera commented Mar 26, 2019

Looks like a few things are going on here:

  • linux builds are failing with:
/home/travis/miniconda3/envs/test/lib/python2.7/site-packages/assaytools/bindingmodels.py:94: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  Ltot = np.zeros(Ltot, dtype)
unexpected array size: size=1, arr_size=0, rank=1, effrank=0, arr.nd=24, dims=[ 1 ], arr.dims=[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
  • osx builds are failing with
  File "/Users/travis/miniconda3/envs/test/lib/python2.7/site-packages/assaytools/pymcmodels.py", line 497, in top_complex_fluorescence_model
    [P_i, L_i, PL_i] = TwoComponentBindingModel.equilibrium_concentrations(DeltaG, Ptrue[:], Ltrue[:])
  File "/Users/travis/miniconda3/envs/test/lib/python2.7/site-packages/assaytools/bindingmodels.py", line 93, in equilibrium_concentrations
    Ptot = np.zeros(Ptot, dtype)
TypeError: 'numpy.float64' object cannot be interpreted as an index

I'll try to debug.

@jchodera
Copy link
Member

@sonyahanson : I think I've fixed the bug that prevented travis tests from running through.
Now I'm seeing some conda HTTP errors, so I've retriggered the builds that failed because of that---the others pass.

I understand we may be seeing lots of HTTP errors in the future until we migrate to conda-forge, so we may want to think of migrating assaytools to conda-forge and using only conda-forge dependencies as soon as things are a bit more stable.

@jchodera
Copy link
Member

Tests pass now @sonyahanson

@jchodera
Copy link
Member

@sonyahanson : I think we should be able to merge your PR!

@sonyahanson
Copy link
Contributor Author

Sure, will do. I just climbed out of a grant resubmission whole so still want to check on if there is an obvious answer for why you are seeing negative concentrations with the analytical binding model, but I can do that in a new PR.

@sonyahanson sonyahanson merged commit 8181150 into master Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants