Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffusion Map Implementation in MDAnalysis #857

Closed
jdetle opened this issue May 20, 2016 · 11 comments
Closed

Diffusion Map Implementation in MDAnalysis #857

jdetle opened this issue May 20, 2016 · 11 comments

Comments

@jdetle
Copy link
Contributor

jdetle commented May 20, 2016

Hi All,
As far as I'm concerned, I'd like to go ahead and refactor @euhruska's work instead of starting off from scratch. So far it looks like he did a good job in implementing all the principle components (see what I did there?) of a diffusion map algorithm, but there are some changes and improvements that can definitely be made:

Refactor to satisfy Bauhaus Style
The current implementation is left as a function and needs to be remade to inherit from BaseAlign,

Parallelize Trajectory Analysis
There are a few routes to go about doing this. I think it would be interesting to try using deco, as it could lay the groundwork for doing this simply in other analysis modules. (Can _single_frame() be made easily concurrent? Or is this limited by how our Reader is working?) If that doesn't work quickly, we could move on and work on something using cython and the like, as demonstrated by this.

Optimize memory allocation
Find all possible areas to prevent copying of arrays where possible, unfortunately the eigenvalue
problem is one area where we need to load an entire matrix into memory.

Provide API for introduction of new metric
A metric function should be able to be provided as an optional argument in initialization, with rmsd used if none is given. RMSD is a somewhat non-useful metric from the literature I read.

Perform thorough testing on real-world data
The most important part of this work is to make sure it actually works, using @euhruska's PR rather than starting from scratch again gives us more time to validate it against data and make sure everything is performing as it should. Of course, I would also work on covering the code.

@jbarnoud
Copy link
Contributor

Can _single_frame() be made easily concurrent? Or is this limited by how our Reader is working?

The question got discussed in several threads. The most relevant being #618 and #719. I also invite you to have a look at MDReader from @mnmelo, as it can already do some parallelization.

There is room for a lot more discussion on the matter. If I well remember, some of the issues we had are fixed in the new topology scheme or by #831.

@jdetle
Copy link
Contributor Author

jdetle commented May 20, 2016

Thanks Jonathan! This is a bit of a rabbit's hole to fall into but I'm doing my best to take notes and take in the vast amount of discussion going on here. Obviously there will be some work to be done.

@jdetle
Copy link
Contributor Author

jdetle commented May 23, 2016

So far the threads of interest seem to be:
#719 --> #314, #618
#618 --> #363, #643, stack overflow on multiprocessing

#293, #797, topology improvements multicore by Tyler Reddy seem all useful as well.

I am getting started on refactoring as we speak.

@kain88-de
Copy link
Member

Parallelize Trajectory Analysis

lets get it working first. This can be done in a later PR. The same goes for the memory optimization

Provide API for introduction of new metric

This should have a high priority. RMSD is OK but in often other metrics are better suited.

@jdetle
Copy link
Contributor Author

jdetle commented May 23, 2016

First priority is getting it working first definitely. On that and
finishing align today.

On Mon, May 23, 2016 at 8:26 AM kain88-de notifications@github.com wrote:

Parallelize Trajectory Analysis

lets get it working first. This can be done in a later PR. The same goes
for the memory optimization

Provide API for introduction of new metric

This should have a high priority. RMSD is OK but in often other metrics
are better suited.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#857 (comment)

Have a wonderful day,
John J. Detlefs

@orbeckst orbeckst changed the title Proposal: Diffusion Map Implementation in MDAnalysis Diffusion Map Implementation in MDAnalysis Jun 6, 2016
@orbeckst orbeckst added this to the 0.16.0 milestone Jun 6, 2016
@orbeckst
Copy link
Member

@jdetle, did you use @euhruska 's code as a basis for this PR (commit a142422 says "refactor @euhruska diffusion map")? If so, he should also be an author in the CHANGELOG, AUTHORS, and the docs.

@orbeckst
Copy link
Member

Ok – I looked into the commits and @euhruska is definitely an author. I will fix this and directly commit to develop.

@jdetle
Copy link
Contributor Author

jdetle commented Jul 22, 2016

Holy crap!!! If I didn't attribute him anywhere it certainly wasn't
intentional.

On Fri, Jul 22, 2016 at 4:09 PM Oliver Beckstein notifications@github.com
wrote:

Ok – I looked into the commits and @euhruska https://github.com/euhruska
is definitely an author. I will fix this and directly commit to develop.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#857 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKcARrMQrGX5KQGihdG7TymV_LzMK76Uks5qYU2lgaJpZM4Ii4ov
.

Have a wonderful day,
John J. Detlefs

@jdetle
Copy link
Contributor Author

jdetle commented Jul 22, 2016

I could have sworn he was in CHANGELOG and docs, perhaps I overlooked AUTHORS

@orbeckst
Copy link
Member

I did not check the docs. With d00244c he's now in CHANGELOG and AUTHORS.

Apologies to @euhruska !

@jdetle
Copy link
Contributor Author

jdetle commented Jul 22, 2016

Ditto! Sorry @euhruska, completely unintentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants