Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: issue 659: DCD Python 3 #682

Closed
wants to merge 14 commits into from
Closed

Conversation

tylerjereddy
Copy link
Member

This is a work in progress branch to get DCD code working with Python 3 #659. The git history will be messy as I am incorporating changes from simpletraj, mixing in my own modifications, and dealing with 0.10 to 0.11 deprecations on both the Python and C API side (simpletraj was modified to use DCD reading only with Python 3 + deprecated MDA syntax--i.e., numframes was causing issues on the C API side).

As of my latest commit after executing the following in Python 3.5:

import MDAnalysis
import MDAnalysis.coordinates.DCD
from MDAnalysis.tests.datafiles import PSF, DCD
u = MDAnalysis.Universe(PSF, DCD)
print('DCD universe:', u)
print('DCD universe atoms:', u.atoms)
print('DCD trajectory:', u.trajectory)

The output obtained (ignoring the built-in MDA python 3 caution message):

DCD universe: <Universe with 3341 atoms and 3365 bonds>
DCD universe atoms: <AtomGroup with 3341 atoms>
DCD trajectory: <DCDReader /Users/treddy/.local/lib/python3.5/site-packages/MDAnalysisTests-0.14.0.dev0-py3.5.egg/MDAnalysisTests/data/adk_dims.dcd with 98 frames of 3341 atoms>

It is not currently possible to iterate over a trajectory (and writing is almost certainly not possible yet either), but this is a first step.

…is and reading in a DCD trajectory in Python 3.5. Can probe the atoms in the trajectory, but can't iterate over the trajectory frames yet.
@orbeckst
Copy link
Member

@tylerjereddy , thanks for taking it on!

If there are any modifications that originated with simpletraj, can you please rewrite your history so that the original author appears as the author in the commit (and then add them to AUTHORS). If the commits are not cleanly separable then please still make sure that everyone is properly credited.

Once it all works we can rewrite history and compact it.


"""
from __future__ import absolute_import, division, print_function, unicode_literals
from __future__ import absolute_import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if you could keep the imports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, many / most of the docstrings got obliterated in the frenzy of trying to 'get it working' as well. I think Alexander had removed some content (including docstrings, Writer object, etc.) he didn't need, so the workflow was something like taking his versions of the dcd python / c modules, then restoring some critical / current MDA code to register the reader and deal with the API changes from the last few months, and then start pasting the DCD Writer and Timeseries objects back in. I suspect there is a smart way to progressively merge in changes from different repos / branches at certain line numbers only, but the extra 'git fu,' at this stage, would probably just detract from my attempts to engineer a solution.

I suspect I'll have to try to get it working first and then there will be some pretty annoying damage control on assigning credit and making sure I don't get credited for just pasting stuff that already existed, etc. Conversely, the disregard for making gradual changes in the workflow could end up making the overall task take longer in the end!

@tylerjereddy
Copy link
Member Author

Python 3.5 DCD trajectory iteration appears to have been restored now as well. Still plenty left to do though.

@tylerjereddy
Copy link
Member Author

When loading in a CRD format trajectory, there is clearly an issue placing the coordinates of the first frame over the initial PSF coordinates:

u = MDAnalysis.Universe(PSF, CRD)

Python 3.5 debug (print statement) output:

**new.shape: (3341,)
**new[0]: <map object at 0x105858940>
**new positions: [array(<map object at 0x105858940>, dtype=object)
 array(<map object at 0x1058589b0>, dtype=object)
 array(<map object at 0x105858b38>, dtype=object) ...,
 array(<map object at 0x105b33fd0>, dtype=object)
 array(<map object at 0x105b360b8>, dtype=object)
 array(<map object at 0x105b36160>, dtype=object)]

Python 2.7 debug (print statement) output:

('**new.shape:', (3341, 3))
('**new[0]:', array([-11.921,  26.307,  10.41 ]))
('**new positions:', array([[-11.921,  26.307,  10.41 ],
       [-11.447,  26.741,   9.595],
       [-12.44 ,  27.042,  10.926],
       ..., 
       [-12.616,  28.099,  21.258],
       [-13.786,  28.568,  21.198],
       [-12.417,  26.877,  21.494]]))

The same row count is a good sign. Taking a wild guess--the map objects may reflect the fact that Python 3 returns iterators instead of lists in a lot of cases now.

@tylerjereddy
Copy link
Member Author

Ok, unpacking the map iterator in CRD.py restores the normal functionality it seems.

@jbarnoud
Copy link
Contributor

It looks indeed like an array of map generator. The easy solution is to wrap the map in a list function. The clean way would be to replace the map by a list comprehension. Depending on where the map is, it may not even be needed. map is often used in constructs like np.array(map(float, line.split())), this convoluted construct can be replaced by np.array(line.split(), dtype=float).

In the time I took to write this comment, the last commit poped up. And indeed, the map in part of the convoluted construct.

@tylerjereddy
Copy link
Member Author

@jbarnoud @hainm Thanks, I've removed the spurious usage of the map function in that module to clean things up.

@tylerjereddy
Copy link
Member Author

Note that I had to modify dcdtimeseries.c to get things going. I just remembered this is actually compiled from Cython so having to manually edit that rather than making the upstream Cython change is probably not ideal from a maintainability standpoint. It is proving much more difficult to achieve the appropriate result with the Cython file itself (i.e., to compile down and avoid the deprecated PyCObject stuff). I'll post the issue here if I'm stuck for a long time on it, but I think you'll agree the changes really do need to happen in .pyx file in the long-term.

…coordinate / trajectory writing. Supersedes previous changes to the compiled dcdtimeseries.c
coordinate / trajectory reading. Supersedes previous changes to the compiled dcdtimeseries.c
@tylerjereddy
Copy link
Member Author

Ok, it looks like I've re-enabled DCD coordinate / trajectory reading with Cython-level changes in the pyx file instead of the manual hack on the compiled c.

@kain88-de
Copy link
Member

which cython version are you running. It should produce py2/py3 compatible c files. They even promote that feature saying cython is the fastest way to get py2/py3 compatibility.

@tylerjereddy
Copy link
Member Author

I'm using the latest Cython (0.23.4), as also indicated at the top of the generated .c file. No, it definitely does not deal with the use of deprecated / now-removed PyCObject type declarations. You can find some examples of people complaining about this stuff on various mailing lists, but the devs just say that the changes could not be avoided and were deprecated for a sufficient amount of time. Anyway, it is just a few extra if statements checking Python version here and there. If you don't put them, those deprecated objects will be generated and Python 3 will have no idea what to do with them.

@tylerjereddy
Copy link
Member Author

Note that this is related to the C API itself more than Cython proper, they just happen to be objects used in Cython for our project. See: https://docs.python.org/2/c-api/cobject.html

@tylerjereddy
Copy link
Member Author

The current challenge is to deal with this:
DCDReader._read_dcd_header = lambda self: _dcdmodule._read_dcd_header( self ) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 120: invalid continuation byte

which probably points to the PyUnicode_AsUTF8 code block in _read_dcd_header in dcd.c

I'm also generally observing some issues at a very basic level with TypeError: expected string or Unicode object, file found-- with Unicode-related problems not surprising me in a Python 2/3 situation.

@tylerjereddy
Copy link
Member Author

The UTF-8 issue seems to pop up when dealing with reading / writing triclinic unit cell data (Issue #187).

@backpropper
Copy link
Contributor

I am working on this issue. Can someone guide me through what is already done and where to start?

@tylerjereddy
Copy link
Member Author

@abhinavgupta94 The discussion above suggests that I managed to get DCD trajectory reading and iteration going (tentatively), but that trajectory writing remained problematic. You could go through the discussion, look at the commits in this PR to see what code I was modifying, and maybe look at what Alexander Rose did in his package (see the original issue)--which I basically modified for MDAnalysis proper.

Beyond that, you could simply install MDAnalysis from this branch with Python 3 and try reading / writing DCD files to see what kinds of problems crop up, and then try to squash them systematically.

@dotsdl
Copy link
Member

dotsdl commented Jul 18, 2016

What's the status of this? From the recent talk given by @orbeckst we'd like to be fully Python 3 by the end of the year, and to my knowledge this is the last barrier.

@tylerjereddy
Copy link
Member Author

@dotsdl Unless someone else has been working on it I don't think there has been any additional progress. I still have some extra information in my lab notes from debugging work on this. Initially someone else was going to take over, but that was probably fuelled by Google Code excitement.

I can try to refocus my efforts on this again, but it will be volunteered time and I can't guarantee an expeditious resolution. If another core dev has more free time they are certainly free to take over (nobody is currently assigned).

@dotsdl
Copy link
Member

dotsdl commented Jul 20, 2016

@tylerjereddy in the interest of getting MDAnalysis feature-complete for Py3, I can probably take this on this winter (Nov-Dec). Depends on what @orbeckst needs from me, though.

@dotsdl dotsdl self-assigned this Jul 20, 2016
@kain88-de
Copy link
Member

superseded by #929

@kain88-de kain88-de closed this Nov 27, 2016
@kain88-de kain88-de deleted the issue-659-DCDpy3-3 branch January 24, 2018 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants