Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code for manipulating CJSON #712

Closed
ghutchis opened this issue Jul 30, 2021 · 13 comments · Fixed by #1427
Closed

Python code for manipulating CJSON #712

ghutchis opened this issue Jul 30, 2021 · 13 comments · Fixed by #1427

Comments

@ghutchis
Copy link
Member

CJSON often has large arrays for performance reasons. Sometimes it's nice to have representations with lists of atoms with x,y,z.

IIRC, there's some Python code for interconverting CJSON representations and atom[i]["coordinates"] = [x, y, z]

We should add code like this to the python module to facilitate manipulating / creating CJSON representations.

@ghutchis
Copy link
Member Author

@MSoegtropIMC if you have some particular suggestions / requests, please leave some comments here.

@MSoegtropIMC
Copy link

I certainly can make a suggestion for a Python class structure to represent CJSON. I wonder if there is some official documentation for CJSON - I couldn't find any.

Are you sure that this one big array method is really noticeably faster or is this an assumption? At some point one has to do something with the read in data, so I would think at some point in time one has to take it apart anyway. To me this looks a bit like saving time once during read but spending it multiple times later. Also I would think that for really large files which come in from the disk, constructing python objects is still faster than reading JSON from disk.

@adityaomar3
Copy link
Contributor

adityaomar3 commented Sep 9, 2023

Can you please help me out with code or file in which changes have to be made also what does 'atom[i]["coordinates"] = [x, y, z]' repreasants. @ghutchis

@ghutchis
Copy link
Member Author

Here's an example CJSON file (gzipped for GitHub):
h2o.cjson.gz

{
  "atoms": {
    "coords": {
      "3d": [
        -1.1664071083068848,
        -0.06516236066818237,
        -7.939339639051468e-07,
        -1.1341277640126144,
        0.4850611475774994,
        -0.7737656886248392,
        -2.039371142295559,
        -0.035370620243069625,
        0.3735580701221557
      ]
    },
    "elements": {
      "number": [
        8,
        1,
        1
      ]
    },
…

Note that to edit / change coordinates in the JSON, you'd do something like:

cjson["atoms"]["coords"]["3d"][3] = 4.5 # changing the x coordinate of atom 1

Note that the coordinates are set as a numeric array.

The request is to build up a Python class for manipulating the CJSON, e.g.

  • parse the CJSON into the internal class representation
  • export the class representation back into CJSON
  • access a list of Atoms (i.e., atoms[i]) with properties (coordinates, element, formal charge)
  • set the coordinates of an atom [i] from a 3x1 numpy array
  • access a list of Bonds (i.e., bonds[i]) with properties (bond order, start and end atom index)

This doesn't need to be one patch. It's probably more of a long-term goal. For example, take a look at:
https://github.com/paulboone/avogadro-plugin/blob/master/avogadro_plugin/utils.py

@MSoegtropIMC
Copy link

Sorry that I didn't follow up on this - I went a bit further down from chemistry to first principles with my problem, but should come back again some time, so I would appreciate progress with this.

Is there already a python class to represent an atoms and bonds? If not, I can suggest one.

As @ghutchis described, the idea is to have two classes "Atom" and "Bond" and to convert CJSON to lists of these classes.

I would find it convenient if the Atoms would have a list of their bonds, but this might be a bit slow - one could maybe do this as a separate step.

@ghutchis
Copy link
Member Author

Is there already a python class to represent an atoms and bonds? If not, I can suggest one.

That depends. There's obviously support to parse CJSON into Avogadro Atoms / Bonds. I think the question is whether there's a benefit to working with the CJSON without the Avogadro pieces.

@MSoegtropIMC
Copy link

There's obviously support to parse CJSON into Avogadro Atoms / Bonds

Are these C++ classes or do you already have python wrappers for these C++ classes?

@ghutchis
Copy link
Member Author

We have Python wrappers built using pybind11:
https://pypi.org/project/avogadro/

There are a few examples here:
https://github.com/OpenChemistry/jupyter-examples/blob/master/avogadro.ipynb

The main question is more "what do people want" from the bindings. It's really easy to add pieces to avogadro/python/* but it helps to know use-cases because each bit needs to be added to pybind11.

@adityaomar3
Copy link
Contributor

@ghutchis I understood the feature request/problem, I am starting with the issue but can you please help me with directory, in which directory I need to make/push the .py file, bit new here so its bit confusing, would avogadrolibs/scripts be fine?

@ghutchis
Copy link
Member Author

ghutchis commented Oct 4, 2023

This code would be part of the avogadro Python package (e.g., on PyPI) which is found here: https://github.com/OpenChemistry/avogadrolibs/tree/master/python/avogadro

Now you might wonder why there's only __init__.py - the core and io pieces are compiled.

So something like cjson would be a new module file in that directory.

Similarly, I'd like to move the class from avogadro-remote script as a connect module file into that directory too. That way someone can do something like:

from avogadro import connect

for filename in glob.iglob("*.xyz"):
    connect.openFile(filename)
    connect.saveGraphic(filename[:-4] + ".png")

@adityaomar3
Copy link
Contributor

adityaomar3 commented Oct 4, 2023

If you say so should I push the avogadro-remote.py script here https://github.com/OpenChemistry/avogadrolibs/tree/master/python/avogadr as connect.py or you would do it?
Also please review or test it whenever you get time. OpenChemistry/avogadroapp#406

@adityaomar3
Copy link
Contributor

#1360 I have added this script to support reading, getting data and retrieving it back into cjson.
Please review it @ghutchis @MSoegtropIMC

@ghutchis ghutchis linked a pull request Jan 4, 2024 that will close this issue
@ghutchis
Copy link
Member Author

ghutchis commented Jan 4, 2024

Okay, initial support has been merged. More suggestions are welcome.

@ghutchis ghutchis closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants