Skip to content

marrink-lab/bentopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bentopy—packs stuff in boxes

State

This project is under development and solely for internal use. Many parts are in flux, and no guarantees about correctness or stability can be made.

Installation

Prerequisites

bentopy uses Rust to speed up some I/O operations of large files. Hence, a Rust compiler is required during installation. To check whether this is the case, you can run

cargo --version

If it is not present, you can install it by any means you prefer. Installation through rustup is very convenient.

Install bentopy through pip directly

If you don't care about peeking into the sources and just want access to the program, this is the quickest option.

python3 -m venv venv && source venv/bin/activate # Not required, but often convenient.
pip3 install git+https://github.com/marrink-lab/bentopy

From source

git clone https://github.com/marrink-lab/bentopy
cd bentopy
python3 -m venv venv && source venv/bin/activate
pip3 install .

Usage

bentopy currently features four subcommands, pack, render, mask, and grocat.

You can learn about the available options through the help information.

bentopy --help
bentopy pack --help

A typical bentopy workflow may look like this.

bentopy grocat -> bentopy mask -> bentopy pack -> bentopy render -> bentopy grocat

What follows is a brief explanation and example invocation of these subcommands. A more detailed walkthrough can be found in the example section.

pack

pack provides the core functionality of bentopy. Given an input configuration file, a packing of the input structures within the specified space is created.

bentopy pack input.json --rearrange --seed 5172

Pack a system defined in input.json. Prior to packing, rearrange the specified structures according to a size heuristic to improve the possible density and set the random seed to 5172.

render

This packing is stored as a placement list, which is a json file that describes which structures at what rotations are placed where. In order to create a structure file (and topology file) from this placement list that can be read by molecular visualization and simulation programs, the render subcommand can be used.

bentopy render placements.json structure.gro -t topol.top

Render placements.json created by pack to a gro file at structure.gro and write a topology file to topol.top.

mask

To set up a configuration for pack, you must define a space into which the structures will be packed. This space can be defined according to an analytical function, such as a sphere. But, bentopy is also capable of packing arbitrary spaces provided as voxel masks. Any boolean numpy array stored as a compressed file (.npz) of the correct dimensions can function as a valid mask.

The mask subcommand provides a convenient and powerful means of setting up such masks based on your existing structures from the command line. mask can be used to automatically or manually select different compartments as determined by mdvcontainment.

bentopy mask chrom_mem.gro mask.npz --autofill

Determine the compartments in chrom_mem.gro and automatically select the innermost compartment (--autofill). From that selected compartment, write a mask to mask.npz

grocat

As the name suggests, grocat is a tool for concatenating gro files. Though this is a relatively simple operation, grocat provides a convenient way of telling apart different sections of large models by optionally specifying a new residue name for a whole file in the argument list by appending :<residue name> to a file path.

bentopy grocat chromosome.gro:CHROM membrane.gro:MEM -o chrom_mem.gro

Concatenate chromosome.gro and membrane.gro into chrom_mem.gro, setting the residue names of the chromosome atoms to CHROM and those of the membrane to MEM in the concatenated structure.

Example

Let's try to pack a spherical system that is full of lysozyme structures. First, we want a structure to pack, so we can download the structure for 3LYZ. We place it in a structures directory to stay organized.

wget https://files.rcsb.org/download/3lyz.pdb
mkdir structures
mv 3lyz.pdb structures

Input configuration

Now we can set up our input configuration, which we will call 3lyz_input.json:

{
	"space": {
		"size": [100, 100, 100],
		"resolution": 0.5,
		"compartments": [
			{
				"id": "main",
				"shape": "spherical"
			}
		]
	},
	"output": {
		"title": "3lyz",
		"dir": "output",
		"topol_includes": [
			"forcefields/forcefield.itp",
			"structures/3lyz.itp"
		]
	},
	"segments": [
		{
			"name": "3lyz",
			"number": 6500,
			"path": "structures/3lyz.pdb",
			"compartments": ["main"]
		}
	]
}

Space

We set the space up to a size of 100×100×100 nm, with a resolution of 0.5 nm. The mask—the volume that defines where structures can be placed—is set to be derived from a spherical analytical function.

In case you want to use a custom mask like you may set up with bentopy mask, you could specify the space in the following manner.

      	"compartments": [
      		{
      			"id": "main",
-     			"shape": "spherical"
+      			"voxels": { "path": "mask.npz" }
      		}
      	]

Here, voxels and the associated path point to a precomputed voxel mask. This mask can be any data that can be loaded by np.load() to be interpreted as a three-dimensional boolean mask. The provided mask must have the same size as specified in the space section's dimensions divided by the resolution.

Constraining compartments.

A compartment definition can also take a constraint parameter. Currently, only the axis predicate is available, which constrains all placements in that compartment such that only placements with the specified value for that axis are considered valid. The following example of a compartment definition accepts placements as valid if and only if the z-component of a placement is at 50 nm.

		{
			"id": "flat",
			"constraint": "axis:z=50.0",
			"shape": "cuboid"
		}

Output

In output, we set a title and directory to write the placement list to. With the optional field topol_includes, we can specify what itp files files are to be included if the placement list produced from this config is written to a topology file (.top).

Note

For this example, we filled this field with dummy paths.

Segments

Finally, in the segments section, we define a list of structures to place. In our case that is only one: which we give the name "3lyz", and we set the number of segments to place to 6500. The path points pack to where the structure file for this segment can be found.

Important

The name record must be selected carefully. If you want to write out a valid topology file using bentopy render, the value of name must correspond to the names in the itp files.

Constraining segment rotations and setting a center adjustment.

For some structures, it can be helpful or necessary to constrain the rotation of certain segments. The rotation_axes parameter takes a string with the axes over which a structure may be randomly rotated. Any axes that are not mentioned will not be rotated. For instance, the axes definition "xyz" indicates full rotational freedom and is the tacit default (rotation is allowed over x, y, and z axes), while "z" constrains the rotation such that it may only occur over the z-axis, leaving x and z rotation as provided in the structure file.

The center parameter can be used to provide an offset in nm. When ommitted, its default value is "auto, auto, auto", which defines the center as the geometric center of the structure. Any of the three values can be replaced by a floating point value, which sets an adjustment from the auto center.

See #24, which tracks the development of an additional keep parameter, which would respect the center for some axis as its zero-value in the structure file.

		{
			"name": "1a0s",
			"number": 100,
			"path": "structures/1a0s.pdb",
			"rotation_axes": "z",
			"center": "auto, auto, -1.2",
			"compartments": ["flat"]
		}

With the above segment definition, up to a 100 instances of some structure will be placed according to some compartment with the id "flat", with a -1.2 nm offset to its geometric center over the z-axis, while only allowing rotation over its z-axis.

pack

Now, we are ready to pack the system. We could simply do this as follows.

bentopy pack 3lyz_input.json

In order to make the procedure deterministic, the --seed parameter can be set. This means that the same command will produce the same output between runs.

bentopy pack --seed 1312 3lyz_input.json

In case we want to pack multiple structures, we may want to pass the --rearrange flag, as well. This will re-order the structures such that large structures are placed first, and small structures are placed last. This placement heuristic can lead to denser packings. When it is not set, the order of the structures in the input configuration is respected.

After the command finishes, we will find that output/3lyz_placements.json has been created. This is a single-line json file, which can be hard to inspect. If you are curious, you can use a tool such as jq to look at what was written in a more readable form.

jq . output/3lyz_placement.json
The output may look like this (some lines have been cut and adjusted for legibility).
{
	"title": "3lyz",
	"size": [ 100, 100, 100 ],
	"topol_includes": [ ... ],
	"placements": [
		{
			"name": "3lyz",
			"path": "structures/3lyz.pdb",
			"batches": [
				[
					[
						[ 1.0, 0.0, 0.0 ],
                        [ 0.0, 1.0, 0.0 ],
                        [ 0.0, 0.0, 1.0 ]
					],
					[
						[  8, 46, 68 ],
						[ 26, 62, 88 ],
                        ... many many more of such lines ...
                    ]
                ],
				[
					[
						[   0.3658391780537972, -0.3882572475566672, -0.8458238619952991  ],
						[  -0.8851693094147572, -0.4258733932991502, -0.18736901171236636 ],
						[ -0.28746650147647396,  0.8172442490465064, -0.49947457185455224 ]
					],
					[
						[ 31, 41, 56 ],
						[ 61, 53,  4 ],
                        ... many many more of such lines ...
                    ]
                ]
                ... and on and on and on ...
            ]
        }
    ]
}

render

render reads in the placement list and writes out a gro file (and optionally, a [top topology file][top]). This is a separate operation, since the packed systems can become very large. Storing the placement list as an intermediate product decouples the hard task of packing from the simple work of writing it into a structure file.


We want to render out the placement list we just created into a structure file called 3lyz_sphere.gro. Additionally, we would like to produce topology file (topol.top) that Gromacs uses to understand how the structure file is built up.

bentopy render output/3lyz_placements.json 3lyz_sphere.gro -t topol.top

You can now inspect the 3lyz_sphere.gro structure in a molecular visualization program of your preference.

But beware! We just created big structure, and some programs may have a hard time keeping up.

Luckily, _bentopy render_ has some additional tricks up its sleeve to ease this load.

In case you want to inspect only a small part of a very large placement list, the --limits option allows you to select a cuboid within the volume defined by the placement list from which the placed structures will be rendered. The volume that is cut out is defined by a sequence of six comma-separated values in the order minx,maxx,miny,maxy,minz,maxz. If a value is a number, it is interpreted as a dimension in nm. If it is not a number (the phrase 'none' is conventional) no limits are set on that dimension.

For example, to only render a 10×10×10 nm cube extending from the point (40, 40, 40) to (50, 50, 50), we can pass the following limits.

bentopy render output/3lyz_placements.json 3lyz_small_cube.gro --limits 40,50,40,50,40,50

Perhaps we would like to see a pancake instead! To do this, we can define the limits only for the z-direction.

bentopy render output/3lyz_placements.json 3lyz_pancake.gro --limits none,none,none,none,45,55

Using --limits, we can cut out a part of the packed structure, but perhaps you want to inspect the total structure without loading as many atoms.

For this, you can try the --mode option, which gives you the ability to only render out certain atoms (backbone, alpha carbon) or beads (representing each residue, or even only one per structure instance). By default, the mode is full, and we have just seen its output. Let's try alpha, now.

[!WARNING] Some of these options (backbone) may not be functional right now.

bentopy render output/3lyz_placements.json 3lyz_alpha.gro --mode alpha

Now, we can compare the sizes of the files.

wc -l 3lyz_sphere.gro 3lyz_alpha.gro

Reducing the number of atoms that are rendered out can improve the time it takes to inspect a packing, if necessary.

[!NOTE] Using modes other than full (the default) is obviously not relevant beyond inspection and analysis of the packed structure. To reflect this, the option to write a topology file and setting a mode are mutually exclusive.

In case you want to render out a structure based on a placement list that you or a colleague have created in a different environment, it can be useful to direct render to read the input structures from a different directory. To do this, you can set a root path for the structures with the --root option. This path will be prepended to any relative structure path that is defined in the placement list.

Releases

No releases published

Packages

No packages published