Skip to content

cdienem/StarTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

StarTool

Table of contents

Concept

The StarTool executes commands for selecting and editing data in a Relion STAR file. The given order of commands defines the order of execution meaning that editing commands work on previously made selections (except when changing global properties as tablenames, column names etc.). Such edited STAR files can be written out as a new starfile and subsets (selections) can be exported as well.

Behind the scenes, the STAR file is loaded into an in-memory SQLite3 database. Selections and edits are executed in that database and only when writing back into a file, the data will be retrieved from the database. Therefore, the StarTool could be easily extended as an interface for solutions where STAR files are stored in an SQLite database.

Quick reference

These commands (ordered alphabetically) are available.

Command Description
--add_col Adds a new column
--delete Deletes selected data
--delete_col Deletes a column
--delete_table Deletes a data table
--deselect Unsets all seleections
--info Prints information about STAR file
--math Basic math operations with values and columns
--merge Merge two or more files
--query Submit a user defined SQLite query
--release Unsets the current table in use (for multi table files)
--rename_col Renames a colum
--rename_table Renames a data table
--replace Replaces values by a user defined value
--replace_regex Regular expression base replacing
--replace_star Replace values with values from other STAR file
--select Select data by operator based conditions
--select_regex Select data based on regular expressions
--select_star Select data based on matches with reference STAR file
--silent Mutes program output
--show Prints currently selected data
--sort Sorts selected data (ascending)
--split_by Splits data into batches
--subset Selects defined data subsets
--tros Sorts selected data (descending)
--use Defines a table to use (only for multi table STAR files)
--write Writes STAR file
--write_selection Writes current selection to STAR file
--writef Writes STAR file (force override)
--writef_selection Writes current selection to STAR file (force override)

Setup

StarTool requires Python 2.7.

Unix based systems (including the partially eaten fruit)

Extract the two files startool.py and STLib.py to your location of choice. Run the Program as python /path/toStarTool/startool.py.

In case you want to have the tool available system wide, use a shell alias like alias stool="python /path/to/StarTool/startool.py $@".

Windows

Coming soon...

Syntax

General usage and input

python startool.py [inputfiles] [selectors/editors] [output]

Example: python startool.py a_file.star --select _rlnVoltage=300 --write_selection selection.star

Multiple files can be loaded by comma separation. The program internally will create tables with the scheme ‘starfilename_tablename’.

In order to make use of faster data handling, one can also use

python startool.py example.star:example.db

to load the STAR file into a local data base file. Changes made by editors will be directly affect that database file. Remember to remove that database file if you want to start over with the original data from your STAR file.

Information

--info

Prints the current starfiles/tables loaded and their labels and record numbers. This should always be used at the beginning to get an overview what data you actually have loaded.

Data display

--show

Prints the current content of the table in use. Selectors are applied.

Selecting subsets of data

These methods can be used to select certain subsets of your data. This is useful if you want to edit or extract only a certain part of the data.

--select _rlnLabel operator value

Selects a subset of the current table based on an operator comparison. Allowed are

= equal
!= not equal
\< smaller than (needs to be escaped)
\> greater than (needs to be escaped)
\<= smaller or equal than (needs to be escaped)
\>= greater or equal than (needs to be escaped)

Example: --select _rlnDefocusU\>=10000 will select all entries with defocus greater or equal to 1 um.

--select_regex _rlnLabel=”regex”

Selects a subset of the current table based on a regular expression match. Allowed are regular ex- pressions.

Example: --select _rlnMicrographName=”.*\.mrcs$” will select all entries where _rlnMicrographName end with '.mrcs'.

--select_star reference.star:rlnA[variationA],_rlnB[variationB]

Selects a subset of the current table based on entries in reference.star. reference.star should only have one data table. The variation value '[x]' is optional and will only be interpreted for numerical columns (like coordinates, defocus etc.). Reference STAR files will be loaded temporarily and do not have to be loaded at the program start up.

Example: --select_star reference.star:_rlnImageName,_rlnCoordinateX[10],_rlnCoordinateY[10] will select all entries that match by _rlnMicrographName with reference.star and where _rlnCoordinateX/Y also matches allowing 10 px variation.

--subset start:end

Selects a subset of records including the records given as numbers. For splitting Data into regular batches see also --split_by.

Example: --subset 3:244 will select data entries 3-244 (including 3 and 244).

--deselect

Unsets all selections.

--use tablename

By default, the program uses the table that was read in if there is only one. --use must be called, if multiple data tables are read from one or more starfiles (you may check by using --info). Calling --use will clear all other previously made selections.

--release

Releases the current table by unsetting the --use and --select* statements (changes made by editors will remain).

Global Editors

Global editors ignore selections made before since they only change global properties of data tables like tablename or column labels.

--add_col _rlnNewLabel

Adds a new column to the current data table in use.

--rename_col _rlnLabelOld=_rlnLabelNew

Renames _rlnLabelOld to _rlnLabelNew in the current table in use.

--delete_col _rlnUnwantedLabel

Removes _rlnUnwantedLabel (including data!) from the current data table in use.

--delete_table tablename

Removes the whole table from the starfile (including data!). If the current table is removed, all selec- tors are reset.

--rename_table tablenameNew

Renames the current table in use to tablenameNew. Be aware that renaming tables may screw up compatibility with Relion.

Local Editors

Loval editors directly affect the data that is currently selected (see Selecting subsets of data)

--sort _rlnlabel

Sorts the selected data by the given label (ascending).

--tros _rlnlabel

Sorts the selected data by the given label (descending).

--delete

Deletes the current selection from the data table in use.

--replace _rlnWhatEver=3.1415

Replaces all values of the specified column with the given value. The column needs to exist in the current data table. This can also be used to fill empty columns with zeros (because Relion can not handle empty columns).

--replace_regex _rlnWhatEver='search'%'replace'

Replaces all values of the specified column matching the 'search' pattern with 'replace'. Regular expressions are allowed for search and replace (as in sed for example).

Example: --replace_regex _rlnLabel='\.star$'%'\.sun' will change all _rlnLabel that end on '.star' to an ending of '.sun'.

Please note that regular expression replacement only works in text based columns.

--replace_star _rlnLabel=reference.star:_rlnReferenceA[variationA],_rlnReferenceB

Replaces a subset of date with values from reference.star based on matching conditions with reference.star. reference.star should only have one data table. The variation value '[x]' is optional and will only be interpreted for numerical columns (like coordinates, defocus etc.). Reference STAR files will be loaded temporarily and do not have to be loaded at the program start up.

Please note that this operation can take a while when used on large datasets and multiple matching criteria.

Example: --replace_star _rlnImagename=reference.star:_rlnMicrographName,_rlnCoordinateX[10],_rlnCoordinateY[10] will replace _rlnImageName of the current data by the values from reference.star where _rlnMicrographName match and where _rlnCoordinateX/Y also match within 10 px variation.

--math _rlnLabel=k operator n

Very basic math implementation. Operations can be like

k+n addition
k-n subtraction
k*n mutliplication
k/n division
k**n n-th power of k
n//k n-th root of k

n and k as used above can be either column names or numbers.

Example: --math _rlnCoordinateX=_rlnCoordinateX-_rlnOriginX

Split and merge operations

--merge outputfile.star

Merges all currently loaded starfiles into outputfile.star. Only merges STAR files that contain one data table. Columns that do not overlap between all merged STAR files will be dropped (including data!). For more details see Usage examples.

--split_by _rlnLabel:noOfBatches

Splits the dataset into a specific number of batches. If no number of bathches is given, the data will be split into subbatches for each unique value of the given label.

Example 1: --split_by _rlnDefocusU:2 will split the STAR file into two subfiles that contain one half of the defocus range each.

Example 2: --split_by _rlnMicrographName will split the STAR file into separate files for each micrograph. Note that this can create a large number of files if used with columns as defocus or coordinates!

Output

--write_selection outputfilename.star

Writes the current selection to a STAR file. This is useful if one wants to extract a subset of data. In silent mode (´--silent´), this will be forced into --writef_selection.

--writef_selection outputstarfile.star

Same as --write_selection, however overrides files without asking.

--write outputstarfile.star

Writes all tables belonging to the STAR file which the current table is part of. Changes made to the individual tables by editor methods will be written (selection will be released before writing). If only specific tables or subsets should be written, you may use --use and write it as a selection with --write_selection.

--writef outputstarfile.star

Same as --write, however overrides files without asking.

Special

--silent
This mutes the program (useful for automated procedures). Be aware that muting the program will force files to be overwritten (--writef and --writef_selection will be called instead of --write and --write_selection).

--query SQLite-query

This option is for experienced users that want to send their own SQLite queries. It will ignore any previously called selector methods. A SELECT statement will trigger a print of the called data.

Usage Examples

Most tasks can be achieved by usage of very simple command. However, to demonstrate the flexibility of the StarTool, they are covered as well in the usage examples.

Split particles by class after classification

Scenario: After running a 3D classification with 4 classes, a 3D refinement of all classes shall be performed automatically. In order to do so, one needs to split the data STAR file of the last iteration (lets assume iteration 25 here) into STAR files for the individual classes.

Solution: Write a shell script that looks like this

#!/bin/bash

# Run your Relion here:
relion_refine ... (run the 3d classification here)

python startool.py path/to/classification/3dclass_it025_data.star --select _rlnClassNumber=1 --write_selection path/to/classification/3dclass_it025_data_class001.star --deselect --select _rlnClassNumber=2 --write_selection path/to/classification/3dclass_it025_data_class002.star --deselect --select _rlnClassNumber=3 --write_selection path/to/classification/3dclass_it025_data_class003.star --deselect --select _rlnClassNumber=4 --write_selection path/to/classification/3dclass_it025_data_class004.star 

# Run the refinements here using the new inputs:
relion_refine ...

The only disadvantage here is that particles might need to be regrouped prior to 3D refinement if the number of particles is rather low. Unfortunately I did not come across a way of regrouping particles with Relion on the command line.

Split data by defocus

Scenario: For automated particle picking it might be required to use different defocus ranges in order to optimize the picking thresholds. For this example I want to split my data at a defocus of 1.7 um.

Solution:

python startool.py micrographs_ctf.star --select _rlnDefocusU\<=17000 --write_selection micrographs_ctf_df1.star --deselect --select _rlnDefocusU\>17000 --write_selection micrographs_ctf_df2.star

Replace defocus values by values from reference

Scenario: You have redone CTF estimation using a program that performs better thatn your initial method and you want to replace the defocus values in your particle STAR file with the new ones.

Solution:

python startool.py data.star --replace_star _rlnDefocusU=better_ctf.star:_rlnMicrographName,_rlnCoordinateX,_rlnCoordinateY --replace_star _rlnDefocusV=better_ctf.star:_rlnMicrographName,_rlnCoordinateX,_rlnCoordinateY --write data_better_ctf.star

This will copy defocus values from better_ctf.star based on exactly matching micrograph name and particle coordinates.

Recenter particles for re-extraction

Scenario: You still work with a version of Relion that cannot re-center particles automatically for re-extraction and you want to re-center refined particles according to their new origin values.

Solution:

python startool.py data.star --math _rlnCoordinateX=_rlnCoordinateX-_rlnOriginX --math _rlnCoordinateY=_rlnCoordinateY-_rlnOriginY --write data_recenter.star

Split data files into batches per micrograph

Scenario: You want to split your particle STAR file into micrograph batches.

Solution:

python startool.py data.star --split_by _rlnMicrographName

This will create a lot of files containing particles per micrographs.

About

The Swiss Army Knife for Editing Relion .STAR Files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages