The StarTool executes commands for selecting and editing data in a Relion STAR file. The given order of commands defines the order of execution meaning that editing commands work on previously made selections (except when changing global properties as tablenames, column names etc.). Such edited STAR files can be written out as a new starfile and subsets (selections) can be exported as well.
Behind the scenes, the STAR file is loaded into an in-memory SQLite3 database. Selections and edits are executed in that database and only when writing back into a file, the data will be retrieved from the database. Therefore, the StarTool could be easily extended as an interface for solutions where STAR files are stored in an SQLite database.
These commands (ordered alphabetically) are available.
Command | Description |
---|---|
--add_col |
Adds a new column |
--delete |
Deletes selected data |
--delete_col |
Deletes a column |
--delete_table |
Deletes a data table |
--deselect |
Unsets all seleections |
--info |
Prints information about STAR file |
--math |
Basic math operations with values and columns |
--merge |
Merge two or more files |
--query |
Submit a user defined SQLite query |
--release |
Unsets the current table in use (for multi table files) |
--rename_col |
Renames a colum |
--rename_table |
Renames a data table |
--replace |
Replaces values by a user defined value |
--replace_regex |
Regular expression base replacing |
--replace_star |
Replace values with values from other STAR file |
--select |
Select data by operator based conditions |
--select_regex |
Select data based on regular expressions |
--select_star |
Select data based on matches with reference STAR file |
--silent |
Mutes program output |
--show |
Prints currently selected data |
--sort |
Sorts selected data (ascending) |
--split_by |
Splits data into batches |
--subset |
Selects defined data subsets |
--tros |
Sorts selected data (descending) |
--use |
Defines a table to use (only for multi table STAR files) |
--write |
Writes STAR file |
--write_selection |
Writes current selection to STAR file |
--writef |
Writes STAR file (force override) |
--writef_selection |
Writes current selection to STAR file (force override) |
StarTool requires Python 2.7.
Extract the two files startool.py
and STLib.py
to your location of choice.
Run the Program as python /path/toStarTool/startool.py
.
In case you want to have the tool available system wide, use a shell alias like alias stool="python /path/to/StarTool/startool.py $@"
.
Coming soon...
python startool.py [inputfiles] [selectors/editors] [output]
Example: python startool.py a_file.star --select _rlnVoltage=300 --write_selection selection.star
Multiple files can be loaded by comma separation. The program internally will create tables with the scheme ‘starfilename_tablename’.
In order to make use of faster data handling, one can also use
python startool.py example.star:example.db
to load the STAR file into a local data base file. Changes made by editors will be directly affect that database file. Remember to remove that database file if you want to start over with the original data from your STAR file.
--info
Prints the current starfiles/tables loaded and their labels and record numbers. This should always be used at the beginning to get an overview what data you actually have loaded.
--show
Prints the current content of the table in use. Selectors are applied.
These methods can be used to select certain subsets of your data. This is useful if you want to edit or extract only a certain part of the data.
--select _rlnLabel operator value
Selects a subset of the current table based on an operator comparison. Allowed are
= | equal |
!= | not equal |
\< | smaller than (needs to be escaped) |
\> | greater than (needs to be escaped) |
\<= | smaller or equal than (needs to be escaped) |
\>= | greater or equal than (needs to be escaped) |
Example: --select _rlnDefocusU\>=10000
will select all entries with defocus greater or equal to 1 um.
--select_regex _rlnLabel=”regex”
Selects a subset of the current table based on a regular expression match. Allowed are regular ex- pressions.
Example: --select _rlnMicrographName=”.*\.mrcs$”
will select all entries where _rlnMicrographName end with '.mrcs'.
--select_star reference.star:rlnA[variationA],_rlnB[variationB]
Selects a subset of the current table based on entries in reference.star. reference.star should only have one data table. The variation value '[x]
' is optional and will only be interpreted for numerical columns (like coordinates, defocus etc.). Reference STAR files will be loaded temporarily and do not have to be loaded at the program start up.
Example: --select_star reference.star:_rlnImageName,_rlnCoordinateX[10],_rlnCoordinateY[10]
will select all entries that match by _rlnMicrographName with reference.star and where _rlnCoordinateX/Y also matches allowing 10 px variation.
--subset start:end
Selects a subset of records including the records given as numbers. For splitting Data into regular batches see also --split_by
.
Example: --subset 3:244
will select data entries 3-244 (including 3 and 244).
--deselect
Unsets all selections.
--use tablename
By default, the program uses the table that was read in if there is only one. --use
must be called, if multiple data tables are read from one or more starfiles (you may check by using --info
). Calling --use
will clear all other previously made selections.
--release
Releases the current table by unsetting the --use
and --select*
statements (changes made by editors will remain).
Global editors ignore selections made before since they only change global properties of data tables like tablename or column labels.
--add_col _rlnNewLabel
Adds a new column to the current data table in use.
--rename_col _rlnLabelOld=_rlnLabelNew
Renames _rlnLabelOld to _rlnLabelNew in the current table in use.
--delete_col _rlnUnwantedLabel
Removes _rlnUnwantedLabel (including data!) from the current data table in use.
--delete_table tablename
Removes the whole table from the starfile (including data!). If the current table is removed, all selec- tors are reset.
--rename_table tablenameNew
Renames the current table in use to tablenameNew. Be aware that renaming tables may screw up compatibility with Relion.
Loval editors directly affect the data that is currently selected (see Selecting subsets of data)
--sort _rlnlabel
Sorts the selected data by the given label (ascending).
--tros _rlnlabel
Sorts the selected data by the given label (descending).
--delete
Deletes the current selection from the data table in use.
--replace _rlnWhatEver=3.1415
Replaces all values of the specified column with the given value. The column needs to exist in the current data table. This can also be used to fill empty columns with zeros (because Relion can not handle empty columns).
--replace_regex _rlnWhatEver='search'%'replace'
Replaces all values of the specified column matching the 'search' pattern with 'replace'. Regular expressions are allowed for search and replace (as in sed
for example).
Example: --replace_regex _rlnLabel='\.star$'%'\.sun'
will change all _rlnLabel that end on '.star' to an ending of '.sun'.
Please note that regular expression replacement only works in text based columns.
--replace_star _rlnLabel=reference.star:_rlnReferenceA[variationA],_rlnReferenceB
Replaces a subset of date with values from reference.star based on matching conditions with reference.star. reference.star should only have one data table. The variation value '[x]
' is optional and will only be interpreted for numerical columns (like coordinates, defocus etc.). Reference STAR files will be loaded temporarily and do not have to be loaded at the program start up.
Please note that this operation can take a while when used on large datasets and multiple matching criteria.
Example: --replace_star _rlnImagename=reference.star:_rlnMicrographName,_rlnCoordinateX[10],_rlnCoordinateY[10]
will replace _rlnImageName of the current data by the values from reference.star where _rlnMicrographName match and where _rlnCoordinateX/Y also match within 10 px variation.
--math _rlnLabel=k operator n
Very basic math implementation. Operations can be like
k+n | addition |
k-n | subtraction |
k*n | mutliplication |
k/n | division |
k**n | n-th power of k |
n//k | n-th root of k |
n
and k
as used above can be either column names or numbers.
Example: --math _rlnCoordinateX=_rlnCoordinateX-_rlnOriginX
--merge outputfile.star
Merges all currently loaded starfiles into outputfile.star. Only merges STAR files that contain one data table. Columns that do not overlap between all merged STAR files will be dropped (including data!). For more details see Usage examples.
--split_by _rlnLabel:noOfBatches
Splits the dataset into a specific number of batches. If no number of bathches is given, the data will be split into subbatches for each unique value of the given label.
Example 1: --split_by _rlnDefocusU:2
will split the STAR file into two subfiles that contain one half of the defocus range each.
Example 2: --split_by _rlnMicrographName
will split the STAR file into separate files for each micrograph. Note that this can create a large number of files if used with columns as defocus or coordinates!
--write_selection outputfilename.star
Writes the current selection to a STAR file. This is useful if one wants to extract a subset of data. In silent mode (´--silent´), this will be forced into --writef_selection
.
--writef_selection outputstarfile.star
Same as --write_selection
, however overrides files without asking.
--write outputstarfile.star
Writes all tables belonging to the STAR file which the current table is part of. Changes made to the
individual tables by editor methods will be written (selection will be released before writing). If only
specific tables or subsets should be written, you may use --use
and write it as a selection with --write_selection
.
--writef outputstarfile.star
Same as --write
, however overrides files without asking.
--silentThis mutes the program (useful for automated procedures). Be aware that muting the program will force files to be overwritten (
--writef
and --writef_selection
will be called instead of --write
and --write_selection
).
--query SQLite-query
This option is for experienced users that want to send their own SQLite queries. It will ignore any previously called selector methods. A SELECT statement will trigger a print of the called data.
Most tasks can be achieved by usage of very simple command. However, to demonstrate the flexibility of the StarTool, they are covered as well in the usage examples.
Scenario: After running a 3D classification with 4 classes, a 3D refinement of all classes shall be performed automatically. In order to do so, one needs to split the data STAR file of the last iteration (lets assume iteration 25 here) into STAR files for the individual classes.
Solution: Write a shell script that looks like this
#!/bin/bash
# Run your Relion here:
relion_refine ... (run the 3d classification here)
python startool.py path/to/classification/3dclass_it025_data.star --select _rlnClassNumber=1 --write_selection path/to/classification/3dclass_it025_data_class001.star --deselect --select _rlnClassNumber=2 --write_selection path/to/classification/3dclass_it025_data_class002.star --deselect --select _rlnClassNumber=3 --write_selection path/to/classification/3dclass_it025_data_class003.star --deselect --select _rlnClassNumber=4 --write_selection path/to/classification/3dclass_it025_data_class004.star
# Run the refinements here using the new inputs:
relion_refine ...
The only disadvantage here is that particles might need to be regrouped prior to 3D refinement if the number of particles is rather low. Unfortunately I did not come across a way of regrouping particles with Relion on the command line.
Scenario: For automated particle picking it might be required to use different defocus ranges in order to optimize the picking thresholds. For this example I want to split my data at a defocus of 1.7 um.
Solution:
python startool.py micrographs_ctf.star --select _rlnDefocusU\<=17000 --write_selection micrographs_ctf_df1.star --deselect --select _rlnDefocusU\>17000 --write_selection micrographs_ctf_df2.star
Scenario: You have redone CTF estimation using a program that performs better thatn your initial method and you want to replace the defocus values in your particle STAR file with the new ones.
Solution:
python startool.py data.star --replace_star _rlnDefocusU=better_ctf.star:_rlnMicrographName,_rlnCoordinateX,_rlnCoordinateY --replace_star _rlnDefocusV=better_ctf.star:_rlnMicrographName,_rlnCoordinateX,_rlnCoordinateY --write data_better_ctf.star
This will copy defocus values from better_ctf.star based on exactly matching micrograph name and particle coordinates.
Scenario: You still work with a version of Relion that cannot re-center particles automatically for re-extraction and you want to re-center refined particles according to their new origin values.
Solution:
python startool.py data.star --math _rlnCoordinateX=_rlnCoordinateX-_rlnOriginX --math _rlnCoordinateY=_rlnCoordinateY-_rlnOriginY --write data_recenter.star
Scenario: You want to split your particle STAR file into micrograph batches.
Solution:
python startool.py data.star --split_by _rlnMicrographName
This will create a lot of files containing particles per micrographs.