-
Notifications
You must be signed in to change notification settings - Fork 1
MetaData
In Xmipp information is transferred between programs using metadata class. Metadata are stored in Star Files which are described FileFormats#Metadata_Files. A complete list of valid labels is available in file metadata_sql.h (MAKE LINK ONCE WE MOVE TO GIT). Metadata are read in memory as tables in a sqlite database.
Each label is stored in a simple class. This class relates the label id (MDL_XXXX) with a string (that will be used when writing or reading metadata files) and a data type.
The basic use of the metadata class is best illustrated by examples.
Example1: Read a metadata, accessing individual values, modify them and save them using a single metadata
An aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value. Common aggregate functions include:
- Count()
- Maximum()
- Minimum()
- Sum()
- ...
We will use this table in the examples and save it as metadata in a file called md.xmd
X | Y | image |
---|---|---|
500 | 1000 | Hansen |
600 | 1600 | Nilsen |
700 | 700 | Hansen |
500 | 300 | Hansen |
600 | 2000 | Jensen |
500 | 100 | Nilsen |
where
- md: is the input metadata
- mdOut: is the output metadata
- AGGR_COUNT: identifies the aggregate function, in this case count
- MDL_X: attribute used to aggregate
- MDL_Y: attribute over which the aggregation function will operate
- MDL_COUNT: label for the new column with the resulting data
Output will be
X | Count |
---|---|
500 | 3 |
600 | 2 |
700 | 1 |
Output will be
X | Sum |
---|---|
500 | 1400 |
600 | 3600 |
700 | 700 |
get all rows from the metadata such that x=3 AND y=4
MDValueEQ eq1(MDL_X, 3.); MDValueEQ eq2(MDL_Y, 4.); MDMultiQuery multi; multi.addAndQuery(eq1); multi.addAndQuery(eq2); auxMetadata.importObjects(auxMetadata3, multi);
Metadata are stored in star files. Start files may contain several metadata tables. It is possible to read several metadata with a single command using regular expresions. All these metadata objects will be merged in a single one.
this code reads the metadata "block_000001@kk" and "block_000002@kk". The result will be the union of both metadata objects and will be stored in auxMetadata
Join operations will create one output metadata merging information from corresponding rows in two input metadatas.
The label MDL_XXX set the condition,
- INNER_JOIN: For each row R1 of inputMD1, the joined metadata has a row for each row in inputMD2 that satisfies inputMD2.XXX=inputMD1.XXX.
- LEFT_OUTER JOIN: First, an inner join is performed. Then, for each row in inputMD1 that does not satisfy the join condition with any row in inputMD2, a joined row is added with null values in columns of inputMD2. Thus, the joined table unconditionally has at least one row for each row in inputMD1.
- OUTER_JOIN: First, an inner join is performed. Then, for each row in inputMD1 that does not satisfy the join condition with any row in inputMD2, a joined row is added with null values in columns of inputMD2. Also, for each row of inputMD2 that does not satisfy the join condition with any row in inputMD1, a joined row with null values in the columns of T1 is added.
Joins are useful if MDL_XXX is unique, that is a given value of xxx never repeats for a given metadata.
- NATURAL: compares those column that appear in both input metadatas. These columns appear only once in the output table.
The following command uses as input metadatas mDsource and auxMetadata3. For each different value of the attribute MDL_X, a new row is created in metadata auxMetadata with the merging of the rows in metadata mDsource and auxMetadata3 such that mDsource.x=auxMetadata3.x.
Using the operators UNION, INTERSECT and SUBTRACTION the output of more than one input metadata can be combined to form a single metadata. The UNION operator returns all rows that are in one or both of the input metadatas. The INTERSECT operator returns all rows that are strictly in both input metadatas. The SUBTRACTION operator returns the rows that are in the first input Metadata but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified.
Size returns the metadata number of linnes and sort creates a new metadata sorted by a given label
Size of metadata mDsource
Sort metadata auxMetadata by the label MDL_X, lowest values first, create an outpur metadata not bigger than 2 rows and the first row to be used is the number 1.
Instead of accessing the data as pairs label value, it is possible (and more efficeint) to read whole rows
#include <data/metadata_extension.h> MetaData md,md1 ;//metadata object MDRow row;//structure for reading lines in a metadata file FileName fn;//input metadata file fn.compose("block1","myfile.emx"); double samplingRate; md.read(fn);// read metadata String errorMessage FOR_ALL_OBJECTS_IN_METADATA(md) //loop through all lines { md.getRow(row, __iter.objId); //read line if (row.getValue(MDL_CTF_SAMPLING_RATE,samplingRate))//get value for attribute ctf_sampling_rate std::cerr << "The sampling rate is: " << samlingRate else { errorMessage=formatString("Cannot find label