Explain every entry in the metadata file. #17

moorepants · 2014-11-20T14:44:18Z

This needs to be either in the GATK docs or in the paper or with the data.

moorepants · 2014-11-20T15:34:07Z

This is in the paper, but needs to be reviewed for completeness.

moorepants · 2014-11-25T23:53:01Z

Make sure all of the files listed in the meta data are correct, especially the mapping to the compensation files.
Add Cortex versions to the meta data files.

moorepants · 2014-11-26T16:32:20Z

I'm about to push the data to Zenodo and just went through all of the meta data in detail. I fixed a bunch of errors, but would you all mind looking through the data too to see if you notice any oddities?

You can view the tables of data indexed by trial number here:

http://nbviewer.ipython.org/github/moorepants/walking-sys-id/blob/meta-data/notebooks/meta_data_check.ipynb

tvdbogert · 2014-11-26T18:01:17Z

I did not see any obvious errors, but have a couple of comments:

The Table with the test conditions for each trial would benefit from
having a subject ID number. Without that, it's quite a puzzle to
find the three tests for each subject. You have to go back and
forth between the Tables and use age, mass, height etc. as clues to
the subject identity. It's all in the database, so you can write
code to do this, but a human-readable table might be useful. Or you
write the code to generate that Table?
If you introduce subject ID numbers, you no longer have to duplicate
subject characteristics in the multiple trial. There can be a
separate (and much shorter) table with subject characteristics. And
of course, you can keep the database as it is and write code to
generate that table.
We should probably not include trials that were not part of the
actual study with the 3 speeds and the perturbation protocol.
Unless (again...) you write code to extract a list of the relevant
trials. It is kind of neat to give everything you have, but not if
that makes it hard to find the data that is likely to be useful.

Ton

On 11/26/2014 11:32 AM, Jason K. Moore wrote:

@spinningplates https://github.com/spinningplates @tvdbogert
https://github.com/tvdbogert

I'm about to push the data to Zenodo and just went through all of the
meta data in detail. I fixed a bunch of errors, but would you all mind
looking through the data too to see if you notice any oddities?

You can view the tables of data indexed by trial number here:

http://nbviewer.ipython.org/github/moorepants/walking-sys-id/blob/meta-data/notebooks/meta_data_check.ipynb

—
Reply to this email directly or view it on GitHub
#17 (comment).

moorepants · 2014-11-26T18:23:22Z

The meta data is stored in a single file per trial (e.g., https://gist.github.com/moorepants/6bbc495128b181393023) and is located in that trial's directory. I did it this way, instead of using a proper database, to simplify things because no one in the lab seemed interested in using a real database to manage this. Thus, there is redundant "study" and "subject" data in each meta data file so that all the meta data for one trial is with the data files for that trial. The function generate_meta_data_tables() simply scrapes the directory of trials for meta data files and recursively parsers them to construct all of the singleton tables (ones without nested structure) which would be akin to single tables in a relational database. These tables are stored in DataFrame objects which are designed to allow easy reduction, grouping, joining, etc. With those tables a few lines of code are needed to form any table you like. Line 10 in the link shows an example of merging some data from two tables. If you specify what you'd like to see in a table, I can generate it for you. What you see is simply a raw parsed version so that you can visually look at all the data at one time on the screen. I will generate some simplified tables to go in the paper and the source code will be shipped along with the paper source.

I'd like to include all the trials we measured because they include potentially useful data. The code already exists that allows you to query trial numbers from the data I have. I could write some code to store the data in an HDF5 or sqlite database file and then the database can be queried with libraries that already exist instead of me writing custom bits for scraping a directory tree.

tvdbogert · 2014-11-26T18:39:05Z

It's OK to have the extra trials as long as it is not a puzzle for the
reader to put the complete perturbation study together. Ideally by just
extracting the right files, rather than writing code to find them.

Perhaps just this Table to generate for the paper:

column 1: subject id number
columns 2-5: gender, age, mass, height
column 6: 0.8 m/s trial number
column 7: 1.2 m/s trial number
column 8: 1.6 m/s trial number

That presents a nice birds-eye view of the dataset and helps people find
the right files without much trouble.

Ton

On 11/26/2014 1:23 PM, Jason K. Moore wrote:

The meta data is stored in a single file per trial (e.g.,
https://gist.github.com/moorepants/6bbc495128b181393023) and is
located in that trial's directory. I did it this way, instead of using
a proper database, to simplify things because no one in the lab seemed
interested in using a real data base to manage this. Thus, there is
redundant "study" and "subject" data in each meta data file so that
all the meta data for one trial is with the data files for that trial.
The function |generate_meta_data_tables()| simply scrapes the
directory of trials for meta data files and recursively parsers them
to construct all of the singleton tables (ones without nested
structure) which would be akin to single tables in a relational
database. These tables are stored in |DataFrame| objects which are
designed to allow easy reduction, grouping, joining, etc. With those
tables a few lines of code are needed t o form any table you like.
Line 10 in the link shows an example of merging some data from two
tables. If you specify what you'd like to see in a table, I can
generate it for you. What you see is simply a raw parsed version so
that you can visually look at /all/ the data at one time on the
screen. I will generate some simplified tables to go in the paper and
the source code will be shipped along with the paper source.

I'd like to include all the trials we measured because they include
potentially useful data. The code already exists that allows you to
query trial numbers from the data I have. I could write some code to
store the data in an HDF5 or sqlite database file and then the
database can be queried with libraries that already exist instead of
me writing custom bits for scraping a directory tree.

—
Reply to this email directly or view it on GitHub
#17 (comment).

moorepants · 2014-11-26T18:40:21Z

Ok, I'll generate that table.

This was referenced Dec 9, 2014

Added participants section and table, closes #29. #40

Merged

Metadata description overhaul, closes #21. #44

Merged

moorepants self-assigned this Dec 9, 2014

moorepants closed this as completed Dec 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain every entry in the metadata file. #17

Explain every entry in the metadata file. #17

moorepants commented Nov 20, 2014

moorepants commented Nov 20, 2014

moorepants commented Nov 25, 2014

moorepants commented Nov 26, 2014

tvdbogert commented Nov 26, 2014

moorepants commented Nov 26, 2014

tvdbogert commented Nov 26, 2014

moorepants commented Nov 26, 2014

Explain every entry in the metadata file. #17

Explain every entry in the metadata file. #17

Comments

moorepants commented Nov 20, 2014

moorepants commented Nov 20, 2014

moorepants commented Nov 25, 2014

moorepants commented Nov 26, 2014

tvdbogert commented Nov 26, 2014

moorepants commented Nov 26, 2014

tvdbogert commented Nov 26, 2014

moorepants commented Nov 26, 2014