input_output.load_time_series() -- allow "value" column to support strings? #35

ejhumphrey · 2014-03-31T15:05:47Z

for convenience: https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L229

Is there any strong opposition to extending this function such that the second column could be strings / labels? The docstring for np.loadtxt makes it look like this would be a simple fix...

justinsalamon · 2014-03-31T15:37:32Z

At the moment the code specifically loads both columns as floats:
https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L229

Can you use np.loadtxt to load columns of different datatypes? It seems as though you have to specify a type and all columns are expected to be of that type.

I wrote this loader with melody in mind - it acts as a first check to ensure the data is of the correct type before proceeding. If you make this function return values that are not necessarily floats, you'd have to add a new check for the melody eval code or it could break...

ejhumphrey · 2014-03-31T15:53:15Z

I believe that yes, you can load different datatypes using that function. The docstring has this example:

>>> d = StringIO("M 21 72\nF 35 58")
>>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
...                      'formats': ('S1', 'i4', 'f4')})
array([('M', 21, 72.0), ('F', 35, 58.0)],
      dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])

One, it's overkill for what I'm asking, but I think you could massage it to behave as intended. Two, while convenient, there's nothing that strictly says it's necessary to use this function (np.loadtxt) versus iterating over a file handle; it's what the rest of the loaders do anyways.

Sure, format assertions are always good, and we'd need a different one if the implementation changes.

craffel · 2014-03-31T19:06:49Z

I think lists of strings are saner than np.arrays of strings, so I'd prefer that it just dealt with list-like objects of any type. Feel free to change at will.

craffel · 2014-04-21T03:31:12Z

So, currently there are @bmcfee 's functions for loading annotations (ranges)/events: mir_eval.io.load_events and mir_eval.io.load_intervals. load_events loads in one- or two-column files; the first column always gets read in as floats (times), the second (if it exists) gets read in as a list of strings (labels). Similarly, load_annotation loads in two- or three-; first two are floats specifying the intervals (ranges), last column if it exists is a list of string labels. @justinsalamon's load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Similarly, mir_eval.util.adjust_intervals is a useful function across many tasks but currently is only really suitable for segments. It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix. Similarly, it'd be useful for time series, but the padding label is always a string. It seems like for both of these cases it'd be nice if it could handle labels as a string or float list-like, and that the padding label could be set arbitrarily.

Is everyone comfortable with me trying to merge all of this functionality?

bmcfee · 2014-04-21T12:13:28Z

load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Sure. load_events allows one argument (converter) to be passed in to specify the event index type (defaults to float). Seems like the quick fix here is to replicate this functionality for label parsing, so we have event_converter and label_converter (defaults to str) in https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L60-L61 .

It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix.

Agreed, but I'm not sure of an elegant way to do this right now. The __%s notation is a crutch following the dummy label generation in the loader, where you want synthetic labels to be unique (numbered). I figured we should have consistent keying for synthetic labels. But, in adjust_*, we only ever use __T_MIN and __T_MAX. I really wish python supported the %*s formatter right about now...

craffel · 2014-04-23T17:03:19Z

We still need to merge the loaders. load_time_series loads two float columns. load_events loads one float column and optionally one string column. load_intervals loads two float columns and optionally one string column.

…d_time series for #35 and #67. Any code which uses load_intervals will need to be updated

craffel · 2014-07-25T16:45:49Z

OK, the changes proposed above
#35 (comment)
have been implemented and all code/example usage has been updated to reflect this change.

@bmcfee @ejhumphrey @urinieto Important - io.load_intervals now does not return labels; if you want to load labeled intervals you need to use io.load_labeled_intervals. Same is true for load_events vs load_labeled_events but AFAIK no one was using load_events for loading labels. Also, none of these functions will generate synthetic labels themselves. If you want synthetic labels like these functions used to return when no labels were present, use util.generate_labels. Take a look at the evaluators/example usage for updated usage examples.

ejhumphrey added enhancement labels Mar 31, 2014

craffel added this to the 0.0.1 milestone Mar 31, 2014

craffel removed the question label Mar 31, 2014

justinsalamon modified the milestone: 0.0.1 Apr 11, 2014

ejhumphrey closed this as completed Apr 15, 2014

craffel reopened this Apr 21, 2014

craffel added a commit that referenced this issue Apr 22, 2014

Switching to explicit start/end padded boundary labels for #35

f2354bf

craffel modified the milestones: 0.1, 0.0.1 Apr 23, 2014

craffel mentioned this issue Jul 25, 2014

util.generate_labels #67

Closed

craffel added a commit that referenced this issue Jul 25, 2014

Separate load_events and load_labeled_events for #35 and #67

0a9ffa2

craffel added a commit that referenced this issue Jul 25, 2014

Separate load_intervals and load_labeled_intervals and refactored loa…

7d805ab

…d_time series for #35 and #67. Any code which uses load_intervals will need to be updated

craffel closed this as completed Jul 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input_output.load_time_series() -- allow "value" column to support strings? #35

input_output.load_time_series() -- allow "value" column to support strings? #35

ejhumphrey commented Mar 31, 2014

justinsalamon commented Mar 31, 2014

ejhumphrey commented Mar 31, 2014

craffel commented Mar 31, 2014

craffel commented Apr 21, 2014

bmcfee commented Apr 21, 2014

craffel commented Apr 23, 2014

craffel commented Jul 25, 2014

input_output.load_time_series() -- allow "value" column to support strings? #35

input_output.load_time_series() -- allow "value" column to support strings? #35

Comments

ejhumphrey commented Mar 31, 2014

justinsalamon commented Mar 31, 2014

ejhumphrey commented Mar 31, 2014

craffel commented Mar 31, 2014

craffel commented Apr 21, 2014

bmcfee commented Apr 21, 2014

craffel commented Apr 23, 2014

craffel commented Jul 25, 2014