Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input_output.load_time_series() -- allow "value" column to support strings? #35

Closed
ejhumphrey opened this issue Mar 31, 2014 · 7 comments
Milestone

Comments

@ejhumphrey
Copy link
Collaborator

for convenience: https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L229

Is there any strong opposition to extending this function such that the second column could be strings / labels? The docstring for np.loadtxt makes it look like this would be a simple fix...

@justinsalamon
Copy link
Collaborator

At the moment the code specifically loads both columns as floats:
https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L229

Can you use np.loadtxt to load columns of different datatypes? It seems as though you have to specify a type and all columns are expected to be of that type.

I wrote this loader with melody in mind - it acts as a first check to ensure the data is of the correct type before proceeding. If you make this function return values that are not necessarily floats, you'd have to add a new check for the melody eval code or it could break...

@ejhumphrey
Copy link
Collaborator Author

I believe that yes, you can load different datatypes using that function. The docstring has this example:

>>> d = StringIO("M 21 72\nF 35 58")
>>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
...                      'formats': ('S1', 'i4', 'f4')})
array([('M', 21, 72.0), ('F', 35, 58.0)],
      dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])

One, it's overkill for what I'm asking, but I think you could massage it to behave as intended. Two, while convenient, there's nothing that strictly says it's necessary to use this function (np.loadtxt) versus iterating over a file handle; it's what the rest of the loaders do anyways.

Sure, format assertions are always good, and we'd need a different one if the implementation changes.

@craffel craffel added this to the 0.0.1 milestone Mar 31, 2014
@craffel
Copy link
Owner

craffel commented Mar 31, 2014

I think lists of strings are saner than np.arrays of strings, so I'd prefer that it just dealt with list-like objects of any type. Feel free to change at will.

@craffel craffel removed the question label Mar 31, 2014
@justinsalamon justinsalamon modified the milestone: 0.0.1 Apr 11, 2014
@craffel
Copy link
Owner

craffel commented Apr 21, 2014

So, currently there are @bmcfee 's functions for loading annotations (ranges)/events: mir_eval.io.load_events and mir_eval.io.load_intervals. load_events loads in one- or two-column files; the first column always gets read in as floats (times), the second (if it exists) gets read in as a list of strings (labels). Similarly, load_annotation loads in two- or three-; first two are floats specifying the intervals (ranges), last column if it exists is a list of string labels. @justinsalamon's load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Similarly, mir_eval.util.adjust_intervals is a useful function across many tasks but currently is only really suitable for segments. It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix. Similarly, it'd be useful for time series, but the padding label is always a string. It seems like for both of these cases it'd be nice if it could handle labels as a string or float list-like, and that the padding label could be set arbitrarily.

Is everyone comfortable with me trying to merge all of this functionality?

@craffel craffel reopened this Apr 21, 2014
@bmcfee
Copy link
Collaborator

bmcfee commented Apr 21, 2014

load_time_series does exactly the same thing as load_events except that the second column reads in floats. It seems like these functions should be merged, and the last column should be either returned as a np.ndarray or a list of strings.

Sure. load_events allows one argument (converter) to be passed in to specify the event index type (defaults to float). Seems like the quick fix here is to replicate this functionality for label parsing, so we have event_converter and label_converter (defaults to str) in https://github.com/craffel/mir_eval/blob/master/mir_eval/input_output.py#L60-L61 .

It'd be useful to use for chords, but the "padding" label should be just 'N' (no chord), I would argue; and the function only allows for setting a prefix.

Agreed, but I'm not sure of an elegant way to do this right now. The __%s notation is a crutch following the dummy label generation in the loader, where you want synthetic labels to be unique (numbered). I figured we should have consistent keying for synthetic labels. But, in adjust_*, we only ever use __T_MIN and __T_MAX. I really wish python supported the %*s formatter right about now...

@craffel
Copy link
Owner

craffel commented Apr 23, 2014

We still need to merge the loaders. load_time_series loads two float columns. load_events loads one float column and optionally one string column. load_intervals loads two float columns and optionally one string column.

craffel added a commit that referenced this issue Jul 25, 2014
…d_time series for #35 and #67.  Any code which uses load_intervals will need to be updated
@craffel
Copy link
Owner

craffel commented Jul 25, 2014

OK, the changes proposed above
#35 (comment)
have been implemented and all code/example usage has been updated to reflect this change.

@bmcfee @ejhumphrey @urinieto Important - io.load_intervals now does not return labels; if you want to load labeled intervals you need to use io.load_labeled_intervals. Same is true for load_events vs load_labeled_events but AFAIK no one was using load_events for loading labels. Also, none of these functions will generate synthetic labels themselves. If you want synthetic labels like these functions used to return when no labels were present, use util.generate_labels. Take a look at the evaluators/example usage for updated usage examples.

@craffel craffel closed this as completed Jul 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants