Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for constructing a table from a list of dicts #901

Merged
merged 2 commits into from
Apr 14, 2013

Conversation

ryanfox
Copy link
Contributor

@ryanfox ryanfox commented Mar 24, 2013

Implemented ticket #647. I did not implement anything with regard to recognizing None as missing data, as that got split off into another ticket.

As every dict may not have the same keys, it compiles a set of all the keys across all dicts first, then compiles columns from those key names. Missing values are taken to be None.

After the list of row-dicts is converted to a dict of col-lists, the _init_from_dict() method is called and the function returns.

@taldcroft
Copy link
Member

@ryanfox - thanks for getting started on this.

[EDIT: removed comment about _init_from_dict]
As you can see from the Travis build results link from this page, the changes make a number of tests now fail. It's a good idea to run the full suite of astropy tests before putting in the pull request. Once all the existing tests are working you'll need to add new tests and documentation for this new functionality.

try:
cols[name].append(row[name])
except KeyError:
cols[name].append(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this kind of logic more cleanly with

cols[name].append(row.get(name, None))

However in this case there is not yet support for doing anything with the None value that gets put in the list. This will break something downstream in a way that will be harder for users to figure out. So I would suggest that after the except KeyError: you should raise an exception like:

raise ValueError('Row {0} is missing a value for column {1!r}'.format(i_row, name))

You would also need to do for i_row, row in enumerate(data): in the loop above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. One question: in the second format spot, why do you have !r, not just {1}. I'm not sure why a dict key would need any special formatting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {1!r} is just a quick way to make the column name enclosed in quotes so it is more distinctive in the error message. But I just realized this may look worse for unicode, so just skip it and use {1}.

@ryanfox
Copy link
Contributor Author

ryanfox commented Mar 27, 2013

Thanks for the input! I went ahead and implemented your suggestions. I've got a commit ready, but wondering if I should rebase to squash the two commits.

Also, I'm not sure if I understand that Travis build page. It looks like it's running different combinations of Python and numpy? Per the testing guidelines I have all the tests passing (excepting 98 skipped and 3 xfailed) after a py.test run. Does that run tests for all versions? I'm currently developing on python 2.7.3 and numpy 1.7.0.

@ryanfox
Copy link
Contributor Author

ryanfox commented Mar 28, 2013

I went ahead and updated my changes with your suggestions. I squashed the commits and re-pushed as well.

@@ -1051,6 +1054,23 @@ def _init_from_list(self, data, names, dtypes, n_cols, copy):

cols = []
def_names = _auto_names(n_cols)

if data and all(isinstance(row, dict) for row in data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to specify the column order explicitly via _init_from_dict. Something like the following should work:

Instead of names = set() use names_from_data = set(). Then:

if names is None:
    names = sorted(names_from_data)

That does two things. First it defers to the user-supplied names, which then must provide an ordered list of all the desired columns. Second, if the user didn't supply names then it uses the sorted names from the data. I think sorting is good because then it provides the output in a predictable order.

@ryanfox
Copy link
Contributor Author

ryanfox commented Apr 1, 2013

Alright, I gave it one more shot. I implemented all the changes discussed above, except waiting on a reply from iguananaut on the loop counter. I'm also holding off on rebasing until the code is in an acceptable state, so I'm not rebasing every commit. Thanks for the inputs again, guys.

@taldcroft
Copy link
Member

@ryanfox - looks like you can leave the i there and go ahead and push up your most recent changes for review. Thanks for the work on this!

@ryanfox
Copy link
Contributor Author

ryanfox commented Apr 10, 2013

I pushed my latest changes, leaving i as it is for now.

@taldcroft
Copy link
Member

Thanks @ryanfox, looks good! I'm merging now. The travis failure is in sphinx-build and appears unrelated to anything in this PR.

taldcroft added a commit that referenced this pull request Apr 14, 2013
Added support for constructing a table from a list of dicts
@taldcroft taldcroft merged commit 9a6e968 into astropy:master Apr 14, 2013
@ryanfox ryanfox deleted the table_from_list_of_dicts branch April 14, 2013 16:38
@embray
Copy link
Member

embray commented Apr 15, 2013

At some point this should be updated to work on any dict-like object (isinstance(row, collections.Mapping), etc.) but this is fine for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants