Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregate datasets into useful structure before returning #31

Closed
4 tasks done
katrinleinweber opened this issue Jan 25, 2018 · 3 comments
Closed
4 tasks done

aggregate datasets into useful structure before returning #31

katrinleinweber opened this issue Jan 25, 2018 · 3 comments
Assignees

Comments

@katrinleinweber
Copy link
Collaborator

katrinleinweber commented Jan 25, 2018

noticed while working on #16

retrieve_data() currently appends multiple downloads into a continuous list in which the datasets can't be addressed anymore. We need a data structure, that lets the user $-address the datasets, and their fields. Ideally, each dataset is referred to by index = bacdive_id. Something like a sparse list-of-lists?!?

ideas:

  • aggregate JSON strings in character vector, then rjson::fromJSON() them "in-place" or somehow that creates the nested lists "below / as lower hierarchies" of that vector
  • write-out each dataset to a file (kind of a local cache), then maybe concatenate files & re-import as a useful data structure
  • use jsonlite to create 1 dataframe per bacdive_ID, then add those to a list
  • keep on c()ombining downloads, but aggregate into a higher-level list and use an apply variant to extract a field/element from the resulting "megastructure"
@katrinleinweber
Copy link
Collaborator Author

jsonlite::fromJSON(…, flatten = TRUE) and simplifyDataFrame = TRUE both still return a list of nested lists with DFs as "leaves". Still need to work out how to extract a field/element (say culture_growth_condition$culture_temp$temp from a combination of these list-of-lists :-/

screen shot 2018-03-12 at 16 09 58

@katrinleinweber
Copy link
Collaborator Author

@sckott: Hello, and thanks for your advice! I got over this data structure problem :-)

@katrinleinweber
Copy link
Collaborator Author

For comparison with the above screen shot: between

a) data above / Bac_hal_data in this example, and
c) the lists (taxonomy_name, morphology_physiology, …, environment_sampli…, etc.) within the datasets, is now
b) a list-of-list for each dataset, named by its numeric BacDive ID (1095 & 1847)

screen shot 2018-04-18 at 16 44 09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant