Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dplyr::group_by() with tbl_json objects changes the type of ..JSON column #135

Open
ramiromagno opened this issue Feb 6, 2021 · 6 comments

Comments

@ramiromagno
Copy link
Contributor

Here ..JSON is of type character:

> tidyjson::worldbank %>% tidyjson::spread_all()
# A tibble: 500 x 9
   ..JSON       document.id boardapprovaldate  closingdate    countryshortname project_name                   regionname      totalamt `_id.$oid`    
   <chr>              <int> <chr>              <chr>          <chr>            <chr>                          <chr>              <dbl> <chr>         
 1 "{\"_id\":{…           1 2013-11-12T00:00:… 2018-07-07T00… Ethiopia         Ethiopia General Education Qu… Africa            1.30e8 52b213b38594d…
 2 "{\"_id\":{…           2 2013-11-04T00:00:… NA             Tunisia          TN: DTF Social Protection Ref… Middle East an…   0.     52b213b38594d…
 3 "{\"_id\":{…           3 2013-11-01T00:00:… NA             Tuvalu           Tuvalu Aviation Investment Pr… East Asia and …   6.06e6 52b213b38594d…
 4 "{\"_id\":{…           4 2013-10-31T00:00:… NA             Yemen, Republic… Gov't and Civil Society Organ… Middle East an…   0.     52b213b38594d…
 5 "{\"_id\":{…           5 2013-10-31T00:00:… 2019-04-30T00… Lesotho          Second Private Sector Competi… Africa            1.31e7 52b213b38594d…
 6 "{\"_id\":{…           6 2013-10-31T00:00:… NA             Kenya            Additional Financing for Cash… Africa            1.00e7 52b213b38594d…
 7 "{\"_id\":{…           7 2013-10-29T00:00:… 2019-06-30T00… India            National Highways Interconnec… South Asia        5.00e8 52b213b38594d…
 8 "{\"_id\":{…           8 2013-10-29T00:00:… NA             China            China Renewable Energy Scale-… East Asia and …   0.     52b213b38594d…
 9 "{\"_id\":{…           9 2013-10-29T00:00:… 2018-12-31T00… India            Rajasthan Road Sector Moderni… South Asia        1.60e8 52b213b38594d…
10 "{\"_id\":{…          10 2013-10-29T00:00:… 2014-12-31T00… Morocco          MA Accountability and Transpa… Middle East an…   2.00e8 52b213b38594d…
# … with 490 more rows

Now is of type list (scroll to the right):

> tidyjson::worldbank %>% tidyjson::spread_all() %>% dplyr::group_by(countryshortname)
# A tibble: 500 x 9
# Groups:   countryshortname [118]
   document.id boardapprovaldate  closingdate    countryshortname  project_name                     regionname      totalamt `_id.$oid`     ..JSON   
         <int> <chr>              <chr>          <chr>             <chr>                            <chr>              <dbl> <chr>          <list>   
 1           1 2013-11-12T00:00:… 2018-07-07T00… Ethiopia          Ethiopia General Education Qual… Africa            1.30e8 52b213b38594d… <named l…
 2           2 2013-11-04T00:00:… NA             Tunisia           TN: DTF Social Protection Refor… Middle East an…   0.     52b213b38594d… <named l…
 3           3 2013-11-01T00:00:… NA             Tuvalu            Tuvalu Aviation Investment Proj… East Asia and …   6.06e6 52b213b38594d… <named l…
 4           4 2013-10-31T00:00:… NA             Yemen, Republic … Gov't and Civil Society Organiz… Middle East an…   0.     52b213b38594d… <named l…
 5           5 2013-10-31T00:00:… 2019-04-30T00… Lesotho           Second Private Sector Competiti… Africa            1.31e7 52b213b38594d… <named l…
 6           6 2013-10-31T00:00:… NA             Kenya             Additional Financing for Cash T… Africa            1.00e7 52b213b38594d… <named l…
 7           7 2013-10-29T00:00:… 2019-06-30T00… India             National Highways Interconnecti… South Asia        5.00e8 52b213b38594d… <named l…
 8           8 2013-10-29T00:00:… NA             China             China Renewable Energy Scale-Up… East Asia and …   0.     52b213b38594d… <named l…
 9           9 2013-10-29T00:00:… 2018-12-31T00… India             Rajasthan Road Sector Moderniza… South Asia        1.60e8 52b213b38594d… <named l…
10          10 2013-10-29T00:00:… 2014-12-31T00… Morocco           MA Accountability and Transpare… Middle East an…   2.00e8 52b213b38594d… <named l…
# … with 490 more rows
@colearendt
Copy link
Owner

colearendt commented Feb 6, 2021

Thanks for reporting!! That is surprising behavior indeed. I'll take a look and see if this can be improved. To be fair, ..JSON is actually always a list, but it is a "hidden" column, so we just print a character overview of what it looks like in JSON form.

I suspect the problem that is happening here is that the "grouped data frame" is no longer a tbl_json. Do you plan to do anything else with the JSON data after doing the group_by operations? It'd be good to get a sense for the workflow you are shooting for here!

@ramiromagno
Copy link
Contributor Author

Here's some more context:

#' @importFrom rlang .data
collect_samples <- function(tbl_json) {

  samples_variants <- tbl_json %>%
    tidyjson::enter_object('samples_variants') %>%
    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

  samples_training <- tbl_json %>%
    tidyjson::enter_object('samples_training') %>%
    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

  all_samples <-
    tidyjson::bind_rows(samples_variants, samples_training) %>%
    dplyr::group_by(.data$..page, .data$array.index) %>%
    dplyr::mutate(., sample_id = seq_len(dplyr::n()), .after = 'array.index') %>%
    dplyr::arrange('sample_id', .by_group = TRUE) %>%
    dplyr::ungroup() %>%
    tidyjson::as.tbl_json(json.column = '..JSON') # Needed because of https://github.com/colearendt/tidyjson/issues/135.

  return(all_samples)
}

I found that using tidyjson::as.tbl_json(json.column = '..JSON') works as a workaround.

@colearendt
Copy link
Owner

colearendt commented Feb 6, 2021 via email

@ramiromagno
Copy link
Contributor Author

btw, I should probably create a separate issue, but something I see doing myself too often is drop a recently created index column with gather_array, like in the example above:

    tidyjson::gather_array(column.name = 'sample_id') %>%
    dplyr::select(-'sample_id')

you think it would be possible to allow that argument column.name to accept a special value that would automatically drop the index column?

@ramiromagno
Copy link
Contributor Author

Perhaps?

    tidyjson::gather_array(column.name = NULL)

@colearendt
Copy link
Owner

colearendt commented Feb 6, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants