New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubiome R notebook #10

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@HadrienG

HadrienG commented May 10, 2018

Ready for testing

  • install and load necessary packages
  • build data frame containing info about public data
  • load public data files
  • exploratory analysis
  • load private taxonomy results

Additional remarks

Some of the ubiome json files are not valid json, or are in other format (csv and tab delimited). Ideally there should be a data validation when accepting the upload files (I can file an issue in the repo if you want). Anyway the r code in the notebook should handle the different formats 馃帀

initial ubiome commit
install necessary packages
build data frame from the api and public data info
try to import json
@madprime

This comment has been minimized.

Member

madprime commented May 10, 2018

On this repo, I think! https://github.com/OpenHumans/oh-ubiome-source

HadrienG added some commits May 10, 2018

fixed public data upload
the notebook can now import the public data in (valid) json, tsv and csv
@gedankenstuecke

This comment has been minimized.

Member

gedankenstuecke commented May 11, 2018

I had some trouble getting it to run on my end:

  1. the data/ folder needs to exist already, otherwise it'll crash. But that's easy enough to fix.

Then I had a problem with getting the reference data loaded in.

There's a lot of

Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'

And in the end the data_container isn't created. See the end of the error message here:

screen shot 2018-05-11 at 09 27 19

Just for fun I also tried loading my own data in the next steps but that also yielded an error:

screen shot 2018-05-11 at 09 25 15

@HadrienG

This comment has been minimized.

HadrienG commented May 13, 2018

Could you give me the whole traceback for the object taxon not found? Not sure if it happens during the json or the csv parsing.

@gedankenstuecke

This comment has been minimized.

Member

gedankenstuecke commented May 14, 2018

Sure, running

invisible(map2(jsons$download_url, paste0("data/", jsons$id, ".json"), download.file))
file.remove("data/37844.json")  # invalid json
data_path <- dir("data", pattern = '.json', full.names = TRUE)
data_container <- map(data_path, read_json_or_tbl_or_csv)

I get the following output and data_container isn't actually created.
I was wondering: did you run your code directly on notebooks.openhumans.org?

Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥7 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37870.json' file 2     9 <NA>  6 columns 7 columns 'data/37870.json' row 3    12 <NA>  6 columns 7 columns 'data/37870.json' col 4    13 <NA>  6 columns 7 columns 'data/37870.json' expected 5    15 <NA>  6 columns 8 columns 'data/37870.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥9 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 7 columns 'data/37872.json' file 2     6 <NA>  6 columns 7 columns 'data/37872.json' row 3     9 <NA>  6 columns 7 columns 'data/37872.json' col 4    10 <NA>  6 columns 8 columns 'data/37872.json' expected 5    13 <NA>  6 columns 7 columns 'data/37872.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥16 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 8 columns 'data/37874.json' file 2     4 <NA>  6 columns 7 columns 'data/37874.json' row 3     7 <NA>  6 columns 8 columns 'data/37874.json' col 4    19 <NA>  6 columns 8 columns 'data/37874.json' expected 5    20 <NA>  6 columns 8 columns 'data/37874.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥103 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37876.json' file 2     9 <NA>  6 columns 7 columns 'data/37876.json' row 3    10 <NA>  6 columns 7 columns 'data/37876.json' col 4    12 <NA>  6 columns 7 columns 'data/37876.json' expected 5    16 <NA>  6 columns 7 columns 'data/37876.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥78 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37878.json' file 2     6 <NA>  6 columns 7 columns 'data/37878.json' row 3    18 <NA>  6 columns 7 columns 'data/37878.json' col 4    20 <NA>  6 columns 7 columns 'data/37878.json' expected 5    23 <NA>  6 columns 7 columns 'data/37878.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥73 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37880.json' file 2     6 <NA>  6 columns 7 columns 'data/37880.json' row 3    19 <NA>  6 columns 7 columns 'data/37880.json' col 4    22 <NA>  6 columns 7 columns 'data/37880.json' expected 5    24 <NA>  6 columns 7 columns 'data/37880.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥87 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     7 <NA>  6 columns 7 columns 'data/37882.json' file 2     8 <NA>  6 columns 7 columns 'data/37882.json' row 3     9 <NA>  6 columns 7 columns 'data/37882.json' col 4    11 <NA>  6 columns 7 columns 'data/37882.json' expected 5    20 <NA>  6 columns 7 columns 'data/37882.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥24 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37884.json' file 2     6 <NA>  6 columns 7 columns 'data/37884.json' row 3     9 <NA>  6 columns 7 columns 'data/37884.json' col 4    19 <NA>  6 columns 7 columns 'data/37884.json' expected 5    20 <NA>  6 columns 7 columns 'data/37884.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥70 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  1 columns 2 columns 'data/37902.json' file 2    14 <NA>  1 columns 2 columns 'data/37902.json' row 3    15 <NA>  1 columns 2 columns 'data/37902.json' col 4    16 <NA>  1 columns 2 columns 'data/37902.json' expected 5    17 <NA>  1 columns 2 columns 'data/37902.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥124 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    13 <NA>  1 columns 2 columns 'data/37904.json' file 2    16 <NA>  1 columns 2 columns 'data/37904.json' row 3    18 <NA>  1 columns 2 columns 'data/37904.json' col 4    21 <NA>  1 columns 2 columns 'data/37904.json' expected 5    22 <NA>  1 columns 2 columns 'data/37904.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥79 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 3 columns 'data/37906.json' file 2    11 <NA>  1 columns 2 columns 'data/37906.json' row 3    16 <NA>  1 columns 2 columns 'data/37906.json' col 4    27 <NA>  1 columns 2 columns 'data/37906.json' expected 5    30 <NA>  1 columns 5 columns 'data/37906.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥117 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     6 <NA>  1 columns 2 columns 'data/37908.json' file 2    10 <NA>  1 columns 2 columns 'data/37908.json' row 3    13 <NA>  1 columns 2 columns 'data/37908.json' col 4    14 <NA>  1 columns 2 columns 'data/37908.json' expected 5    15 <NA>  1 columns 2 columns 'data/37908.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥92 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  6 columns 7 columns 'data/37914.json' file 2    12 <NA>  6 columns 7 columns 'data/37914.json' row 3    13 <NA>  6 columns 7 columns 'data/37914.json' col 4    17 <NA>  6 columns 7 columns 'data/37914.json' expected 5    24 <NA>  6 columns 7 columns 'data/37914.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥96 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 2 columns 'data/37922.json' file 2    11 <NA>  1 columns 2 columns 'data/37922.json' row 3    12 <NA>  1 columns 2 columns 'data/37922.json' col 4    13 <NA>  1 columns 2 columns 'data/37922.json' expected 5    17 <NA>  1 columns 2 columns 'data/37922.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漃arsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
鈥渘umber of columns of result is not a multiple of vector length (arg 1)鈥漌arning message:
鈥76 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     9 <NA>  6 columns 7 columns 'data/37944.json' file 2    12 <NA>  6 columns 7 columns 'data/37944.json' row 3    13 <NA>  6 columns 7 columns 'data/37944.json' col 4    14 <NA>  6 columns 7 columns 'data/37944.json' expected 5    19 <NA>  6 columns 7 columns 'data/37944.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
鈥漌arning message:
鈥淒uplicated column names deduplicated: '{\\n' => '{\\n_1' [28], '\\"taxon\\":' => '\\"taxon\\":_1' [29], '\\"parent\\":' => '\\"parent\\":_1' [31], '\\"count\\":' => '\\"count\\":_1' [33], '4706,\\n' => '4706,\\n_1' [34], '\\"count_norm\\":' => '\\"count_norm\\":_1' [35], '1000000,\\n' => '1000000,\\n_1' [36], '\\"tax_name\\":' => '\\"tax_name\\":_1' [37], '\\"tax_rank\\":' => '\\"tax_rank\\":_1' [39], '},\\n' => '},\\n_1' [41], '{\\n' => '{\\n_2' [42], '\\"taxon\\":' => '\\"taxon\\":_2' [43], '\\"parent\\":' => '\\"parent\\":_2' [45], '\\"count\\":' => '\\"count\\":_2' [47], '\\"count_norm\\":' => '\\"count_norm\\":_2' [49], '\\"tax_name\\":' => '\\"tax_name\\":_2' [51], '\\"tax_rank\\":' => '\\"tax_rank\\":_2' [53], '},\\n' => '},\\n_2' [55], '{\\n' => '{\\n_3' [56], '\\"taxon\\":' => '\\"taxon\\":_3' [57], '\\"parent\\":' => '\\"parent\\":_3' [59], '\\"count\\":' => '\\"count\\":_3' [61], '\\"count_norm\\":' => '\\"count_norm\\":_3' [63], '\\"tax_name\\":' => '\\"tax_name\\":_3' [65], '\\"tax_rank\\":' => '\\"tax_rank\\":_3' [67], '},\\n' => '},\\n_3' [69], '{\\n' => '{\\n_4' [70], '\\"taxon\\":' => '\\"taxon\\":_4' [71], '\\"parent\\":' => '\\"parent\\":_4' [73], '481,\\n' => '481,\\n_1' [74], '\\"count\\":' => '\\"count\\":_4' [75], '\\"count_norm\\":' => '\\"count_norm\\":_4' [77], '\\"tax_name\\":' => '\\"tax_name\\":_4' [79], '\\"tax_rank\\":' => '\\"tax_rank\\":_4' [81], '\\"genus\\"\\n' => '\\"genus\\"\\n_1' [82], '},\\n' => '},\\n_4' [83], '{\\n' => '{\\n_5' [84], '\\"taxon\\":' => '\\"taxon\\":_5' [85], '\\"parent\\":' => '\\"parent\\":_5' [87], '482,\\n' => '482,\\n_1' [88], '\\"count\\":' => '\\"count\\":_5' [89], '\\"count_norm\\":' => '\\"count_norm\\":_5' [91], '\\"tax_name\\":' => '\\"tax_name\\":_5' [93], '\\"tax_rank\\":' => '\\"tax_rank\\":_5' [96], '},\\n' => '},\\n_5' [98], '{\\n' => '{\\n_6' [99], '\\"taxon\\":' => '\\"taxon\\":_6' [100], '\\"parent\\":' => '\\"parent\\":_6' [102], '482,\\n' => '482,\\n_2' [103], '\\"count\\":' => '\\"count\\":_6' [104], '\\"count_norm\\":' => '\\"count_norm\\":_6' [106], '\\"tax_name\\":' => '\\"tax_name\\":_6' [108], '\\"Neisseria' => '\\"Neisseria_1' [109], '\\"tax_rank\\":' => '\\"tax_rank\\":_6' [111], '\\"species\\"\\n' => '\\"species\\"\\n_1' [112], '},\\n' => '},\\n_6' [113], '{\\n' => '{\\n_7' [114], '\\"taxon\\":' => '\\"taxon\\":_7' [115], '\\"parent\\":' => '\\"parent\\":_7' [117], '\\"count\\":' => '\\"count\\":_7' [119], '\\"count_norm\\":' => '\\"count_norm\\":_7' [121], '\\"tax_name\\":' => '\\"tax_name\\":_7' [123], '\\"tax_rank\\":' => '\\"tax_rank\\":_7' [126], '\\"species\\"\\n' => '\\"species\\"\\n_2' [127], '},\\n' => '},\\n_7' [128], '{\\n' => '{\\n_8' [129], '\\"taxon\\":' => '\\"taxon\\":_8' [130], '\\"parent\\":' => '\\"parent\\":_8' [132], '\\"count\\":' => '\\"count\\":_8' [134], '\\"count_norm\\":' => '\\"count_norm\\":_8' [136], '\\"tax_name\\":' => '\\"tax_name\\":_8' [138], '\\"tax_rank\\":' => '\\"tax_rank\\":_8' [140], '\\"family\\"\\n' => '\\"family\\"\\n_1' [141], '},\\n' => '},\\n_8' [142], '{\\n' => '{\\n_9' [143], '\\"taxon\\":' => '\\"taxon\\":_9' [144], '\\"parent\\":' => '\\"parent\\":_9' [146], '543,\\n' => '543,\\n_1' [147], '\\"count\\":' => '\\"count\\":_9' [148], '5,\\n' => '5,\\n_1' [149], '\\"count_norm\\":' => '\\"count_norm\\":_9' [150], '1062,\\n' => '1062,\\n_1' [151], '\\"tax_name\\":' => '\\"tax_name\\":_9' [152], '\\"tax_rank\\":' => '\\"tax_rank\\":_9' [154], '\\"genus\\"\\n' => '\\"genus\\"\\n_2' [155], '},\\n' => '},\\n_9' [156], '{\\n' => '{\\n_10' [157], '\\"taxon\\":' => '\\"taxon\\":_10' [158], '\\"parent\\":' => '\\"parent\\":_10' [160], '\\"count\\":' => '\\"count\\":_10' [162], '\\"count_norm\\":' => '\\"count_norm\\":_10' [164], '\\"tax_name\\":' => '\\"tax_name\\":_10' [166], '\\"tax_rank\\":' => '\\"tax_rank\\":_10' [168], '\\"family\\"\\n' => '\\"family\\"\\n_2' [169], '},\\n' => '},\\n_10' [170], '{\\n' => '{\\n_11' [171], '\\"taxon\\":' => '\\"taxon\\":_11' [172], '\\"parent\\":' => '\\"parent\\":_11' [174], '712,\\n' => '712,\\n_1' [175], '\\"count\\":' => '\\"count\\":_11' [176], '\\"count_norm\\":' => '\\"count_norm\\":_11' [178], '\\"tax_name\\":' => '\\"tax_name\\":_11' [180], '\\"tax_rank\\":' => '\\"tax_rank\\":_11' [182], '\\"genus\\"\\n' => '\\"genus\\"\\n_3' [183], '},\\n' => '},\\n_11' [184], '{\\n' => '{\\n_12' [185], '\\"taxon\\":' => '\\"taxon\\":_12' [186], '\\"parent\\":' => '\\"parent\\":_12' [188], '724,\\n' => '724,\\n_1' [189], '\\"count\\":' => '\\"count\\":_12' [190], '\\"count_norm\\":' => '\\"count_norm\\":_12' [192], '\\"tax_name\\":' => '\\"tax_name\\":_12' [194], '\\"tax_rank\\":' => '\\"tax_rank\\":_12' [197], '\\"species\\"\\n' => '\\"species\\"\\n_3' [198], '},\\n' => '},\\n_12' [199], '{\\n' => '{\\n_13' [200], '\\"taxon\\":' => '\\"taxon\\":_13' [201], '\\"parent\\":' => '\\"parent\\":_13' [203], '724,\\n' => '724,\\n_2' [204], '\\"count\\":' => '\\"count\\":_13' [205], '\\"count_norm\\":' => '\\"count_norm\\":_13' [207], '\\"tax_name\\":' => '\\"tax_name\\":_13' [209], '\\"Haemophilus' => '\\"Haemophilus_1' [210], '\\"tax_rank\\":' => '\\"tax_rank\\":_13' [212], '\\"species\\"\\n' => '\\"species\\"\\n_4' [213], '},\\n' => '},\\n_13' [214], '{\\n' => '{\\n_14' [215], '\\"taxon\\":' => '\\"taxon\\":_14' [216], '\\"parent\\":' => '\\"parent\\":_14' [218], '\\"count\\":' => '\\"count\\":_14' [220], '4,\\n' => '4,\\n_1' [221], '\\"count_norm\\":' => '\\"count_norm\\":_14' [222], '849,\\n' => '849,\\n_1' [223], '\\"tax_name\\":' => '\\"tax_name\\":_14' [224], '\\"tax_rank\\":' => '\\"tax_rank\\":_14' [227], '\\"species\\"\\n' => '\\"species\\"\\n_5' [228], '},\\n' => '},\\n_14' [229], '{\\n' => '{\\n_15' [230], '\\"taxon\\":' => '\\"taxon\\":_15' [231], '\\"parent\\":' => '\\"parent\\":_15' [233], '416916,\\n' => '416916,\\n_1' [234], '\\"count\\":' => '\\"count\\":_15' [235], '\\"count_norm\\":' => '\\"count_norm\\":_15' [237], '\\"tax_name\\":' => '\\"tax_name\\":_15' [239], '\\"Aggregatibacter' => '\\"Aggregatibacter_1' [240], '\\"tax_rank\\":' => '\\"tax_rank\\":_15' [242], '\\"species\\"\\n' => '\\"species\\"\\n_6' [243], '},\\n' => '},\\n_15' [244], '{\\n' => '{\\n_16' [245], '\\"taxon\\":' => '\\"taxon\\":_16' [246], '\\"parent\\":' => '\\"parent\\":_16' [248], '\\"count\\":' => '\\"count\\":_16' [250], '\\"count_norm\\":' => '\\"count_norm\\":_16' [252], '\\"tax_name\\":' => '\\"tax_name\\":_16' [254], '\\"tax_rank\\":' => '\\"tax_rank\\":_16' [256], '\\"family\\"\\n' => '\\"family\\"\\n_3' [257], '},\\n' => '},\\n_16' [258], '{\\n' => '{\\n_17' [259], '\\"taxon\\":' => '\\"taxon\\":_17' [260], '\\"parent\\":' => '\\"parent\\":_17' [262], '815,\\n' => '815,\\n_1' [263], '\\"count\\":' => '\\"count\\":_17' [264], '33,\\n' => '33,\\n_1' [265], '\\"count_norm\\":' => '\\"count_norm\\":_17' [266], '7012,\\n' => '7012,\\n_1' [267], '\\"tax_name\\":' => '\\"tax_name\\":_17' [268], '\\"tax_rank\\":' => '\\"tax_rank\\":_17' [270], '\\"genus\\"\\n' => '\\"genus\\"\\n_4' [271], '},\\n' => '},\\n_17' [272], '{\\n' => '{\\n_18' [273], '\\"taxon\\":' => '\\"taxon\\":_18' [274], '\\"parent\\":' => '\\"parent\\":_18' [276], '816,\\n' => '816,\\n_1' [277], '\\"count\\":' => '\\"count\\":_18' [278], '\\"count_norm\\":' => '\\"count_norm\\":_18' [280], '\\"tax_name\\":' => '\\"tax_name\\":_18' [282], '\\"tax_rank\\":' => '\\"tax_rank\\":_18' [285], '\\"species\\"\\n' => '\\"species\\"\\n_7' [286], '},\\n' => '},\\n_18' [287], '{\\n' => '{\\n_19' [288], '\\"taxon\\":' => '\\"taxon\\":_19' [289], '\\"parent\\":' => '\\"parent\\":_19' [291], '\\"count\\":' => '\\"count\\":_19' [293], '21,\\n' => '21,\\n_1' [294], '\\"count_norm\\":' => '\\"count_norm\\":_19' [295], '4462,\\n' => '4462,\\n_1' [296], '\\"tax_name\\":' => '\\"tax_name\\":_19' [297], '\\"tax_rank\\":' => '\\"tax_rank\\":_19' [299], '\\"genus\\"\\n' => '\\"genus\\"\\n_5' [300], '},\\n' => '},\\n_19' [301], '{\\n' => '{\\n_20' [302], '\\"taxon\\":' => '\\"taxon\\":_20' [303], '\\"parent\\":' => '\\"parent\\":_20' [305], '\\"count\\":' => '\\"count\\":_20' [307], '\\"count_norm\\":' => '\\"count_norm\\":_20' [309], '\\"tax_name\\":' => '\\"tax_name\\":_20' [311], '\\"tax_rank\\":' => '\\"tax_rank\\":_20' [313], '\\"genus\\"\\n' => '\\鈥漃arsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Warning message:
鈥淒uplicated column names deduplicated: '\\n      \\"count\\": 4706' => '\\n      \\"count\\": 4706_1' [14], '\\n      \\"count_norm\\": 1000000' => '\\n      \\"count_norm\\": 1000000_1' [15], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_1' [35], '\\n      \\"parent\\": 482' => '\\n      \\"parent\\": 482_1' [43], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_1' [47], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_2' [53], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_1' [59], '\\n      \\"count\\": 5' => '\\n      \\"count\\": 5_1' [62], '\\n      \\"count_norm\\": 1062' => '\\n      \\"count_norm\\": 1062_1' [63], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_2' [65], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_2' [71], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_3' [77], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_3' [83], '\\n      \\"parent\\": 724' => '\\n      \\"parent\\": 724_1' [85], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_4' [89], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_1' [92], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_1' [93], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_5' [95], '\\n      \\"parent\\": 416916' => '\\n      \\"parent\\": 416916_1' [97], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_6' [101], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_3' [107], '\\n      \\"count\\": 33' => '\\n      \\"count\\": 33_1' [110], '\\n      \\"count_norm\\": 7012' => '\\n      \\"count_norm\\": 7012_1' [111], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_4' [113], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_7' [119], '\\n      \\"count\\": 21' => '\\n      \\"count\\": 21_1' [122], '\\n      \\"count_norm\\": 4462' => '\\n      \\"count_norm\\": 4462_1' [123], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_5' [125], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_6' [131], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_7' [137], '\\n      \\"count\\": 11' => '\\n      \\"count\\": 11_1' [140], '\\n      \\"count_norm\\": 2337' => '\\n      \\"count_norm\\": 2337_1' [141], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_8' [143], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_8' [149], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_1' [152], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_1' [153], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_9' [155], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_9' [167], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_2' [170], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_2' [171], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_10' [173], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_1' [179], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_2' [191], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_4' [197], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_5' [203], '\\n      \\"count\\": 1518' => '\\n      \\"count\\": 1518_1' [206], '\\n      \\"count_norm\\": 322566' => '\\n      \\"count_norm\\": 322566_1' [207], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_10' [209], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_11' [215], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_1' [217], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_12' [221], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_2' [223], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_1' [224], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_1' [225], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_13' [227], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_11' [233], '\\n      \\"count\\": 271' => '\\n      \\"count\\": 271_1' [236], '\\n      \\"count_norm\\": 57586' => '\\n      \\"count_norm\\": 57586_1' [237], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_2' [242], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_2' [243], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_6' [245], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_12' [251], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_3' [254], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_3' [255], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_13' [257], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_1' [263], '\\n      \\"count\\": 210' => '\\n      \\"count\\": 210_1' [266], '\\n      \\"count_norm\\": 44623' => '\\n      \\"count_norm\\": 44623_1' [267], '\\n      \\"tax_rank\\": \\"order\\"\\n    }' => '\\n      \\"tax_rank\\": \\"order\\"\\n    }_1' [269], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_14' [275], '\\n      \\"count\\": 38' => '\\n      \\"count\\": 38_1' [278], '\\n      \\"count_norm\\": 8074' => '\\n      \\"count_norm\\": 8074_1' [279], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_7' [281], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_3' [284], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_3' [285], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_15' [287], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_2' [290], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_2' [291], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_16' [293], '\\n      \\"parent\\": 838' => '\\n      \\"parent\\": 838_1' [295], '\\n      \\"count\\": 14' => '\\n      \\"count\\": 14_1' [296], '\\n      \\"count_norm\\": 2974' => '\\n      \\"count_norm\\": 2974_1' [297], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_17' [299], '\\n      \\"parent\\": 1224' => '\\n      \\"parent\\": 1224_1' [301], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_2' [305], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_14' [311], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_3' [314], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_3' [315], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_3' [317], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_8' [323], '\\n      \\"parent\\": 2' => 鈥漃arsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Error in FUN(X[[i]], ...): object 'taxon' not found
Traceback:

1. map(data_path, read_json_or_tbl_or_csv)
2. .f(.x[[i]], ...)
3. tryCatch({
 .     jsonlite::fromJSON(data_file)$ubiome_bacteriacounts %>% select(taxon, 
 .         parent, tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 . }, error = function(cond) {
 .     paste(cond)
 .     tryCatch({
 .         tab <- read_table2(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         assertthat::assert_that(ncol(tab) > 1)
 .         return(tab)
 .     }, error = function(bla) {
 .         paste(bla)
 .         csv <- read_csv(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         return(csv)
 .     })
 . })   # at line 3-25 of file <text>
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)
7. tryCatch({
 .     tab <- read_table2(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     assertthat::assert_that(ncol(tab) > 1)
 .     return(tab)
 . }, error = function(bla) {
 .     paste(bla)
 .     csv <- read_csv(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     return(csv)
 . })   # at line 10-23 of file <text>
8. tryCatchList(expr, classes, parentenv, handlers)
9. tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. value[[3L]](cond)
11. read_csv(data_file) %>% select(taxon, parent, tax_name, tax_rank, 
  .     count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
  .     ".", fixed = TRUE)[[1]][1])   # at line 19-21 of file <text>
12. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. eval(quote(`_fseq`(`_lhs`)), env, env)
15. `_fseq`(`_lhs`)
16. freduce(value, `_function_list`)
17. function_list[[i]](value)
18. select(., taxon, parent, tax_name, tax_rank, count, count_norm)
19. select.data.frame(., taxon, parent, tax_name, tax_rank, count, 
  .     count_norm)
20. select_vars(names(.data), !(!(!quos(...))))
21. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
22. map(.x[matches], .f, ...)
23. lapply(.x, .f, ...)
24. FUN(X[[i]], ...)
@HadrienG

This comment has been minimized.

HadrienG commented May 14, 2018

thanks for the traceback. Yes I did run it on openhumans.

I'll take a look, but I think it's because one of the json files has unquoted strings (which is not valid json)

The warnings can safely be ignored, maybe I should wrap the call in an invisible(). Or do you know a jupyter way to suppress the warnings for one cell? Like Rmarkdown {r cell_name warnings=FALSE}

@gedankenstuecke

This comment has been minimized.

Member

gedankenstuecke commented May 14, 2018

Great, thanks so much! I was just asking as otherwise different package versions might be to blame for some weird behavior.

I'm not sure how to best turn off the warnings (my own R skills are basically limited to making things look nice thanks to the ggplot2 universe). According to google options(warn=-1) might do the trick to turn the warnings off globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment