Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in a dataset with 78550 rows 10 columns #6

Closed
joniarroba opened this issue Jun 16, 2019 · 4 comments
Closed

Error in a dataset with 78550 rows 10 columns #6

joniarroba opened this issue Jun 16, 2019 · 4 comments
Assignees

Comments

@joniarroba
Copy link

While running the full dataset it gives the following errors.

datafrem characteristics:
df = 78550 rows 10 columns

str(df)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 78550 obs. of 10 variables:
$ date : Date, format: "2015-02-15" "2015-02-15" "2015-02-15" ...
$ author : Factor w/ 5 levels "‎ vazio","joni",..: 2 2 3 3 2 3 3 2 3 2 ...
$ message : chr "Oi Nubi" "Bom dia" "Bom dia!" "\U0001f60a" ...
$ msn_lengh : int 7 7 9 1 52 30 11 34 24 31 ...
$ day : int 15 15 15 15 15 15 15 15 15 15 ...
$ week : num 7 7 7 7 7 7 7 7 7 7 ...
$ month : num 2 2 2 2 2 2 2 2 2 2 ...
$ year : num 2015 2015 2015 2015 2015 ...
$ question_flag: chr "N" "N" "N" "N" ...
$ laughs : chr "N" "N" "N" "N" ...

Erros message

inspect_cat(df)

Column (2/5): authorError: Tibble columns must have consistent lengths, only values of length one are recycled:

  • Length 6: Column value
  • Length 11: Column prop
    Call rlang::last_error() to see a backtrace

When I sample it to 10k rows it works. Still looking around over the problem.

@alastairrushworth
Copy link
Owner

alastairrushworth commented Jun 17, 2019

Hi @joniarroba thanks for the feedback.

It looks like the column author is causing the issue. So I can identify the issue, could you paste the output of the following commands:

  • table(df$author, useNA = 'ifany')
  • levels(df$author)
  • rlang::last_error()

Thanks.

@alastairrushworth alastairrushworth self-assigned this Jun 17, 2019
@joniarroba
Copy link
Author

joniarroba commented Jun 27, 2019

Hi @alastairrushworth , thanks for the support, here we go with the messages!

First

table(df$author, useNA = 'ifany')

‎ vazio joni nubia ‎vazio vazio
0 42333 36195 0 0

Second

levels(df$author)
[1] "‎ vazio" "joni" "nubia" "‎vazio" "vazio"

Third

rlang::last_error()

message: Tibble columns must have consistent lengths, only values of length one are recycled:

  • Length 8: Column value
  • Length 12: Column prop
    class: rlang_error
    backtrace:
  1. inspectdf::inspect_cat(whats)
  2. inspectdf:::fast_table(df_cat[[i]], show_na = TRUE, show_cnt = TRUE)
  3. tibble::tibble(value = vals, prop = freq/length(v))
  4. tibble:::lst_to_tibble(xlq$output, .rows, .name_repair, lengths = xlq$lengths)
  5. tibble:::recycle_columns(x, .rows, lengths)
    Call rlang::last_trace() to see the full backtrace

In the mean I while find a way to solve it!

@alastairrushworth
Copy link
Owner

Hi @joniarroba thanks for the update! Glad you've found a way to solve this.

I think the problem is because there are duplicated factor levels in df$author - vazio appears twice in levels(df$author). I think think you can fix this by resetting the levels using something like

levels(df$author) <- unique(levels(df$author))

In the meantime, I'll open a new issue to handle this internally with a warning in inspectdf.

Thanks!

@alastairrushworth
Copy link
Owner

I've added a factor check to the GH version of the package. So hopefully this fixes the problem you were having. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants