Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results if the data is not sorted alphabetically #29

Open
AlexWeinreb opened this issue May 2, 2022 · 0 comments
Open

Wrong results if the data is not sorted alphabetically #29

AlexWeinreb opened this issue May 2, 2022 · 0 comments

Comments

@AlexWeinreb
Copy link

During the standard analysis, filter_missval() silently sorts the rowData alphabetically. If we skip this step, this results in a rowData where the rownames and the name column contain the same information, but add_rejections() later sorts on the name column without reordering rownames, getting values that do not match. This creates problems later on when functions such as single_plot() use subset(), which work on rownames, then tries to use the name column.

For example, with the steps from ?add_rejections:

# Load example
data <- UbiLength
data <- data[data$Reverse != "+" & data$Potential.contaminant != "+",]
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

# Make SummarizedExperiment
columns <- grep("LFQ.", colnames(data_unique))
exp_design <- UbiLength_ExpDesign
se <- make_se(data_unique, columns, exp_design)

# Filter, normalize and impute missing values
# filt <- filter_missval(se, thr = 0)  # <- no filtering step: no sorting!
norm <- normalize_vsn(se)
imputed <- impute(norm, fun = "MinProb", q = 0.01)

# Test for differentially expressed proteins
diff <- test_diff(imputed, "control", "Ctrl")
dep <- add_rejections(diff, alpha = 0.05, lfc = 1)

Then the resulting object has wrong names:

SummarizedExperiment::rowData(dep, use.names = TRUE) |>
  as.data.frame() |> head() |> dplyr::select(name, ID)
#>             name       ID
#> RBM47   A7E2Y1-3 A7E2Y1-3
#> UBA6        AAAS   Q9NRG9
#> ILVBL      AAMDC Q9H7C9-3
#> FAM92A1     AAR2   Q9Y312
#> ACO2       ABCB6   H7BXK9
#> RPRD1B     ABCB7 O75027-3


plot_single(dep, proteins = c("A7E2Y1-3","AAAS"))
#> Warning: 6 parsing failures.
#> row col           expected actual
#>   1  -- value in level set  ACTR3
#>   2  -- value in level set  ACTR3
#>   3  -- value in level set  ACTR3
#>   4  -- value in level set  TUFM 
#>   5  -- value in level set  TUFM 
#> ... ... .................. ......
#> See problems(...) for more details.

For users: always use filter_missval(), even with a high threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant