-
Notifications
You must be signed in to change notification settings - Fork 0
# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #127 +/- ##
==========================================
+ Coverage 52.35% 52.57% +0.21%
==========================================
Files 22 22
Lines 2611 2623 +12
==========================================
+ Hits 1367 1379 +12
Misses 1244 1244
Continue to review full report at Codecov.
|
| data_arrays = [] | ||
| for all_cell_model in self.set_data_array_cells(self.header[1:], ObjectId()): | ||
| # Cell names are formatted differently in R files | ||
| cell_names = self.header if self.is_r_file else self.header[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this enhancement - it makes filter_expression_scores much more regular.
jlchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting minor change to test_filter_expression_scores. Otherwise looks good.
| float(expression_score) | ||
| ): | ||
| valid_expression_scores.append(expression_score) | ||
| # add one to account for gene name in scores list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now out of date and should be removed now that cell_names is being passed in instead of self.header for cells.
tests/test_dense.py
Outdated
| scores = ["BRCA1", 4, 0, 3, "0", None, "", " ", "nan", "NaN", "Nan", "NAN"] | ||
| cells = ["foo", "foo2", "foo3", "foo4", "foo5", "foo6", "foo7"] | ||
| cells = ["foo", "foo2", "foo3", "foo4", "foo5"] | ||
| actual_filtered_values, actual_filtered_cells = DenseIngestor.filter_expression_scores( | ||
| scores[1:], cells | ||
| ) | ||
| self.assertEqual([4, 3], actual_filtered_values) | ||
| self.assertEqual(["foo2", "foo4"], actual_filtered_cells) | ||
| self.assertEqual(["foo", "foo3"], actual_filtered_cells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test here should have another valid value after all the values that are expected to be filtered to demonstrate that filtered values and cells names do not get "out of sync" during filtering because of the non-numeric NaN values.
devonbush
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve pending response to Jean's comments
The transform function did not take into account r formatted files. An index/out of range error would've occurred due to the length header being one less than the other rows.