# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127

knapii-developments · 2020-08-11T17:51:11Z

The transform function did not take into account r formatted files. An index/out of range error would've occurred due to the length header being one less than the other rows.

codecov · 2020-08-13T18:53:27Z

Codecov Report

Merging #127 into master will increase coverage by 0.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
+ Coverage   52.35%   52.57%   +0.21%     
==========================================
  Files          22       22              
  Lines        2611     2623      +12     
==========================================
+ Hits         1367     1379      +12     
  Misses       1244     1244

Impacted Files	Coverage Δ
ingest/expression_files/dense_ingestor.py	`96.45% <100.00%> (+0.32%)`	⬆️
ingest/expression_files/expression_files.py	`81.57% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d713a04...db2ddc5. Read the comment docs.

bistline · 2020-08-13T19:08:27Z

ingest/expression_files/dense_ingestor.py

        data_arrays = []
-        for all_cell_model in self.set_data_array_cells(self.header[1:], ObjectId()):
+        # Cell names are formatted differently in R files
+        cell_names = self.header if self.is_r_file else self.header[1:]


Like this enhancement - it makes filter_expression_scores much more regular.

jlchang

Requesting minor change to test_filter_expression_scores. Otherwise looks good.

jlchang · 2020-08-13T22:37:38Z

ingest/expression_files/dense_ingestor.py

                        float(expression_score)
                    ):
                        valid_expression_scores.append(expression_score)
                        # add one to account for gene name in scores list


This comment is now out of date and should be removed now that cell_names is being passed in instead of self.header for cells.

jlchang · 2020-08-14T13:17:44Z

tests/test_dense.py

        scores = ["BRCA1", 4, 0, 3, "0", None, "", "   ", "nan", "NaN", "Nan", "NAN"]
-        cells = ["foo", "foo2", "foo3", "foo4", "foo5", "foo6", "foo7"]
+        cells = ["foo", "foo2", "foo3", "foo4", "foo5"]
        actual_filtered_values, actual_filtered_cells = DenseIngestor.filter_expression_scores(
            scores[1:], cells
        )
        self.assertEqual([4, 3], actual_filtered_values)
-        self.assertEqual(["foo2", "foo4"], actual_filtered_cells)
+        self.assertEqual(["foo", "foo3"], actual_filtered_cells)


The test here should have another valid value after all the values that are expected to be filtered to demonstrate that filtered values and cells names do not get "out of sync" during filtering because of the non-numeric NaN values.

devonbush

Approve pending response to Jean's comments

knapii-developments added 3 commits August 11, 2020 11:31

Filter for r files

87d510f

Modify tests

15075ca

Fix duplicate genes test

9e092b3

Merge

e419cc3

knapii-developments marked this pull request as ready for review August 13, 2020 18:59

knapii-developments requested review from bistline, devonbush and jlchang August 13, 2020 18:59

bistline approved these changes Aug 13, 2020

View reviewed changes

jlchang suggested changes Aug 14, 2020

View reviewed changes

devonbush approved these changes Aug 17, 2020

View reviewed changes

Add review suggestions

5f41495

knapii-developments requested a review from jlchang August 17, 2020 15:41

knapii-developments and others added 4 commits August 17, 2020 11:41

Merge branch 'master' into ea-r-validations

8856003

Adjust r format logic

4a0dbf2

Merge

bfa76d7

Add confidence

db2ddc5

jlchang approved these changes Aug 17, 2020

View reviewed changes

knapii-developments merged commit 9b90193 into master Aug 17, 2020

knapii-developments deleted the ea-r-validations branch October 16, 2020 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127

# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127

Uh oh!

knapii-developments commented Aug 11, 2020

Uh oh!

codecov bot commented Aug 13, 2020 •

edited

Loading

Uh oh!

bistline Aug 13, 2020

Uh oh!

jlchang left a comment

Uh oh!

jlchang Aug 13, 2020

Uh oh!

jlchang Aug 14, 2020

Uh oh!

devonbush left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127

# SCP 2657 Add checks for R formatted files for dense matrix file format validations #127

Uh oh!

Conversation

knapii-developments commented Aug 11, 2020

Uh oh!

codecov bot commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bistline Aug 13, 2020

Choose a reason for hiding this comment

Uh oh!

jlchang left a comment

Choose a reason for hiding this comment

Uh oh!

jlchang Aug 13, 2020

Choose a reason for hiding this comment

Uh oh!

jlchang Aug 14, 2020

Choose a reason for hiding this comment

Uh oh!

devonbush left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Aug 13, 2020 •

edited

Loading