Skip to content

Commit

Permalink
Merge pull request #6 from biowdl/extrapropsoverride
Browse files Browse the repository at this point in the history
Add extra feature that makes it easier to add properties for samples in csv format.
  • Loading branch information
rhpvorderman committed Oct 8, 2019
2 parents 7509531 + 40b5733 commit 9ba23a9
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 4 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Changelog
0.2.0-dev
---------------
+ Make sure only one line of additional properties per sample is need in a
csv file.
+ Fix a bug where an empty field for an additional property in a csv
samplesheet would be defined as ``""`` instead of ``None``.

Expand Down
19 changes: 19 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,25 @@ Additional properties at the sample level can be set using additional columns:
"s1","lib1","rg1","r1_1.fq",,"r1_2.fq",,"yes","pizza"
"s2","lib1","rg1","r2_1.fq",,"r2_2.fq",,"no","broccoli"
Additional properties for the same sample only have to be defined in one line.
This saves a lot of duplication for samples with a high readgroup or library
count an makes it easier to read the file.

.. code-block:: text
"sample","library","readgroup","R1","R1_md5","R2","R2_md5","HiSeq4000","other_property"
"s1","lib1","rg1","r1_1.fq",,"r1_2.fq",,"yes","pizza"
"s1","lib1","rg2","r1_1.fq",,"r1_2.fq",,,
"s1","lib2","rg1","r1_1.fq",,"r1_2.fq",,,
"s2","lib1","rg1","r2_1.fq",,"r2_2.fq",,"no","broccoli"
"s2","lib1","rg2","r2_1.fq",,"r2_2.fq",,,
"s2","lib1","rg3","r2_1.fq",,"r2_2.fq",,,
If an additional column is filled with two conflicting values for the same
sample an error will be thrown.

Creating comma-delimited files
------------------------------
These files can be easily generated using a spreadsheet program (such as
Microsoft Excel or LibreOffice Calc).

Expand Down
22 changes: 18 additions & 4 deletions src/biowdl_input_converter/input_conversions.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,22 @@ def samplesheet_csv_to_samplegroup(samplesheet_file: Path) -> SampleGroup:
}
# Add all remaining properties to additional properties at the
# sample level
samples[sample]["additional_properties"] = {
key: value if value != "" else None
for key, value in row_dict.items()
}
if "additional_properties" not in samples[sample].keys():
samples[sample]["additional_properties"] = {}
for key, value in row_dict.items():
existing_value = samples[sample][
"additional_properties"].get(key, None)

updated_value = value if value != "" else None

if existing_value is None:
samples[sample]["additional_properties"][key] = updated_value
else:
if (updated_value is not None
and existing_value != updated_value):
raise ValueError(
f"Conflicting fields in column '{key}' for sample "
f"'{sample}'!"
)

return SampleGroup.from_dict_of_dicts(samples)
3 changes: 3 additions & 0 deletions tests/files/conflicting_properties.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
"sample","library","readgroup","R1","R1_md5","R2","R2_md5","extra_field1"
"s1","lib1","rg1","r1.fq","hello","r2.fq","hey","xf1"
"s1","lib2","rg1","r1.fq","aa","r2.fq","bb","xfI"
3 changes: 3 additions & 0 deletions tests/files/mixed_empty_filled_addprops.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
"sample","library","readgroup","R1","R1_md5","R2","R2_md5","extra_field1","extra_field2"
"s1","lib1","rg1","r1.fq","hello","r2.fq","hey","xf1","xf2"
"s1","lib2","rg1","r1.fq","aa","r2.fq","bb","xf1",
15 changes: 15 additions & 0 deletions tests/test_import_samplesheet_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,21 @@ def test_extra_field():
assert samplesheet[1].additional_properties["extra_field2"] is None


def test_mixed_empty_and_filled_additional_properties():
samplesheet = samplesheet_csv_to_samplegroup(
FILESDIR / Path("mixed_empty_filled_addprops.csv"))
assert len(samplesheet.samples) == 1
assert samplesheet[0].additional_properties["extra_field2"] == "xf2"


def test_conflicting_properties():
with pytest.raises(ValueError) as error:
samplesheet_csv_to_samplegroup(
FILESDIR / Path("conflicting_properties.csv")
)
error.match("Conflicting fields in column 'extra_field1' for sample 's1'")


def test_duplicate_readgroup():
with pytest.raises(ValueError) as error:
samplesheet_csv_to_samplegroup(
Expand Down

0 comments on commit 9ba23a9

Please sign in to comment.