In [None]:
from metapool import KLSampleSheet, validate_sample_sheet

# Knight Lab Sample Sheet Validation

This notebook is designed to validate and troubleshoot sample sheets of externally generated plates.

The steps are as follows:

1. Parse sample sheet.
1. Check that all the required columns in the `Data` section are present.
1. Check that the `Bioinformatics` and `Contact` section are present.
1. Validate and scrub sample identifiers so they are compliant with Illumina's `bcl2fastq` software.
    - Automatically replace non-allowed characters for underscores.
    - Flag non-unique sample identifiers.
1. Check that lane values are not empty.
1. Check that projects in the `Data`, `Bioinformatics` and `Contact` sections are all valid.
1. Validate the Qiita study identifier suffix at the end of every project name.
1. Save the parsed file in a compliant format.

**Note**: warning and error messages (text highlighted in red) will inform you of any problems that may come up.

**Enter the correct path to the sample sheet you want to validate**, replace the path to `good-sample-sheet.csv` for the location of the sheet you want to validate.

In [None]:
sheet = KLSampleSheet('metapool/tests/data/good-sample-sheet.csv')
valid_sheet = validate_sample_sheet(sheet)

If there are any error messages, please correct the sample sheet and re-run the cell above. Once you are happy with the results run the cell below, otherwise you will see an exception below.

In [None]:
with open('validated-sample-sheet.csv', 'w') as f:
    valid_sheet.write(f)