Permalink
Browse files

updated doco

  • Loading branch information...
alimanfoo committed Jul 21, 2011
1 parent 227ea3e commit ff4e7798313f349f3521d985f5212133d1a93dbb
Showing with 81 additions and 9 deletions.
  1. +78 −6 README.txt
  2. +2 −2 csvvalidator.py
  3. +1 −1 tests.py
View
@@ -1,8 +1,80 @@
-=============
-CSV Validator
-=============
+============
+csvvalidator
+============
-CSV Validator is a small library to support validation of tabular data contained
-in delimited file formats.
+This module provides some simple utilities for validating data contained in CSV
+files, or other similar data sources.
+
+Note that the `csvvalidator` module is intended to be used in combination with
+the standard Python `csv` module. The `csvvalidator` module **will not**
+validate the *syntax* of a CSV file. Rather, the `csvvalidator` module can be
+used to validate any source of row-oriented data, such as is provided by a
+`csv.reader` object.
+
+I.e., if you want to validate data from a CSV file, you have to first construct
+a CSV reader using the standard Python `csv` module, specifying the appropriate
+dialect, and then pass the CSV reader as the source of data to either the
+`CSVValidator.validate` or the `CSVValidator.ivalidate` method.
+
+The `CSVValidator` class is the foundation for all validator objects that are
+capable of validating CSV data.
+
+You can use the CSVValidator class to dynamically construct a validator, e.g.::
+
+ import sys
+ import csv
+ from csvvalidator import *
+
+ field_names = (
+ 'study_id',
+ 'patient_id',
+ 'gender',
+ 'age_years',
+ 'age_months',
+ 'date_inclusion'
+ )
+
+ validator = CSVValidator(field_names)
+
+ # basic header and record length checks
+ validator.add_header_check('EX1', 'bad header')
+ validator.add_record_length_check('EX2', 'unexpected record length')
+
+ # some simple value checks
+ validator.add_value_check('study_id', int,
+ 'EX3', 'study id must be an integer')
+ validator.add_value_check('patient_id', int,
+ 'EX4', 'patient id must be an integer')
+ validator.add_value_check('gender', enumeration('M', 'F'),
+ 'EX5', 'invalid gender')
+ validator.add_value_check('age_years', number_range_inclusive(0, 120, int),
+ 'EX6', 'invalid age in years')
+ validator.add_value_check('date_inclusion', datetime_string('%Y-%m-%d'),
+ 'EX7', 'invalid date')
+
+ # a more complicated record check
+ def check_age_variables(r):
+ age_years = int(r['age_years'])
+ age_months = int(r['age_months'])
+ valid = (age_months >= age_years * 12 and
+ age_months % age_years < 12)
+ if not valid:
+ raise ValueError(age_years, age_months)
+ validator.add_record_check(check_age_variables,
+ 'EX8', 'invalid age variables')
+
+ # validate the data and write problems to stdout
+ data = csv.reader('/path/to/data.csv', delimiter='\t')
+ problems = validator.validate(data)
+ write_problems(problems, sys.stdout)
+
+For more complex use cases you can also sub-class `CSVValidator` to define
+re-usable validator classes for specific data sources.
+
+The source code for this module lives at:
+
+ https://github.com/alimanfoo/csvvalidator
+
+For a complete account of all of the functionality available from this module,
+see the example.py and tests.py modules in the source code repository.
-TODO finish this
View
@@ -66,8 +66,8 @@ def check_age_variables(r):
problems = validator.validate(data)
write_problems(problems, sys.stdout)
-You can also sub-class `CSVValidator` to define re-usable validator classes for
-your specific data sources.
+For more complex use cases you can also sub-class `CSVValidator` to define
+re-usable validator classes for specific data sources.
The source code for this module lives at:
View
@@ -1,5 +1,5 @@
"""
-TODO
+Tests for the `csvvalidator` module.
"""

0 comments on commit ff4e779

Please sign in to comment.