Skip to content

eddiethedean/csvalchemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csvalchemy

A Python package for reading and writing CSV files using Pydantic models.

Overview

csvalchemy provides a clean interface for validating CSV data against Pydantic models, handling errors gracefully, and writing validated results back to CSV files. It integrates with dydactic for robust validation of data records.

Features

  • CSV Reading: Read CSV files and validate each row against Pydantic models
  • Error Handling: Continue processing even when individual rows fail validation
  • Type Safety: Full type hints and validation using Pydantic
  • CSV Writing: Write validated results back to CSV files
  • Integration: Built on dydactic for reliable validation

Dependencies

  • Python: 3.10 or higher
  • pydantic: >=2.9.2 (Data validation using Python type annotations)
  • dydactic: >=0.2.0 (Validation engine - requires Python 3.10+)
  • python-dateutil: >=2.8.0 (DateTime parsing)

Installation

pip install csvalchemy

Quick Start

from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

# Define your model
class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# Sample CSV content
csv_content = """name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com
Charlie,35,charlie@example.com
"""

# Read and validate CSV
with StringIO(csv_content) as f:
    for result in read(f, Person):
        if result.error:
            print(f"Validation error: {result.error}")
        else:
            print(f"Valid person: {result.result.name}, age {result.result.age}")

Output:

Valid person: Alice, age 30
Valid person: Bob, age 25
Valid person: Charlie, age 35

Examples

Error Handling

csvalchemy continues processing even when individual rows fail validation:

from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# CSV with some invalid rows
csv_content = """name,age,email
Alice,30,alice@example.com
Bob,not_a_number,bob@example.com
Charlie,35,charlie@example.com
Diana,not_a_number,diana@example.com
"""

with StringIO(csv_content) as f:
    valid_count = 0
    error_count = 0
    
    for result in read(f, Person):
        if result.error:
            error_count += 1
            print(f"Error on row {error_count}: {result.error}")
        else:
            valid_count += 1
            print(f"Valid: {result.result.name}")
    
    print(f"\nSummary: {valid_count} valid, {error_count} errors")

Output:

Valid: Alice
Error on row 1: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing
Valid: Charlie
Error on row 2: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing

Summary: 2 valid, 2 errors

Writing Validated CSV

Write only validated results back to CSV:

from pydantic import BaseModel
from csvalchemy import read
from io import StringIO

class Product(BaseModel):
    id: int
    name: str
    price: float
    in_stock: bool

# Input CSV
input_csv = """id,name,price,in_stock
1,Widget,19.99,True
2,Gadget,29.99,False
3,Invalid,not_a_number,True
4,Thing,39.99,True
"""

# Read and validate
input_file = StringIO(input_csv)
validator = read(input_file, Product)

# Write validated results to new CSV
output_file = StringIO()

# Recreate validator since iterator was consumed
input_file2 = StringIO(input_csv)
validator2 = read(input_file2, Product)
writer = validator2.csv_writer(output_file)

# Consume writer to trigger CSV writing
for result in writer:
    if result.error:
        print(f"Skipped invalid row: {result.error}")
    else:
        print(f"Wrote: {result.result.name}")

# Show output CSV
output_file.seek(0)
print("\n=== Output CSV ===")
print(output_file.read())

Output:

Wrote: Widget
Wrote: Gadget
Skipped invalid row: 1 validation error for Product
price
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/float_parsing
Wrote: Thing

=== Output CSV ===
id,name,price,in_stock
1,Widget,19.99,True
2,Gadget,29.99,False
4,Thing,39.99,True

Using Validator Directly

Validate data not from CSV files:

from pydantic import BaseModel
from csvalchemy import Validator
import dydactic.options

class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

# Data not from CSV
records = [
    {"name": "Alice", "age": "30", "email": "alice@example.com"},
    {"name": "Bob", "age": "not_a_number", "email": "bob@example.com"},
    {"name": "Charlie", "age": "35"},
]

# Standard validation
print("=== Using Validator directly ===")
validator = Validator(iter(records), Person)

for result in validator:
    if result.error:
        print(f"Error: {result.error}")
    else:
        print(f"Valid: {result.result.name}, age {result.result.age}")

# Skip invalid records
print("\n=== Using SKIP error option ===")
validator_skip = Validator(
    iter(records),
    Person,
    error_option=dydactic.options.ErrorOption.SKIP
)

valid_results = list(validator_skip)
print(f"Got {len(valid_results)} valid results (invalid ones skipped)")

Output:

=== Using Validator directly ===
Valid: Alice, age 30
Error: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not_a_number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing
Valid: Charlie, age 35

=== Using SKIP error option ===
Got 2 valid results (invalid ones skipped)

Integration with dydactic

csvalchemy uses dydactic as its core validation engine. The Validator and ValidatorIterator classes wrap dydactic.validate() to provide a consistent API for CSV data validation.

How it works

  1. CSV Reading: read() creates a CSVReaderValidator that reads CSV rows using Python's csv.DictReader
  2. Validation: Each row is validated using dydactic.validate(), which handles Pydantic model validation
  3. Error Handling: Validation errors are captured without stopping the iteration
  4. Result Mapping: dydactic's result objects are mapped to csvalchemy's Result type for consistent API

Benefits

  • Leverages dydactic's robust validation handling
  • Independent validation of each record (errors don't stop processing)
  • Type-safe error handling with clear error messages
  • Compatible with dydactic's validation strategies
  • Configurable error handling (RETURN, RAISE, or SKIP)
  • Support for strict validation and attribute-based validation

Configuration Options

The Validator class supports dydactic's configuration options:

  • error_option: Control how validation errors are handled:
    • RETURN (default): Errors are returned in Result.error
    • RAISE: Exceptions are raised immediately on validation errors
    • SKIP: Records with errors are skipped entirely
  • strict: Enable strict Pydantic validation
  • from_attributes: Validate from object attributes

Example:

from pydantic import BaseModel
from csvalchemy import Validator
import dydactic.options

class Person(BaseModel):
    name: str
    age: int

records = [
    {"name": "Alice", "age": "30"},
    {"name": "Bob", "age": "invalid"},
    {"name": "Charlie", "age": "35"},
]

# Default: RETURN errors
validator_return = Validator(iter(records), Person)
results_return = list(validator_return)
print(f"RETURN mode: {len(results_return)} results (including errors)")

# SKIP invalid records
validator_skip = Validator(
    iter(records),
    Person,
    error_option=dydactic.options.ErrorOption.SKIP
)
results_skip = list(validator_skip)
print(f"SKIP mode: {len(results_skip)} results (errors skipped)")

Output:

RETURN mode: 3 results (including errors)
SKIP mode: 2 results (errors skipped)

Architecture Notes

Casting and Validation

csvalchemy provides two approaches to validation:

  1. Full Validation (Recommended): Use Validator or read() which leverage dydactic's complete validation pipeline including dydactic's casting functionality. This is the primary and recommended approach for CSV validation.

  2. Standalone Casting: The cast.py module provides casting utilities similar to dydactic.cast. This module is kept for:

    • Standalone use cases that don't require full dydactic validation
    • Direct class instantiation without Pydantic models
    • Testing scenarios

Note: The main validation flow uses dydactic's casting internally, so cast.py is not used in the primary validation pipeline.

Requirements

  • Python 3.10+ (required by dydactic)
  • See pyproject.toml for complete dependency list

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages