Customer Analytics is a scalable, chunked data processing pipeline written in Python. It efficiently ingests, validates, transforms, and analyzes large customer purchase datasets β with support for schema enforcement, aggregation, and logging.
- β Chunked processing of large CSVs
- β
Data validation with
pandera - β Automatic logging of data size and memory usage
- β Cleaning and normalization of customer data
- β Aggregations: revenue, basket size, and customer counts
- β
ISO 3166-1 alpha-2 country code validation with
pycountry - β
Testable with
pytest