-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
The validation system (geojson-validator.ts, swe-validator.ts, constraint-validator.ts, sensorml-validator.ts - total ~1,384 lines with 152 tests across 3 validation systems) has no performance benchmarking despite functional testing coverage. This means:
- No validation overhead data: Unknown cost of enabling validation (
options.validate = true) - No validator comparison: Unknown which validator is fastest/slowest (GeoJSON vs SWE vs SensorML)
- No constraint cost: Unknown overhead of deep constraint validation (intervals, patterns, significant figures)
- No throughput data: Unknown how many features can be validated per second
- No scaling data: Unknown how validation performance scales with collection size or nesting depth
- No optimization data: Cannot make informed decisions about validation strategies
Real-World Impact:
- Request validation: Should validation always be enabled? Performance cost unknown
- Batch processing: Validating 1,000+ features - acceptable latency unknown
- Strict vs permissive mode: Performance difference unknown
- Embedded devices: CPU overhead must stay within limits
- Server-side: Validation throughput affects scalability and cost
- Constraint validation: Deep validation cost unknown (intervals, patterns, significant figures)
Context
This issue was identified during the comprehensive validation conducted January 27-28, 2026.
Related Validation Issues: #12 (GeoJSON Validation), #14 (SWE Common Validation), #15 (SensorML Validation)
Work Item ID: 35 from Remaining Work Items
Repository: https://github.com/OS4CSAPI/ogc-client-CSAPI
Validated Commit: a71706b9592cad7a5ad06e6cf8ddc41fa5387732
Detailed Findings
1. No Performance Benchmarks Exist
Evidence from validation issues:
Issue #12 (GeoJSON): 61 tests, 40.95% coverage (claimed 97.4%)
Issue #14 (SWE Common): 78 tests (50 swe-validator + 28 constraint-validator), 73.68% coverage (claimed 100%)
Issue #15 (SensorML): 13 tests, coverage data not available
Total: 152 tests across 3 validation systems
Current Situation:
- ✅ Excellent functional tests (152 tests total)
- ✅ Comprehensive validation logic
- ❌ ZERO performance measurements (no ops/sec, latency, overhead data)
- ❌ No throughput benchmarks
- ❌ No validation overhead analysis
- ❌ No constraint validation cost data
2. Three Validation Systems (Performance Patterns Unknown)
From Issue #12, #14, #15 validation reports:
GeoJSON Validator (376 lines, 61 tests, 40.95% coverage):
- Pattern: Manual property checking, no schema validation
- Features: 7 feature type validators, collection validators (0% coverage)
- Complexity: Simple type checking + required property validation
- Unknown: Overhead per feature, collection validation performance
export function validateSystemFeature(data: unknown): ValidationResult {
const errors: string[] = [];
if (!isFeature(data)) {
errors.push('Object is not a valid GeoJSON Feature');
return { valid: false, errors };
}
if (!hasCSAPIProperties(data.properties)) {
errors.push('Missing required CSAPI properties (featureType, uid)');
return { valid: false, errors };
}
const props = data.properties as any;
if (props.featureType !== 'System') {
errors.push(`Expected featureType 'System', got '${props.featureType}'`);
}
return { valid: errors.length === 0, errors: errors.length > 0 ? errors : undefined };
}Performance Questions:
- How many features/sec can be validated?
- Is collection validation proportionally slower (untested)?
- What's the cost of each property check?
SWE Common Validator (357 lines swe-validator + 312 lines constraint-validator, 78 tests, 73.68% coverage):
- Pattern: Component-specific validation + optional deep constraint validation
- Features: 9 component validators, 6 constraint validators
- Complexity: Type checking + UoM validation + interval checking + pattern matching + significant figures
- Unknown: Constraint validation overhead, recursive validation cost
export function validateQuantity(data: unknown, validateConstraints = true): ValidationResult {
const errors: ValidationError[] = [];
if (!hasDataComponentProperties(data)) {
errors.push({ message: 'Missing required DataComponent properties' });
return { valid: false, errors };
}
const component = data as any;
if (component.type !== 'Quantity') {
errors.push({ message: `Expected type 'Quantity', got '${component.type}'` });
}
if (!component.uom) {
errors.push({ message: 'Missing required property: uom' });
}
// Perform deep constraint validation if value is present
if (validateConstraints && component.value !== undefined && component.value !== null && errors.length === 0) {
const constraintResult = validateQuantityConstraint(component as QuantityComponent, component.value);
if (!constraintResult.valid && constraintResult.errors) {
errors.push(...constraintResult.errors);
}
}
return { valid: errors.length === 0, errors: errors.length > 0 ? errors : undefined };
}Constraint Validation Example:
export function validateQuantityConstraint(
component: QuantityComponent | QuantityRangeComponent,
value: number
): ValidationResult {
if (!component.constraint) {
return { valid: true };
}
const errors: ValidationError[] = [];
const { intervals, values: allowedValues, significantFigures } = component.constraint;
// Check interval constraints
if (intervals && intervals.length > 0) {
const inAnyInterval = intervals.some(([min, max]) => value >= min && value <= max);
if (!inAnyInterval) {
errors.push({
path: 'value',
message: `Value ${value} is outside allowed intervals: ${JSON.stringify(intervals)}`,
});
}
}
// Check significant figures constraint
if (significantFigures !== undefined && significantFigures > 0) {
const actualSigFigs = getSignificantFigures(value);
if (actualSigFigs > significantFigures) {
errors.push({
path: 'value',
message: `Value ${value} has ${actualSigFigs} significant figures, maximum allowed is ${significantFigures}`,
});
}
}
return errors.length > 0 ? { valid: false, errors } : { valid: true };
}Performance Questions:
- What's the cost of constraint validation? 10%? 50%? 100% overhead?
- Is interval checking expensive (array iteration)?
- Is significant figures calculation expensive (string manipulation)?
- Is pattern/regex validation expensive?
- Should
validateConstraintsdefault to true or false?
SensorML Validator (339 lines, 13 tests, coverage N/A):
- Pattern: Hierarchical validation (type-specific → AbstractProcess → DescribedObject)
- Features: 4 process type validators, deployment validator, derived property validator
- Complexity: Deep nesting (PhysicalSystem → AbstractPhysicalProcess → AbstractProcess → DescribedObject)
- Unknown: Hierarchical validation overhead, async overhead
export async function validateSensorMLProcess(
process: SensorMLProcess
): Promise<ValidationResult> {
const errors: string[] = [];
const warnings: string[] = [];
try {
if (!process.type) {
errors.push('Missing required property: type');
}
switch (process.type) {
case 'PhysicalSystem':
validatePhysicalSystem(process as any, errors, warnings);
break;
case 'PhysicalComponent':
validatePhysicalComponent(process as any, errors, warnings);
break;
case 'SimpleProcess':
validateSimpleProcess(process as any, errors, warnings);
break;
case 'AggregateProcess':
validateAggregateProcess(process as any, errors, warnings);
break;
default:
errors.push(`Unknown process type: ${(process as any).type}`);
}
validateDescribedObject(process, errors, warnings);
} catch (error) {
errors.push(`Validation error: ${error}`);
}
return {
valid: errors.length === 0,
errors: errors.length > 0 ? errors : undefined,
warnings: warnings.length > 0 ? warnings : undefined,
};
}Hierarchical Validation Example:
function validatePhysicalSystem(system: any, errors: string[], warnings: string[]): void {
validateAbstractPhysicalProcess(system, errors, warnings); // Parent validator
if (system.components && !Array.isArray(system.components)) {
errors.push('components must be an array');
}
if (system.connections && !Array.isArray(system.connections)) {
errors.push('connections must be an array');
}
if (system.components && system.components.length === 0) {
warnings.push('PhysicalSystem has no components');
}
}Performance Questions:
- What's the cost of hierarchical validation (4 levels deep)?
- Is async overhead significant (all functions return Promises)?
- How many property checks per validation?
- Should synchronous validation be offered for performance?
3. Unknown Validation Strategy Performance
From parser integration (Issue #10):
Optional validation during parsing:
parse(data: unknown, options: ParserOptions = {}): ParseResult<T> { // ... parsing logic ... // Validate if requested if (options.validate) { const validationResult = this.validate(parsed, format.format); if (!validationResult.valid) { errors.push(...(validationResult.errors || [])); if (options.strict) { throw new CSAPIParseError(`Validation failed: ${errors.join(', ')}`, format.format); } } warnings.push(...(validationResult.warnings || [])); } return { data: parsed, format, errors, warnings }; }
Three Validation Strategies:
- No validation:
options.validate = false(default for performance?) - Permissive validation:
options.validate = true, options.strict = false(collect errors, don't throw) - Strict validation:
options.validate = true, options.strict = true(throw on first error)
Performance Questions:
- What's the overhead of each strategy?
- Is strict mode faster (early return)?
- Should validation default to enabled or disabled?
- When should users enable validation (dev vs prod)?
4. Unknown Collection Validation Performance
From Issue #12:
Collection validators exist but have 0% coverage:
validateSystemFeatureCollection()- 0 callsvalidateDeploymentFeatureCollection()- 0 calls- All 7 collection validators - 0 invocations
Collection Validation Pattern:
export function validateSystemFeatureCollection(
data: unknown
): ValidationResult {
const errors: string[] = [];
if (!isFeatureCollection(data)) {
errors.push('Object is not a valid GeoJSON FeatureCollection');
return { valid: false, errors };
}
const collection = data as FeatureCollection;
const features = collection.features || [];
features.forEach((feature: unknown, index: number) => {
const result = validateSystemFeature(feature);
if (!result.valid) {
errors.push(`Feature at index ${index}: ${result.errors?.join(', ')}`);
}
});
return { valid: errors.length === 0, errors: errors.length > 0 ? errors : undefined };
}Performance Questions:
- Does validation scale linearly with collection size?
- At what collection size does it become slow?
- Should large collections be validated in chunks?
- What's the memory overhead (error accumulation)?
5. Unknown Constraint Validation Cost
From Issue #14:
6 constraint validators implemented:
- validateQuantityConstraint: Interval checking, discrete values, significant figures
- validateCountConstraint: Integer intervals, discrete values
- validateTextConstraint: Pattern/regex matching, token lists
- validateCategoryConstraint: Token list matching
- validateTimeConstraint: Temporal intervals, ISO 8601 parsing
- validateRangeConstraint: Range endpoint validation, min ≤ max checking
Significant Figures Algorithm:
function getSignificantFigures(value: number): number {
if (value === 0) return 1;
if (!isFinite(value)) return Infinity;
// Convert to string and remove leading zeros and decimal point
const str = Math.abs(value).toString();
const normalized = str.replace(/^0+\.?0*/, '').replace('.', '');
return normalized.length;
}Pattern/Regex Validation:
if (pattern && typeof pattern === 'string') {
try {
const regex = new RegExp(pattern);
if (!regex.test(value)) {
errors.push({
path: 'value',
message: `Text value '${value}' does not match required pattern: ${pattern}`,
});
}
} catch (e) {
errors.push({
path: 'constraint.pattern',
message: `Invalid regex pattern: ${pattern}`,
});
}
}Performance Questions:
- How expensive is significant figures calculation (string manipulation)?
- How expensive is regex compilation and matching?
- Should regex patterns be compiled once and cached?
- What's the cost of interval checking (array iteration)?
- What's the cost of ISO 8601 datetime parsing?
6. Unknown Recursive Validation Performance
From Issue #14:
Minimal aggregate validation:
- DataRecord: Checks fields array exists, doesn't validate nested components
- DataArray: Checks elementCount/elementType exist, doesn't validate elementType structure
- No automatic recursive validation
However, tests show nested validation works when called manually:
it('should recursively parse deeply nested structures', () => {
const nested = {
type: 'DataRecord',
fields: [
{
name: 'innerRecord',
component: {
type: 'DataRecord',
fields: [
{
name: 'quantity',
component: {
type: 'Quantity',
uom: { code: 'Cel' },
},
},
],
},
},
],
};
const result = parseDataRecordComponent(nested); // Recursive parsing + validation
// ...
});Performance Questions:
- How deep can nesting go before performance degrades?
- What's the overhead of recursive validation calls?
- Should there be a maximum depth limit?
- How does depth affect memory usage (call stack)?
7. No Optimization History
No Baseline Data:
- Cannot track validation performance regressions when adding features
- Cannot validate optimization attempts
- Cannot compare validation strategies
- Cannot document validation overhead for users
- Cannot decide when to enable/disable validation
8. Validation System Context
GeoJSON Validator (376 lines, 61 tests, 40.95% coverage):
- ✅ 7 feature type validators (System, Deployment, Procedure, SamplingFeature, Property, Datastream, ControlStream)
- ✅ Collection validators (0% coverage but exist)
- ❌ No geometry validation (claimed but not implemented)
- ❌ No link validation (claimed but not implemented)
- ❌ No temporal validation (claimed but not implemented)
- Validation approach: Manual property checking
SWE Common Validator (669 lines, 78 tests, 73.68% coverage):
- ✅ 9 component validators (Quantity, Count, Text, Category, Time, RangeComponent, DataRecord, DataArray, ObservationResult)
- ✅ 6 constraint validators (intervals, patterns, significant figures, tokens)
- ❌ 8 claimed validators don't exist (Boolean, Vector, Matrix, DataStream, DataChoice, Geometry)
- ❌ No automatic nested validation (requires manual recursive calls)
- Validation approach: Type checking + optional deep constraint validation
SensorML Validator (339 lines, 13 tests, coverage N/A):
- ✅ 4 process type validators (PhysicalSystem, PhysicalComponent, SimpleProcess, AggregateProcess)
- ✅ Deployment validator, DerivedProperty validator
- ✅ Hierarchical validation (4 levels deep)
⚠️ Ajv configured but not used (structural validation instead)- Validation approach: Hierarchical type checking
Total: ~1,384 lines of validation code, 152 tests
Proposed Solution
1. Establish Benchmark Infrastructure (DEPENDS ON #55)
PREREQUISITE: This work item REQUIRES the benchmark infrastructure from work item #32 (Issue #55) to be completed first.
Once benchmark infrastructure exists:
- Import Tinybench framework (from Add comprehensive performance benchmarking #55)
- Use benchmark utilities (stats, reporter, regression detection)
- Integrate with CI/CD pipeline
- Use shared benchmark fixtures
2. Create Comprehensive Validation Benchmarks
Create benchmarks/validation.bench.ts (~800-1,200 lines) with:
GeoJSON Validation Benchmarks:
- All 7 feature types (System, Deployment, Procedure, SamplingFeature, Property, Datastream, ControlStream)
- Single feature validation (baseline)
- Collection validation (10, 100, 1,000 features)
- Invalid feature validation (error path)
- Property checking overhead
SWE Common Validation Benchmarks:
- Simple components (Quantity, Count, Text, Category, Time)
- With constraints vs without constraints
- Interval checking (1 interval, 5 intervals, 10 intervals)
- Pattern/regex validation (simple, complex patterns)
- Significant figures calculation (various precisions)
- Nested DataRecord validation (1 level, 2 levels, 3 levels deep)
- DataArray validation
SensorML Validation Benchmarks:
- All 4 process types (PhysicalSystem, PhysicalComponent, SimpleProcess, AggregateProcess)
- Hierarchical validation overhead (4 levels deep)
- Async vs sync overhead (measure Promise overhead)
- Deployment validation
- DerivedProperty validation (URI validation)
Validation Strategy Benchmarks:
- No validation (baseline)
- Permissive validation (collect errors)
- Strict validation (throw on error)
- Compare overhead for each strategy
Constraint Validation Benchmarks:
- Quantity with no constraints (baseline)
- Quantity with intervals
- Quantity with significant figures
- Text with pattern matching
- Text with token list
- Time with temporal intervals
Collection Scaling Benchmarks:
- Single feature (baseline)
- 10 features
- 100 features
- 1,000 features
- 10,000 features
- Test all three validators: GeoJSON, SWE, SensorML
3. Create Memory Usage Benchmarks
Create benchmarks/validation-memory.bench.ts (~200-300 lines) with:
Memory per Validation:
- Single GeoJSON feature
- Single SWE Quantity (simple)
- Single SWE DataRecord (nested, 3 levels)
- Single SensorML PhysicalSystem
Memory Scaling:
- 100 features: total memory, average per feature
- 1,000 features: total memory, GC pressure
- 10,000 features: total memory, heap usage
Error Accumulation Memory:
- Validation with 0 errors (baseline)
- Validation with 10 errors
- Validation with 100 errors
- Validation with 1,000 errors (collection)
4. Analyze Benchmark Results
Create benchmarks/validation-analysis.ts (~150-250 lines) with:
Performance Comparison:
- Validator comparison: GeoJSON vs SWE vs SensorML (fastest vs slowest)
- Strategy comparison: No validation vs permissive vs strict
- Constraint comparison: Simple validation vs constraint validation
- Collection scaling: throughput vs count
Identify Bottlenecks:
- Operations taking >20% of validation time
- Operations with >1ms latency per feature
- Operations with sublinear scaling
- Memory-intensive operations
Generate Recommendations:
- When to enable/disable validation (dev vs prod)
- Which validation strategy to use (permissive vs strict)
- When to enable constraint validation (always vs selective)
- Maximum practical collection sizes
- Optimal nesting depth limits
5. Implement Targeted Optimizations (If Needed)
ONLY if benchmarks identify issues:
Optimization Candidates (benchmark-driven):
- If property checking slow: Cache type guards
- If constraint validation expensive: Lazy constraint evaluation
- If regex slow: Compile and cache regex patterns
- If collection validation slow: Parallel validation (Web Workers)
- If hierarchical validation expensive: Flatten validation hierarchy
- If async overhead significant: Offer synchronous validators
Optimization Guidelines:
- Only optimize proven bottlenecks (>10% overhead or <1,000 validations/sec)
- Measure before and after (verify improvement)
- Document tradeoffs (code complexity vs speed gain)
- Add regression tests (ensure optimization doesn't break functionality)
6. Document Performance Characteristics
Update README.md with new "Validation Performance" section (~150-250 lines):
Performance Overview:
- Typical validation overhead: X% (by validator)
- Typical throughput: X validations/sec (by validator)
- Memory usage: X KB per validation (by validator)
Validator Performance Comparison:
GeoJSON: ~XX,XXX validations/sec (simplest, fastest)
SWE: ~XX,XXX validations/sec (YY% slower due to constraint validation)
SensorML: ~XX,XXX validations/sec (ZZ% slower due to hierarchical validation)
Validation Strategy Overhead:
No validation: 0% overhead (baseline)
Permissive: XX% overhead (collect errors)
Strict: XX% overhead (throw on error)
Constraint Validation Overhead:
No constraints: XX,XXX ops/sec (baseline)
With intervals: XX,XXX ops/sec (YY% overhead)
With patterns: XX,XXX ops/sec (ZZ% overhead)
With sig figures: XX,XXX ops/sec (AA% overhead)
Best Practices:
- Development: Enable validation with
options.validate = trueto catch errors early - Production: Disable validation for trusted data to maximize performance
- Constraint validation: Enable only when data quality enforcement is required
- Collections: Consider chunked validation for >X,XXX features
- Nesting: Limit SWE DataRecord nesting to X levels for optimal performance
Performance Targets:
- Good: <5% validation overhead (<0.05ms per feature)
- Acceptable: <10% validation overhead (<0.1ms per feature)
- Poor: >20% validation overhead (>0.2ms per feature) - needs optimization
7. Integrate with CI/CD
Add to .github/workflows/benchmarks.yml (coordinate with #55):
Benchmark Execution:
- name: Run validation benchmarks
run: npm run bench:validation
- name: Run validation memory benchmarks
run: npm run bench:validation:memoryPerformance Regression Detection:
- Compare against baseline (main branch)
- Alert if any benchmark >10% slower
- Alert if memory usage >20% higher
PR Comments:
- Post benchmark results to PRs
- Show comparison with base branch
- Highlight regressions and improvements
Acceptance Criteria
Benchmark Infrastructure (4 items)
- ✅ Benchmark infrastructure from Add comprehensive performance benchmarking #55 is complete and available
- Created
benchmarks/validation.bench.tswith comprehensive validation benchmarks (~800-1,200 lines) - Created
benchmarks/validation-memory.bench.tswith memory usage benchmarks (~200-300 lines) - Created
benchmarks/validation-analysis.tswith results analysis (~150-250 lines)
GeoJSON Validation Benchmarks (5 items)
- Benchmarked all 7 feature types (System, Deployment, Procedure, SamplingFeature, Property, Datastream, ControlStream)
- Benchmarked single feature vs collection validation (10, 100, 1,000 features)
- Benchmarked valid vs invalid feature validation (error paths)
- Documented throughput for each feature type (validations/sec)
- Identified fastest and slowest feature types
SWE Common Validation Benchmarks (7 items)
- Benchmarked simple components (Quantity, Count, Text, Category, Time)
- Benchmarked validation with vs without constraints
- Benchmarked interval checking (1, 5, 10 intervals)
- Benchmarked pattern/regex validation (simple, complex)
- Benchmarked significant figures calculation
- Benchmarked nested DataRecord validation (1, 2, 3 levels)
- Documented constraint validation overhead (% and absolute time)
SensorML Validation Benchmarks (5 items)
- Benchmarked all 4 process types (PhysicalSystem, PhysicalComponent, SimpleProcess, AggregateProcess)
- Benchmarked hierarchical validation overhead (4 levels deep)
- Measured async overhead (Promise creation/resolution)
- Benchmarked Deployment and DerivedProperty validation
- Documented hierarchical validation cost
Validation Strategy Benchmarks (4 items)
- Benchmarked no validation (baseline)
- Benchmarked permissive validation (collect errors)
- Benchmarked strict validation (throw on error)
- Documented strategy overhead (% and absolute time)
Collection Scaling Benchmarks (5 items)
- Benchmarked collection validation: 10, 100, 1,000, 10,000 features
- Tested all three validators at each scale
- Documented scaling characteristics: linear/sublinear/superlinear
- Identified performance inflection points (when it becomes slow)
- Documented maximum practical collection size
Constraint Validation Benchmarks (6 items)
- Benchmarked Quantity: no constraints, intervals, significant figures
- Benchmarked Text: no constraints, pattern, token list
- Benchmarked Time: no constraints, temporal intervals
- Benchmarked Count: no constraints, integer intervals
- Benchmarked Category: no constraints, token list
- Documented constraint validation overhead per type
Memory Benchmarks (5 items)
- Measured memory per validation (GeoJSON, SWE simple, SWE nested, SensorML)
- Measured memory scaling (100, 1,000, 10,000 validations)
- Measured error accumulation memory (0, 10, 100, 1,000 errors)
- Measured GC pressure for large collections
- Documented memory recommendations
Performance Analysis (5 items)
- Analyzed all benchmark results
- Identified bottlenecks (operations >20% of validation time or >1ms per feature)
- Generated performance comparison report (validator, strategy, constraint)
- Created recommendations document (when to enable/disable, strategy choice)
- Documented current performance characteristics
Optimization (if needed) (4 items)
- Identified optimization opportunities from benchmark data
- Implemented targeted optimizations ONLY for proven bottlenecks
- Re-benchmarked after optimization (verified improvement)
- Added regression tests to prevent optimization from breaking functionality
Documentation (7 items)
- Added "Validation Performance" section to README.md (~150-250 lines)
- Documented typical validation overhead (% by validator)
- Documented typical throughput (validations/sec by validator)
- Documented memory usage (KB per validation by validator)
- Documented validator performance comparison (GeoJSON vs SWE vs SensorML)
- Documented validation strategy overhead (no validation vs permissive vs strict)
- Documented best practices (when to enable, strategy choice, collection sizes)
CI/CD Integration (4 items)
- Added validation benchmarks to
.github/workflows/benchmarks.yml - Configured performance regression detection (>10% slower = fail)
- Added PR comment with benchmark results and comparison
- Verified benchmarks run on every PR and main branch commit
Implementation Notes
Files to Create
Benchmark Files (~1,150-1,750 lines total):
-
benchmarks/validation.bench.ts(~800-1,200 lines)- GeoJSON validation benchmarks (7 feature types × 3 scenarios)
- SWE Common validation benchmarks (5 component types × constraint variations)
- SensorML validation benchmarks (4 process types × hierarchical levels)
- Validation strategy benchmarks (3 strategies)
- Constraint validation benchmarks (6 constraint types)
- Collection scaling benchmarks (5 sizes × 3 validators)
-
benchmarks/validation-memory.bench.ts(~200-300 lines)- Memory per validation (4 validator types)
- Memory scaling (3 sizes)
- Error accumulation memory (4 error counts)
- GC pressure analysis
-
benchmarks/validation-analysis.ts(~150-250 lines)- Performance comparison logic
- Bottleneck identification
- Recommendation generation
- Results formatting
Files to Modify
README.md (~150-250 lines added):
- New "Validation Performance" section with:
- Performance overview
- Validator comparison table
- Strategy overhead table
- Constraint overhead table
- Best practices
- Performance targets
package.json (~10 lines):
{
"scripts": {
"bench:validation": "tsx benchmarks/validation.bench.ts",
"bench:validation:memory": "tsx benchmarks/validation-memory.bench.ts",
"bench:validation:analyze": "tsx benchmarks/validation-analysis.ts"
}
}.github/workflows/benchmarks.yml (coordinate with #55):
- Add validation benchmark execution
- Add memory benchmark execution
- Add regression detection
- Add PR comment generation
Files to Reference
Validator Source Files (for accurate benchmarking):
src/ogc-api/csapi/validation/geojson-validator.ts(376 lines, 61 tests, 40.95% coverage)src/ogc-api/csapi/validation/swe-validator.ts(357 lines, 50 tests, 73.68% coverage)src/ogc-api/csapi/validation/constraint-validator.ts(312 lines, 28 tests)src/ogc-api/csapi/validation/sensorml-validator.ts(339 lines, 13 tests)
Test Fixtures (reuse existing test data):
src/ogc-api/csapi/validation/geojson-validator.spec.ts(has sample GeoJSON features)src/ogc-api/csapi/validation/swe-validator.spec.ts(has sample SWE components)src/ogc-api/csapi/validation/constraint-validator.spec.ts(has sample constraints)src/ogc-api/csapi/validation/sensorml-validator.spec.ts(has sample SensorML processes)
Technology Stack
Benchmarking Framework (from #55):
- Tinybench (statistical benchmarking)
- Node.js
process.memoryUsage()for memory tracking - Node.js
performance.now()for timing
Benchmark Priorities:
- High: Validation strategy overhead, GeoJSON validation, constraint validation cost
- Medium: SWE component validation, collection scaling, SensorML validation
- Low: Async overhead, extreme nesting (>3 levels), extreme scaling (>10,000)
Performance Targets (Hypothetical - Measure to Confirm)
Validation Overhead:
- Good: <5% overhead (<0.05ms per feature)
- Acceptable: <10% overhead (<0.1ms per feature)
- Poor: >20% overhead (>0.2ms per feature)
Throughput:
- Good: >20,000 validations/sec (<0.05ms per validation)
- Acceptable: >10,000 validations/sec (<0.1ms per validation)
- Poor: <5,000 validations/sec (>0.2ms per validation)
Constraint Validation Overhead:
- Good: <10% overhead vs no constraints
- Acceptable: <25% overhead vs no constraints
- Poor: >50% overhead vs no constraints
Memory:
- Good: <1 KB per validation
- Acceptable: <5 KB per validation
- Poor: >10 KB per validation
Optimization Guidelines
ONLY optimize if benchmarks prove need:
- Validation overhead >20%
- Throughput <5,000 validations/sec
- Memory >10 KB per validation
- Constraint validation overhead >50%
Optimization Approach:
- Identify bottleneck from benchmark data
- Profile with Chrome DevTools or Node.js profiler
- Implement targeted optimization
- Re-benchmark to verify improvement (>20% faster)
- Add regression tests
- Document tradeoffs
Common Optimizations:
- Cache type guards and compiled regex patterns
- Use Set instead of Array for token list checking
- Early return on invalid data (strict mode)
- Parallel validation for large collections
- Synchronous validators (avoid Promise overhead)
- Lazy constraint evaluation (only when needed)
Dependencies
CRITICAL DEPENDENCY:
- REQUIRES work item Update parser test count from 108 to 166 tests #32 (Issue Add comprehensive performance benchmarking #55) - Comprehensive performance benchmarking infrastructure
- Cannot start until benchmark framework, utilities, and CI/CD integration are complete
Why This Dependency Matters:
- Reuses Tinybench setup from Add comprehensive performance benchmarking #55
- Uses shared benchmark utilities (stats, reporter, regression detection)
- Integrates with established CI/CD pipeline
- Follows consistent benchmarking patterns
Testing Requirements
Benchmark Validation:
- All benchmarks must run without errors
- All benchmarks must complete in <60 seconds total
- All benchmarks must produce consistent results (variance <10%)
- Memory benchmarks must not cause out-of-memory errors
Regression Tests:
- Add tests to verify optimizations don't break functionality
- Rerun all 152 existing validation tests after any optimization
- Verify coverage remains >70% (current average)
Caveats
Performance is Environment-Dependent:
- Benchmarks run on specific hardware (document specs)
- Results vary by Node.js version, CPU, memory
- Production performance may differ from benchmark environment
- Document benchmark environment in README
Optimization Tradeoffs:
- Faster code may be more complex
- Cached regex patterns increase memory usage
- Parallel validation adds API complexity
- Synchronous validators lose flexibility
- Document all tradeoffs in optimization PRs
Validation Performance Context:
- GeoJSON likely fastest (simple property checking)
- SWE likely slowest (constraint validation + nesting)
- SensorML medium (hierarchical validation)
- Validation overhead typically <10% of total parse time
- Network latency typically dominates validation overhead
Priority Justification
Priority: Low
Why Low Priority:
- No Known Performance Issues: No user complaints about slow validation
- Functional Excellence: Validators work correctly with comprehensive tests (152 tests total)
- Not Critical Path: Validation is optional (
options.validate), defaults to disabled - Depends on Infrastructure: Cannot start until Add comprehensive performance benchmarking #55 (benchmark infrastructure) is complete
- Educational Value: Primarily for documentation and validation strategy guidance
Why Still Important:
- Strategy Guidance: Users need to know when to enable/disable validation (dev vs prod)
- Regression Prevention: Establish baseline to detect future validation performance degradation
- Optimization Guidance: Data-driven decisions about what (if anything) to optimize
- Constraint Validation: Understand cost of deep constraint validation
- Scaling Guidance: Help users estimate validation overhead for large collections
Impact if Not Addressed:
⚠️ Unknown validation overhead (users can't estimate cost)⚠️ No baseline for regression detection (can't track performance over time)⚠️ No optimization guidance (can't prioritize improvements)⚠️ Unknown constraint validation cost (users don't know if it's worth enabling)- ✅ Validators still work correctly (functional quality not affected)
- ✅ No known performance bottlenecks (no urgency)
Effort Estimate: 10-15 hours (after #55 complete)
- Benchmark creation: 6-9 hours
- Memory analysis: 1-2 hours
- Results analysis: 1-2 hours
- Documentation: 1-2 hours
- CI/CD integration: 0.5-1 hour (reuse from Add comprehensive performance benchmarking #55)
- Optimization (optional, if needed): 2-4 hours
When to Prioritize Higher:
- If users report slow validation
- If adding real-time validation features (need performance baseline)
- If optimizing for embedded/mobile (need overhead data)
- If validation becomes mandatory (need to minimize overhead)