Skip to content

Commit

Permalink
Added new aggregate rules (first, nth, last) (#82)
Browse files Browse the repository at this point in the history
This commit introduces several new aggregate rules (first, nth, last) in
JSON, PHP, YML schema files, and creates corresponding classes. These
additions enhance rule validation, allowing further specification of
conditions based on first, nth, and last values in a column. The changes
include a range of rule adjustments such as number expectations, string
expectations, and "not expected" conditions, broadening the flexibility
and functionality of the schema validation system.
  • Loading branch information
SmetDenis committed Mar 24, 2024
1 parent 3f0225c commit 409156d
Show file tree
Hide file tree
Showing 26 changed files with 885 additions and 15 deletions.
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Stable Version](https://poser.pugx.org/jbzoo/csv-blueprint/version)](https://packagist.org/packages/jbzoo/csv-blueprint/) [![Total Downloads](https://poser.pugx.org/jbzoo/csv-blueprint/downloads)](https://packagist.org/packages/jbzoo/csv-blueprint/stats) [![Docker Pulls](https://img.shields.io/docker/pulls/jbzoo/csv-blueprint.svg)](https://hub.docker.com/r/jbzoo/csv-blueprint/tags) [![GitHub License](https://img.shields.io/github/license/jbzoo/csv-blueprint)](https://github.com/JBZoo/Csv-Blueprint/blob/master/LICENSE)

<!-- rules-counter -->
[![Static Badge](https://img.shields.io/badge/Rules-103-green?label=Total%20Number%20of%20Rules&labelColor=darkgreen&color=gray)](schema-examples/full.yml) [![Static Badge](https://img.shields.io/badge/Rules-55-green?label=Cell%20Value&labelColor=blue&color=gray)](src/Rules/Cell) [![Static Badge](https://img.shields.io/badge/Rules-45-green?label=Aggregate%20Column&labelColor=blue&color=gray)](src/Rules/Aggregate) [![Static Badge](https://img.shields.io/badge/Rules-3-green?label=Extra%20Checks&labelColor=blue&color=gray)](#extra-checks) [![Static Badge](https://img.shields.io/badge/Rules-329-green?label=Plan%20to%20add&labelColor=gray&color=gray)](tests/schemas/todo.yml)
[![Static Badge](https://img.shields.io/badge/Rules-119-green?label=Total%20Number%20of%20Rules&labelColor=darkgreen&color=gray)](schema-examples/full.yml) [![Static Badge](https://img.shields.io/badge/Rules-55-green?label=Cell%20Value&labelColor=blue&color=gray)](src/Rules/Cell) [![Static Badge](https://img.shields.io/badge/Rules-59-green?label=Aggregate%20Column&labelColor=blue&color=gray)](src/Rules/Aggregate) [![Static Badge](https://img.shields.io/badge/Rules-5-green?label=Extra%20Checks&labelColor=blue&color=gray)](#extra-checks) [![Static Badge](https://img.shields.io/badge/Rules-317-green?label=Plan%20to%20add&labelColor=gray&color=gray)](tests/schemas/todo.yml)
<!-- /rules-counter -->

## Introduction
Expand Down Expand Up @@ -265,6 +265,26 @@ columns:
aggregate_rules:
is_unique: true # All values in the column are unique.

# First number in the column. Expected value is float or integer.
first_num: 5
first_num_not: 4.123
first_num_min: -1
first_num_max: 2e4
first: Expected # First value in the column. Will be compared as strings.
first_not: 'Not Expected' # Not allowed as the first value in the column. Will be compared as strings.

# N-th in the column.
nth: [ 2, Expected ] # Nth value in the column. Will be compared as strings.
nth_not: [ 2, 'Not expected' ] # Not allowed as the N-th value in the column. Will be compared as strings.

# Last number in the column. Expected value is float or integer.
last_num: 5
last_num_not: 4.123
last_num_min: -1
last_num_max: 2e4
last: Expected # Last value in the column. Will be compared as strings.
last_not: 'Not Expected' # Not allowed as the last value in the column. Will be compared as strings.

# Sum of the numbers in the column. Example: [1, 2, 3] => 6.
sum: 5.123
sum_not: 4.123
Expand Down Expand Up @@ -362,7 +382,9 @@ Behind the scenes to what is outlined in the yml above, there are additional che

* With `filename_pattern` rule, you can check if the file name matches the pattern.
* Property `name` is not defined in a column. If `csv.header: true`.
* Schema contains an unknown column `name` that is not found in the CSV file. If `csv.header: true`
* Check that each row matches the number of columns.
* If `csv.header: true`. Schema contains an unknown column `name` that is not found in the CSV file.
* If `csv.header: false`. Compare the number of columns in the schema and the CSV file.

<!-- /extra-rules -->

Expand All @@ -381,11 +403,11 @@ You can find launch examples in the [workflow demo](https://github.com/JBZoo/Csv
with:
# Path(s) to validate. You can specify path in which CSV files will be searched. Feel free to use glob pattrens. Usage examples: /full/path/file.csv, p/file.csv, p/*.csv, p/**/*.csv, p/**/name-*.csv, **/*.csv, etc.
# Required: true
csv: ./tests/**/*.csv
csv: './tests/**/*.csv'

# Schema filepath. It can be a YAML, JSON or PHP. See examples on GitHub.
# Required: true
schema: ./tests/schema.yml
schema: './tests/**/*.yml'

# Report format. Available options: text, table, github, gitlab, teamcity, junit.
# Default value: table
Expand Down
17 changes: 17 additions & 0 deletions schema-examples/full.json
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,23 @@
"aggregate_rules" : {
"is_unique" : true,

"first_num" : 5,
"first_num_not" : 4.123,
"first_num_min" : -1,
"first_num_max" : 20000,
"first" : "Expected",
"first_not" : "Not Expected",

"nth" : [2, "Expected"],
"nth_not" : [2, "Not expected"],

"last_num" : 5,
"last_num_not" : 4.123,
"last_num_min" : -1,
"last_num_max" : 20000,
"last" : "Expected",
"last_not" : "Not Expected",

"sum" : 5.123,
"sum_not" : 4.123,
"sum_min" : 1.123,
Expand Down
17 changes: 17 additions & 0 deletions schema-examples/full.php
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,23 @@
'aggregate_rules' => [
'is_unique' => true,

'first_num' => 5,
'first_num_not' => 4.123,
'first_num_min' => -1,
'first_num_max' => 20000.0,
'first' => 'Expected',
'first_not' => 'Not Expected',

'nth' => [2, 'Expected'],
'nth_not' => [2, 'Not expected'],

'last_num' => 5,
'last_num_not' => 4.123,
'last_num_min' => -1,
'last_num_max' => 20000.0,
'last' => 'Expected',
'last_not' => 'Not Expected',

'sum' => 5.123,
'sum_not' => 4.123,
'sum_min' => 1.123,
Expand Down
20 changes: 20 additions & 0 deletions schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,26 @@ columns:
aggregate_rules:
is_unique: true # All values in the column are unique.

# First number in the column. Expected value is float or integer.
first_num: 5
first_num_not: 4.123
first_num_min: -1
first_num_max: 2e4
first: Expected # First value in the column. Will be compared as strings.
first_not: 'Not Expected' # Not allowed as the first value in the column. Will be compared as strings.

# N-th in the column.
nth: [ 2, Expected ] # Nth value in the column. Will be compared as strings.
nth_not: [ 2, 'Not expected' ] # Not allowed as the N-th value in the column. Will be compared as strings.

# Last number in the column. Expected value is float or integer.
last_num: 5
last_num_not: 4.123
last_num_min: -1
last_num_max: 2e4
last: Expected # Last value in the column. Will be compared as strings.
last_not: 'Not Expected' # Not allowed as the last value in the column. Will be compared as strings.

# Sum of the numbers in the column. Example: [1, 2, 3] => 6.
sum: 5.123
sum_not: 4.123
Expand Down
28 changes: 28 additions & 0 deletions schema-examples/full_clean.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,46 +94,74 @@ columns:

aggregate_rules:
is_unique: true

first_num: 5
first_num_not: 4.123
first_num_min: -1
first_num_max: 20000.0
first: Expected
first_not: 'Not Expected'

nth: [ 2, Expected ]
nth_not: [ 2, 'Not expected' ]

last_num: 5
last_num_not: 4.123
last_num_min: -1
last_num_max: 20000.0
last: Expected
last_not: 'Not Expected'

sum: 5.123
sum_not: 4.123
sum_min: 1.123
sum_max: 10.123

average: 5.123
average_not: 4.123
average_min: 1.123
average_max: 10.123

count: 5
count_not: 4
count_min: 1
count_max: 10

count_empty: 5
count_empty_not: 4
count_empty_min: 1
count_empty_max: 10

count_not_empty: 5
count_not_empty_not: 4
count_not_empty_min: 1
count_not_empty_max: 10

median: 5.123
median_not: 4.123
median_min: 1.123
median_max: 10.123

population_variance: 5.123
population_variance_not: 4.123
population_variance_min: 1.123
population_variance_max: 10.123

sample_variance: 5.123
sample_variance_not: 4.123
sample_variance_min: 1.123
sample_variance_max: 10.123

stddev: 5.123
stddev_not: 4.123
stddev_min: 1.123
stddev_max: 10.123

stddev_pop: 5.123
stddev_pop_not: 4.123
stddev_pop_min: 1.123
stddev_pop_max: 10.123

coef_of_var: 5.123
coef_of_var_not: 4.123
coef_of_var_min: 1.123
Expand Down
5 changes: 1 addition & 4 deletions src/Rules/AbstarctRule.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ abstract class AbstarctRule
{
public const INPUT_TYPE = self::INPUT_TYPE_UNDEF;

public const INPUT_TYPE_UNDEF = self::INPUT_TYPE_STRINGS;
public const INPUT_TYPE_BOOL = 0;
public const INPUT_TYPE_INTS = 1;
public const INPUT_TYPE_FLOATS = 2;
public const INPUT_TYPE_STRINGS = 3;
public const INPUT_TYPE_UNDEF = 4;

// Modes
public const DEFAULT = 'default';
Expand Down Expand Up @@ -213,9 +213,6 @@ protected function getOptionAsFloat(): float
return (float)$this->options;
}

/**
* @return string[]
*/
protected function getOptionAsArray(): array
{
// TODO: Replace to warning message
Expand Down
44 changes: 44 additions & 0 deletions src/Rules/Aggregate/ComboFirstNum.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?php

/**
* JBZoo Toolbox - Csv-Blueprint.
*
* This file is part of the JBZoo Toolbox project.
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*
* @license MIT
* @copyright Copyright (C) JBZoo.com, All rights reserved.
* @see https://github.com/JBZoo/Csv-Blueprint
*/

declare(strict_types=1);

namespace JBZoo\CsvBlueprint\Rules\Aggregate;

use JBZoo\CsvBlueprint\Rules\AbstarctRule;

final class ComboFirstNum extends AbstarctAggregateRuleCombo
{
public const INPUT_TYPE = AbstarctRule::INPUT_TYPE_FLOATS;

protected const NAME = 'first value';

protected const HELP_TOP = ['First number in the column. Expected value is float or integer.'];

protected const HELP_OPTIONS = [
self::EQ => ['5', ''],
self::NOT => ['4.123', ''],
self::MIN => ['-1', ''],
self::MAX => ['2e4', ''],
];

protected function getActualAggregate(array $colValues): ?float
{
if (!isset($colValues[0])) {
return null;
}

return (float)$colValues[0];
}
}
44 changes: 44 additions & 0 deletions src/Rules/Aggregate/ComboLastNum.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?php

/**
* JBZoo Toolbox - Csv-Blueprint.
*
* This file is part of the JBZoo Toolbox project.
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*
* @license MIT
* @copyright Copyright (C) JBZoo.com, All rights reserved.
* @see https://github.com/JBZoo/Csv-Blueprint
*/

declare(strict_types=1);

namespace JBZoo\CsvBlueprint\Rules\Aggregate;

use JBZoo\CsvBlueprint\Rules\AbstarctRule;

final class ComboLastNum extends AbstarctAggregateRuleCombo
{
public const INPUT_TYPE = AbstarctRule::INPUT_TYPE_FLOATS;

protected const NAME = 'last value';

protected const HELP_TOP = ['Last number in the column. Expected value is float or integer.'];

protected const HELP_OPTIONS = [
self::EQ => ['5', ''],
self::NOT => ['4.123', ''],
self::MIN => ['-1', ''],
self::MAX => ['2e4', ''],
];

protected function getActualAggregate(array $colValues): ?float
{
if (\count($colValues) === 0) {
return null;
}

return (float)\end($colValues);
}
}
43 changes: 43 additions & 0 deletions src/Rules/Aggregate/First.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
<?php

/**
* JBZoo Toolbox - Csv-Blueprint.
*
* This file is part of the JBZoo Toolbox project.
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*
* @license MIT
* @copyright Copyright (C) JBZoo.com, All rights reserved.
* @see https://github.com/JBZoo/Csv-Blueprint
*/

declare(strict_types=1);

namespace JBZoo\CsvBlueprint\Rules\Aggregate;

use JBZoo\CsvBlueprint\Rules\AbstarctRule;

final class First extends AbstarctAggregateRule
{
public const INPUT_TYPE = AbstarctRule::INPUT_TYPE_STRINGS;

protected const HELP_OPTIONS = [
self::DEFAULT => ['Expected', 'First value in the column. Will be compared as strings.'],
];

public function validateRule(array &$columnValues): ?string
{
if (\count($columnValues) === 0) {
return null;
}

$first = \reset($columnValues);
if ($first !== $this->getOptionAsString()) {
return "The first value in the column is \"<c>{$first}</c>\", " .
"which is not equal than the expected \"<green>{$this->getOptionAsString()}</green>\"";
}

return null;
}
}

0 comments on commit 409156d

Please sign in to comment.