Skip to content

Commit

Permalink
Implemented abstract logic for combined and aggregated rules (#35)
Browse files Browse the repository at this point in the history
- Had to do a lot of refactoring for code consistency.
 - Tested on the basis of `ComboSum` (`sum_*`).
  • Loading branch information
SmetDenis committed Mar 17, 2024
1 parent 8640b7e commit 41ae59a
Show file tree
Hide file tree
Showing 93 changed files with 729 additions and 428 deletions.
45 changes: 31 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,11 @@ columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Optional. Description of the column. Not used in the validation process.

# Optional. You can use this section to validate each value in the column.
####################################################################################################################
# Data validation for each(!) value in the column.
# Of course, this can greatly affect the speed of checking.
# It depends on the number of checks and CSV file size.
# TODO: There are several ways to optimize this process, but the author needs time to test it carefully.
rules:
# Important notes:
# 1. All rules except "not_empty" ignored for empty strings (length 0).
Expand Down Expand Up @@ -139,7 +143,7 @@ columns:
length_max: 10

# Basic string checks
is_trimed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_trimmed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_lowercase: true # String is only lower-case. Example: "hello world".
is_uppercase: true # String is only upper-case. Example: "HELLO WORLD".
is_capitalize: true # String is only capitalized. Example: "Hello World".
Expand Down Expand Up @@ -204,11 +208,22 @@ columns:
is_cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", "".
is_usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY".

# Optional. You can use this section to validate the whole column
# Be careful, this can reduce performance noticeably depending on the combination of rules.

####################################################################################################################
# Data validation for the entire(!) column using different data aggregation methods.
# Depending on the file size and the chosen aggregation method - this can use a lot of RAM time.
# Be careful with files that are 2-3 or more times larger than the available memory.
# TODO: There are several ways to optimize this process, but the author needs time to test it carefully.
aggregate_rules:
is_unique: true # All values in the column are unique.

# Assumes that all values in the column are int/float only.
# An empty string is converted to null.
sum: 5
sum_not: 4
sum_min: 1
sum_max: 10

- name: "another_column"

- name: "third_column"
Expand Down Expand Up @@ -376,13 +391,12 @@ Schema: ./tests/schemas/demo_invalid.yml
Found CSV files: 3
(1/3) Invalid file: ./tests/fixtures/batch/demo-1.csv
+------+------------------+--------------+-------------- demo-1.csv -------------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+--------------+---------------------------------------------------------------------------------+
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 1, total: 2 |
| 3 | 2:Float | num_max | The number of the value "74605.944", which is greater than the expected "74605" |
| 3 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+------+------------------+--------------+-------------- demo-1.csv -------------------------------------------------------+
+------+------------------+--------------+--------- demo-1.csv --------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+--------------+-----------------------------------------------------------------------+
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 1, total: 2 |
| 3 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+------+------------------+--------------+--------- demo-1.csv --------------------------------------------------+
(2/3) Invalid file: ./tests/fixtures/batch/demo-2.csv
+------+------------+------------+-------------------------- demo-2.csv ------------------------------------------------------------+
Expand All @@ -406,7 +420,7 @@ Found CSV files: 3
+------+-----------+------------------+---------------------- demo-3.csv ------------------------------------------------------------+
Found 9 issues in 3 out of 3 CSV files.
Found 8 issues in 3 out of 3 CSV files.
```

Expand Down Expand Up @@ -471,13 +485,16 @@ It's random ideas and plans. No orderings and deadlines. <u>But batch processing
* Gitlab and JUnit reports must be as one structure. It's not so easy to implement. But it's a good idea.
* Merge reports from multiple CSV files into one report. It's useful when you have a lot of files and you want to see all errors in one place. Especially for GitLab and JUnit reports.

**Agregate rule**
**Misc**
* Use it as PHP SDK. Examples in Readme.
* Warnings about deprecated options and features.
* Warnings about invalid schema files.
* Move const:HELP to PHP annotations. Canonic way to describe the command.
* S3 Storage support. Validate files in the S3 bucket?
* More examples and documentation.


PS. [There is a file](tests/schemas/example_full.yml) with my ideas and imagination. It's not valid schema file, just a draft.
PS. [There is a file](tests/schemas/todo.yml) with my ideas and imagination. It's not valid schema file, just a draft.
I'm not sure if I will implement all of them. But I will try to do my best.


Expand Down
8 changes: 6 additions & 2 deletions schema-examples/full.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"length_not" : 4,
"length_min" : 1,
"length_max" : 10,
"is_trimed" : true,
"is_trimmed" : true,
"is_lowercase" : true,
"is_uppercase" : true,
"is_capitalize" : true,
Expand Down Expand Up @@ -64,7 +64,11 @@
"is_usa_market_name" : true
},
"aggregate_rules" : {
"is_unique" : true
"is_unique" : true,
"sum" : 5,
"sum_not" : 4,
"sum_min" : 1,
"sum_max" : 10
}
},
{"name" : "another_column"},
Expand Down
6 changes: 5 additions & 1 deletion schema-examples/full.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
'length_not' => 4,
'length_min' => 1,
'length_max' => 10,
'is_trimed' => true,
'is_trimmed' => true,
'is_lowercase' => true,
'is_uppercase' => true,
'is_capitalize' => true,
Expand Down Expand Up @@ -81,6 +81,10 @@
],
'aggregate_rules' => [
'is_unique' => true,
'sum' => 5,
'sum_not' => 4,
'sum_min' => 1,
'sum_max' => 10,
],
],
['name' => 'another_column'],
Expand Down
23 changes: 19 additions & 4 deletions schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,11 @@ columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Optional. Description of the column. Not used in the validation process.

# Optional. You can use this section to validate each value in the column.
####################################################################################################################
# Data validation for each(!) value in the column.
# Of course, this can greatly affect the speed of checking.
# It depends on the number of checks and CSV file size.
# TODO: There are several ways to optimize this process, but the author needs time to test it carefully.
rules:
# Important notes:
# 1. All rules except "not_empty" ignored for empty strings (length 0).
Expand Down Expand Up @@ -64,7 +68,7 @@ columns:
length_max: 10

# Basic string checks
is_trimed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_trimmed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_lowercase: true # String is only lower-case. Example: "hello world".
is_uppercase: true # String is only upper-case. Example: "HELLO WORLD".
is_capitalize: true # String is only capitalized. Example: "Hello World".
Expand Down Expand Up @@ -129,11 +133,22 @@ columns:
is_cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", "".
is_usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY".

# Optional. You can use this section to validate the whole column
# Be careful, this can reduce performance noticeably depending on the combination of rules.

####################################################################################################################
# Data validation for the entire(!) column using different data aggregation methods.
# Depending on the file size and the chosen aggregation method - this can use a lot of RAM time.
# Be careful with files that are 2-3 or more times larger than the available memory.
# TODO: There are several ways to optimize this process, but the author needs time to test it carefully.
aggregate_rules:
is_unique: true # All values in the column are unique.

# Assumes that all values in the column are int/float only.
# An empty string is converted to null.
sum: 5
sum_not: 4
sum_min: 1
sum_max: 10

- name: "another_column"

- name: "third_column"
Expand Down
28 changes: 23 additions & 5 deletions src/Rules/AbstarctRule.php
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ public function __construct(

public function validate(array|string $cellValue, int $line = ColumnValidator::FALLBACK_LINE): ?Error
{
// TODO: Extract to abstract boolean cell/agregate rule
if ($this->isEnabled($cellValue) === false) {
return null;
}
Expand All @@ -86,7 +87,7 @@ public function getHelp(): string
$leftPad = \str_repeat(' ', self::HELP_LEFT_PAD);

$renderLine = function (array|string $row, string $mode) use ($leftPad): string {
$ymlRuleCode = $this instanceof AbstractCombo ? $this->getComboRuleCode($mode) : $this->getRuleCode();
$ymlRuleCode = \str_replace('ag:', '', $this->getRuleCode($mode));
$baseKeyVal = "{$leftPad}{$ymlRuleCode}: {$row[0]}";

if (isset($row[1]) && $row[1] !== '') {
Expand All @@ -106,7 +107,7 @@ public function getHelp(): string
);
}

if ($this instanceof AbstractCombo) {
if ($this instanceof AbstarctRuleCombo) {
return \implode("\n", [
$topComment,
$renderLine(static::HELP_OPTIONS[self::EQ], self::EQ),
Expand Down Expand Up @@ -155,7 +156,7 @@ protected function getOptionAsInt(): int
{
// TODO: Replace to warning message
if ($this->options === '' || !\is_numeric($this->options)) {
$options = \is_array($this->options) ? \implode(', ', $this->options) : $this->options;
$options = \is_array($this->options) ? '[' . \implode(', ', $this->options) . ']' : $this->options;
throw new Exception(
"Invalid option \"{$options}\" for the \"{$this->getRuleCode()}\" rule. " .
'It should be integer.',
Expand All @@ -165,6 +166,20 @@ protected function getOptionAsInt(): int
return (int)$this->options;
}

protected function getOptionAsFloat(): float
{
// TODO: Replace to warning message
if ($this->options === '' || !\is_numeric($this->options)) {
$options = \is_array($this->options) ? '[' . \implode(', ', $this->options) . ']' : $this->options;
throw new Exception(
"Invalid option \"{$options}\" for the \"{$this->getRuleCode()}\" rule. " .
'It should be integer/float.',
);
}

return (float)$this->options;
}

/**
* @return string[]
*/
Expand Down Expand Up @@ -208,8 +223,11 @@ protected function isEnabled(array|string $cellValue): bool
return $cellValue !== '';
}

protected function getRuleCode(): string
protected function getRuleCode(?string $mode = null): string
{
return Utils::camelToKebabCase((new \ReflectionClass($this))->getShortName());
$mode ??= $this->mode;
$postfix = $mode !== self::EQ && $mode !== self::DEFAULT ? "_{$mode}" : '';

return Utils::camelToKebabCase((new \ReflectionClass($this))->getShortName()) . $postfix;
}
}
115 changes: 115 additions & 0 deletions src/Rules/AbstarctRuleCombo.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
<?php

/**
* JBZoo Toolbox - Csv-Blueprint.
*
* This file is part of the JBZoo Toolbox project.
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*
* @license MIT
* @copyright Copyright (C) JBZoo.com, All rights reserved.
* @see https://github.com/JBZoo/Csv-Blueprint
*/

declare(strict_types=1);

namespace JBZoo\CsvBlueprint\Rules;

use JBZoo\CsvBlueprint\Rules\Aggregate\AbstarctAggregateRuleCombo;
use JBZoo\CsvBlueprint\Rules\Cell\AbstractCellRuleCombo;
use JBZoo\CsvBlueprint\Validators\ColumnValidator;
use JBZoo\CsvBlueprint\Validators\Error;

abstract class AbstarctRuleCombo extends AbstarctRule
{
protected const NAME = 'UNDEFINED';

protected const VERBS = [
self::EQ => 'not equal',
self::NOT => 'equal',
self::MIN => 'less',
self::MAX => 'greater',
];

abstract protected function getExpected(): float;

abstract protected function getActual(array|string $value): float;

public function validate(array|string $cellValue, int $line = ColumnValidator::FALLBACK_LINE): ?Error
{
$error = $this->validateCombo($cellValue);

if ($error !== null) {
return new Error($this->ruleCode, $error, $this->columnNameId, $line);
}

return null;
}

public function test(array|string $cellValue, bool $isHtml = false): string
{
$errorMessage = (string)$this->validateCombo($cellValue);

return $isHtml ? $errorMessage : \strip_tags($errorMessage);
}

public static function parseMode(string $origRuleName): string
{
$postfixes = [self::MAX, self::MIN, self::NOT];

if (\preg_match('/_(' . \implode('|', $postfixes) . ')$/', $origRuleName, $matches) === 1) {
return $matches[1];
}

return '';
}

protected function getRuleCode(?string $mode = null): string
{
return \str_replace('combo_', '', parent::getRuleCode($mode));
}

protected static function compare(float $expected, float $actual, string $mode): bool
{
// Rounding numbers to 10 decimal places before strict comparison is necessary due to the inherent
// imprecision of floating-point arithmetic. Computers represent floating-point numbers in binary,
// which can lead to small rounding errors for what we expect to be precise decimal values.
// As a result, direct comparisons of floating-point numbers that should be equal might fail.
// Rounding both numbers to a fixed number of decimal places before comparison can mitigate this issue,
// making it a practical approach to handle the imprecision and ensure more reliable equality checks.
// Since PHP's default precision is 12 digits, we chose 10 digits to be more confident.
$precision = 10;
$expected = \round($expected, $precision);
$actual = \round($actual, $precision);

return match ($mode) {
self::EQ => $expected === $actual,
self::NOT => $expected !== $actual,
self::MIN => $expected <= $actual,
self::MAX => $expected >= $actual,
default => throw new \InvalidArgumentException("Unknown mode: {$mode}"),
};
}

private function validateCombo(array|string $cellValue): ?string
{
if ($this instanceof AbstractCellRuleCombo) {
if (!\is_string($cellValue)) {
throw new \InvalidArgumentException('The value should be a string');
}

return $this->validateComboCell($cellValue, $this->mode);
}

if ($this instanceof AbstarctAggregateRuleCombo) {
if (!\is_array($cellValue)) {
throw new \InvalidArgumentException('The value should be an array of numbers/strings');
}

return $this->validateComboAggregate($cellValue, $this->mode);
}

throw new \LogicException('Unknown rule type: ' . static::class);
}
}

0 comments on commit 41ae59a

Please sign in to comment.