Skip to content

Commit

Permalink
Add detailed description and metadata to schema examples (#58)
Browse files Browse the repository at this point in the history
Updated JSON, PHP, and YAML schema examples to include a detailed
description and name metadata. Also added a `full_clean.yml` file with
the comment lines removed. Updated the `ErrorSuite.php`, `Schema.php`,
`Utils.php`, `ValidateCsvTest.php` classes to reflect these changes.
  • Loading branch information
SmetDenis committed Mar 19, 2024
1 parent fc04014 commit 9958cd7
Show file tree
Hide file tree
Showing 18 changed files with 393 additions and 193 deletions.
81 changes: 44 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,22 +89,41 @@ It's also covered by tests, so it's always up-to-date.

<!-- full-yml -->
```yml
# It's a full example of the CSV schema file in YAML format.
# It's a complete example of the CSV schema file in YAML format.
# See copy of the file without comments here ./schema-examples/full_clean.yml

# Just meta
name: CSV Blueprint Schema Example # Name of a CSV file. Not used in the validation process.
description: | # Any description of the CSV file. Not used in the validation process.
This YAML file provides a detailed description and validation rules for CSV files
to be processed by JBZoo/Csv-Blueprint tool. It includes specifications for file name patterns,
CSV formatting options, and extensive validation criteria for individual columns and their values,
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values

# Here are default values to parse CSV file.
# You can skip this section if you don't need to override the default values.
csv:
header: true # If the first row is a header. If true, name of each column is required.
delimiter: , # Delimiter character in CSV file.
quote_char: \ # Quote character in CSV file.
enclosure: "\"" # Enclosure for each field in CSV file.
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.


# Description of each column in CSV.
# It is recommended to present each column in the same order as presented in the CSV file.
# This will not affect the validator, but will make it easier for you to navigate.
# For convenience, use the first line as a header (if possible).
columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Description of the column. Not used in the validation process.
Expand Down Expand Up @@ -491,44 +510,32 @@ Default report format is `table`:

<!-- output-table -->
```
./csv-blueprint validate:csv --csv='./tests/fixtures/batch/*.csv' --schema='./tests/schemas/demo_invalid.yml'
./csv-blueprint validate:csv --csv='./tests/fixtures/demo.csv' --schema='./tests/schemas/demo_invalid.yml'
Schema: ./tests/schemas/demo_invalid.yml
Found CSV files: 3
(1/3) Invalid file: ./tests/fixtures/batch/demo-1.csv
+------+------------------+--------------+--------- demo-1.csv --------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+--------------+-----------------------------------------------------------------------+
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 1, total: 2 |
| 3 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+------+------------------+--------------+--------- demo-1.csv --------------------------------------------------+
(2/3) Invalid file: ./tests/fixtures/batch/demo-2.csv
+------+------------+------------+----------------- demo-2.csv --------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------+------------+-------------------------------------------------------------------------------+
| 2 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
| 7 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
| 2 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", |
| | | | which is less than the expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 4 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", |
| | | | which is less than the expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 5 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", |
| | | | which is greater than the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
+------+------------+------------+----------------- demo-2.csv --------------------------------------------------+
(3/3) Invalid file: ./tests/fixtures/batch/sub/demo-3.csv
+------+-----------+------------------+------------ demo-3.csv --------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+-----------+------------------+--------------------------------------------------------------------------+
| 1 | | filename_pattern | Filename "./tests/fixtures/batch/sub/demo-3.csv" does not match pattern: |
| | | | "/demo-[12].csv$/i" |
+------+-----------+------------------+------------ demo-3.csv --------------------------------------------------+
Found 8 issues in 3 out of 3 CSV files.
Found CSV files: 1
(1/1) Invalid file: ./tests/fixtures/demo.csv
+------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+------------------+------------------------------------------------------------------------------------------------------+
| 1 | | filename_pattern | Filename "./tests/fixtures/demo.csv" does not match pattern: "/demo-[12].csv$/i" |
| 6 | 0:Name | length_min | The length of the value "Carl" is 4, which is less than the expected "5" |
| 11 | 0:Name | length_min | The length of the value "Lois" is 4, which is less than the expected "5" |
| 1 | 1:City | ag:is_unique | Column has non-unique values. Unique: 9, total: 10 |
| 2 | 2:Float | num_max | The number of the value "4825.185", which is greater than the expected "4825.184" |
| 6 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 8 | 3:Birthday | date_min | The date of the value "1955-05-14" is parsed as "1955-05-14 00:00:00 +00:00", which is less than the |
| | | | expected "1955-05-15 00:00:00 +00:00 (1955-05-15)" |
| 9 | 3:Birthday | date_max | The date of the value "2010-07-20" is parsed as "2010-07-20 00:00:00 +00:00", which is greater than |
| | | | the expected "2009-01-01 00:00:00 +00:00 (2009-01-01)" |
| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", "green", "Blue"] |
+------+------------------+------------------+----------------------- demo.csv ---------------------------------------------------------------------+
Found 9 issues in CSV file.
```
<!-- /output-table -->
Expand Down
2 changes: 2 additions & 0 deletions schema-examples/full.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
{
"name" : "CSV Blueprint Schema Example",
"description" : "This YAML file provides a detailed description and validation rules for CSV files\nto be processed by JBZoo\/Csv-Blueprint tool. It includes specifications for file name patterns,\nCSV formatting options, and extensive validation criteria for individual columns and their values,\nsupporting a wide range of data validation rules from basic type checks to complex regex validations.\nThis example serves as a comprehensive guide for creating robust CSV file validations.\n",
"filename_pattern" : "\/demo(-\\d+)?\\.csv$\/i",
"csv" : {
"header" : true,
Expand Down
7 changes: 7 additions & 0 deletions schema-examples/full.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@
declare(strict_types=1);

return [
'name' => 'CSV Blueprint Schema Example',
'description' => 'This YAML file provides a detailed description and validation rules for CSV files
to be processed by JBZoo/Csv-Blueprint tool. It includes specifications for file name patterns,
CSV formatting options, and extensive validation criteria for individual columns and their values,
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
',
'filename_pattern' => '/demo(-\\d+)?\\.csv$/i',
'csv' => [
'header' => true,
Expand Down
23 changes: 21 additions & 2 deletions schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,41 @@
# @see https://github.com/JBZoo/Csv-Blueprint
#

# It's a full example of the CSV schema file in YAML format.
# It's a complete example of the CSV schema file in YAML format.
# See copy of the file without comments here ./schema-examples/full_clean.yml

# Just meta
name: CSV Blueprint Schema Example # Name of a CSV file. Not used in the validation process.
description: | # Any description of the CSV file. Not used in the validation process.
This YAML file provides a detailed description and validation rules for CSV files
to be processed by JBZoo/Csv-Blueprint tool. It includes specifications for file name patterns,
CSV formatting options, and extensive validation criteria for individual columns and their values,
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values

# Here are default values to parse CSV file.
# You can skip this section if you don't need to override the default values.
csv:
header: true # If the first row is a header. If true, name of each column is required.
delimiter: , # Delimiter character in CSV file.
quote_char: \ # Quote character in CSV file.
enclosure: "\"" # Enclosure for each field in CSV file.
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.


# Description of each column in CSV.
# It is recommended to present each column in the same order as presented in the CSV file.
# This will not affect the validator, but will make it easier for you to navigate.
# For convenience, use the first line as a header (if possible).
columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Description of the column. Not used in the validation process.
Expand Down
148 changes: 148 additions & 0 deletions schema-examples/full_clean.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
#
# JBZoo Toolbox - Csv-Blueprint.
#
# This file is part of the JBZoo Toolbox project.
# For the full copyright and license information, please view the LICENSE
# file that was distributed with this source code.
#
# @license MIT
# @copyright Copyright (C) JBZoo.com, All rights reserved.
# @see https://github.com/JBZoo/Csv-Blueprint
#

# It's a complete example of the CSV schema file in YAML format.
# It's just a copy of ./schema-examples/full.yml without comments.

name: CSV Blueprint Schema Example
description: |
This YAML file provides a detailed description and validation rules for CSV files
to be processed by JBZoo/Csv-Blueprint tool. It includes specifications for file name patterns,
CSV formatting options, and extensive validation criteria for individual columns and their values,
supporting a wide range of data validation rules from basic type checks to complex regex validations.
This example serves as a comprehensive guide for creating robust CSV file validations.
filename_pattern: /demo(-\d+)?\.csv$/i

csv:
header: true
delimiter: ","
quote_char: "\\"
enclosure: "\""
encoding: utf-8
bom: false

columns:
- name: "Column Name (header)"
description: "Lorem ipsum"
example: "Some example"
rules:
not_empty: true
exact_value: Some string
allow_values: [ y, n, "" ]
not_allow_values: [ invalid ]
regex: /^[\d]{2}$/
length: 5
length_not: 4
length_min: 1
length_max: 10
is_trimmed: true
is_lowercase: true
is_uppercase: true
is_capitalize: true
word_count: 5
word_count_not: 4
word_count_min: 1
word_count_max: 10
contains: Hello
contains_one: [ a, b ]
contains_all: [ a, b, c ]
starts_with: "prefix "
ends_with: " suffix"
num: 5
num_not: 4.123
num_min: 1.2e3
num_max: -10.123
is_int: true
is_float: true
precision: 5
precision_not: 4
precision_min: 1
precision_max: 10
date: 01 Jan 2000
date_not: 2006-01-02 15:04:05 -0700 Europe/Rome
date_min: +1 day
date_max: now
date_format: Y-m-d
is_date: true
is_bool: true
is_ip4: true
is_url: true
is_email: true
is_domain: true
is_uuid: true
is_alias: true
is_currency_code: true
is_base64: true
is_json: true
is_latitude: true
is_longitude: true
is_geohash: true
is_cardinal_direction: true
is_usa_market_name: true
country_code: alpha-2
language_code: alpha-2

aggregate_rules:
is_unique: true
sum: 5.123
sum_not: 4.123
sum_min: 1.123
sum_max: 10.123
average: 5.123
average_not: 4.123
average_min: 1.123
average_max: 10.123
count: 5
count_not: 4
count_min: 1
count_max: 10
count_empty: 5
count_empty_not: 4
count_empty_min: 1
count_empty_max: 10
count_not_empty: 5
count_not_empty_not: 4
count_not_empty_min: 1
count_not_empty_max: 10
median: 5.123
median_not: 4.123
median_min: 1.123
median_max: 10.123
population_variance: 5.123
population_variance_not: 4.123
population_variance_min: 1.123
population_variance_max: 10.123
sample_variance: 5.123
sample_variance_not: 4.123
sample_variance_min: 1.123
sample_variance_max: 10.123
sd_sample: 5.123
sd_sample_not: 4.123
sd_sample_min: 1.123
sd_sample_max: 10.123
sd_population: 5.123
sd_population_not: 4.123
sd_population_min: 1.123
sd_population_max: 10.123
coef_of_var: 5.123
coef_of_var_not: 4.123
coef_of_var_min: 1.123
coef_of_var_max: 10.123

- name: "another_column"
rules:
not_empty: true

- name: "third_column"
rules:
not_empty: true
2 changes: 1 addition & 1 deletion src/Schema.php
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ public function validate(): ErrorSuite

$errors = new ErrorSuite($this->filename);

$metaErrors = Utils::compareArray($expectedMeta, $actualMeta->getArrayCopy(), 'meta');
$metaErrors = Utils::compareArray($expectedMeta, $actualMeta->getArrayCopy(), 'meta', '.');

// Validate meta info
foreach ($metaErrors as $metaError) {
Expand Down
10 changes: 8 additions & 2 deletions src/Utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,13 @@ public static function compareArray(
$curPath = $path === '' ? (string)$key : "{$path}.{$key}";

if (!\array_key_exists($key, $expectedSchema)) {
$differences[$columnId . '/' . $curPath] = [$columnId, "Unknown key: {$keyPrefix}.{$curPath}"];
if (\strlen($keyPrefix) <= 1) {
$message = "Unknown key: .{$curPath}";
} else {
$message = "Unknown key: .{$keyPrefix}.{$curPath}";
}

$differences[$columnId . '/' . $curPath] = [$columnId, $message];
continue;
}

Expand All @@ -159,7 +165,7 @@ public static function compareArray(
$differences[$columnId . '/' . $curPath] = [
$columnId,
"Expected type \"<c>{$expectedType}</c>\", actual \"<green>{$actualType}</green>\" in " .
"{$keyPrefix}.{$curPath}",
".{$keyPrefix}.{$curPath}",
];
} elseif (\is_array($value)) {
$differences += \array_merge(
Expand Down
2 changes: 1 addition & 1 deletion src/Validators/ErrorSuite.php
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ private static function getTableSize(): array
'column' => 30,
'rule' => 30,
'min' => 120,
'max' => 150,
'max' => 170,
'reserve' => 3, // So that the table does not rest on the very edge of the terminal. Just in case.
];

Expand Down

0 comments on commit 9958cd7

Please sign in to comment.