Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CSV schema validation with filename patterns #22

Merged
merged 4 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions .github/workflows/demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,12 @@ jobs:

- name: 👎 Invalid CSV file
run: |
docker run \
! docker run \
-v `pwd`:/parent-host \
--rm jbzoo/csv-blueprint \
validate:csv \
--csv=/parent-host/tests/fixtures/batch/*.csv \
--schema=/parent-host/tests/schemas/demo_invalid.yml
continue-on-error: true


phar:
Expand All @@ -138,11 +137,10 @@ jobs:

- name: 👎 Invalid CSV file
run: |
./build/csv-blueprint.phar \
! ./build/csv-blueprint.phar \
validate:csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_invalid.yml
continue-on-error: true


php:
Expand All @@ -169,8 +167,7 @@ jobs:

- name: 👎 Invalid CSV file
run: |
./csv-blueprint \
! ./csv-blueprint \
validate:csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_invalid.yml
continue-on-error: true
34 changes: 28 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,9 +254,16 @@ Found CSV files: 3
| 7 | 0:Name | min_length | Value "Lois" (length: 4) is too short. Min length is 5 |
+------+------------+------------+----- demo-2.csv ---------------------------------------+

(3/3) OK: ./tests/fixtures/batch/sub/demo-3.csv
(3/3) Invalid file: ./tests/fixtures/batch/sub/demo-3.csv
+------+-----------+------------------+---- demo-3.csv -------------------------------------------+
| Line | id:Column | Rule | Message |
+------+-----------+------------------+-----------------------------------------------------------+
| 0 | | filename_pattern | Filename "./tests/fixtures/batch/sub/demo-3.csv" does not |
| | | | match pattern: "/demo-[12].csv$/i" |
+------+-----------+------------------+---- demo-3.csv -------------------------------------------+

Found 7 issues in 2 out of 3 CSV files.

Found 8 issues in 3 out of 3 CSV files.

```

Expand Down Expand Up @@ -307,6 +314,11 @@ This gives you great flexibility when validating CSV files.
```yml
# It's a full example of the CSV schema file in YAML format.

# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values
header: true # If the first row is a header. If true, name of each column is required
delimiter: , # Delimiter character in CSV file
Expand Down Expand Up @@ -362,6 +374,8 @@ columns:
cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", ""
usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY"

- name: "another_column"

```


Expand All @@ -370,15 +384,16 @@ columns:

```json
{
"csv" : {
"filename_pattern" : "/demo(-\\d+)?\\.csv$/i",
"csv" : {
"header" : true,
"delimiter" : ",",
"quote_char" : "\\",
"enclosure" : "\"",
"encoding" : "utf-8",
"bom" : false
},
"columns" : [
"columns" : [
{
"name" : "csv_header_name",
"description" : "Lorem ipsum",
Expand Down Expand Up @@ -412,7 +427,8 @@ columns:
"cardinal_direction" : true,
"usa_market_name" : true
}
}
},
{"name" : "another_column"}
]
}

Expand All @@ -422,6 +438,7 @@ columns:




<details>
<summary>Click to see: PHP Format</summary>

Expand All @@ -430,6 +447,8 @@ columns:
declare(strict_types=1);

return [
'filename_pattern' => '/demo(-\\d+)?\\.csv$/i',

'csv' => [
'header' => true,
'delimiter' => ',',
Expand All @@ -438,6 +457,7 @@ return [
'encoding' => 'utf-8',
'bom' => false,
],

'columns' => [
[
'name' => 'csv_header_name',
Expand Down Expand Up @@ -473,6 +493,7 @@ return [
'usa_market_name' => true,
],
],
['name' => 'another_column'],
],
];

Expand All @@ -481,6 +502,7 @@ return [
</details>



## Coming soon

It's random ideas and plans. No orderings and deadlines. <u>But batch processing is the priority #1</u>.
Expand All @@ -494,7 +516,7 @@ Batch processing
* [ ] Discovering CSV files by `filename_pattern` in the schema file. In case you have a lot of schemas and a lot of CSV files and want to automate the process as one command.

Validation
* [ ] `filename_pattern` validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").
* [x] ~~`filename_pattern` validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").~~
* [ ] Agregate rules (like "at least one of the fields should be not empty" or "all values must be unique").
* [ ] Handle empty files and files with only a header row, or only with one line of data. One column wthout header is also possible.
* [ ] Using multiple schemas for one csv file.
Expand Down
8 changes: 5 additions & 3 deletions schema-examples/full.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
{
"csv" : {
"filename_pattern" : "/demo(-\\d+)?\\.csv$/i",
"csv" : {
"header" : true,
"delimiter" : ",",
"quote_char" : "\\",
"enclosure" : "\"",
"encoding" : "utf-8",
"bom" : false
},
"columns" : [
"columns" : [
{
"name" : "csv_header_name",
"description" : "Lorem ipsum",
Expand Down Expand Up @@ -41,6 +42,7 @@
"cardinal_direction" : true,
"usa_market_name" : true
}
}
},
{"name" : "another_column"}
]
}
4 changes: 4 additions & 0 deletions schema-examples/full.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
declare(strict_types=1);

return [
'filename_pattern' => '/demo(-\\d+)?\\.csv$/i',

'csv' => [
'header' => true,
'delimiter' => ',',
Expand All @@ -23,6 +25,7 @@
'encoding' => 'utf-8',
'bom' => false,
],

'columns' => [
[
'name' => 'csv_header_name',
Expand Down Expand Up @@ -58,5 +61,6 @@
'usa_market_name' => true,
],
],
['name' => 'another_column'],
],
];
7 changes: 7 additions & 0 deletions schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@

# It's a full example of the CSV schema file in YAML format.

# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values
header: true # If the first row is a header. If true, name of each column is required
delimiter: , # Delimiter character in CSV file
Expand Down Expand Up @@ -66,3 +71,5 @@ columns:
is_longitude: true # Can be integer or float. Example: -89.123456
cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", ""
usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY"

- name: "another_column"
39 changes: 37 additions & 2 deletions src/Csv/CsvFile.php
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
namespace JBZoo\CsvBlueprint\Csv;

use JBZoo\CsvBlueprint\Schema;
use JBZoo\CsvBlueprint\Utils;
use JBZoo\CsvBlueprint\Validators\Error;
use JBZoo\CsvBlueprint\Validators\ErrorSuite;
use League\Csv\Reader as LeagueReader;
Expand Down Expand Up @@ -82,7 +83,9 @@ public function validate(bool $quickStop = false): ErrorSuite
{
$errors = new ErrorSuite($this->getCsvFilename());

$errors->addErrorSuit($this->validateHeader())
$errors
->addErrorSuit($this->validateFile($quickStop))
->addErrorSuit($this->validateHeader($quickStop))
->addErrorSuit($this->validateEachCell($quickStop))
->addErrorSuit(self::validateAggregateRules($quickStop));

Expand All @@ -106,7 +109,7 @@ private function prepareReader(): LeagueReader
return $reader;
}

private function validateHeader(): ErrorSuite
private function validateHeader(bool $quickStop = false): ErrorSuite
{
$errors = new ErrorSuite();

Expand All @@ -125,6 +128,10 @@ private function validateHeader(): ErrorSuite

$errors->addError($error);
}

if ($quickStop && $errors->count() > 0) {
return $errors;
}
}

return $errors;
Expand Down Expand Up @@ -152,6 +159,34 @@ private function validateEachCell(bool $quickStop = false): ErrorSuite
return $errors;
}

private function validateFile(bool $quickStop = false): ErrorSuite
{
$errors = new ErrorSuite();

$filenamePattern = $this->schema->getFilenamePattern();
if (
$filenamePattern !== null
&& $filenamePattern !== ''
&& \preg_match($filenamePattern, $this->csvFilename) === 0
) {
$error = new Error(
'filename_pattern',
'Filename "<c>' . Utils::cutPath($this->csvFilename) .
"</c>\" does not match pattern: \"<c>{$filenamePattern}</c>\"",
'',
0,
);

$errors->addError($error);

if ($quickStop && $errors->count() > 0) {
return $errors;
}
}

return $errors;
}

private static function validateAggregateRules(bool $quickStop = false): ErrorSuite
{
$errors = new ErrorSuite();
Expand Down
4 changes: 2 additions & 2 deletions src/Schema.php
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,9 @@ public function getColumn(int|string $columNameOrId): ?Column
return $column;
}

public function getFinenamePattern(): ?string
public function getFilenamePattern(): ?string
{
return $this->data->getStringNull('finename_pattern');
return Utils::prepareRegex($this->data->getStringNull('filename_pattern'));
}

public function getIncludes(): array
Expand Down
2 changes: 1 addition & 1 deletion src/Utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ public static function prepareRegex(?string $pattern, string $addDelimiter = '/'
}
}

return $addDelimiter . $pattern . $addDelimiter . 'u';
return $addDelimiter . $pattern . $addDelimiter;
}

/**
Expand Down
2 changes: 1 addition & 1 deletion tests/Blueprint/MiscTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ public function testPrepareRegex(): void
{
isSame(null, Utils::prepareRegex(null));
isSame(null, Utils::prepareRegex(''));
isSame('/.*/u', Utils::prepareRegex('.*'));
isSame('/.*/', Utils::prepareRegex('.*'));
isSame('#.*#u', Utils::prepareRegex('#.*#u'));
isSame('/.*/', Utils::prepareRegex('/.*/'));
isSame('/.*/ius', Utils::prepareRegex('/.*/ius'));
Expand Down
2 changes: 1 addition & 1 deletion tests/Blueprint/RulesTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ public function testRegex(): void
isSame(null, $rule->validate('aaa'));
isSame(null, $rule->validate('a'));
isSame(
'"regex" at line 0, column "prop". Value "1bc" does not match the pattern "/^a/u".',
'"regex" at line 0, column "prop". Value "1bc" does not match the pattern "/^a/".',
\strip_tags((string)$rule->validate('1bc')),
);
}
Expand Down
4 changes: 2 additions & 2 deletions tests/Blueprint/SchemaTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ public function testFilename(): void
public function testGetFinenamePattern(): void
{
$schemaEmpty = new Schema(self::SCHEMA_EXAMPLE_EMPTY);
isSame(null, $schemaEmpty->getFinenamePattern());
isSame(null, $schemaEmpty->getFilenamePattern());

$schemaFull = new Schema(self::SCHEMA_EXAMPLE_FULL);
isSame('^example\.csv$', $schemaFull->getFinenamePattern());
isSame('/^example\.csv$/', $schemaFull->getFilenamePattern());
}

public function testScvStruture(): void
Expand Down
Loading
Loading