Skip to content

Commit

Permalink
Add quick check option and optimize validation process (#17)
Browse files Browse the repository at this point in the history
This commit introduces a new quick check option that terminates the CSV
validation process as soon as the first error is detected, leading to
faster checks but fewer error messages. Some modifications were made to
improve handling of file paths and file counts. The console output was
also tweaked to be more human-readable. Two new tests were added to
check the documentation against the actual functionality.
  • Loading branch information
SmetDenis committed Mar 13, 2024
1 parent afb7653 commit 70c6715
Show file tree
Hide file tree
Showing 10 changed files with 444 additions and 129 deletions.
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ COPY . /app
RUN cd /app \
&& composer install --no-dev --optimize-autoloader --no-progress \
&& composer clear-cache
RUN chmod +x app/csv-blueprint
RUN chmod +x /app/csv-blueprint

# Color output by default
ENV TERM_PROGRAM=Hyper

ENTRYPOINT ["/app/csv-blueprint"]
ENTRYPOINT ["sh", "-c", "/app/csv-blueprint"]
49 changes: 36 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,28 @@ Also see demo in the [GitHub Actions](https://github.com/JBZoo/Csv-Blueprint/act
### As GitHub Action

```yml
- name: Validate CSV file
uses: jbzoo/csv-blueprint@master
with:
csv: tests/**/*.csv
schema: tests/schema.yml
# Optional. Default is "github". Available options: text, table, github, etc
report: table
- uses: jbzoo/csv-blueprint # See the specific version on releases page
with:
# Path(s) to validate. You can specify path in which CSV files will be searched. Feel free to use glob pattrens. Usage examples: /full/path/file.csv, p/file.csv, p/*.csv, p/**/*.csv, p/**/name-*.csv, **/*.csv, etc.
# Required: true
csv: ./tests/**/*.csv

# Schema filepath. It can be a YAML, JSON or PHP. See examples on GitHub.
# Required: true
schema: ./tests/schema.yml

# Report format. Available options: text, table, github, gitlab, teamcity, junit
# Default value: github
# You can skip it
report: github

# Quick mode. It will not validate all rows. It will stop after the first error.
# Default value: no
# You can skip it
quick: no

```

**Note**. Report format for GitHub Actions is `github` by default. See [GitHub Actions friendly](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message) and [PR as a live demo](https://github.com/JBZoo/Csv-Blueprint-Demo/pull/1/files).

This allows you to see bugs in the GitHub interface at the PR level.
Expand Down Expand Up @@ -171,6 +185,10 @@ Options:
It can be a YAML, JSON or PHP. See examples on GitHub.
-r, --report=REPORT Report output format. Available options:
text, table, github, gitlab, teamcity, junit [default: "table"]
-Q, --quick[=QUICK] Immediately terminate the check at the first error found.
Of course it will speed up the check, but you will get only 1 message out of many.
If any error is detected, the utility will return a non-zero exit code.
Empty value or "yes" will be treated as "true". [default: "no"]
--no-progress Disable progress bar animation for logs. It will be used only for text output format.
--mute-errors Mute any sort of errors. So exit code will be always "0" (if it's possible).
It has major priority then --non-zero-on-error. It's on your own risk!
Expand Down Expand Up @@ -211,8 +229,9 @@ Default report format is `table`:
Schema: ./tests/schemas/demo_invalid.yml
Found CSV files: 3
Invalid file: ./tests/fixtures/batch/demo-1.csv
(1/3) Invalid file: ./tests/fixtures/batch/demo-1.csv
+------+------------------+--------------+ demo-1.csv ------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+--------------+------------------------------------------------------+
Expand All @@ -221,7 +240,7 @@ Invalid file: ./tests/fixtures/batch/demo-1.csv
| | | | "green", "Blue"] |
+------+------------------+--------------+ demo-1.csv ------------------------------------------+
Invalid file: ./tests/fixtures/batch/demo-2.csv
(2/3) Invalid file: ./tests/fixtures/batch/demo-2.csv
+------+------------+------------+----- demo-2.csv ---------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------+------------+--------------------------------------------------------+
Expand All @@ -235,7 +254,8 @@ Invalid file: ./tests/fixtures/batch/demo-2.csv
| 7 | 0:Name | min_length | Value "Lois" (length: 4) is too short. Min length is 5 |
+------+------------+------------+----- demo-2.csv ---------------------------------------+
OK: ./tests/fixtures/batch/sub/demo-3.csv
(3/3) OK: ./tests/fixtures/batch/sub/demo-3.csv
Found 7 issues in 2 out of 3 CSV files.
```
Expand Down Expand Up @@ -469,14 +489,17 @@ Batch processing
* [x] ~~CSV/Schema file discovery in the folder with regex filename pattern (like `glob(./**/dir/*.csv)`).~~
* [x] ~~If option `--csv` is a folder, then validate all files in the folder.~~
* [x] ~~Checking multiple CSV files in one schema.~~
* [ ] Quick stop flag. If the first error is found, then stop the validation process to save time.
* [ ] Using multiple schemas for one csv file.
* [x] ~~Quick stop flag. If the first error is found, then stop the validation process to save time.~~
* [ ] If option `--csv` is not specified, then the STDIN is used. To build a pipeline in Unix-like systems.
* [ ] Discovering CSV files by `filename_pattern` in the schema file. In case you have a lot of schemas and a lot of CSV files and want to automate the process as one command.

Validation
* [ ] Filename pattern validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").
* [ ] `filename_pattern` validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").
* [ ] Agregate rules (like "at least one of the fields should be not empty" or "all values must be unique").
* [ ] Handle empty files and files with only a header row, or only with one line of data. One column wthout header is also possible.
* [ ] Using multiple schemas for one csv file.
* [ ] Inheritance of schemas, rules and columns. Define parent schema and override some rules in the child schemas. Make it DRY and easy to maintain.
* [ ] Validate syntax and options in the schema file. It's important to know if the schema file is valid and can be used for validation.
* [ ] If option `--schema` is not specified, then validate only super base level things (like "is it a CSV file?").
* [ ] Complex rules (like "if field `A` is not empty, then field `B` should be not empty too").
* [ ] Extending with custom rules and custom report formats. Plugins?
Expand Down
13 changes: 9 additions & 4 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# @see https://github.com/JBZoo/Csv-Blueprint
#

name: 'CSV Validator by schemas'
name: 'CSV Blueprint - Validator by schemas'
description: 'Strict and flexible schema-based CSV file validation with the ability to report as GitHub Annotations in your PRs.'
author: 'Denis Smetannikov <admin@jbzoo.com>'

Expand All @@ -21,18 +21,21 @@ branding:
inputs:
csv:
description: >
Path(s) to validate. You can specify path in which CSV files will be searched
(max depth is 10).
Path(s) to validate. You can specify path in which CSV files will be searched.
Feel free to use glob pattrens. Usage examples:
/full/path/file.csv, p/file.csv, p/*.csv, p/**/*.csv, p/**/name-*.csv, **/*.csv, etc.
required: true
schema:
description: 'Schema filepath. It can be a YAML, JSON or PHP. See examples on GitHub.'
required: true
report:
description: 'Report output format. Available options: text, table, github, gitlab, teamcity, junit'
description: 'Report format. Available options: text, table, github, gitlab, teamcity, junit'
default: github
required: true
quick:
description: 'Quick mode. It will not validate all rows. It will stop after the first error.'
default: no
required: true

runs:
using: 'docker'
Expand All @@ -45,5 +48,7 @@ runs:
- ${{ inputs.schema }}
- '--report'
- ${{ inputs.report }}
- '--quick'
- ${{ inputs.quick }}
- '--ansi'
- '-vvv'
67 changes: 50 additions & 17 deletions src/Commands/ValidateCsv.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Finder\SplFileInfo;

use function JBZoo\Utils\bool;

/**
* @psalm-suppress PropertyNotSetInConstructor
*/
Expand Down Expand Up @@ -67,6 +69,16 @@ protected function configure(): void
"Report output format. Available options:\n" .
'<info>' . \implode(', ', ErrorSuite::getAvaiableRenderFormats()) . '</info>',
ErrorSuite::RENDER_TABLE,
)
->addOption(
'quick',
'Q',
InputOption::VALUE_OPTIONAL,
"Immediately terminate the check at the first error found.\n" .
"Of course it will speed up the check, but you will get only 1 message out of many.\n" .
"If any error is detected, the utility will return a non-zero exit code.\n" .
'Empty value or "yes" will be treated as "true".',
'no',
);

parent::configure();
Expand All @@ -76,45 +88,61 @@ protected function executeAction(): int
{
$csvFilenames = $this->getCsvFilepaths();
$schemaFilename = $this->getSchemaFilepath();
$quickCheck = $this->isQuickCheck();
$totalFiles = \count($csvFilenames);

$this->_("Found CSV files: {$totalFiles}");
$this->_('');

$errorCounter = 0;
$invalidFiles = 0;
$totalFiles = \count($csvFilenames);
$errorSuite = null;

foreach ($csvFilenames as $index => $csvFilename) {
$prefix = '(' . ((int)$index + 1) . "/{$totalFiles})";

if ($quickCheck && $errorSuite !== null && $errorSuite->count() > 0) {
$this->_("{$prefix} <yellow>Skipped:</yellow> " . Utils::cutPath($csvFilename->getPathname()));
continue;
}

foreach ($csvFilenames as $csvFilename) {
$csvFile = new CsvFile($csvFilename->getPathname(), $schemaFilename);
$errorSuite = $csvFile->validate();
$errorSuite = $csvFile->validate($quickCheck);

if ($errorSuite->count() > 0) {
$invalidFiles++;
$errorCounter += $errorSuite->count();

if ($this->isTextMode()) {
$this->_('<red>Invalid file:</red> ' . Utils::cutPath($csvFilename->getPathname()), OutLvl::E);
if ($this->isHumanReadableMode()) {
$this->_(
"{$prefix} <red>Invalid file:</red> " . Utils::cutPath($csvFilename->getPathname()),
OutLvl::E,
);
}

$output = $errorSuite->render($this->getOptString('report'));
if ($output !== null) {
$this->_($output, $this->isTextMode() ? OutLvl::E : OutLvl::DEFAULT);
$this->_($output, $this->isHumanReadableMode() ? OutLvl::E : OutLvl::DEFAULT);
}
} elseif ($this->isTextMode()) {
$this->_('<green>OK:</green> ' . Utils::cutPath($csvFilename->getPathname()));
} elseif ($this->isHumanReadableMode()) {
$this->_("{$prefix} <green>OK:</green> " . Utils::cutPath($csvFilename->getPathname()));
}
}

if ($errorCounter > 0 && $this->isTextMode()) {
if ($errorCounter > 0 && $this->isHumanReadableMode()) {
if ($totalFiles === 1) {
$errMessage = "<c>Found {$errorCounter} issues in CSV file.</c>";
} else {
$errMessage = "<c>Found {$errorCounter} issues in {$invalidFiles} out of {$totalFiles} CSV files.</c>";
}

$this->_('');
$this->_($errMessage, OutLvl::E);

return self::FAILURE;
}

if ($this->isTextMode()) {
if ($this->isHumanReadableMode()) {
$this->_('<green>Looks good!</green>');
}

Expand All @@ -133,7 +161,7 @@ private function getCsvFilepaths(): array
throw new Exception('CSV file(s) not found in path(s): ' . \implode("\n, ", $rawInput));
}

return $scvFilenames;
return \array_values($scvFilenames);
}

private function getSchemaFilepath(): string
Expand All @@ -144,23 +172,28 @@ private function getSchemaFilepath(): string
throw new Exception("Schema file not found: {$schemaFilename}");
}

if ($this->isTextMode()) {
if ($this->isHumanReadableMode()) {
$this->_('<blue>Schema:</blue> ' . Utils::cutPath($schemaFilename));
}

return $schemaFilename;
}

private function isTextMode(): bool
private function isHumanReadableMode(): bool
{
return $this->getReportType() === ErrorSuite::REPORT_TEXT
|| $this->getReportType() === ErrorSuite::REPORT_GITHUB
|| $this->getReportType() === ErrorSuite::REPORT_TEAMCITY
|| $this->getReportType() === ErrorSuite::RENDER_TABLE;
return $this->getReportType() !== ErrorSuite::REPORT_GITLAB
&& $this->getReportType() !== ErrorSuite::REPORT_JUNIT;
}

private function getReportType(): string
{
return $this->getOptString('report', ErrorSuite::RENDER_TABLE, ErrorSuite::getAvaiableRenderFormats());
}

private function isQuickCheck(): bool
{
$value = $this->getOptString('quick');

return $value === '' || bool($value);
}
}
9 changes: 7 additions & 2 deletions src/Csv/CsvFile.php
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@
use JBZoo\CsvBlueprint\Validators\ErrorSuite;
use League\Csv\Reader as LeagueReader;
use League\Csv\Statement;
use League\Csv\TabularDataReader;

final class CsvFile
{
private string $csvFilename;
private ParseConfig $structure;
private LeagueReader $reader;
private Schema $schema;
private bool $isEmpty;

public function __construct(string $csvFilename, null|array|string $csvSchemaFilenameOrArray = null)
{
Expand All @@ -36,6 +38,7 @@ public function __construct(string $csvFilename, null|array|string $csvSchemaFil
}

$this->csvFilename = $csvFilename;
$this->isEmpty = \filesize($this->csvFilename) <= 1;
$this->schema = new Schema($csvSchemaFilenameOrArray);
$this->structure = $this->schema->getCsvStructure();
$this->reader = $this->prepareReader();
Expand All @@ -56,7 +59,9 @@ public function getCsvStructure(): ParseConfig
*/
public function getHeader(): array
{
if ($this->structure->isHeader()) {
if ($this->structure->isHeader() && !$this->isEmpty) {
// TODO: add handler for empty file
// League\Csv\SyntaxError : The header record does not exist or is empty at offset: `0
return $this->reader->getHeader();
}

Expand All @@ -68,7 +73,7 @@ public function getRecords(): \Iterator
return $this->reader->getRecords($this->getHeader());
}

public function getRecordsChunk(int $offset = 0, int $limit = -1): \League\Csv\TabularDataReader
public function getRecordsChunk(int $offset = 0, int $limit = -1): TabularDataReader
{
return Statement::create(null, $offset, $limit)->process($this->reader, $this->getHeader());
}
Expand Down
2 changes: 1 addition & 1 deletion src/Validators/ErrorSuite.php
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ private function renderTable(): string

$table->render();

return $buffer->fetch();
return \trim($buffer->fetch()) . "\n";
}

private function prepareSourceSuite(): SourceSuite
Expand Down

0 comments on commit 70c6715

Please sign in to comment.