Skip to content

Commit

Permalink
Merge ffaccb5 into 78ad613
Browse files Browse the repository at this point in the history
  • Loading branch information
SmetDenis committed Mar 11, 2024
2 parents 78ad613 + ffaccb5 commit fd3c2ed
Show file tree
Hide file tree
Showing 9 changed files with 270 additions and 42 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ jobs:
output: table
continue-on-error: true

- name: 👎 Invalid CSV file - Report as Table
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/demo-test.csv
schema: tests/schemas/demo_invalid.yml


docker:
name: Docker Hub
Expand Down
227 changes: 211 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,201 @@
[![Stable Version](https://poser.pugx.org/jbzoo/csv-blueprint/version)](https://packagist.org/packages/jbzoo/csv-blueprint/) [![Total Downloads](https://poser.pugx.org/jbzoo/csv-blueprint/downloads)](https://packagist.org/packages/jbzoo/csv-blueprint/stats) [![Docker Pulls](https://img.shields.io/docker/pulls/jbzoo/csv-blueprint.svg)](https://hub.docker.com/r/jbzoo/csv-blueprint) [![Dependents](https://poser.pugx.org/jbzoo/csv-blueprint/dependents)](https://packagist.org/packages/jbzoo/csv-blueprint/dependents?order_by=downloads) [![GitHub License](https://img.shields.io/github/license/jbzoo/csv-blueprint)](https://github.com/JBZoo/Csv-Blueprint/blob/master/LICENSE)


* [Introduction](#introduction)
* [Features](#features)
* [Usage](#usage)
* [As GitHub Action](#as-github-action)
* [As Docker container](#as-docker-container)
* [As PHP binary](#as-php-binary)
* [As PHP project](#as-php-project)
* [Schema Definition](#schema-definition)
* [Schema file examples](#schema-file-examples)
* [Coming soon](#coming-soon)
* [Unit tests and check code style](#unit-tests-and-check-code-style)
* [License](#license)
* [See Also](#see-also)


### Installing
## Introduction
The JBZoo/Csv-Blueprint tool is a powerful and flexible utility designed for validating CSV files against
a predefined schema specified in YAML format. With the capability to run both locally and in Docker environments,
JBZoo/Csv-Blueprint is an ideal choice for integrating into CI/CD pipelines, such as GitHub Actions,
to ensure the integrity of CSV data in your projects.


## Features
* **Schema-based Validation**: Define the structure and rules for your CSV files in an intuitive [YAML format](schema-examples/full.yml), enabling precise validation against your data's expected format.
* **Flexible Configuration**: Support for custom delimiters, quote characters, enclosures, and encoding settings to handle a wide range of CSV formats.
* **Comprehensive Rule Set**: Includes a broad set of validation rules, such as non-empty fields, exact values, regular expressions, numeric constraints, date formats, and more, catering to various data validation needs.
* **Docker Support**: Easily integrate into any workflow with Docker, providing a seamless experience for development, testing, and production environments.
* **GitHub Actions Integration**: Automate CSV validation in your CI/CD pipeline, enhancing the quality control of your data in pull requests and deployments.
* **Various ways to report** issues that can be easily integrated with GithHub, Gitlab, TeamCity, etc. The default output is a human-readable table. [See Live Demo](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml).



## Usage

Also see demo in the [GitHub Actions](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml) file.

### As GitHub Action

```yml
- name: Validate CSV file
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/demo.csv
schema: tests/schemas/demo_invalid.yml
output: table
```
**Note**. Output format for GitHub Actions is `github` by default. [GitHub Actions friendly](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message)
This allows you to see bugs in the GitHub interface at the PR level. That is, the error will be shown in a specific place in the CSV file.
See screenshot.


### As Docker container
Ensure you have Docker installed on your machine.

```sh
composer require jbzoo/csv-blueprint
# Pull the Docker image
docker pull jbzoo/csv-blueprint

# Run the tool inside Docker
docker run --rm \
--workdir=/parent-host \
-v `pwd`:/parent-host \
jbzoo/csv-blueprint \
validate:csv \
--csv=./tests/fixtures/demo.csv \
--schema=./tests/schemas/demo_invalid.yml
```


### Usage
### As PHP binary
Ensure you have PHP installed on your machine.

As Docker container:
```sh
wget https://github.com/JBZoo/CI-Report-Converter/releases/latest/download/csv-blueprint.phar
chmod +x ./csv-blueprint.phar
./csv-blueprint.phar --csv=./tests/fixtures/demo.csv --schema=./tests/schemas/demo_invalid.yml
```

### As PHP project
Ensure you have PHP installed on your machine.
Then, you can use the following commands to build from source and run the tool.

```sh
@docker run --rm \
-v `pwd`:/parent-host \
jbzoo/csv-blueprint \
validate:csv \
--csv=/parent-host/tests/fixtures/demo.csv \
--schema=/parent-host/tests/schemas/demo_invalid.yml \
--ansi
git clone git@github.com:jbzoo/csv-blueprint.git csv-blueprint
cd csv-blueprint
make build
./csv-blueprint validate:csv --csv=./tests/fixtures/demo.csv --schema=./tests/schemas/demo_invalid.yml
```

### Help
```
Description:
Validate CSV file by rule
Usage:
validate:csv [options]
Options:
-c, --csv=CSV CSV filepath to validate. If not set or empty, then the STDIN is used.
-s, --schema=SCHEMA Schema rule filepath
-o, --output=OUTPUT Report output format. Available options: text, table, github, gitlab, teamcity, junit [default: "table"]
--no-progress Disable progress bar animation for logs. It will be used only for text output format.
--mute-errors Mute any sort of errors. So exit code will be always "0" (if it's possible).
It has major priority then --non-zero-on-error. It's on your own risk!
--stdout-only For any errors messages application will use StdOut instead of StdErr. It's on your own risk!
--non-zero-on-error None-zero exit code on any StdErr message.
--timestamp Show timestamp at the beginning of each message.It will be used only for text output format.
--profile Display timing and memory usage information.
--output-mode=OUTPUT-MODE Output format. Available options:
text - Default text output format, userfriendly and easy to read.
cron - Shortcut for crontab. It's basically focused on human-readable logs output.
It's combination of --timestamp --profile --stdout-only --no-progress -vv.
logstash - Logstash output format, for integration with ELK stack.
[default: "text"]
--cron Alias for --output-mode=cron. Deprecated!
-h, --help Display help for the given command. When no command is given display help for the list command
-q, --quiet Do not output any message
-V, --version Display this application version
--ansi|--no-ansi Force (or disable --no-ansi) ANSI output
-n, --no-interaction Do not ask any interactive question
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
```

### Output Example

As a result of the validation process, you will receive a human-readable table with a list of errors found in the CSV file.
By defualt, the output format is a table, but you can choose from a variety of formats, such as text, GitHub, GitLab, TeamCity, JUnit, and more.
For example, the following output is generated using the "table" format.

**Note**. Output format for GitHub Actions is `github` by default. [GitHub Actions friendly](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message)
This allows you to see bugs in the GitHub interface at the PR level. That is, the error will be shown in a specific place in the CSV file. See screenshot.

```
CSV : ./tests/fixtures/demo.csv
Schema : ./tests/schemas/demo_invalid.yml
+------+------------------+--------------+-- demo.csv -------------------------------------------+
| Line | id:Column | Rule | Message |
+------+------------------+--------------+-------------------------------------------------------+
| 1 | 1: | csv.header | Property "name" is not defined in schema: |
| | | | "./tests/schemas/demo_invalid.yml" |
| 5 | 2:Float | max | Value "74605.944" is greater than "74605" |
| 5 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", |
| | | | "green", "Blue"] |
| 6 | 0:Name | min_length | Value "Carl" (legth: 4) is too short. Min length is 5 |
| 6 | 3:Birthday | min_date | Value "1955-05-14" is less than the minimum date |
| | | | "1955-05-15T00:00:00.000+00:00" |
| 8 | 3:Birthday | min_date | Value "1955-05-14" is less than the minimum date |
| | | | "1955-05-15T00:00:00.000+00:00" |
| 9 | 3:Birthday | max_date | Value "2010-07-20" is more than the maximum date |
| | | | "2009-01-01T00:00:00.000+00:00" |
| 11 | 0:Name | min_length | Value "Lois" (legth: 4) is too short. Min length is 5 |
| 11 | 4:Favorite color | allow_values | Value "blue" is not allowed. Allowed values: ["red", |
| | | | "green", "Blue"] |
+------+------------------+--------------+-- demo.csv -------------------------------------------+
CSV file is not valid! Found 9 errors.
```


### Schema Definition
Define your CSV validation schema in a YAML file.

This example defines a simple schema for a CSV file with a header row, specifying that the `id` column must not be empty and must contain integer values.
Also, it checks that the `name` column has a minimum length of 3 characters.

```yml
csv:
delimiter: ,
quote_char: \
enclosure: "\""

columns:
- name: id
rules:
not_empty: true
is_int: true

- name: name
rules:
min_length: 3

```

### Schema file examples

<details>
<summary>Click to see: YAML format (with comment)</summary>
In the [example Yml file](schema-examples/full.yml) you can find a detailed description of all features.

**Important notes**
* I have deliberately refused typing of columns and replaced them with rules,
which can be combined in any sequence and completely at your discretion.
This gives you great flexibility when validating CSV files.

* All fields (unless explicitly stated otherwise) are optional and you can choose not to declare them. Up to you.

```yml
# It's a full example of the CSV schema file in YAML format.
Expand Down Expand Up @@ -93,8 +260,6 @@ columns:

```

</details>


<details>
<summary>Click to see: JSON Format</summary>
Expand Down Expand Up @@ -212,13 +377,43 @@ return [
</details>


## Coming soon

* [ ] Filename pattern validation with regex (like "all files in the folder should be in the format 'YYYY-MM-DD.csv'").
* [ ] CSV/Schema file discovery in the folder with regex pattern (glob).
* [ ] Agregate rules (like "at least one of the fields should be not empty" or "all fields should be unique").
* [ ] Create CSV files based on the schema (like "create 1000 rows with random data").
* [ ] Multiple CSV files in one schema.
* [ ] Multiple schemas in one validation process.
* [ ] Parallel validation of really-really large files (1GB+ ?). I know you have them and less memory is better.
* [ ] Parallel validation of multiple files.
* [ ] Inheritance of schemas, rules and columns. Define parent schema and override some rules in the child schemas.
* [ ] More output formats (like JSON, XML, etc).
* [ ] Complex rules (like "if field A is not empty, then field B should be not empty too").
* [ ] Input encoding detection + BOM (right now it's experimental).
* [ ] Extending with custom rules and custom output formats.
* [ ] More examples and documentation.


## Unit tests and check code style
```sh
make update
make build
make test-all
```


### License

MIT


## See Also

- [CI-Report-Converter](https://github.com/JBZoo/CI-Report-Converter) - It converts different error reporting standards for deep compatibility with popular CI systems.
- [Composer-Diff](https://github.com/JBZoo/Composer-Diff) - See what packages have changed after `composer update`.
- [Composer-Graph](https://github.com/JBZoo/Composer-Graph) - Dependency graph visualization of composer.json based on mermaid-js.
- [Mermaid-PHP](https://github.com/JBZoo/Mermaid-PHP) - Generate diagrams and flowcharts with the help of the mermaid script language.
- [Utils](https://github.com/JBZoo/Utils) - Collection of useful PHP functions, mini-classes, and snippets for every day.
- [Image](https://github.com/JBZoo/Image) - Package provides object-oriented way to manipulate with images as simple as possible.
- [Data](https://github.com/JBZoo/Data) - Extended implementation of ArrayObject. Use files as config/array.
- [Retry](https://github.com/JBZoo/Retry) - Tiny PHP library providing retry/backoff functionality with multiple backoff strategies and jitter support.
1 change: 1 addition & 0 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ runs:
- ${{ inputs.schema }}
- '--output'
- ${{ inputs.output }}
- '--ansi'
- '-vvv'
2 changes: 1 addition & 1 deletion src/Csv/CsvFile.php
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ public function __construct(string $csvFilename, null|array|string $csvSchemaFil

public function getCsvFilename(): string
{
return \pathinfo((string)\realpath($this->csvFilename), \PATHINFO_BASENAME);
return $this->csvFilename;
}

public function getCsvStructure(): ParseConfig
Expand Down
11 changes: 8 additions & 3 deletions src/Validators/ErrorSuite.php
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ private function renderTable(): string
{
$buffer = new BufferedOutput();
$table = (new Table($buffer))
->setHeaderTitle($this->csvFilename)
->setFooterTitle($this->csvFilename)
->setHeaderTitle($this->getTestcaseName())
->setFooterTitle($this->getTestcaseName())
->setHeaders(['Line', 'id:Column', 'Rule', 'Message'])
->setColumnMaxWidth(0, 10)
->setColumnMaxWidth(1, 20)
Expand All @@ -154,7 +154,7 @@ private function renderTable(): string

private function prepareSourceSuite(): SourceSuite
{
$suite = new SourceSuite($this->csvFilename);
$suite = new SourceSuite($this->getTestcaseName());

foreach ($this->errors as $error) {
$caseName = $error->getRuleCode() . ' at column ' . $error->getColumnName();
Expand All @@ -166,4 +166,9 @@ private function prepareSourceSuite(): SourceSuite

return $suite;
}

private function getTestcaseName(): string
{
return \pathinfo((string)\realpath((string)$this->csvFilename), \PATHINFO_BASENAME);
}
}
4 changes: 2 additions & 2 deletions tests/Blueprint/CsvReaderTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ final class CsvReaderTest extends PHPUnit
public function testReadCsvFileWithoutHeader(): void
{
$csv = new CsvFile(self::CSV_SIMPLE_NO_HEADER, self::SCHEMA_SIMPLE_NO_HEADER);
isSame('simple_no_header.csv', $csv->getCsvFilename());
isSame(self::CSV_SIMPLE_NO_HEADER, $csv->getCsvFilename());

isSame([], $csv->getHeader());

Expand All @@ -50,7 +50,7 @@ public function testReadCsvFileWithoutHeader(): void
public function testReadCsvFileWithHeader(): void
{
$csv = new CsvFile(self::CSV_SIMPLE_HEADER, self::SCHEMA_SIMPLE_HEADER);
isSame('simple_header.csv', $csv->getCsvFilename());
isSame(self::CSV_SIMPLE_HEADER, $csv->getCsvFilename());

isSame(['seq', 'bool', 'exact'], $csv->getHeader());

Expand Down
4 changes: 3 additions & 1 deletion tests/Blueprint/MiscTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,9 @@ private function testCheckExampleInReadme(
$tmpl = \implode("\n", ["```{$type}", $filepath, '```']);
}

$tmpl = $this->getSpoiler("Click to see: {$title}", $tmpl);
if ($type !== 'yml') {
$tmpl = $this->getSpoiler("Click to see: {$title}", $tmpl);
}

isFileContains($tmpl, PROJECT_ROOT . '/README.md');
}
Expand Down
Loading

0 comments on commit fd3c2ed

Please sign in to comment.