Skip to content

Commit

Permalink
Revamp CSV validation process in GitHub workflow (#15)
Browse files Browse the repository at this point in the history
  • Loading branch information
SmetDenis committed Mar 13, 2024
1 parent 9011753 commit 9da0d94
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 85 deletions.
109 changes: 43 additions & 66 deletions .github/workflows/demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,84 +29,67 @@ jobs:
name: All Report Types
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/checkout@v3

- name: Setup PHP
uses: shivammathur/setup-php@v2
- name: Valid CSV file
uses: jbzoo/csv-blueprint@master
with:
php-version: 8.3
tools: composer

- name: Build the Project
run: make update

- name: Report - Table (Default)
run: make demo-github --no-print-directory
continue-on-error: true

- name: Report - Text
run: REPORT=text make demo-github --no-print-directory
continue-on-error: true

- name: Report - Github Actions
run: REPORT=github make demo-github --no-print-directory
continue-on-error: true

- name: Report - GitLab
run: REPORT=gitlab make demo-github --no-print-directory
continue-on-error: true
csv: tests/fixtures/demo.csv
schema: tests/schemas/demo_valid.yml

- name: Report - TeamCity CI
run: REPORT=teamcity make demo-github --no-print-directory
- name: Invalid CSV file - Report as Text
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
report: text
continue-on-error: true

- name: Report - JUnit
run: REPORT=junit make demo-github --no-print-directory
- name: Invalid CSV file - Report as Table
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
report: table
continue-on-error: true


github-actions:
name: GitHub Actions
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Invalid CSV file - Report as GitHub Annotations
uses: jbzoo/csv-blueprint@master
with:
fetch-depth: 0
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
continue-on-error: true

- name: 👍 Valid CSV file
- name: Invalid CSV file - TeamCity
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/demo.csv
schema: tests/schemas/demo_valid.yml
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
report: teamcity
continue-on-error: true

- name: 👎 Invalid CSV file - Report as GitHub Annotations
- name: Invalid CSV file - Gitlab
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/demo.csv
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
report: gitlab
continue-on-error: true

- name: 👎 Invalid CSV file - Report as Table
- name: Invalid CSV file - JUnit
uses: jbzoo/csv-blueprint@master
with:
csv: tests/fixtures/demo.csv
csv: tests/fixtures/batch/*.csv
schema: tests/schemas/demo_invalid.yml
report: table
report: junit
continue-on-error: true


docker:
name: Docker Hub
name: Docker
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/checkout@v3

- name: Pull Docker Image
run: docker pull jbzoo/csv-blueprint
Expand All @@ -117,7 +100,7 @@ jobs:
-v `pwd`:/parent-host \
--rm jbzoo/csv-blueprint \
validate:csv \
--csv=/parent-host/tests/fixtures/demo.csv \
--csv=/parent-host/tests/fixtures/batch/*.csv \
--schema=/parent-host/tests/schemas/demo_valid.yml
- name: 👎 Invalid CSV file
Expand All @@ -126,7 +109,7 @@ jobs:
-v `pwd`:/parent-host \
--rm jbzoo/csv-blueprint \
validate:csv \
--csv=/parent-host/tests/fixtures/demo.csv \
--csv=/parent-host/tests/fixtures/batch/*.csv \
--schema=/parent-host/tests/schemas/demo_invalid.yml
continue-on-error: true

Expand All @@ -135,10 +118,7 @@ jobs:
name: Phar
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/checkout@v3

- name: Setup PHP
uses: shivammathur/setup-php@v2
Expand All @@ -153,14 +133,14 @@ jobs:
run: |
./build/csv-blueprint.phar \
validate:csv \
--csv=./tests/fixtures/demo.csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_valid.yml
- name: 👎 Invalid CSV file
run: |
./build/csv-blueprint.phar \
validate:csv \
--csv=./tests/fixtures/demo.csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_invalid.yml
continue-on-error: true

Expand All @@ -169,10 +149,7 @@ jobs:
name: Pure PHP
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/checkout@v3

- name: Setup PHP
uses: shivammathur/setup-php@v2
Expand All @@ -187,13 +164,13 @@ jobs:
run: |
./csv-blueprint \
validate:csv \
--csv=./tests/fixtures/demo.csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_valid.yml
- name: 👎 Invalid CSV file
run: |
./csv-blueprint \
validate:csv \
--csv=./tests/fixtures/demo.csv \
--csv=./tests/fixtures/batch/*.csv \
--schema=./tests/schemas/demo_invalid.yml
continue-on-error: true
38 changes: 35 additions & 3 deletions .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,44 @@ on:
types: [ published ]
push:
branches:
- 'master'
- "*"
tags:
- "*.*.*"
- "*.*"
- "*"
pull_request:
branches:
- "master"

jobs:
docker:
name: Docker
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: jbzoo/csv-blueprint
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}
type=sha
- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Login to Docker Hub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
Expand All @@ -33,5 +63,7 @@ jobs:
- name: Build and push
uses: docker/build-push-action@v5
with:
push: true
tags: jbzoo/csv-blueprint:latest
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
48 changes: 33 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Also see demo in the [GitHub Actions](https://github.com/JBZoo/Csv-Blueprint/act
with:
csv: tests/fixtures/demo.csv
schema: tests/schemas/demo_invalid.yml
report: table # Optional. Default is "github"
report: table # Optional. Default is "github". Available options: text, table, github
```
**Note**. Report format for GitHub Actions is `github` by default. [GitHub Actions friendly](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-warning-message).

Expand Down Expand Up @@ -463,30 +463,48 @@ return [

It's random ideas and plans. No orderings and deadlines. <u>But batch processing is the priority #1</u>.

Batch processing
* [x] CSV/Schema file discovery in the folder with regex filename pattern (like `glob(./**/dir/*.csv)`).
* [x] If option `--csv` is a folder, then validate all files in the folder.
* [x] Checking multiple CSV files in one schema. Batch processing.
* [ ] Filename pattern validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").
* [ ] Quick stop mode. If the first error is found, then stop the validation process to save time.
* [ ] S3 Storage support. Validate files in the S3 bucket?
* [ ] Build phar file and release via GitHub Actions.
* [x] Checking multiple CSV files in one schema.
* [ ] Quick stop flag. If the first error is found, then stop the validation process to save time.
* [ ] Using multiple schemas for one csv file.
* [ ] If option `--csv` is not specified, then the STDIN is used. To build a pipeline in Unix-like systems.
* [ ] If option `--schema` is not specified, then validate only super base level things (like "is it a CSV file?").

Validation
* [ ] Filename pattern validation with regex (like "all files in the folder should be in the format `/^[\d]{4}-[\d]{2}-[\d]{2}\.csv$/`").
* [ ] Agregate rules (like "at least one of the fields should be not empty" or "all values must be unique").
* [ ] Create CSV files based on the schema (like "create 1000 rows with random data based on schema and rules").
* [ ] Using multiple schemas for one csv file. Batch processing.
* [ ] Parallel validation of really-really large files (1GB+ ?). I know you have them and not so much memory.
* [ ] Parallel validation of multiple files at once.
* [ ] Benchmarks as part of the CI process and Readme. It's important to know how much time the validation process takes.
* [ ] Inheritance of schemas, rules and columns. Define parent schema and override some rules in the child schemas. Make it DRY and easy to maintain.
* [ ] More report formats (like JSON, XML, etc). Any ideas?
* [ ] If option `--schema` is not specified, then validate only super base level things (like "is it a CSV file?").
* [ ] Complex rules (like "if field `A` is not empty, then field `B` should be not empty too").
* [ ] Input encoding detection + `BOM` (right now it's experimental). It works but not so accurate... UTF-8/16/32 is the best choice for now.
* [ ] Gitlab and JUnit reports mus be as one structure. It's not so easy to implement. But it's a good idea.
* [ ] Extending with custom rules and custom report formats. Plugins?
* [ ] Input encoding detection + `BOM` (right now it's experimental). It works but not so accurate... UTF-8/16/32 is the best choice for now.

Release workflow
* [ ] Build and release Docker image [via GitHub Actions, tags and labels](https://docs.docker.com/build/ci/github-actions/manage-tags-labels/).
* [ ] Upgrading to PHP 8.3.x
* [ ] Build phar file and release via GitHub Actions.

Performance and optimization
* [ ] Parallel validation of really-really large files (1GB+ ?). I know you have them and not so much memory.
* [ ] Parallel validation of multiple files at once.
* [ ] Benchmarks as part of the CI(?) and Readme. It's important to know how much time the validation process takes.
* [ ] Optimazation on `php.ini` level to start it faster. JIT.

Mock data generation
* [ ] Create CSV files based on the schema (like "create 1000 rows with random data based on schema and rules").
* [ ] Use [Faker](https://github.com/FakerPHP/Faker) for random data generation.

Reporting
* [ ] More report formats (like JSON, XML, etc). Any ideas?
* [ ] Gitlab and JUnit reports must be as one structure. It's not so easy to implement. But it's a good idea.

Misc
* [ ] Use it as PHP SDK. Examples in Readme.
* [ ] S3 Storage support. Validate files in the S3 bucket?
* [ ] More examples and documentation.


PS. [There is a file](tests/schemas/example_full.yml) with my ideas and imagination.
I'm not sure if I will implement all of them. But I will try to do my best.

Expand Down
2 changes: 1 addition & 1 deletion tests/Blueprint/CommandsTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ public function testCreateValidateNegativeText(): void
$rootPath = PROJECT_ROOT;

[$actual, $exitCode] = $this->virtualExecution('validate:csv', [
'csv' => './tests/fixtures/demo.csv',
'csv' => './tests/**/demo.csv',
'schema' => './tests/schemas/demo_invalid.yml',
'report' => 'text',
]);
Expand Down

0 comments on commit 9da0d94

Please sign in to comment.