Skip to content

Commit

Permalink
Fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
Denis Smet committed Mar 16, 2024
1 parent c6505f6 commit c7d4875
Show file tree
Hide file tree
Showing 2 changed files with 172 additions and 173 deletions.
341 changes: 170 additions & 171 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,176 @@ Integrating CSV validation into CI processes promotes higher data integrity, rel

Also see demo in the [GitHub Actions](https://github.com/JBZoo/Csv-Blueprint/actions/workflows/demo.yml) file.

### Schema Definition
Define your CSV validation schema in a YAML file.

This example defines a simple schema for a CSV file with a header row, specifying that the `id` column must not be empty and must contain integer values.
Also, it checks that the `name` column has a minimum length of 3 characters.

```yml
csv:
delimiter: ;

columns:
- name: id
rules:
not_empty: true
is_int: true

- name: name
rules:
min_length: 3

```

In the [example Yml file](schema-examples/full.yml) you can find a detailed description of all features.
It's also covered by tests, so it's always up-to-date.

**Important notes**
* I have deliberately refused typing of columns (like `type: integer`) and replaced them with rules,
which can be combined in any sequence and completely at your discretion.
This gives you great flexibility when validating CSV files.
* All fields (unless explicitly stated otherwise) are optional, and you can choose not to declare them. Up to you.
* You are always free to add your option anywhere (except the `rules` list) and it will be ignored. I find it convenient for additional integrations and customization.


### Schema file examples

Available formats: [YAML](schema-examples/full.yml), [JSON](schema-examples/full.json), [PHP](schema-examples/full.php).

```yml
# It's a full example of the CSV schema file in YAML format.

# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values
header: true # If the first row is a header. If true, name of each column is required.
delimiter: , # Delimiter character in CSV file.
quote_char: \ # Quote character in CSV file.
enclosure: "\"" # Enclosure for each field in CSV file.
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.

columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Optional. Description of the column. Not used in the validation process.

# Optional. You can use this section to validate each value in the column.
rules:
# Important notes:
# 1. All rules except "not_empty" ignored for empty strings (length 0).
# If the value must be non-empty, use "not_empty" as extra rule!
# 2. All rules don't depend on each other. They are independent.
# They know nothing about each other and cannot influence each other.
# 3. You can use the rules in any combination. Or not use any of them.
# They are grouped below simply for ease of navigation and reading.
# 4. If you see the value for the rule is "true" - that's just an enable flag.
# In other cases, these are rule parameters.
# 5. The order of rules execution is the same as in the scheme. But it doesn't matter.
# The result will be the same in any order.
# 6. Most of the rules are case-sensitive. Unless otherwise specified.
# 7. As backup plan, you alsways can use the "regex" rule.

# General rules
not_empty: true # Value is not an empty string. Actually checks if the string length is not 0.
exact_value: Some string # Case-sensitive. Exact value for string in the column.
allow_values: [ y, n, "" ] # Strict set of values that are allowed. Case-sensitive.

# Any valid regex pattern. See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php.
# Of course it's an ultimatum to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
regex: /^[\d]{2}$/

# Checks length of a string including spaces (multibyte safe).
length: 5
length_not: 4
length_min: 1
length_max: 10

# Basic string checks
is_trimed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_lowercase: true # String is only lower-case. Example: "hello world".
is_uppercase: true # String is only upper-case. Example: "HELLO WORLD".
is_capitalize: true # String is only capitalized. Example: "Hello World".

# Count number of words used in a string.
# Note that multibyte locales are not supported.
# Example: "Hello World, 123" - 2 words only (123 is not a word).
word_count: 5
word_count_not: 4
word_count_min: 1
word_count_max: 10

# Contains rules
contains: Hello # Case-sensitive. Example: "Hello World".
contains_one: [ a, b ] # At least one of the string must be in the CSV value. Case-sensitive.
contains_all: [ a, b, c ] # All the strings must be part of a CSV value. Case-sensitive.
starts_with: "prefix " # Case-sensitive. Example: "prefix Hello World".
ends_with: " suffix" # Case-sensitive. Example: "Hello World suffix".

# Under the hood it convertes and compares as float values.
# Comparison accuracy is 12 digits after a dot.
# Scientific number format is also supported. Example: "1.2e3".
num: 5
num_not: 4
num_min: 1
num_max: 10

# Number of digits after the decimal point (with zeros).
precision: 5
precision_not: 4
precision_min: 1
precision_max: 10

# Dates. Under the hood, the strings are converted to timestamp and compared.
# This gives you the ability to use relative dates and any formatting you want.
# By default, it works in UTC. But you can specify your own timezone as part of the date string.
# Format: https://www.php.net/manual/en/datetime.format.php.
# Parsing: https://www.php.net/manual/en/function.strtotime.php.
# Timezones: https://www.php.net/manual/en/timezones.php.
date: 01 Jan 2000 # You can use any string that can be parsed by the strtotime function.
date_not: 2006-01-02 15:04:05 -0700 Europe/Rome
date_min: +1 day # Examples of relative formats.
date_max: now # Examples of current date and time.
date_format: Y-m-d # Check strict format of the date.
is_date: true # Accepts arbitrary date format. Is shows error if failed to convert to timestamp.

# Specific formats
is_bool: true # Allow only boolean values "true" and "false", case-insensitive.
is_int: true # Check format only. Can be negative and positive. Without any separators.
is_float: true # Check format only. Can be negative and positive. Dot as decimal separator.
is_ip4: true # Only IPv4. Example: "127.0.0.1".
is_url: true # Only URL format. Example: "https://example.com/page?query=string#anchor".
is_email: true # Only email format. Example: "user@example.com".
is_domain: true # Only domain name. Example: "example.com".
is_uuid: true # Only UUID4 format. Example: "550e8400-e29b-41d4-a716-446655440000".
is_alias: true # Only alias format. Example: "my-alias-123". It can contain letters, numbers, and dashes.

# Geography
is_latitude: true # Can be integer or float. Example: 50.123456.
is_longitude: true # Can be integer or float. Example: -89.123456.
is_geohash: true # Check if the value is a valid geohash. Example: "u4pruydqqvj".
is_cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", "".
is_usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY".

# Optional. You can use this section to validate the whole column
# Be careful, this can reduce performance noticeably depending on the combination of rules.
aggregate_rules:
is_unique: true # All values in the column are unique.

- name: "another_column"

- name: "third_column"

- description: "Column with description only. Undefined header name."

```


### As GitHub Action

Expand Down Expand Up @@ -255,177 +425,6 @@ Optional format `text` with highlited keywords:
* Tools uses [JBZoo/CI-Report-Converter](https://github.com/JBZoo/CI-Report-Converter) as SDK to convert reports to different formats. So you can easily integrate it with any CI system.


### Schema Definition
Define your CSV validation schema in a YAML file.

This example defines a simple schema for a CSV file with a header row, specifying that the `id` column must not be empty and must contain integer values.
Also, it checks that the `name` column has a minimum length of 3 characters.

```yml
csv:
delimiter: ;

columns:
- name: id
rules:
not_empty: true
is_int: true

- name: name
rules:
min_length: 3

```

In the [example Yml file](schema-examples/full.yml) you can find a detailed description of all features.
It's also covered by tests, so it's always up-to-date.

**Important notes**
* I have deliberately refused typing of columns (like `type: integer`) and replaced them with rules,
which can be combined in any sequence and completely at your discretion.
This gives you great flexibility when validating CSV files.
* All fields (unless explicitly stated otherwise) are optional, and you can choose not to declare them. Up to you.
* You are always free to add your option anywhere (except the `rules` list) and it will be ignored. I find it convenient for additional integrations and customization.


### Schema file examples

Available formats: [YAML](schema-examples/full.yml), [JSON](schema-examples/full.json), [PHP](schema-examples/full.php).

```yml
# It's a full example of the CSV schema file in YAML format.

# Regular expression to match the file name. If not set, then no pattern check
# This way you can validate the file name before the validation process.
# Feel free to check parent directories as well.
# See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
filename_pattern: /demo(-\d+)?\.csv$/i

csv: # Here are default values. You can skip this section if you don't need to override the default values
header: true # If the first row is a header. If true, name of each column is required.
delimiter: , # Delimiter character in CSV file.
quote_char: \ # Quote character in CSV file.
enclosure: "\"" # Enclosure for each field in CSV file.
encoding: utf-8 # (Experimental) Only utf-8, utf-16, utf-32.
bom: false # (Experimental) If the file has a BOM (Byte Order Mark) at the beginning.

columns:
- name: "Column Name (header)" # Any custom name of the column in the CSV file (first row). Required if "csv_structure.header" is true.
description: "Lorem ipsum" # Optional. Description of the column. Not used in the validation process.

# Optional. You can use this section to validate each value in the column.
rules:
# Important notes:
# 1. All rules except "not_empty" ignored for empty strings (length 0).
# If the value must be non-empty, use "not_empty" as extra rule!
# 2. All rules don't depend on each other. They are independent.
# They know nothing about each other and cannot influence each other.
# 3. You can use the rules in any combination. Or not use any of them.
# They are grouped below simply for ease of navigation and reading.
# 4. If you see the value for the rule is "true" - that's just an enable flag.
# In other cases, these are rule parameters.
# 5. The order of rules execution is the same as in the scheme. But it doesn't matter.
# The result will be the same in any order.
# 6. Most of the rules are case-sensitive. Unless otherwise specified.
# 7. As backup plan, you alsways can use the "regex" rule.

# General rules
not_empty: true # Value is not an empty string. Actually checks if the string length is not 0.
exact_value: Some string # Case-sensitive. Exact value for string in the column.
allow_values: [ y, n, "" ] # Strict set of values that are allowed. Case-sensitive.

# Any valid regex pattern. See https://www.php.net/manual/en/reference.pcre.pattern.syntax.php.
# Of course it's an ultimatum to verify any sort of string data.
# Please, be careful. Regex is a powerful tool, but it can be very dangerous if used incorrectly.
# Remember that if you want to solve a problem with regex, you now have two problems.
regex: /^[\d]{2}$/

# Checks length of a string including spaces (multibyte safe).
length: 5
length_not: 4
length_min: 1
length_max: 10

# Basic string checks
is_trimed: true # Only trimed strings. Example: "Hello World" (not " Hello World ").
is_lowercase: true # String is only lower-case. Example: "hello world".
is_uppercase: true # String is only upper-case. Example: "HELLO WORLD".
is_capitalize: true # String is only capitalized. Example: "Hello World".

# Count number of words used in a string.
# Note that multibyte locales are not supported.
# Example: "Hello World, 123" - 2 words only (123 is not a word).
word_count: 5
word_count_not: 4
word_count_min: 1
word_count_max: 10

# Contains rules
contains: Hello # Case-sensitive. Example: "Hello World".
contains_one: [ a, b ] # At least one of the string must be in the CSV value. Case-sensitive.
contains_all: [ a, b, c ] # All the strings must be part of a CSV value. Case-sensitive.
starts_with: "prefix " # Case-sensitive. Example: "prefix Hello World".
ends_with: " suffix" # Case-sensitive. Example: "Hello World suffix".

# Under the hood it convertes and compares as float values.
# Comparison accuracy is 12 digits after a dot.
# Scientific number format is also supported. Example: "1.2e3".
num: 5
num_not: 4
num_min: 1
num_max: 10

# Number of digits after the decimal point (with zeros).
precision: 5
precision_not: 4
precision_min: 1
precision_max: 10

# Dates. Under the hood, the strings are converted to timestamp and compared.
# This gives you the ability to use relative dates and any formatting you want.
# By default, it works in UTC. But you can specify your own timezone as part of the date string.
# Format: https://www.php.net/manual/en/datetime.format.php.
# Parsing: https://www.php.net/manual/en/function.strtotime.php.
# Timezones: https://www.php.net/manual/en/timezones.php.
date: 01 Jan 2000 # You can use any string that can be parsed by the strtotime function.
date_not: 2006-01-02 15:04:05 -0700 Europe/Rome
date_min: +1 day # Examples of relative formats.
date_max: now # Examples of current date and time.
date_format: Y-m-d # Check strict format of the date.
is_date: true # Accepts arbitrary date format. Is shows error if failed to convert to timestamp.

# Specific formats
is_bool: true # Allow only boolean values "true" and "false", case-insensitive.
is_int: true # Check format only. Can be negative and positive. Without any separators.
is_float: true # Check format only. Can be negative and positive. Dot as decimal separator.
is_ip4: true # Only IPv4. Example: "127.0.0.1".
is_url: true # Only URL format. Example: "https://example.com/page?query=string#anchor".
is_email: true # Only email format. Example: "user@example.com".
is_domain: true # Only domain name. Example: "example.com".
is_uuid: true # Only UUID4 format. Example: "550e8400-e29b-41d4-a716-446655440000".
is_alias: true # Only alias format. Example: "my-alias-123". It can contain letters, numbers, and dashes.

# Geography
is_latitude: true # Can be integer or float. Example: 50.123456.
is_longitude: true # Can be integer or float. Example: -89.123456.
is_geohash: true # Check if the value is a valid geohash. Example: "u4pruydqqvj".
is_cardinal_direction: true # Valid cardinal direction. Examples: "N", "S", "NE", "SE", "none", "".
is_usa_market_name: true # Check if the value is a valid USA market name. Example: "New York, NY".

# Optional. You can use this section to validate the whole column
# Be careful, this can reduce performance noticeably depending on the combination of rules.
aggregate_rules:
is_unique: true # All values in the column are unique.

- name: "another_column"

- name: "third_column"

- description: "Column with description only. Undefined header name."

```


## Coming soon

It's random ideas and plans. No orderings and deadlines. <u>But batch processing is the priority #1</u>.
Expand Down
4 changes: 2 additions & 2 deletions src/Rules/Cell/IsDate.php
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ class IsDate extends AbstarctCellRule

public function validateRule(string $cellValue): ?string
{
if (!$this->tryToParseDate($cellValue)) {
if (!self::tryToParseDate($cellValue)) {
return "Value \"<c>{$cellValue}</c>\" is not a valid date.";
}

return null;
}

protected function tryToParseDate(string $cellValue): bool
protected static function tryToParseDate(string $cellValue): bool
{
if ($cellValue === '') {
return false;
Expand Down

0 comments on commit c7d4875

Please sign in to comment.