Skip to content

Commit

Permalink
updated specs, removed the foreign keys hack, updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
OriHoch committed Jul 6, 2017
1 parent cd378b9 commit 3291756
Show file tree
Hide file tree
Showing 8 changed files with 449 additions and 226 deletions.
241 changes: 171 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,95 +10,159 @@
A utility library for working with [Table Schema](https://specs.frictionlessdata.io/table-schema/) in php.


## Features
## Features summary and Usage guide

### Installation

```bash
$ composer require frictionlessdata/tableschema
```

### Schema

A model of a schema with helpful methods for working with the schema and supported data.
Schema class provides helpful methods for working with a table schema and related data.

`use frictionlessdata\tableschema\Schema;`

Schema objects can be constructed using any of the following:

* php object
```php
$schema = new Schema((object)[
'fields' => [
(object)[
'name' => 'id', 'title' => 'Identifier', 'type' => 'integer',
'constraints' => (object)[
"required" => true,
"minimum" => 1,
"maximum" => 500
]
],
(object)['name' => 'name', 'title' => 'Name', 'type' => 'string'],
],
'primaryKey' => 'id'
]);
```

* string containing json
* string containg value supported by [file_get_contents](http://php.net/manual/en/function.file-get-contents.php)
```php
$schema = new Schema("{
\"fields\": [
{\"name\": \"id\"},
{\"name\": \"height\", \"type\": \"integer\"}
]
}");
```

You can use the Schema::validate static function to load and validate a schema.
It returns a list of loading or validation errors encountered.
* string containg value supported by [file_get_contents](http://php.net/manual/en/function.file-get-contents.php)
```
$schema = new Schema("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/ecf1b2504332852cca1351657279901eca6fdbb5/datasets/synthetic/schema.json");
```

### Table
The schema is loaded, parsed and validated and will raise exceptions in case of any problems.

Provides methods for loading any fopen compatible data source and iterating over the data.
access the schema data, which is ensured to conform to the specs.

* Data is validated according to a given table schema
* Data is converted to native types according to the schema
```
$schema->missingValues(); // [""]
$schema->primaryKey(); // ["id"]
$schema->foreignKeys(); // []
$schema->fields(); // ["id" => IntegerField, "name" => StringField]
$field = $schema->field("id");
$field("id")->format(); // "default"
$field("id")->name(); // "id"
$field("id")->type(); // "integer"
$field("id")->constraints(); // (object)["required"=>true, "minimum"=>1, "maximum"=>500]
$field("id")->enum(); // []
$field("id")->required(); // true
$field("id")->unique(); // false
```

validate function accepts the same arguemnts as the Schema constructor but returns a list of errors instead of raising exceptions
```
// validate functions accepts the same arguments as the Schema constructor
$validationErrors = Schema::validate("http://invalid.schema.json");
foreach ($validationErrors as $validationError) {
print(validationError->getMessage();
};
```

## Important Notes
validate and cast a row of data according to the schema
```
$row = $schema->castRow(["id" => "1", "name" => "First Name"]);
```

- Table schema is in transition to v1 - but many datapackage in the wild are still pre-v1
- At the moment I am developing this library with support only for v1
- See [this Gitter discussion](https://gitter.im/frictionlessdata/chat?at=58df75bfad849bcf423e5d80) about this transition
will raise exception if row fails validation

it returns the row with all native values

## Getting Started
```
$row // ["id" => 1, "name" => "First Name"];
```

### Installation
validate the row to get a list of errors

```bash
$ composer require frictionlessdata/tableschema
```
$schema->validateRow(["id" => "foobar"]); // ["id is not numeric", "name is required" .. ]
```

### Usage
### Table

```php
Table class allows to iterate over data conforming to a table schema


instantiate a Table object based on a data source and a table schema.

```
use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Table;
$dataSource = new CsvDataSource("tests/fixtures/data.csv");
$schema = new Schema((object)[
'fields' => [
(object)['name' => 'first_name'],
(object)['name' => 'last_name'],
(object)['name' => 'order'],
]
]);
$table = new Table($dataSource, $schema);
```

// construct schema from json string
$schema = new Schema('{
"fields": [
{"name": "id"},
{"name": "height", "type": "integer"}
]
}');
iterate over the data, all the values are cast and validated according to the schema
```
foreach ($table as $row) {
print($row["order"]." ".$row["first_name"]." ".$row["last_name"]."\n");
};
```

validate function will validate the schema and get some sample of the data itself to validate it as well

```
Table::validate(new CsvDataSource("http://invalid.data.source/"), $schema);
```

// schema will be parsed and validated against the json schema (under src/schemas/table-schema.json)
// will raise exception in case of validation error
### InferSchema

// access in php after validation
$schema->descriptor->fields[0]->name == "id"
InferSchema class allows to infer a schema based on a sample of the data

// validate a schema from a remote resource and getting list of validation errors back
$validationErrors = tableschema\Schema::validate("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/ecf1b2504332852cca1351657279901eca6fdbb5/datasets/synthetic/schema.json");
foreach ($validationErrors as $validationError) {
print(validationError->getMessage();
```
use frictionlessdata\tableschema\InferSchema;
use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\Table;
$dataSource = new CsvDataSource("tests/fixtures/data.csv");
$schema = new InferSchema();
$table = new Table($dataSource, $schema);
if (Table::validate($dataSource, $schema) == []) {
var_dump($schema->fields()); // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]
};
```

more control over the infer process

// validate and cast a row according to schema
$schema = new Schema('{"fields": ["name": "id", "type": "integer"]}');
$row = $schema->castRow(["id" => "1"]);
// raise exception if row fails validation
// returns row with all native values

// validate a row
$validationErrors = $schema->validateRow(["id" => "foobar"]);
// error that id is not numeric

// iterate over a remote data source conforming to a table schema
$table = new tableschema\Table(
new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv"),
new tableschema\Schema("http://www.example.com/data-schema.json")
);
foreach ($table as $person) {
print($person["first_name"]." ".$person["last_name"]);
}

// validate a remote data source
$validationErrors = tableschema\Table::validate($dataSource, $schema);
print(tableschema\SchemaValidationError::getErrorMessages($validationErrors));

// infer schema of a remote data source
$dataSource = new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv");
$schema = new tableschema\InferSchema();
$table = new tableschema\Table($dataSource, $schema);
```
foreach ($table as $row) {
var_dump($row); // row will be in inferred native values
var_dump($schema->descriptor()); // will contain the inferred schema descriptor
Expand All @@ -108,25 +172,62 @@ foreach ($table as $row) {
// it returns all the rows received until the lock, casted to the final inferred schema
// you may now continue to iterate over the rest of the rows
};
```

### EditableSchema

// schema creation, editing and saving
EditableSchema extends the Schema object with editing capabilities

```
use frictionlessdata\tableschema\EditableSchema;
use frictionlessdata\tableschema\Fields\FieldsFactory;
// EditableSchema extends the Schema object with editing capabilities
$schema = new EditableSchema();
// set fields
```

edit fields
```
$schema->fields([
"id" => FieldsFactory::field((object)["name" => "id", "type" => "integer"])
"id" => (object)["type" => "integer"],
"name" => (object)["type" => "string"],
]);
// remove field
$schema->removeField("age");
// edit primaryKey
```

appropriate field object is created according to the given descriptor
```
$schema->field("id"); // IntegerField object
```

add / update or remove fields

```
$schema->field("email", (object)["type" => "string", "format" => "email"]);
$schema->field("name", (object)["type" => "string"]);
$schema->removeField("name");
```

set or update other table schema attributes
```
$schema->primaryKey(["id"]);
```


// after every change - schema is validated and will raise Exception in case of validation errors
// finally, you can save the schema to a json file
after every change - schema is validated and will raise Exception in case of validation errors

finally, save the schema to a json file

```
$schema->save("my-schema.json");
```


## Important Notes

- Table schema is in transition to v1 - but many datapackage in the wild are still pre-v1
- At the moment I am developing this library with support only for v1
- See [this Gitter discussion](https://gitter.im/frictionlessdata/chat?at=58df75bfad849bcf423e5d80) about this transition


## Contributing

Please read the contribution guidelines: [How to Contribute](CONTRIBUTING.md)
12 changes: 10 additions & 2 deletions src/EditableSchema.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

namespace frictionlessdata\tableschema;

use frictionlessdata\tableschema\Fields\BaseField;
use frictionlessdata\tableschema\Fields\FieldsFactory;

class EditableSchema extends Schema
{
public function __construct($descriptor = null)
Expand All @@ -12,9 +15,14 @@ public function __construct($descriptor = null)
public function fields($newFields = null)
{
if (!is_null($newFields)) {
$this->fieldsCache = $newFields;
$this->descriptor()->fields = [];
foreach ($newFields as $field) {
$this->fieldsCache = [];
foreach ($newFields as $name => $field) {
if (!is_a($field, "frictionlessdata\\tableschema\\Fields\\BaseField")) {
if (!isset($field->name)) $field->name = $name;
$field = FieldsFactory::field($field);
}
$this->fieldsCache[$name] = $field;
$this->descriptor()->fields[] = $field->descriptor();
}

Expand Down
18 changes: 0 additions & 18 deletions src/SchemaValidator.php
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ protected function validateSchema()
// Validate
$validator = new \JsonSchema\Validator();
$descriptor = json_decode(json_encode($this->descriptor));
$this->applyForeignKeysResourceHack($descriptor);
$validator->validate(
$descriptor,
(object) ['$ref' => 'file://'.realpath(dirname(__FILE__)).'/schemas/table-schema.json']
Expand Down Expand Up @@ -113,21 +112,4 @@ protected function validateKeys()
}
}
}

protected function applyForeignKeysResourceHack($descriptor)
{
if (isset($descriptor->foreignKeys) && is_array($descriptor->foreignKeys)) {
foreach ($descriptor->foreignKeys as $foreignKey) {
// the resource field of foreign keys has problems validating as standard uri string
// we just override the validation entirely by placing a valid uri
if (
is_object($foreignKey)
&& isset($foreignKey->reference) && is_object($foreignKey->reference)
&& isset($foreignKey->reference->resource) && !empty($foreignKey->reference->resource)
) {
$foreignKey->reference->resource = 'void://'.$foreignKey->reference->resource;
}
}
}
}
}
3 changes: 3 additions & 0 deletions src/schemas/CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
2017-07-06T17:16:11+03:00
* enum and minimum / maximum values were modified to be limited to specific types
* foreignKeys.items[].reference.resource - removed uri format constraint
1 change: 0 additions & 1 deletion src/schemas/LAST_UPDATE

This file was deleted.

0 comments on commit 3291756

Please sign in to comment.