A schema store service that tracks and manages all the schemas used in the Data Pipeline
Switch branches/tags
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
acceptance/configs [DATAPIPE-529] Removing unwated file from commit Dec 11, 2015
acceptance_tests Set up acceptance tests. Apr 22, 2015
api_docs Merge branch 'DATAPIPE-1857_bowu_add_pagination_to_the_get_sources_by… Nov 1, 2016
bin adopt service to venv-update 1.0.1 May 9, 2016
docs/source apply apache license v2 to schematizer Nov 1, 2016
logs added logs folder and .keep file Jul 27, 2016
requirements.d cleaning up requirements Oct 5, 2016
schema apply apache license v2 to schematizer Nov 1, 2016
schematizer Merge branch 'change-log-latest-schema-to-log-latest-schema-id' Nov 18, 2016
schematizer_testing Merge branch 'master' into DATAPIPE-1319-Schematizer-changes-for-is-log Nov 3, 2016
serviceinitd apply apache license v2 to schematizer Nov 1, 2016
tests updated code to resolve merge conflicts. Nov 9, 2016
.dockerignore added retry to ensure the container is up and ready to use; remove ye… Aug 12, 2016
.gitignore adopt service to venv-update 1.0.1 May 9, 2016
.pre-commit-config.yaml Merge branch 'keegan_DATAPIPE-221_swagger_2_support' Sep 26, 2016
.rat-excludes apply apache license v2 to schematizer Nov 1, 2016
.travis.yml removed docker push steps from travis Nov 22, 2016
Dockerfile prepare schematizer project files for open sourcing Sep 30, 2016
Dockerfile-opensource fix up dockerfile-opensource Nov 17, 2016
LICENSE apply apache license v2 to schematizer Nov 1, 2016
Makefile add make target to push swagger spec to swagger registery in schematizer Nov 17, 2016
Makefile-opensource added support for pushing OS schematizer to docker registry Oct 3, 2016
NOTICE apply apache license v2 to schematizer Nov 1, 2016
README.md updated README Nov 15, 2016
config-env-dev.yaml commit the service template. Jan 23, 2015
config-open-source.yaml added support for pushing OS schematizer to docker registry Oct 3, 2016
config.yaml schematizer make test is failing in OS mode trying to find topology file Nov 17, 2016
connection_sets.yaml Fall cleaning of schematizer tox/make Oct 1, 2015
deploy-blacklist.txt commit the service template. Jan 23, 2015
docker-compose-opensource.yml removing comment from docker-compose-opensource Sep 30, 2016
docker-compose.yml prepare schematizer project files for open sourcing Sep 30, 2016
fig-tools-opensource.yml remove git tag from latestopensource schematizer image Oct 5, 2016
fig-tools.yml fixed the fig-tools.yml Aug 7, 2015
requirements-internal.txt added support for pushing OS schematizer to docker registry Oct 3, 2016
requirements.txt remove ref of yelpcorp in open source mode from tox Nov 17, 2016
schematizer.wsgi remove dependendancy from yelp_lib in schematizer Sep 10, 2016
setup.py rename readme to readme.md Nov 2, 2016
test_everything.sh apply apache license v2 to schematizer Nov 1, 2016
topology.yaml schematizer make test is failing in OS mode trying to find topology file Nov 17, 2016
tox-opensource.ini removed docker push steps from travis Nov 22, 2016
tox.ini add option to specify docker-compose file Nov 2, 2016

README.md

Schematizer

What is it?

The Schematizer is a schema store service that tracks and manages all the schemas used in the Data Pipeline and provides features like automatic documentation support. We use Apache Avro to represent our schemas.

Read More

How to download

git clone git@github.com:Yelp/schematizer.git

Tests

Running unit tests

make -f Makefile-opensource test

Running unit integration tests

make -f Makefile-opensource itest

Setup and Configuration

  1. Create a mysql database for Schematizer Service::
CREATE DATABASE <db_name> DEFAULT CHARACTER SET utf8;
  1. Create MySQL tables in <db_name> database for Schematizer Service::
cat schema/tables/*.sql | mysql <db_name>
  1. Create a topology.yaml file
topology:
-   cluster: <schematizer_cluster_name>
    replica: master
    entries:
        - charset: utf8
          use_unicode: true
          host: <db_ip>
          db: <db_name>
          user: <db_user>
          passwd: <db_password>
          port: <db_port>
  1. In config.yaml assign values to the following configs::
schematizer_cluster: <schematizer_cluster_name>

topology_path: /path/to/topology.yaml

Usage

Use serviceinitd/schematizer.py to start the Schematizer service.

Interactive directly with Schematizer Service.

Registering a schema::

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{
  "namespace": "test_namespace",
  "source_owner_email": "test@test.com",
  "source": "test_source",
  "contains_pii": false,
  "schema": "{\"type\":\"record\",\"namespace\":\"test_namespace\",\"source\":\"test_source\",\"name\":\"test_name\",\"doc\":\"test_doc\",\"fields\":[{\"type\":\"string\",\"doc\":\"test_doc1\",\"name\":\"key1\"},{\"type\":\"string\",\"doc\":\"test_doc2\",\"name\":\"key2\"}]}"
}' 'http://127.0.0.1:8888/v1/schemas/avro'

Getting Schema By ID::

curl -X GET --header 'Accept: text/plain' 'http://127.0.0.1:8888/v1/schemas/<schema_id>'

Interactive with Schematizer Service using Schematizer Client Lib.

Registering a schema::

from data_pipeline.schematizer_clientlib.schematizer import get_schematizer
test_avro_schema_json = {
    "type": "record",
    "namespace": "test_namespace",
    "source": "test_source",
    "name": "test_name",
    "doc": "test_doc",
    "fields": [
        {"type": "string", "doc": "test_doc1", "name": "key1"},
        {"type": "string", "doc": "test_doc2", "name": "key2"}
    ]
}
schema_info = get_schematizer().register_schema_from_schema_json(
    namespace="test_namespace",
    source="test_source",
    schema_json=test_avro_schema_json,
    source_owner_email="test@test.com",
    contains_pii=False
)

Getting Schema By ID::

from data_pipeline.schematizer_clientlib.schematizer import get_schematizer

schema_info = get_schematizer().get_schema_by_id(
    schema_id=schema_info.schema_id
)

Disclaimer

We're still in the process of setting up this service as a stand-alone. There may be additional work required to run a Schematizer instance and integrate with other applications.

License

Schematizer is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Contributing

Everyone is encouraged to contribute to Schematizer by forking the Github repository and making a pull request or opening an issue.