From c716c82d7a642089a0d6c16912a3c05bf0395c73 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Thu, 9 Feb 2017 15:21:58 +0000 Subject: [PATCH 01/19] fixing code sippets and more validation info --- docs/source/testing.rst | 10 ++ docs/source/validation.rst | 191 ++++++++++++++++++++++++++++++++++++- 2 files changed, 200 insertions(+), 1 deletion(-) diff --git a/docs/source/testing.rst b/docs/source/testing.rst index 7de40d6b5d..8e7732c8f3 100644 --- a/docs/source/testing.rst +++ b/docs/source/testing.rst @@ -16,6 +16,8 @@ classes that don't need parameters set. Tests are run with: +.. code:: shell + make etl-unittest API unit tests @@ -24,6 +26,10 @@ API unit tests API unit tests make sure the observatory-extension, which reads data and metadata from the ETL, are working right. +.. code:: shell + + make extension-unittest + ``TODO`` Integration tests @@ -32,6 +38,10 @@ Integration tests Integration tests ensure that the data from the ETL that is set for deployment is is able to return a measure for every piece of metadata. +.. code:: shell + + make extension-autotest + ``TODO`` Diagnosing common issues in integration tests diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 54d91a226d..99ce85dfa3 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -37,6 +37,8 @@ task once for each geography level, year, and month in that year. A generic example of using a :ref:`luigi.WrapperTask`: +.. code:: python + from luigi import WrapperTask, Task, Parameter class MyTask(Task): @@ -75,6 +77,8 @@ parameters, it would be possible to run a task redundantly. An example of this: +.. code:: python + from tasks.util import DownloadUnzipTask class MyBadTask(DownloadUnzipTask): @@ -96,6 +100,8 @@ Use default parameter values sparingly The above bad practice is easily paired with setting default values for parameters. For example: +.. code:: python + from tasks.util import DownloadUnzipTask class MyBadTask(DownloadUnzipTask): @@ -141,6 +147,8 @@ you store a meaningful and unique ``geom_ref`` from the same table. For example: +.. code:: python + from tasks.util import ColumnsTask from tasks.meta import OBSColumn, GEOM_REF from luigi import Parameter @@ -170,6 +178,23 @@ for both the ``geom`` and the ``geomref``. If the ``+ '+id'`` concatenation were missing, it would mean that the metadata model would not properly link geomrefs to the geometries they refer to. +Specify section, subsection, source tags and license tags for all columns +**************************************************** + +When defining your :ref:`tasks.meta.OBSColumn` objects in +a :ref:`tasks.util.ColumnsTask` class, make sure each column is assigned +a :ref:`tasks.util.OBSTag` of ``type`` ``section``, ``subsection``, ``source``, +and ``license``. Use shared tags from :ref:`tasks.tags` when possible, in +particular for ``section`` and ``subsection``. + +Specify unit tags for all measure columns +***************************************** + +When defining a :ref:`tasks.meta.OBSColumn` that will hold a measurement, make +sure to define a ``unit`` using a tag. This could be something like +``people``, ``money``, etc. There are standard units accessible in +:ref:`tasks.tags`. + Making sure ETL code works right -------------------------------- @@ -192,8 +217,12 @@ Provided :ref:`tasks.util.TableTask` and `tasks.util.ColumnTask` classes were executed, it's wise to jump into the database and check to make sure entries were made in those tables. +.. code:: shell + make psql +.. code:: sql + SELECT COUNT(*) FROM observatory.obs_column WHERE id LIKE 'path.to.module.%'; SELECT COUNT(*) FROM observatory.obs_table WHERE id LIKE 'path.to.module.%'; @@ -207,7 +236,7 @@ Delete old data to start from scratch to make sure everything works When using the proper utility classes, your data on disk, for example from downloads that are part of the ETL, will be saved to a file or folder -`tmp/module.name/ClassName_Args`. +``tmp/module.name/ClassName_Args``. In order to make sure the ETL is reproduceable, it's wise to delete this folder or move it to another location after development, and re-run to make @@ -219,4 +248,164 @@ Making sure metadata works right Checking the metadata works right is one of the more challenging components of QA'ing new ETL code. +Regenerate the ``obs_meta`` table +********************************* + +The ``obs_meta`` table is a denormalized view of the underlying :ref:`metadata` +objects that you've created when running tasks. + +You can force the regeneration of this table using +:ref:`tasks.carto.OBSMetaToLocal` + +.. code:: shell + + make -- run carto OBSMetaToLocal + +Once the table is generated, you can take a look at it in SQL: + +.. code:: shell + + make psql + +If the metadata is working correctly, you should have more entries in +``obs_meta`` than before. If you were starting from nothing, there should be +more than 0 rows in the table. + +.. code:: sql + + SELECT COUNT(*) FROM observatory.obs_meta; + +If you already had data, you can filter ``obs_meta`` to look for new rows with +a schema corresponding to what you added. For example, if you added metadata +columns and tables in ``tasks/mx/inegi``, you should look for columns with that +schema: + +.. code:: sql + + SELECT COUNT(*) FROM observatory.obs_meta WHERE numer_id LIKE 'mx.inegi.%'; + +If nothing is appearing in ``obs_meta``, chances are you are missing some +metadata: + +Have you defined and executed a :ref:`tasks.util.TableTask` that links to your columns? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can check to see if these links exist by checking ``obs_column_table``: + +.. code:: shell + + make psql + +.. code:: sql + + SELECT COUNT(*) FROM observatory.obs_column_table + WHERE column_id LIKE 'my.schema.%' + AND table_id LIKE 'my.schema.%'; + +If they don't exist, make sure that your Python code roughly corresponds to: + +.. code:: python + + from tasks.util import ColumnsTask, TableTask + + class MyColumnsTask(ColumnsTask): + + def columns(self): + # Return OrderdDict of columns here + + class MyTableTask(TableTask): + + def timespan(self): + # Return timespan here + + def requires(self): + return { + 'columns': MyColumnsTask() + } + + def columns(self): + return self.input()['columns'] + + def populate(self): + # Populate the output table here + +Unless the :ref:`TableTask` returns some of the columns from :ref:`ColumnsTask` +in its own ``columns`` method, the links will not be initialized properly. + +Finally, double check that you actually ran the :ref:`TableTask` using ``make +-- run my.schema MyTableTask``. + +Are you defining ``geom_ref`` relationships properly? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In cases where a :ref:`TableTask` does not have its own geometries, at least +one of the columns returned from its ``columns`` method needs to be in +a ``geom_ref`` relationship. Here's an example: + +.. code:: python + + from collections import OrderedDict + + from tasks.util import ColumnsTask, TableTask + from tasks.meta import OBSColumn, GEOM_REF + + class MyGeomColumnsTask(ColumnsTask): + + def columns(self): + geom = OBSColumn(type='Geometry', name='My geom') + geomref = OBSColumn(type='Text', targets={ + geom: GEOM_REF + }) + return OrderedDict([ + ('geom', geom), + ('geomref', geomref) + ]) + + class MyColumnsTask(ColumnsTask): + + def columns(self): + # Return OrderdDict of columns here + + class MyTableTask(TableTask): + + def timespan(self): + # Return timespan here + + def requires(self): + return { + 'geom_columns': MyGeomColumnsTask(), + 'data_columns': MyColumnsTask() + } + + def columns(self): + cols = OrderedDict() + cols['geomref'] = self.input()['geom_columns']['geomref'] + cols.update(self.input()['data_columns']) + return cols + + def populate(self): + # Populate the output table here + +The above code would ensure that all columns existing inside ``MyTableTask`` +would be appropriately linked to any geometries that connect to ``geomref``. + +Regenerate and look at the Catalog +********************************** + +Once :ref:`tasks.carto.OBSMetaToLocal` has been run, you can generate the +catalog. + +.. code:: shell + + make catalog + +1. Are there any nasty typos or missing data? +2. Does the nesting look right? Are there columns not nested? +3. Are sources and licenses populated for all measures? +4. Is a table with a boundary/timespan matrix appearing beneath each measure? + +Upload to a test CARTO server +***************************** + +``TODO`` From 9a3ef4d56d332f5b91b7b129a4c01e5b19a1b507 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Fri, 10 Feb 2017 11:26:42 -0500 Subject: [PATCH 02/19] code block --- docs/source/development.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/development.rst b/docs/source/development.rst index beae0b1a4d..b4b2c1ef4d 100644 --- a/docs/source/development.rst +++ b/docs/source/development.rst @@ -80,6 +80,7 @@ overwriting all output it has already created. For example, if you have a :ref:`tasks.util.TempTableTask` that you've modified in the course of development and need to re-run: +``` from tasks.util import TempTableTask from tasks.meta import current_session @@ -90,6 +91,7 @@ development and need to re-run: session.execute(''' CREATE TABLE {} AS SELECT 'foo' AS mycol; ''') +``` Running ``make -- run path.to.module MyTempTable`` will only work once, even after making changes to the ``run`` method. From 9f3b494ed064d4d57db79d0ab3bbf43c6669b556 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Fri, 10 Feb 2017 11:33:32 -0500 Subject: [PATCH 03/19] several methods for re-running parts of ETL --- docs/source/development.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/development.rst b/docs/source/development.rst index b4b2c1ef4d..ee3d2cb1ef 100644 --- a/docs/source/development.rst +++ b/docs/source/development.rst @@ -64,12 +64,13 @@ over again. These tasks are meant to take care of the most repetitive aspects. :members: :show-inheritance: -Running pieces of the ETL +Running and Re-Running Pieces of the ETL ------------------------- When doing local development, it's advisable to run small pieces of the ETL locally to make sure everything works correctly. You can use the ``make -- -run`` helper, documented in :ref:`run-any-task`. +run`` helper, documented in :ref:`run-any-task`. There are several methods for +re-running pieces of the ETL depending on the task and are described below: Using ``--force`` during development ************************************ From a64c0275762bc180f558604d6fee96015c8dcfff Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 10:48:58 -0500 Subject: [PATCH 04/19] change to rst codeblock --- docs/source/development.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/development.rst b/docs/source/development.rst index ee3d2cb1ef..3e3186bca7 100644 --- a/docs/source/development.rst +++ b/docs/source/development.rst @@ -81,7 +81,7 @@ overwriting all output it has already created. For example, if you have a :ref:`tasks.util.TempTableTask` that you've modified in the course of development and need to re-run: -``` +.. code:: python from tasks.util import TempTableTask from tasks.meta import current_session @@ -92,7 +92,6 @@ development and need to re-run: session.execute(''' CREATE TABLE {} AS SELECT 'foo' AS mycol; ''') -``` Running ``make -- run path.to.module MyTempTable`` will only work once, even after making changes to the ``run`` method. From 4b3e653ae447a24cb61166548c2df4a3d1d74391 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 14:15:51 -0500 Subject: [PATCH 05/19] add `:ref:` to tasks.util.ColumnTask --- docs/source/validation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 99ce85dfa3..38c60fd6a6 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -213,7 +213,7 @@ When you use :ref:`run-any-task` to run individual components: * Are tables and columns being added to the ``observatory.obs_table`` and ``observatory.obs_column`` metadata tables? -Provided :ref:`tasks.util.TableTask` and `tasks.util.ColumnTask` classes were +Provided :ref:`tasks.util.TableTask` and :ref:`tasks.util.ColumnTask` classes were executed, it's wise to jump into the database and check to make sure entries were made in those tables. From 40bb0a773027bc57c7a7593736dbfbcd103d7c38 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 17:29:39 -0500 Subject: [PATCH 06/19] reformat "MyGeoColumnsTask" --- docs/source/validation.rst | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 38c60fd6a6..f77c81104d 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -287,7 +287,7 @@ schema: If nothing is appearing in ``obs_meta``, chances are you are missing some metadata: -Have you defined and executed a :ref:`tasks.util.TableTask` that links to your columns? +Have you defined and executed a proper :ref:`tasks.util.TableTask`? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can check to see if these links exist by checking ``obs_column_table``: @@ -349,17 +349,20 @@ a ``geom_ref`` relationship. Here's an example: from tasks.util import ColumnsTask, TableTask from tasks.meta import OBSColumn, GEOM_REF - class MyGeomColumnsTask(ColumnsTask): - + class MyGeoColumnsTask(ColumnsTask): def columns(self): - geom = OBSColumn(type='Geometry', name='My geom') - geomref = OBSColumn(type='Text', targets={ - geom: GEOM_REF - }) - return OrderedDict([ - ('geom', geom), - ('geomref', geomref) - ]) + + geom = OBSColumn( + type='Geometry') + + geomref = OBSColumn( + type='Text', + targets={geom: GEOM_REF}) + + return OrderedDict([ + ('geom', geom), + ('geomref', geomref) + ]) class MyColumnsTask(ColumnsTask): @@ -373,7 +376,7 @@ a ``geom_ref`` relationship. Here's an example: def requires(self): return { - 'geom_columns': MyGeomColumnsTask(), + 'geom_columns': MyGeoColumnsTask(), 'data_columns': MyColumnsTask() } @@ -389,6 +392,8 @@ a ``geom_ref`` relationship. Here's an example: The above code would ensure that all columns existing inside ``MyTableTask`` would be appropriately linked to any geometries that connect to ``geomref``. + + Regenerate and look at the Catalog ********************************** From 5d9c5252baa050355776d4e05b4042216d66b4f4 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 17:37:32 -0500 Subject: [PATCH 07/19] typo fix --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d7fdd4f41f..c748f23613 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Docker is available. [tasks/meta.py](https://github.com/CartoDB/bigmetadata/blob/master/tasks/meta.py#L76). There are six related tables, `obs_table`, `obs_column_table`, `obs_column`, `obs_column_tag`, `obs_tag`, and `obs_column_to_column`. An overarching - denomralized view can be found in `obs_meta`. + denormalized view can be found in `obs_meta`. * __Catalog__: a [static HTML guide](https://cartodb.github.io/bigmetadata) to data in the observatory generated from the metadata. Docs are generated From 1d8def3e5e005318281fb25e7b58a0a5e1268c98 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 18:06:00 -0500 Subject: [PATCH 08/19] add "check obs_table for the_geom" --- docs/source/validation.rst | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index f77c81104d..c305f70f85 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -392,7 +392,36 @@ a ``geom_ref`` relationship. Here's an example: The above code would ensure that all columns existing inside ``MyTableTask`` would be appropriately linked to any geometries that connect to ``geomref``. +Do you have both the data and geometries in your table? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can check by running: +.. code:: sql + + SELECT * FROM observatory.obs_table + WHERE id LIKE 'my.schema.%'; + +If there is only one table and it has a null "the_geom" boundary, +then you are missing a geometry table. You will need to write a second +:ref:`TableTask` with follows the following structure: + +.. code:: python + class Geometry(TableTask): + def timespan(self): + # Return timespan here + + def requires(self): + return { + 'meta': MyGeoColumnsTask(), + 'data': RawGeometry() + } + + def columns(self): + return self.input()['meta'] + + def populate(self): + # Populate the output table here Regenerate and look at the Catalog ********************************** From cb4753782a540a7681264263a7211e9b566d7855 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 18:28:12 -0500 Subject: [PATCH 09/19] fix code block format --- docs/source/validation.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index c305f70f85..c88115a702 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -396,6 +396,7 @@ Do you have both the data and geometries in your table? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can check by running: + .. code:: sql SELECT * FROM observatory.obs_table @@ -406,6 +407,7 @@ then you are missing a geometry table. You will need to write a second :ref:`TableTask` with follows the following structure: .. code:: python + class Geometry(TableTask): def timespan(self): From 19430c9e686021721c9ac90d12ef832f715b8cf6 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Mon, 13 Feb 2017 18:29:09 -0500 Subject: [PATCH 10/19] fix code block --- docs/source/development.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/development.rst b/docs/source/development.rst index 3e3186bca7..2d5ea06011 100644 --- a/docs/source/development.rst +++ b/docs/source/development.rst @@ -82,6 +82,7 @@ a :ref:`tasks.util.TempTableTask` that you've modified in the course of development and need to re-run: .. code:: python + from tasks.util import TempTableTask from tasks.meta import current_session From d560efff29c87170b006df1d5d0052ef16d179e8 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Tue, 14 Feb 2017 10:39:47 -0500 Subject: [PATCH 11/19] example for checking obs_table the_geom --- docs/source/validation.rst | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index c88115a702..8fa61038d6 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -403,8 +403,22 @@ You can check by running: WHERE id LIKE 'my.schema.%'; If there is only one table and it has a null "the_geom" boundary, -then you are missing a geometry table. You will need to write a second -:ref:`TableTask` with follows the following structure: +then you are missing a geometry table. For example: + +.. code:: sql + +SELECT * from observatory.obs_table +WHERE id LIKE 'es.ine.five_year_population%'; + +.. code:: shell + + id | tablename | timespan | the_geom | description | version +----------------------------------------+----------------------------------------------+----------+----------+-------------+--------- + es.ine.five_year_population_99914b932b | obs_24b656e9e23d1dac2c8ab5786a388f9bf0f4e5ae | 2015 | | | 5 +(1 row) + +Notice that the_geom is empty. You will need to write a second :ref:`TableTask` with the +following structure: .. code:: python From 41250c6646e6b2061fc32848f87149fb8dbbb307 Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Tue, 14 Feb 2017 10:41:17 -0500 Subject: [PATCH 12/19] codeblock format fix --- docs/source/validation.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 8fa61038d6..e18af0c52b 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -407,15 +407,15 @@ then you are missing a geometry table. For example: .. code:: sql -SELECT * from observatory.obs_table -WHERE id LIKE 'es.ine.five_year_population%'; + SELECT * from observatory.obs_table + WHERE id LIKE 'es.ine.five_year_population%'; .. code:: shell - id | tablename | timespan | the_geom | description | version -----------------------------------------+----------------------------------------------+----------+----------+-------------+--------- - es.ine.five_year_population_99914b932b | obs_24b656e9e23d1dac2c8ab5786a388f9bf0f4e5ae | 2015 | | | 5 -(1 row) + id | tablename | timespan | the_geom | description | version + ----------------------------------------+----------------------------------------------+----------+----------+-------------+--------- + es.ine.five_year_population_99914b932b | obs_24b656e9e23d1dac2c8ab5786a388f9bf0f4e5ae | 2015 | | | 5 + (1 row) Notice that the_geom is empty. You will need to write a second :ref:`TableTask` with the following structure: From 76b6cfdcda8b94cd2dc30aa6b2da735e0cc2fb1a Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Tue, 14 Feb 2017 12:00:11 -0500 Subject: [PATCH 13/19] missing comma --- docs/source/validation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index e18af0c52b..aa52e388d7 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -183,7 +183,7 @@ Specify section, subsection, source tags and license tags for all columns When defining your :ref:`tasks.meta.OBSColumn` objects in a :ref:`tasks.util.ColumnsTask` class, make sure each column is assigned -a :ref:`tasks.util.OBSTag` of ``type`` ``section``, ``subsection``, ``source``, +a :ref:`tasks.util.OBSTag` of ``type``, ``section``, ``subsection``, ``source``, and ``license``. Use shared tags from :ref:`tasks.tags` when possible, in particular for ``section`` and ``subsection``. From 833aa56b8f0f7133ceba9aaa3d789d14b42c119e Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Tue, 14 Feb 2017 12:03:24 -0500 Subject: [PATCH 14/19] Change heading type of "Catalog" --- docs/source/validation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index aa52e388d7..83215c5a03 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -440,7 +440,7 @@ following structure: # Populate the output table here Regenerate and look at the Catalog -********************************** +-------------- Once :ref:`tasks.carto.OBSMetaToLocal` has been run, you can generate the catalog. From 2d9bcca37f122b038cfffa15d3991d68fb66152f Mon Sep 17 00:00:00 2001 From: Michelle Ho Date: Tue, 14 Feb 2017 12:04:07 -0500 Subject: [PATCH 15/19] Change heading type of "Upload to CARTO server" --- docs/source/validation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 83215c5a03..b0078f3797 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -455,7 +455,7 @@ catalog. 4. Is a table with a boundary/timespan matrix appearing beneath each measure? Upload to a test CARTO server -***************************** +-------------- ``TODO`` From 981173b8c82815968e25e1c8b8a4c2b69edc5463 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Wed, 1 Mar 2017 19:24:44 +0000 Subject: [PATCH 16/19] adding comments --- docs/source/validation.rst | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index b0078f3797..5e491ac354 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -432,7 +432,7 @@ following structure: 'meta': MyGeoColumnsTask(), 'data': RawGeometry() } - + def columns(self): return self.input()['meta'] @@ -449,11 +449,34 @@ catalog. make catalog +You can view the generated Catalog in a browser window by going to the IP and +port address for the nginx process. The current processes are shown with +``docker-compose ps`` or ``make ps``. + 1. Are there any nasty typos or missing data? + + * Variable names should be unique, human-readable, and concise. If the + variable needs more in-depth definition, this should go in the + "description" of the variable. + 2. Does the nesting look right? Are there columns not nested? + + * Variables that are denominators should also have subcolumns of direct + nested variables. + + * There may be repetitive nesting if a variable is nested under two + denominators, which is fine. + 3. Are sources and licenses populated for all measures? + + * A source and license :ref:`tasks.util.OBSTag` must be written for new + sources and licenses + 4. Is a table with a boundary/timespan matrix appearing beneath each measure? + * If not, hardcode the sample latitude and longitude in :ref:`tasks.meta.catalog_lonlat`. + + Upload to a test CARTO server -------------- From dc5ac82325f2d7d5e432bd6648431aa198f9fa96 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Wed, 1 Mar 2017 19:56:39 +0000 Subject: [PATCH 17/19] finished validation docs --- docs/source/validation.rst | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/docs/source/validation.rst b/docs/source/validation.rst index 5e491ac354..30b1fd2aa4 100644 --- a/docs/source/validation.rst +++ b/docs/source/validation.rst @@ -480,5 +480,16 @@ port address for the nginx process. The current processes are shown with Upload to a test CARTO server -------------- -``TODO`` +If you set a ``CARTODB_API_KEY`` and ``CARTODB_URL`` in your ``.env`` file, in +the format: +.. code:: shell + + CARTODB_API_KEY=your_api_key + CARTODB_URL=https://username.carto.com + +You will now be able to upload your data and metadata to CARTO for previewing. + +.. code:: shell + + make sync From 9c54e4834881957c001802728c65f3cd92bfce8d Mon Sep 17 00:00:00 2001 From: John Krauss Date: Wed, 1 Mar 2017 19:59:50 +0000 Subject: [PATCH 18/19] updating docs --- docs/source/tasks.au.data.rst | 7 +++++++ docs/source/tasks.au.geo.rst | 7 +++++++ docs/source/tasks.au.rst | 18 ++++++++++++++++++ docs/source/tasks.poi.rst | 7 +++++++ docs/source/tasks.rst | 3 +++ docs/source/tasks.socrata.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.acris.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.columns.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.dcp.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.dob.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.package.rst | 7 +++++++ docs/source/tasks.us.ny.nyc.rst | 21 +++++++++++++++++++++ docs/source/tasks.us.ny.rst | 17 +++++++++++++++++ docs/source/tasks.us.rst | 1 + 14 files changed, 123 insertions(+) create mode 100644 docs/source/tasks.au.data.rst create mode 100644 docs/source/tasks.au.geo.rst create mode 100644 docs/source/tasks.au.rst create mode 100644 docs/source/tasks.poi.rst create mode 100644 docs/source/tasks.socrata.rst create mode 100644 docs/source/tasks.us.ny.nyc.acris.rst create mode 100644 docs/source/tasks.us.ny.nyc.columns.rst create mode 100644 docs/source/tasks.us.ny.nyc.dcp.rst create mode 100644 docs/source/tasks.us.ny.nyc.dob.rst create mode 100644 docs/source/tasks.us.ny.nyc.package.rst create mode 100644 docs/source/tasks.us.ny.nyc.rst create mode 100644 docs/source/tasks.us.ny.rst diff --git a/docs/source/tasks.au.data.rst b/docs/source/tasks.au.data.rst new file mode 100644 index 0000000000..876203876f --- /dev/null +++ b/docs/source/tasks.au.data.rst @@ -0,0 +1,7 @@ +tasks.au.data module +==================== + +.. automodule:: tasks.au.data + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.au.geo.rst b/docs/source/tasks.au.geo.rst new file mode 100644 index 0000000000..dfb64b40a2 --- /dev/null +++ b/docs/source/tasks.au.geo.rst @@ -0,0 +1,7 @@ +tasks.au.geo module +=================== + +.. automodule:: tasks.au.geo + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.au.rst b/docs/source/tasks.au.rst new file mode 100644 index 0000000000..30e43849af --- /dev/null +++ b/docs/source/tasks.au.rst @@ -0,0 +1,18 @@ +tasks.au package +================ + +Submodules +---------- + +.. toctree:: + + tasks.au.data + tasks.au.geo + +Module contents +--------------- + +.. automodule:: tasks.au + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.poi.rst b/docs/source/tasks.poi.rst new file mode 100644 index 0000000000..bd671e8993 --- /dev/null +++ b/docs/source/tasks.poi.rst @@ -0,0 +1,7 @@ +tasks.poi module +================ + +.. automodule:: tasks.poi + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.rst b/docs/source/tasks.rst index 9281f7476a..9b20a42952 100644 --- a/docs/source/tasks.rst +++ b/docs/source/tasks.rst @@ -6,6 +6,7 @@ Subpackages .. toctree:: + tasks.au tasks.br tasks.ca tasks.es @@ -23,7 +24,9 @@ Submodules tasks.carto tasks.meta + tasks.poi tasks.sched + tasks.socrata tasks.sphinx tasks.tags tasks.util diff --git a/docs/source/tasks.socrata.rst b/docs/source/tasks.socrata.rst new file mode 100644 index 0000000000..460f19ddd7 --- /dev/null +++ b/docs/source/tasks.socrata.rst @@ -0,0 +1,7 @@ +tasks.socrata module +==================== + +.. automodule:: tasks.socrata + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.acris.rst b/docs/source/tasks.us.ny.nyc.acris.rst new file mode 100644 index 0000000000..106c5343c9 --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.acris.rst @@ -0,0 +1,7 @@ +tasks.us.ny.nyc.acris module +============================ + +.. automodule:: tasks.us.ny.nyc.acris + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.columns.rst b/docs/source/tasks.us.ny.nyc.columns.rst new file mode 100644 index 0000000000..84919e6f9f --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.columns.rst @@ -0,0 +1,7 @@ +tasks.us.ny.nyc.columns module +============================== + +.. automodule:: tasks.us.ny.nyc.columns + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.dcp.rst b/docs/source/tasks.us.ny.nyc.dcp.rst new file mode 100644 index 0000000000..4132cf507f --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.dcp.rst @@ -0,0 +1,7 @@ +tasks.us.ny.nyc.dcp module +========================== + +.. automodule:: tasks.us.ny.nyc.dcp + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.dob.rst b/docs/source/tasks.us.ny.nyc.dob.rst new file mode 100644 index 0000000000..d6816bcd2c --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.dob.rst @@ -0,0 +1,7 @@ +tasks.us.ny.nyc.dob module +========================== + +.. automodule:: tasks.us.ny.nyc.dob + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.package.rst b/docs/source/tasks.us.ny.nyc.package.rst new file mode 100644 index 0000000000..470b1cc11d --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.package.rst @@ -0,0 +1,7 @@ +tasks.us.ny.nyc.package module +============================== + +.. automodule:: tasks.us.ny.nyc.package + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.nyc.rst b/docs/source/tasks.us.ny.nyc.rst new file mode 100644 index 0000000000..9be59a91ca --- /dev/null +++ b/docs/source/tasks.us.ny.nyc.rst @@ -0,0 +1,21 @@ +tasks.us.ny.nyc package +======================= + +Submodules +---------- + +.. toctree:: + + tasks.us.ny.nyc.acris + tasks.us.ny.nyc.columns + tasks.us.ny.nyc.dcp + tasks.us.ny.nyc.dob + tasks.us.ny.nyc.package + +Module contents +--------------- + +.. automodule:: tasks.us.ny.nyc + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.ny.rst b/docs/source/tasks.us.ny.rst new file mode 100644 index 0000000000..b793352bf8 --- /dev/null +++ b/docs/source/tasks.us.ny.rst @@ -0,0 +1,17 @@ +tasks.us.ny package +=================== + +Subpackages +----------- + +.. toctree:: + + tasks.us.ny.nyc + +Module contents +--------------- + +.. automodule:: tasks.us.ny + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/source/tasks.us.rst b/docs/source/tasks.us.rst index 639aecbf3d..fda0b02949 100644 --- a/docs/source/tasks.us.rst +++ b/docs/source/tasks.us.rst @@ -8,6 +8,7 @@ Subpackages tasks.us.census tasks.us.epa + tasks.us.ny Submodules ---------- From 89ccc5a7f3fa956bb042944cbed0e85f8bb91590 Mon Sep 17 00:00:00 2001 From: John Krauss Date: Thu, 2 Mar 2017 15:15:45 +0000 Subject: [PATCH 19/19] notes on adding observatory-extension --- docs/source/testing.rst | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/docs/source/testing.rst b/docs/source/testing.rst index 8e7732c8f3..1ef0b12fca 100644 --- a/docs/source/testing.rst +++ b/docs/source/testing.rst @@ -11,39 +11,49 @@ ETL unit tests Unit tests ensure that there are no errors in the underlying utility classes that could cause errors in code you build on top of them. -The tests also provide limited coverage for simple :ref:`tasks.util.ColumnTask` -classes that don't need parameters set. - Tests are run with: .. code:: shell make etl-unittest +.. +Metadata integration tests +-------------------------- + +Integration tests make sure that the metadata being generated as part of your +ETL will actually be queryable by the API. For example, if you have an ETL +that ingests data but does not +.. + API unit tests -------------- API unit tests make sure the observatory-extension, which reads data and metadata from the ETL, are working right. +In order for this to function, you'll need to clone a copy of +``observatory-extension`` into the root of the ``bigmetadata`` repo. + .. code:: shell + git clone git@github.com:cartodb/observatory-extension make extension-unittest -``TODO`` - Integration tests ----------------- Integration tests ensure that the data from the ETL that is set for deployment is is able to return a measure for every piece of metadata. +As above, you'll need a copy of ``observatory-extension`` locally for this test +to work. + .. code:: shell + git clone git@github.com:cartodb/observatory-extension make extension-autotest -``TODO`` - Diagnosing common issues in integration tests ---------------------------------------------