diff --git a/docs/source/components/dkan_metastore.rst b/docs/source/components/dkan_metastore.rst index 8154191a1d..be939cefdd 100644 --- a/docs/source/components/dkan_metastore.rst +++ b/docs/source/components/dkan_metastore.rst @@ -14,7 +14,7 @@ as precise as the geographic boundaries or relevant time period the data applies Some more details of DKAN's metastore: * The data assets themselves (usually in the form of local files or URLs to data files) are referred to internally in DKAN as *resources*. -* The structure and format of dataset metadata in DKAN are determined by a `JSON schema `_. By default, DKAN provides and utilizes the `DCAT-US metadata schema `_ to store datasets, but :ref:`custom schemas ` can be added to the codebase to override this. +* The structure and format of dataset metadata in DKAN are determined by a `JSON schema `_. By default, DKAN provides and utilizes the `DCAT-US metadata schema `_ to store datasets, but :doc:`custom schemas <../user-guide/guide_custom_schemas>` can be added to the codebase to override this. * In DCAT-US, resources are placed in a sub-schema of the parent dataset called a *distribution*. .. image:: https://project-open-data.cio.gov/v1.1/schema-diagram.svg @@ -26,28 +26,10 @@ Some more details of DKAN's metastore: Read the documentation on :doc:`How to add a Dataset <../user-guide/guide_dataset>` to get started adding information to the metastore. -Configuration -------------- -.. _custom_schema: - -Changing your dataset schema -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Replacing the dataset schema in DKAN allows you to add fields and conform to additional specifications. -As long as you provide a valid JSON schema, any information going into the metastore will be validated against it. - -To change the schema being used, copy the contents of the `schema` directory from the DKAN repo and place -it in the root of your Drupal installation (docroot/schema). Then make any modifications necessary to the -`dataset.json` file inside the `collections` directory. Note that even if you are only changing the -dataset.json schema, it is important to copy ALL of the schema files as DKAN will be expecting all of the -schema files to be in the same location. +.. note:: -.. warning:: + Read the documentation on :doc:`Changing your dataset schema <../user-guide/guide_custom_schemas>` to learn how to add custom fields. - Warning: The schema is actively used by the catalog to verify the validity of the data. - Making changes to the schema after data is present in the catalog should be done with care - as non-backward-compatible changes to the schema could cause issues. - Look at ``Drupal::metastore::SchemaRetriever::findSchemaDirectory()`` for context. Data Dictionaries ----------------- diff --git a/docs/source/user-guide/guide_custom_schemas.rst b/docs/source/user-guide/guide_custom_schemas.rst new file mode 100644 index 0000000000..2ec2aba1c2 --- /dev/null +++ b/docs/source/user-guide/guide_custom_schemas.rst @@ -0,0 +1,32 @@ +How to customize the dataset schema +==================================== + +If you have additional specification requirements, replacing the dataset schema in DKAN will allow you to store +additional metadata for a dataset beyond the :doc:`default metadata <../components/dkan_metastore>`. +As long as you provide a valid JSON schema, any information going into the metastore will be validated against it. + +To change the schema being used, copy the entire schema/collections directory out of dkan to your docroot directory (docroot/schema/collections). +Then make any modifications necessary to the `dataset.json` file inside the `collections` directory. Add your custom field(s) under "properties". + +.. code:: json + + "properties": { + "myNewField": { + "title": "Custom Field", + "description": "Some descriptive text.", + "type": "string" + }, + + } + +You can remove metadata fields with the exception of the *distribution* property, this one must remain present in the dataset.json file. + +Note that even if you are only changing the dataset.json schema, it is important to copy ALL of the schema files as DKAN will be expecting all of the +schema files to be in the same location. + +.. warning:: + + Warning: The schema is actively used by the catalog to verify the validity of the data. + Making changes to the schema after data is present in the catalog should be done with care + as non-backward-compatible changes to the schema could cause issues. + Look at ``Drupal::metastore::SchemaRetriever::findSchemaDirectory()`` for context. diff --git a/docs/source/user-guide/guide_file_fetchers.rst b/docs/source/user-guide/guide_file_fetchers.rst index 98fe75f3a6..37ab2fe5a8 100644 --- a/docs/source/user-guide/guide_file_fetchers.rst +++ b/docs/source/user-guide/guide_file_fetchers.rst @@ -1,7 +1,7 @@ Implementing a custom file fetcher ----------------------------------- +================================== -DKAN uses a library called [getdkan/file-fetcher](https://github.com/GetDKAN/file-fetcher). This library allows developers to extend the file transfer functionality for their specialized needs. +DKAN uses a library called `getdkan/file-fetcher `_. This library allows developers to extend the file transfer functionality for their specialized needs. This library is used to download a resource, such as a CSV file, so that it can be loaded into the database and presented through the UI and API. This process is called "localization," because the source resource is copied to the local file system. Usually, this downloaded copy is temporary and is eventually removed. @@ -10,17 +10,17 @@ The standard file fetcher processors will probably be adequate for most uses, bu In cases such as these, we might want to add our own processor class to extend the file fetcher functionality. How-to: -======= +------- Note that a code example can be found in the ``custom_processor_test`` module, which is used to test this functionality. Create a file processor class -_____________________________ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To implement a new file processor, a create a custom file fetcher processor class. This class could extend ``FileFetcher\Processor\Remote`` or ``FileFetcher\Processor\Local``, or be a totally new implementation of ``FileFetcher\Processor\ProcessorInterface``. Create a FileFetcherFactory -___________________________ +^^^^^^^^^^^^^^^^^^^^^^^^^^^ Next, create a new file fetcher factory class. This class should emulate ``Drupal\common\FileFetcher\FileFetcherFactory``. There is example code in the ``custom_processor_test`` module which demonstrates how to do this. @@ -39,7 +39,7 @@ The new factory should create and configure a ``FileFetcher\FileFetcher`` object } Declare your factory as a service -_________________________________ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is also very important to declare your new factory class as a service. You accomplish this by decorating ``dkan.common.file_fetcher`` in your module's ``*.services.yml`` file, something like this: @@ -53,7 +53,7 @@ It is also very important to declare your new factory class as a service. You ac Now whenever DKAN uses the ``dkan.common.file_fetcher`` service, your file fetcher factory will be used instead, and your new processor will find its way into use. Processor negotiation -===================== +--------------------- It's important to know how ``FileFetcher`` goes about choosing a processor. diff --git a/docs/source/user-guide/guide_harvest.rst b/docs/source/user-guide/guide_harvest.rst index 1c9febf4c1..e049872259 100644 --- a/docs/source/user-guide/guide_harvest.rst +++ b/docs/source/user-guide/guide_harvest.rst @@ -4,7 +4,7 @@ How to create a harvest Use drush commands to :term:`Harvest` data into your catalog. Create a harvest JSON file -__________________________ +-------------------------- Normally you would use the data.json provided by another data catalog and "harvest" the datasets into your catalog. But harvests can also be used for bulk management of datasets from a manually generated data.json file. @@ -81,11 +81,37 @@ Run the harvest View the status of the harvest ------------------------------ -Navigate to *admin/dkan/harvest* to view the status of the extraction, + + .. prompt:: bash $ + + drush dkan:harvest:status myHarvestId + +You can also navigate to *admin/dkan/harvest* to view the status of the extraction, the date the harvest was run, and the number of datasets that were added by the harvest. By clicking on the harvest ID, you will also see specific information about each dataset, and the status of the datastore import. +Revert a harvest +---------------- + + .. prompt:: bash $ + + drush dkan:harvest:revert myHarvestId + +This will delete all of the datasets listed in the specified harvest ID. Any referenced +terms will be set to the orphaned state. Any distributions from the harvest will be unpublished. +Alternatively you could run `dkan:harvest:archive` to unpublish the datasets without +deleting them from your catalog. + +Deregister a harvest +-------------------- + + .. prompt:: bash $ + + drush dkan:harvest:deregister myHarvestId + +This will remove the harvest plan. + Transforms ---------- If you would also like to make changes to the data you are harvesting, diff --git a/docs/source/user-guide/guide_metastore_properties.rst b/docs/source/user-guide/guide_metastore_properties.rst index a9ffdf3e7c..524b0cb056 100644 --- a/docs/source/user-guide/guide_metastore_properties.rst +++ b/docs/source/user-guide/guide_metastore_properties.rst @@ -8,7 +8,7 @@ can be accesssed via an API endpoint. Learn about the `API