Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation updates #4086

Merged
merged 3 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 3 additions & 21 deletions docs/source/components/dkan_metastore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ as precise as the geographic boundaries or relevant time period the data applies
Some more details of DKAN's metastore:

* The data assets themselves (usually in the form of local files or URLs to data files) are referred to internally in DKAN as *resources*.
* The structure and format of dataset metadata in DKAN are determined by a `JSON schema <https://json-schema.org/>`_. By default, DKAN provides and utilizes the `DCAT-US metadata schema <https://resources.data.gov/resources/dcat-us/>`_ to store datasets, but :ref:`custom schemas <custom_schema>` can be added to the codebase to override this.
* The structure and format of dataset metadata in DKAN are determined by a `JSON schema <https://json-schema.org/>`_. By default, DKAN provides and utilizes the `DCAT-US metadata schema <https://resources.data.gov/resources/dcat-us/>`_ to store datasets, but :doc:`custom schemas <../user-guide/guide_custom_schemas>` can be added to the codebase to override this.
* In DCAT-US, resources are placed in a sub-schema of the parent dataset called a *distribution*.

.. image:: https://project-open-data.cio.gov/v1.1/schema-diagram.svg
Expand All @@ -26,28 +26,10 @@ Some more details of DKAN's metastore:
Read the documentation on :doc:`How to add a Dataset <../user-guide/guide_dataset>` to get started adding information to the metastore.


Configuration
-------------
.. _custom_schema:

Changing your dataset schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Replacing the dataset schema in DKAN allows you to add fields and conform to additional specifications.
As long as you provide a valid JSON schema, any information going into the metastore will be validated against it.

To change the schema being used, copy the contents of the `schema` directory from the DKAN repo and place
it in the root of your Drupal installation (docroot/schema). Then make any modifications necessary to the
`dataset.json` file inside the `collections` directory. Note that even if you are only changing the
dataset.json schema, it is important to copy ALL of the schema files as DKAN will be expecting all of the
schema files to be in the same location.
.. note::

.. warning::
Read the documentation on :doc:`Changing your dataset schema <../user-guide/guide_custom_schemas>` to learn how to add custom fields.

Warning: The schema is actively used by the catalog to verify the validity of the data.
Making changes to the schema after data is present in the catalog should be done with care
as non-backward-compatible changes to the schema could cause issues.
Look at ``Drupal::metastore::SchemaRetriever::findSchemaDirectory()`` for context.

Data Dictionaries
-----------------
Expand Down
32 changes: 32 additions & 0 deletions docs/source/user-guide/guide_custom_schemas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
How to customize the dataset schema
====================================

If you have additional specification requirements, replacing the dataset schema in DKAN will allow you to store
additional metadata for a dataset beyond the :doc:`default metadata <../components/dkan_metastore>`.
As long as you provide a valid JSON schema, any information going into the metastore will be validated against it.

To change the schema being used, copy the entire schema/collections directory out of dkan to your docroot directory (docroot/schema/collections).
Then make any modifications necessary to the `dataset.json` file inside the `collections` directory. Add your custom field(s) under "properties".

.. code:: json

"properties": {
"myNewField": {
"title": "Custom Field",
"description": "Some descriptive text.",
"type": "string"
},

}

You can remove metadata fields with the exception of the *distribution* property, this one must remain present in the dataset.json file.

Note that even if you are only changing the dataset.json schema, it is important to copy ALL of the schema files as DKAN will be expecting all of the
schema files to be in the same location.

.. warning::

Warning: The schema is actively used by the catalog to verify the validity of the data.
Making changes to the schema after data is present in the catalog should be done with care
as non-backward-compatible changes to the schema could cause issues.
Look at ``Drupal::metastore::SchemaRetriever::findSchemaDirectory()`` for context.
14 changes: 7 additions & 7 deletions docs/source/user-guide/guide_file_fetchers.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Implementing a custom file fetcher
----------------------------------
==================================

DKAN uses a library called [getdkan/file-fetcher](https://github.com/GetDKAN/file-fetcher). This library allows developers to extend the file transfer functionality for their specialized needs.
DKAN uses a library called `getdkan/file-fetcher <https://github.com/GetDKAN/file-fetcher>`_. This library allows developers to extend the file transfer functionality for their specialized needs.

This library is used to download a resource, such as a CSV file, so that it can be loaded into the database and presented through the UI and API. This process is called "localization," because the source resource is copied to the local file system. Usually, this downloaded copy is temporary and is eventually removed.

Expand All @@ -10,17 +10,17 @@ The standard file fetcher processors will probably be adequate for most uses, bu
In cases such as these, we might want to add our own processor class to extend the file fetcher functionality.

How-to:
=======
-------

Note that a code example can be found in the ``custom_processor_test`` module, which is used to test this functionality.

Create a file processor class
_____________________________
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To implement a new file processor, a create a custom file fetcher processor class. This class could extend ``FileFetcher\Processor\Remote`` or ``FileFetcher\Processor\Local``, or be a totally new implementation of ``FileFetcher\Processor\ProcessorInterface``.

Create a FileFetcherFactory
___________________________
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Next, create a new file fetcher factory class. This class should emulate ``Drupal\common\FileFetcher\FileFetcherFactory``. There is example code in the ``custom_processor_test`` module which demonstrates how to do this.

Expand All @@ -39,7 +39,7 @@ The new factory should create and configure a ``FileFetcher\FileFetcher`` object
}

Declare your factory as a service
_________________________________
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is also very important to declare your new factory class as a service. You accomplish this by decorating ``dkan.common.file_fetcher`` in your module's ``*.services.yml`` file, something like this:

Expand All @@ -53,7 +53,7 @@ It is also very important to declare your new factory class as a service. You ac
Now whenever DKAN uses the ``dkan.common.file_fetcher`` service, your file fetcher factory will be used instead, and your new processor will find its way into use.

Processor negotiation
=====================
---------------------

It's important to know how ``FileFetcher`` goes about choosing a processor.

Expand Down
30 changes: 28 additions & 2 deletions docs/source/user-guide/guide_harvest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ How to create a harvest
Use drush commands to :term:`Harvest` data into your catalog.

Create a harvest JSON file
__________________________
--------------------------

Normally you would use the data.json provided by another data catalog and "harvest" the datasets into your catalog.
But harvests can also be used for bulk management of datasets from a manually generated data.json file.
Expand Down Expand Up @@ -81,11 +81,37 @@ Run the harvest

View the status of the harvest
------------------------------
Navigate to *admin/dkan/harvest* to view the status of the extraction,

.. prompt:: bash $

drush dkan:harvest:status myHarvestId

You can also navigate to *admin/dkan/harvest* to view the status of the extraction,
the date the harvest was run, and the number of datasets that were added
by the harvest. By clicking on the harvest ID, you will also see specific
information about each dataset, and the status of the datastore import.

Revert a harvest
----------------

.. prompt:: bash $

drush dkan:harvest:revert myHarvestId

This will delete all of the datasets listed in the specified harvest ID. Any referenced
terms will be set to the orphaned state. Any distributions from the harvest will be unpublished.
Alternatively you could run `dkan:harvest:archive` to unpublish the datasets without
deleting them from your catalog.

Deregister a harvest
--------------------

.. prompt:: bash $

drush dkan:harvest:deregister myHarvestId

This will remove the harvest plan.

Transforms
----------
If you would also like to make changes to the data you are harvesting,
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/guide_metastore_properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ can be accesssed via an API endpoint. Learn about the `API <https://demo.getdkan

You can customize which sub-elements generate additional data nodes here ``admin/dkan/properties``.

.. image:: https://dkan-documentation-files.s3.us-east-2.amazonaws.com/dkan2/metastore-referencer-config.png
.. image:: images/metastore-referencer-config.png

When the value of these elements change or become outdated, the corresponding data node will be unpublished by the **orphan_reference_processor** queue task.

Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ User Guide
guide_api
guide_datastore_settings
guide_metastore_properties
guide_custom_schemas
guide_harvest
guide_queues
guide_local_files
Expand Down