Skip to content

How To Add A Dataset

gchristiana edited this page Jul 23, 2020 · 2 revisions

Short term plans are to use simple enough JSON metadata for all datasets now that's easy to create by hand, but is close enough to standards that any future open data portals can consume them.

How To Add A New Dataset

  • Ensure the data is public and/or we have rights to republish it. Anything covered by open meeting laws or that the town provides in response to a records request is certainly OK.
  • Format the data into a simple CSV with rows and columns. Understand what each column means; going forward we should work on a standard for providing keys for column names/types.
  • Decide where the data belongs in an approximate functional category. Current dir structure (propose improvements!):
    • arlingtonma.info/tree/master/docs/data
      • /demographics - population & classifications thereof, natural resources (road miles per town, square area, etc.)
      • /finance - budgets, money, salaries, expenses.
      • /governance - annual reports, org charts, structural information about town government.
      • /voting - voting data for elections, town meeting, etc.
      • /other? - what other high-level categories should we use?
  • If the original data source (that you downloaded from the town website, for example) is a file with rows/columns data, use the same file name as that file. Otherwise, choose a descriptive filename with extension .csv.
  • Check in your my_data_file.csv file into the appropriate dir.
  • Check in a my_data_file.json metadata file in the same dir using the format below.

Notes On JSON Metadata

The JSON schema is based on DCAT-US Schema v1.1 (Project Open Data Metadata Schema) (see sample files from US).

Note that we're not currently doing strict JSON schema compatibility; for the time being, the .json format should be easy enough to write by hand and display using simple jekyll/liquid templates - see _includes/dataset.html et al.

See a sample Arlington dataset .json file for population data, and a sample Arlington data catalog (collection of datasets).

  • We aren't using the additional fields the US Federal government defines (like bureauCode, programCode, etc.).
  • Notes on other metadata fields:
    • Using describedBy and references are two ways to provide a pointer to human-readable pages that describe the data with greater context.
    • Use landingPage is another great way to point to the appropriate board/committee homepage of a document, to provide organizational context.
    • Question: how should we try to use contactPoint? For now, we're republishing data that is sourced to various town departments; it's not clear if it's effective for us to point end-users directly at the relevant town employee (yet), or if that would just cause confusion if site readers started emailing the town directly.
    • TODO: come up with a way to assign identifiers.
    • TODO: come up with taxonomy for keyword and theme metadata fields.