Skip to content
This repository has been archived by the owner on Jan 5, 2022. It is now read-only.

Core Data Objects

Eric Jackson edited this page Apr 2, 2015 · 10 revisions

Overview

This page describes the key data objects that the GBE works with (and that get exposed via the API). Our goal is a set of structures that are general enough to describe budget and spending data from a wide range of government organizations while remaining simple enough to work with easily.

Common Fields

All of the objects below include an ID and most of them also include timestamps that track creation and last-update times. These will not be indicated below.

Many of the objects also include a field called 'Properties'. This is just a name:value store (persisted as a JSON string). Where included these will be listed below.

Organization

At the root of a site on the GBE platform is an organization for which we are presenting financial information, generally a municipal or county government. The information here is simple:

  • Name
  • Description
  • Properties

Accounts and Categories

As in any accounting system, we want to separate information about the structure of the data from the data itself. This is accomplished by tying individual data items into a chart of accounts.

Chart of Accounts

  • Organization ID
  • Properties

An organization can have more than one chart of accounts - this is necessary to handle changes from year to year. However, there must always be one designated as the default. Perhaps this can be done via a property on the organization? TBD.

Account

  • Chart ID
  • Account Code
  • Account Name
  • Account Type
  • Properties

The account type is an integer code for the standard account type categories (revenue, expense, asset, liability, equity, contra-account).

A data item may only have one account code.

Account Category

  • Name
  • Chart ID
  • Description
  • Properties

Account Category Value

  • Name
  • Code
  • Account Category ID
  • Properties

A data item can have values for multiple categories. As an example, a line item might have an account of office supplies along with categories for fund, department, division, project, etc.

Data Sets

In order to present budget data for the public to view and interact with, we must be able to import a set of data provided by a government. This may range from a large collection of individual transactions to a very high-level summary by category of government spending or revenues. We will use 2 structures to describe everything in this range.

DataSet

This is one of the most critical data design decisions - the definition of a dataset has significant consequences for what's easy or hard to do in the system. I think the definition here is a good approach, but it needs some additional scrutiny.

A DataSet has three key parameters: granularity, extent and type.

Granularity is the extent of the time bucket associated with a single data item. It can be a day, a month or a year and it is uniform for all items within the dataset (i.e., you can't have an item that represents January office supplies expense in a dataset with year or day granularity). Having multiple entries for the same period on the same account+categories combination is allowed (that is how transaction data would be handled, for example), but is probably not common.

Extent is the time period within which all items in the data set fall. It can be a (specific) day, month or year. The only constraint is that the extent can't cover a period less than the granularity of the dataset.

As a specific example, the Asheville budget data has granularity = extent = 1 year.

Type is a further parameter that can distinguish two datasets with the same granularity and extent. A common example would be "actual" versus "budget".

It is important to note that the type, granularity and extent collectively constitute a unique identifier for a dataset within a given organization. Perhaps later we'll need to add a parameter (project, dept, etc.), but for now it'll do.

So with that introduction, here are the fields associated with a DataSet:

  • Organization ID
  • Chart of Accounts ID
  • Type
  • Granularity (Day, Month or Year)
  • Extent (this may only be defined up to the granularity, e.g., if granularity is Year, then the month and day of the extent must be blank).
    • Year
    • Month
    • Day
  • Description
  • Properties
  • Category Order (an array of category type IDs in the order categories appear in the dataset).

Defining the category order field here is a compromise. We want to be able to just store a list of IDs for individual data items. We could specify this at the chart of accounts level, but that makes the system over-rigid. In the vast majority of cases a dataset will be replaced rather than updated and so there's no need to worry about how to change the order of categories.

DataItem

This is an individual record, e.g., an actual or budget amount for a specific account in a specific period. It could even be an individual transaction, although that is not our focus here.

  • Dataset ID
  • Group
  • Amount
  • Year
  • Month
  • Day
  • Account ID
  • Category IDs (see below)
  • Properties

The group parameter associates a set of dataitems as belonging to a single import - if another import were to be done with the same group id, it would replace rather than be added to an existing set. This is important because a full dataset (e.g., the 2015 budget) may actually be loaded in pieces (e.g., fund by fund in Asheville). We need to be able to do the piecemeal loading, but also have the ability to replace one of the pieces, if needed.

As noted above, the Year/Month/Day specification is limited by the granularity.

Most datasets tag data items with multiple categories so that the Category IDs above actually involve an array. There are various ways to handle this - in the current implementation we plan to store the first 3 as individual columns (to facilitate queries) and the remainder in a spill column with a JSON array string. The category types being referred to can be looked up via the category ID, but, as described above, the DataSet establishs the order to be used for the array.