City of Denton Datasets
Issue tracker for City of Denton Datasets
This repo is meant to provide a collaborative way to assess and discuss the new datasets provided by the city of Denton. Using the built in CKAN API (see also here) on their platform (data.cityofdenton.com), we autogenerated issues for each dataset available. This gives us the ability to sort, discuss, and report back findings to the city technical staff on these datasets, not just to improve them for Open Data Day, but for all future open data purposes as well.
Inside each issue you will see some basic data about the dataset, a link to the dataset on the website, and a summary of the resources for each dataset.
Expected of Each Data Set
Each data set should be usable in a database, and should have variables with which the data set can be merged with other data sets. Barring that, the data should be geocoded (tagged with geographic positioning data, such as latitude and longitude, or state plane coordinates).
To ensure these data are useful without additional munging, the following should be true:
- the columns are delimited consistently or (less desirable) column separated correctly
- variable types aren't mixed within columns (no text in integer columns, etc.)
- Text strings are quoted (but this isn't strictly required, provided the text isn't comma-separated)
- Ideally, there is exactly one header row, but it's reasonable for some datasets to have two-line records OR lack any header row, provided there is a correct codebook
Has a unique identifier or a combination of columns that can be assembled into a unique identifier (day/date/month, e.g.)
Has a codebook describing each data variable (field), variable type, and, where needed, column width
The user/committer has actually imported the data and confirmed that it worked without errors
What's Missing? Is It Useful?
It's possible that the data are usable but less useful than they could be, if, for instance, the data were in a different format, or if they contained an additional variable or level of analysis. A "wish list" for reasonable changes or upgrades is also potentially useful for the data catalog, moving forward.
Additionally, a measure of usability is valuable. Please tag each data set with "usable", "unusable", "needs repair", etc.
Issue / Data fetching script
If you want to play around, or even use this data script for something else, you can clone the repo and use composer to install the dependencies.
$ git clone email@example.com:OpenDenton/City-of-Denton-Datasets.git
If you don't have composer installed, download that first.
$ brew install composer
$ curl -s http://getcomposer.org/installer | php
Then install the dependencies from the root of the repository.
$ composer install
From here, you'll need to generate a new Github token, and replace the demo token in the script.
$token = 'NEW TOKEN HERE';
unusable: any of a list of conditions that prevent the data from being used --
- the data set cannot be read by a computer program
- there are no definitions for variables
- text data are not represented by either ASCII or Unicode
complete: includes code book, machine-usable data, no missing data, and the columns match the data and expected data types
incomplete: missing some element that make the data imperfect or difficult to use; potentially usable but, obviously, not totally finished
gis: data include a geographic (whether or not they are geocoded) component
nocodebook: it is not possible to know what the values in a given column mean, for certain, because there is no supporting documentation for the data set
xls: data are in some Excel version format
text: data are provided in an ASCII text format
needsupdate: for time series data, are there missing years (especially the most recent years)?
needscleaning: Data may be usable but data within columns may be inconsistent or benefit from breaking values out into additional variables (A text field with "Males, 21-34, Causasian - Non-Hispanic" should really be at least four fields)
personalinfo: Data contains Personally Identifiable Information (PII) such as names and addresses
unclearvariables: Even if the column headers are in English, if they are imprecise or uncertain, this tag applies