Skip to content

Releases: datacommonsorg/import

0.1-alpha.1k

10 Mar 19:45
5157e13
Compare
Choose a tag to compare

Includes the following fixes:

  1. Support for CSVs from MS Excel (with BOM characters)
  2. Allows empty column names in CSV
  3. Propagate exceptions better so we don't fail silently when bad things happen

0.1-alpha.1j

03 Mar 23:28
599c141
Compare
Choose a tag to compare

Bug Fixes:

  • Fixes exceptions thrown when a series has a mix of numeric and non-numeric values

0.1-alpha.1h

16 Aug 19:35
654aefd
Compare
Choose a tag to compare

Highlights

  • Added support for categorical variables (SVs with statType: measurementResult)
  • Performance optimizations with ~25% expected speed gains

Changelog

New checks and verifications

  • Added support for categorical variables
    • Any non-numeric StatVarObservation values must now be explicitly allowed with the --allow-non-numeric-obs-values=true flag.
    • Categorical variables (SVs with statType: measurementResult) can be checked for existence by specifying --check-measurement-result=true.
    • Some common checks (inconsistent values, date gaps) apply to all time series, including categorical variables.

Speed Optimizations

  • Added a heuristic to date checking, vastly improving speed for the “correct-path”. Expected improvement is ~25%

Summary Report changes

  • Added “expand/collapse all” buttons that work for all collapsible tags on the report
  • Changed chart style to highlight data points and support datasets with many data points

Bug fixes

  • Fixed issue where the line number of the last CSV row was incorrect
  • Fixed issue where logs were duplicated when more than two values had the same date
  • Fixed flaky ordering of output in some test goldens

0.1-alpha.1g

01 Jul 00:06
59736fa
Compare
Choose a tag to compare

What’s new:

  • Improvements to speed when using the tool;

    • Allow external IDs to be resolved using local side MCF, saving on the need to first get new external IDs updated in the reconciliation API which could take days before those IDs were verified by the tool.
    • Optimized performance for an estimated ~10% raw speed boost.
  • Expanded checks to catch more issues and support additional data types;

    • Existence checks for “observationAbout” references (behind a new flag -ep)
    • Expanded validation to recently introduced statTypes (confidence interval {upper, lower} limit, kurtosis, skewness, growth rate).
    • Support schemaless SVs with init-cap mprop
  • Added documentation for;

    • Tool usage (docs/usage.md)
    • Error counters (docs/counters.md)
    • Complex Values (docs/complex_values.md)
  • Summary Report improvements;

    • Added missing observationPeriods field
    • Added table of contents
    • Made tables sortable on-click
    • Separated the display of time series facets
    • Displayed human-readable names for places, taking priority over dcid
    • Improved sample place heuristics
  • Bug fixes

    • Fix issue where a time series with a single datapoint smaller than -1 would cause a fatal crash
    • Fix order of census area code for resolution

0.1-alpha.1f

14 Feb 21:04
114ee14
Compare
Choose a tag to compare

What's new:

  • Fix HTTP exception in DC calls in Java 11.x version
  • Fix runtime errors in chart generation
  • Remove the requirement for StatVars to have a populationType
  • Fix bug in percentile* statType validation

0.1-alpha.1e

15 Dec 21:23
9fdb93f
Compare
Choose a tag to compare

This release includes:

  • Support for generating an HTML Summary Report
    • Enabled by default. To disable, pass -sr=false
  • Upgrade log4j version to 2.16.0
  • Minor bug fixes and updates

0.1-alpha.1d

24 Sep 21:34
7eea7f2
Compare
Choose a tag to compare

This release includes:

  • Support for Stat Checker
    • Enabled by default. To disable, pass -s=false
  • Support for Resolution (aka resolving local-refs and generating dcids for nodes)
    • Defaults to local mode (-r=LOCAL), for use when you already reference place DCIDs.
    • To resolve external IDs to DCIDs for places, pass -r=FULL. This will make Recon API calls.
    • To disable resolution, pass -r=NONE
  • Support for parallel processing of CSV files
    • Parallel processing happens when there are multiple CSV files
    • Defaults to no parallelism. Set -n=<number-of-threads> to increase parallelism
  • More batching for existence checks
    • This is enabled by default. To disable, pass -e=false
  • Changes in default output directory
    • Default is now dc_generated/ in the current directory. To change, set -o=<your-directory>

0.1-alpha.1c

24 Sep 21:32
f384cbc
Compare
Choose a tag to compare
0.1-alpha.1c Pre-release
Pre-release

Intermediate release with partial stat check and resolution support.

0.1-alpha.1b

27 Aug 23:52
8feaea7
Compare
Choose a tag to compare

This release includes:

  • Existence checks for DCID references using DC Staging API
  • End-to-end integration tests (refer to test-cases here
  • Several bug fixes)

0.1-alpha.1

12 Aug 22:16
a3defab
Compare
Choose a tag to compare
0.1-alpha.1 Pre-release
Pre-release

An early version of the import tool to get user feedback.