Skip to content

Releases: bxparks/bigquery-schema-generator

1.6.1 - fix amnesia during multipe type mismatches

12 Jan 23:31
6405d35
Compare
Choose a tag to compare
  • 1.6.1 (2024-01-12)
    • Bug Fix: Prevent amnesia that causes multiple type mismatches warnings
      • If a data set contains multiple records with a column which do not
        match each other, then the old code would remove the corresponding
        internal schema_entry for that column, and print a warning message.
      • This means that subsequent records would recreate the schema_entry,
        and a subsequent mismatch would print another warning message.
      • This also meant that if there was a second record after the most
        recent mismatch, the script would output a schema entry for the
        mismatching column, corresponding to the type of the last record which
        was not marked as a mismatch.
      • The fix is to use a tombstone entry for the offending column, instead
        of deleting the schema_entry completely. Only a single warning
        message is printed, and the column is ignored for all subsequent
        records in the input data set.
      • See
        [Issue#98](https://github.com/bxparks/bigquery-schema-generator/issues/98]
        which identified this problem which seems to have existed from the
        very beginning.

1.6.0 - allow NULLABLE to convert to REPEATED; add input_format=csvdictreader

01 Apr 16:46
d8fb050
Compare
Choose a tag to compare
  • 1.6.0 (2023-04-01)
    • Allow null fields to convert to REPEATED because bq load seems
      to interpret null fields to be equivalent to an empty array [].
      See #90.
    • Add input_format='csvdictreader' option. Similar to 'dict' but
      intended to be used with the csv.DictReader class to read CSV and TSV
      files with various options. More documentation and discussions at:

1.5.1 - add examples; update documentation

04 Dec 16:10
2d983fa
Compare
Choose a tag to compare
  • 1.5.1 (2022-12-04)
    • Add examples/*.py to demonstrate how to use SchemaGenerator as a
      library.
    • Update README.md to state that bq load --autodetect uses the first
      500 records. Previously, it scanned only the 100 records.
    • This is a maintenance release with no new features or bug fixes.

v1.5 - add --preserve_input_sort_order flag

14 Nov 16:31
2830dd0
Compare
Choose a tag to compare
  • 1.5 (2021-11-14)
    • Make the column order in the BQ schema file match the order of appearance
      in the JSON data file using the --preserve_input_sort_order flag.
      Thanks to kdeggelman@ in
      PR#75.

v1.4.1 - add documentation for input_format='dict'

23 Aug 16:52
da3609f
Compare
Choose a tag to compare
  • 1.4.1 (2021-08-23)
    • Add documentation for the input_format='dict' option.
    • Add additional inpout format 'json' and 'dict' test cases.
    • Maintenance release, no functional change in core code.

v1.4 - input_format can be an internal Python dict; support scientific floating point numbers

10 Dec 05:27
acaa74b
Compare
Choose a tag to compare
  • 1.4 (2020-12-09)
    • Add 'dict' as a third input_format when SchemaGenerator is used as a
      library. This can be useful when the data has already been transformed
      into a list of native Python dict objects (see #58, thanks to
      ZiggerZZ@).
    • Expand the pattern matchers for quoted integers and quoted floating point
      numbers to be more compatible with the patterns recognized by bq load --autodetect.
    • Add Table of Contents to READMD.md. Add usage info for the
      schema_map=existing_schema_map and the input_format='dict' parameters
      in the SchemaGenerator() constructor.

1.3 - support an existing schema file

05 Dec 18:53
d5c3cd3
Compare
Choose a tag to compare
  • 1.3 (2020-12-05)
    • Allow an existing schema file to be specified using
      --existing_schema_path flag, so that new data can be merged into it.
      See #40, #57, and #61.
      (Thanks to abroglesc@ and bozzzzo@).

1.2 - print JSON full path in error messages

28 Oct 03:46
0f63dd0
Compare
Choose a tag to compare
  • 1.2 (2020-10-27)
    • Print full path of nested JSON elements in error messages (See #52;
      thanks abroglesc@).

1.1 - Add `--ignore_invalid_lines` flag

10 Jul 14:48
Compare
Choose a tag to compare
  • 1.1 (2020-07-10)
    • Add --ignore_invalid_lines to ignore parsing errors on invalid lines
      and continue processing. Fixes
      #49.
    • Add GitHub actions for automated tests and flake8 validation.
    • Add package __version__ string.
    • Update setup.py, no longer need to convert README.md markdown to RST.

1.0 - fix sanitize_name, add continuous integration

04 Apr 19:33
15b68d1
Compare
Choose a tag to compare
  • 1.0 (2020-04-04)
    • Fix --sanitize_names for recursive RECORD fields (Thanks riccardomc@,
      see #43).
    • Clean up how unit tests are run, trying my best to figure out
      Python's convolution package importing mechanism.
    • Add GitHub Actions continuous integration pipelines with flake8 checks and
      automated unit testing.