Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defaultTimestampTimeZone can be source type specific #1887

Closed
benedeki opened this issue Aug 19, 2021 · 1 comment · Fixed by #1899 or #1916
Closed

defaultTimestampTimeZone can be source type specific #1887

benedeki opened this issue Aug 19, 2021 · 1 comment · Fixed by #1899 or #1916
Assignees
Labels
feature New feature priority: medium Important but not urgent Standardization Standardization Job affected

Comments

@benedeki
Copy link
Collaborator

benedeki commented Aug 19, 2021

Background

When a timestamp is read without having time zone specification - either in data or in metadata - the system setting of defaultTimestampTimeZone is used. For various source types this setting might cause

Feature

One global setting of defaultTimestampTimeZone can cause confusion and unwanted timestamp shifts for certain. For example parquet timestamps are generally normalized to UTC. This can be addressed by metadata configuration of the schema, but it's tedious.
Implement defaultTimestampTimeZone specific per source type, which would take precedence to the global defaultTimestampTimeZone. The global would be used as a fallback, similarly as system time zone is a fallback in case of the global setting missing.

Additional

Update the documentation with this configuration option.

Proposed Solution

Solution Ideas

  1. Upon startup, have Defaults implicit be based on source type
  2. First check for defaultTimestampTimeZone.[source_type], if not found use the existing logic.
@benedeki benedeki added feature New feature Standardization Standardization Job affected priority: medium Important but not urgent labels Aug 19, 2021
@benedeki benedeki self-assigned this Aug 23, 2021
benedeki added a commit that referenced this issue Sep 2, 2021
* `DefaultsByFormat` extends the `Defaults` trait, being able to read defaults from configuration files
* `DefaultsByFormat` offers further granularity by first checking the format specific setting only then taking the global one
* Basic `GlobalDefaults` are not configuration dependent anymore
* Standardization now user `DefaultsByFormat` for its defaults, where rawFormat is used for format parameter
benedeki added a commit that referenced this issue Sep 2, 2021
* Removed forgotten pointless configuration
@benedeki benedeki linked a pull request Sep 3, 2021 that will close this issue
benedeki added a commit that referenced this issue Sep 3, 2021
* Placed the `implicit defaults` creation to a more appropriate place
benedeki added a commit that referenced this issue Sep 3, 2021
* Build was failing on UTs missing defaults implicit
benedeki added a commit that referenced this issue Sep 11, 2021
* changed DefaultsByFormat class for easier testing
benedeki added a commit that referenced this issue Sep 13, 2021
* Switched to configuration path to be `enceladus.defaultTimestampTimeZone.default` and `enceladus.defaultTimestampTimeZone.[rawFormat]` respectively
* `defaultTimestampTimeZone` is still supported/read as an obsolete fallback
benedeki added a commit that referenced this issue Sep 14, 2021
benedeki added a commit that referenced this issue Sep 15, 2021
#1887 defaultTimestampTimeZone can be source type specific
* `DefaultsByFormat` extends the `Defaults` trait, being able to read defaults from configuration files
* `DefaultsByFormat` offers further granularity by first checking the format specific setting only then taking the global one
* Basic `GlobalDefaults` are not configuration dependent anymore
* Standardization now user `DefaultsByFormat` for its defaults, where rawFormat is used for format parameter
* Switched to configuration path to be `enceladus.defaultTimestampTimeZone.default` and `enceladus.defaultTimestampTimeZone.[rawFormat]` respectively
* `defaultTimestampTimeZone` is still supported/read as an obsolete fallback
Co-authored-by: Daniel K <dk1844@gmail.com>
@benedeki
Copy link
Collaborator Author

benedeki commented Sep 15, 2021

Release notes:
Default time zone used during Standardization can now be overridden/configured for each raw format type. The configuration name is then standardization.defaultTimestampTimeZone.[format] and takes precedence over the global setting of standardization.defaultTimestampTimeZone.default. (defaultTimestampTimeZone was made obsolete, but still supported as a fallback)

benedeki added a commit that referenced this issue Sep 15, 2021
* rename of the configuration prefix from `enceladus.` to `standardization.`
@benedeki benedeki linked a pull request Sep 15, 2021 that will close this issue
benedeki added a commit that referenced this issue Sep 16, 2021
#1887 defaultTimestampTimeZone can be source type specific
* rename of the configuration prefix from `enceladus.` to `standardization.`
dk1844 added a commit that referenced this issue Nov 4, 2021
* Update for next development version 2.24.0-SNAPSHOT

* Suppress download noise in license check

* Suppress compiler warning of obsolete Java (#1892)

* 1868 statistics with missing counts and datasets missing proprties (#1873)

* 1868 statistics with missing counts and datasets missing proprties

* 1843 Summary page for properties (#1880)

* 1843 Home page with properties,  side panel with missing counts and summary page for properties with tab containing datasets missing that particular property

* Feature/1603 mapping table filtering general (#1879)

* #1603 serde tests for CR and MT DataFrameFilters
(mongo-bson-based serde tests for CR and MT DataFrameFilters, mongo-bson-based serde tests extended for CR with a blank mappingTableFilter)

* #1909 Increase the limit of columns shown in menas column selection

* 1903 Add validation for complex default values in mapping tables on import

* Project config and management updates (#1908)

Project config and management updates
* poc issue template
* CODEOWNERS update
* developers update
* Badges to README.md

* 1881 HyperConformance enceladus_info_version from payload  (#1896)

1881 HyperConformance enceladus_info_version from payload

* #1887 defaultTimestampTimeZone can be source type specific (#1899)

#1887 defaultTimestampTimeZone can be source type specific
* `DefaultsByFormat` extends the `Defaults` trait, being able to read defaults from configuration files
* `DefaultsByFormat` offers further granularity by first checking the format specific setting only then taking the global one
* Basic `GlobalDefaults` are not configuration dependent anymore
* Standardization now user `DefaultsByFormat` for its defaults, where rawFormat is used for format parameter
* Switched to configuration path to be `enceladus.defaultTimestampTimeZone.default` and `enceladus.defaultTimestampTimeZone.[rawFormat]` respectively
* `defaultTimestampTimeZone` is still supported/read as an obsolete fallback
Co-authored-by: Daniel K <dk1844@gmail.com>

* #1887 defaultTimestampTimeZone can be source type specific (#1916)

#1887 defaultTimestampTimeZone can be source type specific
* rename of the configuration prefix from `enceladus.` to `standardization.`

* #172 Save original timezone information in metadata file (#1900)

* Upgrade of Atum to 3.6.0
* Writing the default time zones for timestamps and dates into _INFO file

* #1894 `HadoopFsPersistenceFactory` - adding Spline S3 write support (#1912)

* #1894 Spline S3 support via custom persistence factory `HadoopFsPersistenceFactory`.
Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>

* Update versions for release v2.24.0

* Update for next development version 2.25.0-SNAPSHOT

* #1926 Add executor extra java opts to helper scripts

* #1931 Add switch for running kinit in helper scripts

* #1882 Update Cobrix dependency to v.2.3.0

* #1882 Remove explicit "collapse_root" since it is the default since Cobrix 2.3.0

* #1882 Update Cobrix to 2.4.1 and update Cobol test suite for ASCII files.

* #1882 Bump up Cobrix version to 2.4.2.

* #1927 Spline _LINEAGE and Atum _INFO files permission alignment (#1934)

* #1927 - testing setup: set both spline _LINEAGE and atum _INFO to hdfs file permissions 733 -> the result on EMR HDFS was 711 (due to 022 umask there) -> evidence of working

* #1927 - cleanup of test settings of 733 fs permissions

* #1927 Atum final version 3.7.0 used instead of the snapshot (same code)

* #1927 comment change

* #1927 - default 644 FS permissions for both _INFO and _LINEAGE files.

* 1937 limit output file size (#1941)

* 1937 limit output file size

* 1937 limit output file size

* 1937 renamings + constants

* 1937 more conditions

* 1937 rename params

* 1937 feedback + script params

* 1937 more feedback

* 1937 final feedback

* #1951: Windows Helper scripts - add missing features
* `ADDITIONAL_JVM_EXECUTOR_CONF`
* Kerberos configuration
* Trust store configuration
* kinit execution option
* `--min-processing-block-size` & `--max-processing-block-size`
* logo improvement

* * --min-processing-block-size -> --min-processing-partition-size
* --max-processing-block-size -> --max-processing-partition-size

* #1869: SparkJobs working with LoadBalanced Menas (#1935)

* `menas.rest.retryCount` - configuration, how many times an url should be retried if failing with retry-able error implemented
* `menas.rest.availability.setup` - configuration, how the url list should be handled
* _Standardization_, _Conformance_ and _HyperConformance_ changed to provide retry count and availability setup to Dao, read from configuration
* `ConfigReader` enhanced and unified to read configurations more easily and universally
* Mockito upgraded to 1.16.42

Co-authored-by: Daniel K <dk1844@gmail.com>

* Feature/1863 mapping table filtering (#1929)

* #1863 mapping cr & mt fitler successfully reuses the same fragment (both using the same named model)
 - todo reuse validation, reuse manipulation methods

* #1863 FilterEdit.js allows reusing filterEdit TreeTable logic between mCR and MT editings

* #1863 mCT editing validation enabled (commons from FilterEdit.js)

* #1863 mCT datatype hinting hinting enabled (commons from DataTypeUtils.js)

* #1863 mCR/MT edit dialog default width=950px, some cleanup
* #1863 bugfixes: directly creating MT with filter (fix on accepting the field), UI fix for MT filter model initialization

* #1863 npm audit fix

* #1863 bugfix: adding new mCR (when no edit MCR dialog has been opened yet) did not work - fixed

* #1863 selecting mapping column from MT schema works (for all schema levels) for edit. TODO = Schema type support

 #1863 mCR - schema-based columns suggested for filter, value types filled in silently during submit, too.

* #1863 bugfix: empty MT - schema may be empty

* #1863 bugfix: removing a filter left a null node - cleanup was needed (otherwise view would fail)
logging cleanup

* #1863 select list item now shows valueType as additionalText, cleanup

* #1863 nonEmptyAndNonNullFilled - map->filter bug fixed.

* #1863 typo for null filter

Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>

* Update versions for release v2.25.0

* [merge] build fix

* [merge] npm audit fix

* [merge] npm audit fix

* [merge] buildfix (menas->rest_api packaging fix)

* [merge] review updates

Co-authored-by: David Benedeki <benedeki@volny.cz>
Co-authored-by: Saša Zejnilović <zejnils@gmail.com>
Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>
Co-authored-by: Adrian Olosutean <adi.olosutean@gmail.com>
Co-authored-by: Ruslan Iushchenko <yruslan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature priority: medium Important but not urgent Standardization Standardization Job affected
Projects
None yet
1 participant