Spline 0.3 doesn't write lineage file to S3 #1894

benedeki · 2021-08-25T08:11:28Z

Describe the bug

When the output path for either Standardization or Conformace is an S3 storage path _SPLINE file is no created,

To Reproduce

Steps to reproduce the behavior:

Have a dataset with publish path set to S3
Run Standardization and Conformance with Spline active on the dataset
See that the publish folder contains data, but no _SPLINE file
(Same if the intermediate destination after Standardization is located on S3)

Expected behavior

The _SPLINE file is present with normal content.

…stenceFactory`.

…ization is linked from Spline, ... )

…oopFsPersistenceFactory` added to the spline.properties.template

… release notes howto https://github.com/AbsaOSS/atum/releases/tag/v3.5.0

…1912) * #1894 Spline S3 support via custom persistence factory `HadoopFsPersistenceFactory`. Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>

dk1844 · 2021-09-17T19:35:38Z

Release notes:
Enceladus can write Spline's _LINEAGE file in an S3 location.

* Update for next development version 2.24.0-SNAPSHOT * Suppress download noise in license check * Suppress compiler warning of obsolete Java (#1892) * 1868 statistics with missing counts and datasets missing proprties (#1873) * 1868 statistics with missing counts and datasets missing proprties * 1843 Summary page for properties (#1880) * 1843 Home page with properties, side panel with missing counts and summary page for properties with tab containing datasets missing that particular property * Feature/1603 mapping table filtering general (#1879) * #1603 serde tests for CR and MT DataFrameFilters (mongo-bson-based serde tests for CR and MT DataFrameFilters, mongo-bson-based serde tests extended for CR with a blank mappingTableFilter) * #1909 Increase the limit of columns shown in menas column selection * 1903 Add validation for complex default values in mapping tables on import * Project config and management updates (#1908) Project config and management updates * poc issue template * CODEOWNERS update * developers update * Badges to README.md * 1881 HyperConformance enceladus_info_version from payload (#1896) 1881 HyperConformance enceladus_info_version from payload * #1887 defaultTimestampTimeZone can be source type specific (#1899) #1887 defaultTimestampTimeZone can be source type specific * `DefaultsByFormat` extends the `Defaults` trait, being able to read defaults from configuration files * `DefaultsByFormat` offers further granularity by first checking the format specific setting only then taking the global one * Basic `GlobalDefaults` are not configuration dependent anymore * Standardization now user `DefaultsByFormat` for its defaults, where rawFormat is used for format parameter * Switched to configuration path to be `enceladus.defaultTimestampTimeZone.default` and `enceladus.defaultTimestampTimeZone.[rawFormat]` respectively * `defaultTimestampTimeZone` is still supported/read as an obsolete fallback Co-authored-by: Daniel K <dk1844@gmail.com> * #1887 defaultTimestampTimeZone can be source type specific (#1916) #1887 defaultTimestampTimeZone can be source type specific * rename of the configuration prefix from `enceladus.` to `standardization.` * #172 Save original timezone information in metadata file (#1900) * Upgrade of Atum to 3.6.0 * Writing the default time zones for timestamps and dates into _INFO file * #1894 `HadoopFsPersistenceFactory` - adding Spline S3 write support (#1912) * #1894 Spline S3 support via custom persistence factory `HadoopFsPersistenceFactory`. Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com> * Update versions for release v2.24.0 * Update for next development version 2.25.0-SNAPSHOT * #1926 Add executor extra java opts to helper scripts * #1931 Add switch for running kinit in helper scripts * #1882 Update Cobrix dependency to v.2.3.0 * #1882 Remove explicit "collapse_root" since it is the default since Cobrix 2.3.0 * #1882 Update Cobrix to 2.4.1 and update Cobol test suite for ASCII files. * #1882 Bump up Cobrix version to 2.4.2. * #1927 Spline _LINEAGE and Atum _INFO files permission alignment (#1934) * #1927 - testing setup: set both spline _LINEAGE and atum _INFO to hdfs file permissions 733 -> the result on EMR HDFS was 711 (due to 022 umask there) -> evidence of working * #1927 - cleanup of test settings of 733 fs permissions * #1927 Atum final version 3.7.0 used instead of the snapshot (same code) * #1927 comment change * #1927 - default 644 FS permissions for both _INFO and _LINEAGE files. * 1937 limit output file size (#1941) * 1937 limit output file size * 1937 limit output file size * 1937 renamings + constants * 1937 more conditions * 1937 rename params * 1937 feedback + script params * 1937 more feedback * 1937 final feedback * #1951: Windows Helper scripts - add missing features * `ADDITIONAL_JVM_EXECUTOR_CONF` * Kerberos configuration * Trust store configuration * kinit execution option * `--min-processing-block-size` & `--max-processing-block-size` * logo improvement * * --min-processing-block-size -> --min-processing-partition-size * --max-processing-block-size -> --max-processing-partition-size * #1869: SparkJobs working with LoadBalanced Menas (#1935) * `menas.rest.retryCount` - configuration, how many times an url should be retried if failing with retry-able error implemented * `menas.rest.availability.setup` - configuration, how the url list should be handled * _Standardization_, _Conformance_ and _HyperConformance_ changed to provide retry count and availability setup to Dao, read from configuration * `ConfigReader` enhanced and unified to read configurations more easily and universally * Mockito upgraded to 1.16.42 Co-authored-by: Daniel K <dk1844@gmail.com> * Feature/1863 mapping table filtering (#1929) * #1863 mapping cr & mt fitler successfully reuses the same fragment (both using the same named model) - todo reuse validation, reuse manipulation methods * #1863 FilterEdit.js allows reusing filterEdit TreeTable logic between mCR and MT editings * #1863 mCT editing validation enabled (commons from FilterEdit.js) * #1863 mCT datatype hinting hinting enabled (commons from DataTypeUtils.js) * #1863 mCR/MT edit dialog default width=950px, some cleanup * #1863 bugfixes: directly creating MT with filter (fix on accepting the field), UI fix for MT filter model initialization * #1863 npm audit fix * #1863 bugfix: adding new mCR (when no edit MCR dialog has been opened yet) did not work - fixed * #1863 selecting mapping column from MT schema works (for all schema levels) for edit. TODO = Schema type support #1863 mCR - schema-based columns suggested for filter, value types filled in silently during submit, too. * #1863 bugfix: empty MT - schema may be empty * #1863 bugfix: removing a filter left a null node - cleanup was needed (otherwise view would fail) logging cleanup * #1863 select list item now shows valueType as additionalText, cleanup * #1863 nonEmptyAndNonNullFilled - map->filter bug fixed. * #1863 typo for null filter Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com> * Update versions for release v2.25.0 * [merge] build fix * [merge] npm audit fix * [merge] npm audit fix * [merge] buildfix (menas->rest_api packaging fix) * [merge] review updates Co-authored-by: David Benedeki <benedeki@volny.cz> Co-authored-by: Saša Zejnilović <zejnils@gmail.com> Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com> Co-authored-by: Adrian Olosutean <adi.olosutean@gmail.com> Co-authored-by: Ruslan Iushchenko <yruslan@gmail.com>

benedeki added bug Something isn't working 3rd party issue Issue not our own in origin but we are slowed down by them priority: medium Important but not urgent cloud Enything concerning usage and deployment in cloud labels Aug 25, 2021

benedeki added this to the Enceladus in cloud - wave 2 milestone Aug 25, 2021

benedeki assigned dk1844 Sep 6, 2021

dk1844 mentioned this issue Sep 6, 2021

Enceladus #1894: S3 support for HdfsDataLineageWriter AbsaOSS/spline#953

Closed

dk1844 added a commit that referenced this issue Sep 9, 2021

#1894 Spline S3 support via custom persistence factory `HadoopFsPersi…

8068960

…stenceFactory`.

dk1844 added a commit that referenced this issue Sep 10, 2021

#1894 Spline S3 support via custom persistence factory `HadoopFsPersi…

50957f8

…stenceFactory`.

dk1844 linked a pull request Sep 10, 2021 that will close this issue

#1894 HadoopFsPersistenceFactory - adding Spline S3 write support #1912

Merged

dk1844 added a commit that referenced this issue Sep 13, 2021

#1894 licence fix

87fa4f6

dk1844 added a commit that referenced this issue Sep 14, 2021

#1894 PR review updates (code style, visibility, excessive JSONSerial…

717b928

…ization is linked from Spline, ... )

dk1844 added a commit that referenced this issue Sep 14, 2021

#1894 PR review updates: `za.co.absa.enceladus.spline.persistence.Had…

83ace7e

…oopFsPersistenceFactory` added to the spline.properties.template

dk1844 added a commit that referenced this issue Sep 16, 2021

#1894 buildfix after merge with develop having (atum 3.5+) - based on…

67d547a

… release notes howto https://github.com/AbsaOSS/atum/releases/tag/v3.5.0

dk1844 closed this as completed in #1912 Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spline 0.3 doesn't write lineage file to S3 #1894

Spline 0.3 doesn't write lineage file to S3 #1894

benedeki commented Aug 25, 2021

dk1844 commented Sep 17, 2021

Spline 0.3 doesn't write lineage file to S3 #1894

Spline 0.3 doesn't write lineage file to S3 #1894

Comments

benedeki commented Aug 25, 2021

Describe the bug

To Reproduce

Expected behavior

dk1844 commented Sep 17, 2021