Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve][Sort] Import all changelog mode data ingest into Hive #4030

Closed
2 tasks done
thexiay opened this issue Apr 29, 2022 · 0 comments · Fixed by #4065
Closed
2 tasks done

[Improve][Sort] Import all changelog mode data ingest into Hive #4030

thexiay opened this issue Apr 29, 2022 · 0 comments · Fixed by #4065

Comments

@thexiay
Copy link
Contributor

thexiay commented Apr 29, 2022

Description

Hive does not support CDC sink, so we could give a option to regard all changelog data as insert-only data. So we can insert into hive from canal-json,rowkind as a hive schema field.

e.g.

CREATE TABLE mysql_inlong_0505 (
	`id` BIGINT,
	`name` STRING,
        `op_type` STRING METADATA FROM 'meta.op-type',
	`log_ts` TIMESTAMP_LTZ(3) METADATA FROM 'meta.op_ts' VIRTUAL, 
	PRIMARY KEY(`id`) NOT ENFORCED
) with (
	'connector' = 'mysql-cdc-inlong',
	'hostname' = 'localhost',
	'username' = 'root',
	'password' = '123456',
	'database-name' = 'test',
	'table-name' = 'mysql_inlong_0505'
);

CREATE TABLE hive_table (
  `id` BIGINT,
  `name` STRING,
  `op_type` STRING,
) PARTITIONED BY (dt string, hr string, me string) STORED AS parquet;

INSERT INTO hive_table 
SELECT user_id, order_amount, op_type, DATE_FORMAT(log_ts, 'yyyy-MM-dd'), DATE_FORMAT(log_ts, 'HH'), DATE_FORMAT(log_ts, 'mm')
FROM kafka_table;

InLong Component

InLong Sort

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

@healchow healchow added this to the 1.2.0 milestone Apr 29, 2022
@healchow healchow changed the title [Improve][Sort] option to produce insert-only stream in canal json format [Improve][Sort] Support to produce insert-only stream in canal JSON format Apr 29, 2022
@healchow healchow changed the title [Improve][Sort] Support to produce insert-only stream in canal JSON format [Improve][Sort] Import data in Canal JSON format from Kafka into Hive in insert-only mode Apr 29, 2022
@thexiay thexiay changed the title [Improve][Sort] Import data in Canal JSON format from Kafka into Hive in insert-only mode [Improve][Sort] Import all changelog mode data ingest into Hive May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants