Skip to content

HIVE-21671: Replicate Streaming ingestion with transactional batch size as 1.#615

Closed
sankarh wants to merge 2 commits intoapache:masterfrom
sankarh:HIVE-21671
Closed

HIVE-21671: Replicate Streaming ingestion with transactional batch size as 1.#615
sankarh wants to merge 2 commits intoapache:masterfrom
sankarh:HIVE-21671

Conversation

@sankarh
Copy link
Contributor

@sankarh sankarh commented May 6, 2019

No description provided.

public static void classLevelSetup() throws Exception {
HashMap<String, String> overrides = new HashMap<>();
overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
GzipJSONMessageEncoder.class.getCanonicalName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test suite for json format also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not needed as no new event added for streaming support. Other suit covers the write events.

new MiniDFSCluster.Builder(conf).numDataNodes(1).format(true).build();
HashMap<String, String> acidEnableConf = new HashMap<String, String>() {{
put("fs.defaultFS", miniDFSCluster.getFileSystem().getUri().toString());
put("hive.support.concurrency", "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set strict managed table to true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, no need as it have significance only for table type which we set explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, added it.

.run("select msg from " + tblName + " order by msg")
.verifyResults((new String[] {"val1", "val2"}));

// Begin another transaction, write more records and commit 2nd transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit second after load

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. updating the comment.

"clustered by (id) into 5 buckets " +
"stored as orc tblproperties(\"transactional\"=\"true\")");

// Static partition values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two test case are identical ..common code can be consolidated to one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it separate is more readable. Also, it's not exactly same. We have where clause in static partition test which is not there in unpartitioned one.

String key = (partitionValues == null) ? tableObject.getFullyQualifiedName()
: partitionValues.toString();
if (!writePaths.containsKey(key)) {
writePaths.put(key, new WriteDirInfo(partitionValues, writeDir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if for same partition value path is already there ..is it ok to ignore it ? i think assert can be added for writeDir to be same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible to have more than one delta directory for partition from same streaming txn. So, even if multiple bucket files created within same delta dir, this is guaranteed that writeDir is same. Will add assert.

.finalDestination(partitionPath);

// Add write directory information in the connection object.
conn.addWriteDirectoryInfo(partitionValues, AcidUtils.baseOrDeltaSubdirPath(partitionPath, options));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be always delta ..using base or delta may give wrong impression that it can be base or delete delta in some cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a common method used by multiple places. Also, the input options decide what is the dir type. I think, it is not confusing as reader would understand from the code. I will keep it as it is.

recordWriter.close();

// Add write notification events if it is enabled.
conn.addWriteNotificationEvents();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write notification logs should be written before commit txn event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is already called before commit. Otherwise test wouldn't pass.

@sankarh sankarh force-pushed the HIVE-21671 branch 3 times, most recently from 1da72bb to 179720b Compare May 8, 2019 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants