Avoid checking Accumulo table exists before creation by jmark99 · Pull Request #74 · apache/accumulo-examples

jmark99 · 2021-04-05T18:21:08Z

Rather than checking if a table exists prior to creation, just create, catch and ignore TableExistsException instead.

This ticket removes the explicit check where appropriate. During the process several other small changes were made to the affected classes as well.

For the Bloom related examples, the table names were updated to use contants. Several methods were created to remove some redundant code.

For the CharacterHistogram.java class, an unneeded import was also removed.

Closes #13

Rather than checking if a table exists prior to creation, create, catch and ignore TableExistsException instead. This ticket removes the explicit check where appropriate. During the process several other small changes were made to the affected classes as well. For the Bloom related examples, the table names were updated to use contants. Several methods were created to remove some redundant code. For the CharacterHistogram.java class, an unneeded import was also removed. Closes apache#13

EdColeman · 2021-04-05T18:50:11Z

spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java

+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        // ignore


Does this not represent an actual error condition - this implies that the proceeding table delete failed? It shouldn't happen, but if it does then it seems something needs investigation.

EdColeman · 2021-04-05T18:56:02Z

src/main/java/org/apache/accumulo/examples/bloom/BloomFiltersNotFound.java

-        client.tableOperations().setProperty("bloom_test4", "table.bloom.enabled", "true");
+        client.tableOperations().create(BLOOM_TEST3);
+      } catch (TableExistsException e) {
+        // ignore


If you run the test twice, do you get double the data? Does that matter?

EdColeman · 2021-04-05T19:04:29Z

A general question - could ignoring a table exists cause confusion if the results are examined by the user? Say for an ingest, running the example twice might lead to doubling the ingest - if I expected 10 rows but instead had 20 after a second run would that be worse than having the second fail because the table was there? Or are the scanners that are pulling the expect number of rows ingested and would only return partial results on multiple runs? Would it help if the exception was logged instead of silently ignored?

Another thought is if the examples are to serve as "recommended coding practices" is there benefit of showing the handling of table exists exceptions something we would recommend?

Overall I have no issues with the changes, just mainly curious as to why TableExists exceptions would be ignored.

jmark99 · 2021-04-05T19:48:23Z

@EdColeman those are good questions to ask. I assume the initial intent of the ticket was to prevent checking for the existence of the table prior to every creation. The current state of the change keeps the examples working as before. That is, previously the code would check for the existence of the table and if it already existed would just keep chugging along without performing the actual creation. If errors arose or if the data was incorrect after a second consecuteve run, then the example would fail somewhere down the line or just have incorrect output (at least for most of the ones that I see).

Ideally, prior to running the examples you would want the table to not even exist. That way we would hopefully see consistent results between each run. I briefly considered updating the examples to delete tables if they already existed prior to execution, but I was worried if someone ran an example and had a 'real' table with a name from the example that could cause a problem. I guess that could still occur if even if the examples run and names were identical.

Another possiblity is to update them to provide a message to the user to remove any existing tables before running and exit. That would allow the user to clear up existing tables and then run the example. I would probably prefer that path at the moment. What are your thoughts on that approach?

I did run the integration tests back to back and they all still passed. I did not examine each example individually to see what would happen if run back to back.

So in summary, the current change keeps all of the examples running in the same manner as they were already running, but without the explicit table existence check. But I am open to updating to one of the aforementioned methods or to another route if you have other suggestions.

jmark99 · 2021-04-05T21:05:46Z

@EdColeman and I had some additional Slack conversations and came up with a few ideas to hopefully improve the examples a bit more. In regards to table creation a couple of additional changes will be made. Firstly, the table names are going to receive a prefix so that there will be no conflict with any existing tables that a user might already be using. Additionally, If a table already exists, this will be logged so that a user can make an informed decision as to whether they would like to clean up the tables prior to running or re-running the examples. The log messages will indicate whether any additional configuration has taken place in cases where the table already existed. Finally, if situations exist where the example should definitely not proceed if a table already exists, the example will be updated to reflect this.

Created Constants class to hold tables names and other constants used throughout the examples. Updated RandomBatchScanner to print log warning if table already exists. Updated SequentialBatchWriter to exit if the required table for scanning does not exist.

Updated to exit with error if table creation fails.

Created common Enum class for table names and constants. Updated table creation code to log warning message if table already exists.

Updated table name in batch.md documentation. Updated to use examples namespace and to warn user if table already exists.

Updated bloom.md docs to represent updated use of namespace with table name.

Updated table creation to warn users if table already exists. Created common enum class to contain table names and other values.

Updated classpath.md documentation to include creation of examples namespace. Also, update config command to use table.class.loader.context rather than deprecated table.classpat.context.

Updated .gitignore to ignore log4j properties in conf directory.

Updated ReadWriteExample and RowOperation classes to use TableExistsException and warning messages to user if table already exists.

Minor style change to array creation in StatsCombiner.java. Updated combiner.md documentation to make use of examples namespace.

Updated compacationStrategy.md documentation to make use of namespaces.

Update log message constants in Constants Enum class.

* Create common enum to contain namespace and table names. * Small modifications to several small methods to read clearer. * Extract common creation of constraints table to common location. * Replace use of System.out.println with logging.

Updated deleteKeyValuePair.md documentation to use example namespaces.

Update dirlist example to use namespaces and to warn users of the prior existence of tables. Create common class to hold table names.

Update export.md documentation to utilize namespaces.

Updated documentation for filter, helloworld, and isolation examples. Updated associated java classes for the aforementioned exmples.

Update the constraint related examples to work with the recent change to the Constraints package. The Constraints were moved to publie API and the location of the imports had to be updated to point to the right location.

Updated docs for isolation, regex, reservaction, rgbalancer, and rowhash. Updated the Constraint related classes.

Update docs for sample and shard example. Update the SampleExample class to handle examples namespace.

Update documentation for wordCount example. Update WordCount class to create namespace and table. Added MRCommon class to contains common mapreduce package variables.

Update Visibility example documentation to reflect the use of the examples namespace. Update contraints path to use new public API path for Constraints.

Update documentation for tabletofile and terasoft examples.

Update batch.md documentation. Modify access to table name Strings in BloomCommon. Remove default names for tables in Shard example.

Updates several documentation files to add namespace info. Updates various classes to use the common createTableWithNamespace method. Updates the declaration of table names in various classes.

Update filedata documentation to note that the ChunkCombiner class must be avaiable in the accumulo lib directory or on the classpath somewhere in order to scan the create examples.dataTable.

EdColeman · 2021-05-06T19:12:02Z

spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java

+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "


Should this be log?

EdColeman · 2021-05-06T19:12:35Z

spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java

+      try {
+        client.tableOperations().create(outputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "


Should this be log (instead of low)

Yes :) Thanks for the catch. It was not caught by compile since this class is not in the src path of the examples.

Replace misspelled word - low -> log

EdColeman

The changes look okay, except for the two minor questions.

I tried to build, but I think I need additional set-up for that to work.

jmark99 · 2021-05-07T11:47:47Z

@EdColeman, yes, you need to have a running instance and update the examples conf files accordingly. For a couple of the examples the examples jar file must be in the Accumulo lib directory or somewhere on the classpath. I use fluo-uno for running the examples. I've gone through them all and except for the couple of issues that are still listed in GitHub, they are working as expected. Are you good with me pushing the changes at this time?

EdColeman reviewed Apr 5, 2021

View reviewed changes

jmark99 marked this pull request as draft April 5, 2021 20:55

jmark99 added 24 commits April 12, 2021 09:27

Avoid checking Accumulo table exists before creation

0c9efd0

Updated to exit with error if table creation fails.

Avoid checking Accumulo table exists before creation

a500887

Created common Enum class for table names and constants. Updated table creation code to log warning message if table already exists.

Avoid checking Accumulo table exists before creation

633aeba

Updated table name in batch.md documentation. Updated to use examples namespace and to warn user if table already exists.

Avoid checking Accumulo table exists before creation

4c241b8

Updated bloom.md docs to represent updated use of namespace with table name.

Avoid checking Accumulo table exists before creation

7a33ac0

Updated table creation to warn users if table already exists. Created common enum class to contain table names and other values.

Avoid checking Accumulo table exists before creation

1845cf7

Updated classpath.md documentation to include creation of examples namespace. Also, update config command to use table.class.loader.context rather than deprecated table.classpat.context.

Avoid checking Accumulo table exists before creation

7930e54

Updated .gitignore to ignore log4j properties in conf directory.

Avoid checking Accumulo table exists before creation

bcdb0af

Updated ReadWriteExample and RowOperation classes to use TableExistsException and warning messages to user if table already exists.

Avoid checking Accumulo table exists before creation

f7809cc

Minor style change to array creation in StatsCombiner.java. Updated combiner.md documentation to make use of examples namespace.

Avoid checking Accumulo table exists before creation

2e16da9

Updated compacationStrategy.md documentation to make use of namespaces.

Avoid checking Accumulo table exists before creation

0869acb

Update log message constants in Constants Enum class.

Avoid checking Accumulo table exists before creation

210c8d1

Updated deleteKeyValuePair.md documentation to use example namespaces.

Avoid checking Accumulo table exists before creation

695b9f0

Update dirlist example to use namespaces and to warn users of the prior existence of tables. Create common class to hold table names.

Avoid checking Accumulo table exists before creation

bc4361e

Update export.md documentation to utilize namespaces.

Avoid checking Accumulo table exists before creation

255ca19

Updated documentation for filter, helloworld, and isolation examples. Updated associated java classes for the aforementioned exmples.

Avoid checking Accumulo table exists before creation

cc77603

Update the constraint related examples to work with the recent change to the Constraints package. The Constraints were moved to publie API and the location of the imports had to be updated to point to the right location.

Avoid checking Accumulo table exists before creation

04536ba

Updated docs for isolation, regex, reservaction, rgbalancer, and rowhash. Updated the Constraint related classes.

Avoid checking Accumulo table exists before creation

7bc81a8

Update docs for sample and shard example. Update the SampleExample class to handle examples namespace.

Avoid checking Accumulo table exists before creation

855d8b2

Update documentation for wordCount example. Update WordCount class to create namespace and table. Added MRCommon class to contains common mapreduce package variables.

Avoid checking Accumulo table exists before creation

f4f9a30

Update Visibility example documentation to reflect the use of the examples namespace. Update contraints path to use new public API path for Constraints.

Avoid checking Accumulo table exists before creation

7c9ca4a

Update documentation for tabletofile and terasoft examples.

Avoid checking Accumulo table exists before creation

4feda7e

Update batch.md documentation. Modify access to table name Strings in BloomCommon. Remove default names for tables in Shard example.

jmark99 added 2 commits May 6, 2021 12:35

Avoid checking Accumulo table exists before creation

4e0a472

Updates several documentation files to add namespace info. Updates various classes to use the common createTableWithNamespace method. Updates the declaration of table names in various classes.

Avoid checking Accumulo table exists before creation

77ceefb

Update filedata documentation to note that the ChunkCombiner class must be avaiable in the accumulo lib directory or on the classpath somewhere in order to scan the create examples.dataTable.

jmark99 marked this pull request as ready for review May 6, 2021 17:45

EdColeman reviewed May 6, 2021

View reviewed changes

Avoid checking Accumulo table exists before creation

6da9745

Replace misspelled word - low -> log

EdColeman reviewed May 6, 2021

View reviewed changes

jmark99 merged commit ef2e04e into apache:main May 7, 2021

jmark99 deleted the ae13 branch May 12, 2021 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid checking Accumulo table exists before creation#74

Avoid checking Accumulo table exists before creation#74
jmark99 merged 28 commits intoapache:mainfrom
jmark99:ae13

jmark99 commented Apr 5, 2021

Uh oh!

EdColeman Apr 5, 2021

Uh oh!

EdColeman Apr 5, 2021

Uh oh!

EdColeman commented Apr 5, 2021

Uh oh!

jmark99 commented Apr 5, 2021

Uh oh!

jmark99 commented Apr 5, 2021

Uh oh!

EdColeman May 6, 2021

Uh oh!

EdColeman May 6, 2021

Uh oh!

jmark99 May 6, 2021

Uh oh!

EdColeman left a comment

Uh oh!

jmark99 commented May 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmark99 commented Apr 5, 2021

Uh oh!

EdColeman Apr 5, 2021

Choose a reason for hiding this comment

Uh oh!

EdColeman Apr 5, 2021

Choose a reason for hiding this comment

Uh oh!

EdColeman commented Apr 5, 2021

Uh oh!

jmark99 commented Apr 5, 2021

Uh oh!

jmark99 commented Apr 5, 2021

Uh oh!

EdColeman May 6, 2021

Choose a reason for hiding this comment

Uh oh!

EdColeman May 6, 2021

Choose a reason for hiding this comment

Uh oh!

jmark99 May 6, 2021

Choose a reason for hiding this comment

Uh oh!

EdColeman left a comment

Choose a reason for hiding this comment

Uh oh!

jmark99 commented May 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants