Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility to create an empty wal file. #4116

Merged
merged 22 commits into from
Apr 9, 2024

Conversation

EdColeman
Copy link
Contributor

The utility creates an empty wal file with a version 4 header. This parallels the CreateEmpty utility to generate an empty RFile.

A hex dump of a created file.

00000000  2d 2d 2d 20 4c 6f 67 20  46 69 6c 65 20 48 65 61  |--- Log File Hea|
00000010  64 65 72 20 28 76 34 29  20 2d 2d 2d 00 00 00 07  |der (v4) ---....|
00000020  55 2b 31 46 34 37 42 00  00 00 00 02 00 00 00 00  |U+1F47B.........|
00000030  00 00                                             |..|
00000032

The created file can be read by the wal-info utility

2023-12-22T19:46:22,240 [conf.SiteConfiguration] INFO : Found Accumulo configuration on classpath at [deleted]/fluo-uno/install/accumulo-3.1.0-SNAPSHOT/conf/accumulo.properties
OPEN 

2023-12-22T19:46:23,310 [logger.LogReader] INFO : Done reading /accumulo/empty/e1.wal

I did test this using an uno instance, by kill the tserver, replaceing the wal that didn't have mutations and the then restarted the tserver.

Relevant log fragments: (the wal replaced with the generated empty was 48857908-d9c2-48ec-8bb0-5d3e9ca2cce4)

tserver log on recovery

2023-12-22T16:53:07,533 [log.LogSorter] INFO : Finished log sort 48857908-d9c2-48ec-8bb0-5d3e9ca2cce4 50 bytes 1 parts in 211ms
2023-12-22T16:53:07,536 [log.LogSorter] INFO : Finished log sort 7b435c97-4952-476f-8862-a6cc79f0cd3d 6684 bytes 1 parts in 212ms
...
2023-12-22T16:53:27,885 [log.SortedLogRecovery] INFO : Found 1 of 2 logs with max id 1 for tablet +r<<
2023-12-22T16:53:27,896 [log.SortedLogRecovery] INFO : Recovering mutations, tablet:+r<< tabletId:1 seq:30 logs:[7b435c97-4952-476f-8862-a6cc79f0cd3d]
2023-12-22T16:53:27,924 [tablet.recovery] INFO : For +r<< recovered 0 mutations creating 0 entries from 0 walogs
2023-12-22T16:53:27,948 [tserver.TabletClientHandler] INFO : Root tablet loaded: +r<<
2023-12-22T16:53:28,258 [tserver.TabletServer] INFO : Writing log marker for hdfs://localhost:8020/accumulo/wal/ip-10-113-15-130+9997/80e442de-23d7-413f-9612-dc542c54a319
2023-12-22T16:53:28,262 [log.TabletServerLogger] INFO : Using next log hdfs://localhost:8020/accumulo/wal/ip-10-113-15-130+9997/80e442de-23d7-413f-9612-dc542c54a319
2023-12-22T16:53:28,293 [tserver.TabletServer] INFO : Writing log marker for hdfs://localhost:8020/accumulo/wal/ip-10-113-15-130+9997/443901c3-53cf-4275-a4d0-a9f18b60f735
2023-12-22T16:53:28,478 [log.SortedLogRecovery] INFO : Tablet !0<;~ is not defined in recovery logs [48857908-d9c2-48ec-8bb0-5d3e9ca2cce4, 7b435c97-4952-476f-8862-a6cc79f0cd3d]
2023-12-22T16:53:28,509 [tablet.recovery] INFO : For !0<;~ recovered 0 mutations creating 0 entries from 0 walogs
2023-12-22T16:53:28,566 [log.SortedLogRecovery] INFO : Tablet !0;< is not defined in recovery logs [48857908-d9c2-48ec-8bb0-5d3e9ca2cce4, 7b435c97-4952-476f-8862-a6cc79f0cd3d]
2023-12-22T16:53:28,575 [tablet.recovery] INFO : For !0;
< recovered 0 mutations creating 0 entries from 0 walogs

@EdColeman EdColeman added this to In progress in 3.1.0 via automation Dec 22, 2023
@EdColeman EdColeman self-assigned this Dec 22, 2023
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this could reuse the serialization code in the WAL classes more, in some way, rather than writing an effectively copied implementation of it for this utility. The WAL code is a little bit of a mess, though, and needs to be simplified a bit. Probably won't be done before this PR is done, but would be good to revisit this implementation after some WAL internals code cleanup.

@Parameter(
description = " <path> the path / filename of the created empty wal file. The file cannot exist",
required = true)
String walFilename;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is to work like CreateEmpty for wal files, this needs to be:

Suggested change
String walFilename;
List<String> files = new ArrayList<>();

And then the execute method needs to loop over each file name to create a WAL for each one.

Also, like CreateEmpty, each path should be a URL and be written to a Path in HDFS (org.apache.hadoop.fs.Path), not a java.nio.file.Path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, when needing to use an empty rfile it was convenient to have a "local" file. That file could the be reused when needed by using hadoop copyFromLocal to the desired directory and filename. The empty rfile utility was only needed when the empty.rf "seed" file was not available.

It would seem this could be similar. If you have a recovery problem and you want to move on (and accept the data loss) you would identify the file(s) that are missing / corrupt and then copy an empty wal to replace them. If the files are corrupt or cannot be processed, would need to delete the current files and then replace them with the empty file.

Rather than running this utility to create each empty file on demand, you could create an single file and then either use the command line or better, a simple script to do the delete / copy.

The file could be created in hdfs, but then you would likely need to create a temporary directory in the proper namespace - in which case copyFromLocal is just as easy to use.

If the intended usage is the utility would to create the files directly - you would still need to delete the original files. Overwriting is probably a bad idea - if there was mistake, you could clobber valid files.

I assumed that the intention of this utility was to simply create the empty seed file. These changes can be done. but it seems to be unnecessarily complicating things to create that empty wal seed file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, when needing to use an empty rfile it was convenient to have a "local" file.

The empty RFile create tool doesn't create a local file, though. It creates one in the DFS. However, you can specify a local file by using a URL filename pattern, like file:///path/to/empty.rf.

you could create an single file and then either use the command line or better, a simple script to do the delete / copy.

I agree, and think that's probably how most will use it. But the current CreateEmpty is more flexible than that by allowing creation of multiple at a time. I don't know how people will use that, but that's how it's implemented.

If the intended usage is the utility would to create the files directly - you would still need to delete the original files. Overwriting is probably a bad idea - if there was mistake, you could clobber valid files.

Understood, but creating in DFS isn't about overwriting. It's about where you're placing the seed file before you move it into place. You can place it in the DFS, so it's a simple hdfs mv instead of a copyFromLocal.

I assumed that the intention of this utility was to simply create the empty seed file. These changes can be done. but it seems to be unnecessarily complicating things to create that empty wal seed file.

It can definitely serve that purpose, but I do not want to restrict the user's ability to do things in batches, if that's how they've written their scripts. and it's not that much complexity to just loop over the file names to create multiple. This is already a feature for the CreateEmpty utility. Using the DFS API to create a file from a URL rather than a local file does add a tiny bit of complexity, but I still think it's worth it to keep parity with the behavior of CreateEmpty. Users don't need to learn two different sets of quirks/restrictions for two, essentially equivalent, utilities that vary only in the type of empty file being created. In fact, I think this utility could just be a --type wal|rf option to the existing CreateEmpty, with a default type of rf, which would make adding this utility very trivial... but that would require more complex changes to the filename validation in the JCommander Opts. So, to keep things the same for both utilities, you can either add a tiny bit of complexity to what you've written in this separate utility, or you can add a bit of a complexity in the filename validation and argument parsing for the existing utility, and move your impl into that for the WAL case. Either way, I think there's value in keeping the behavior of the two utilities as close as possible. Divergence invites confusion.

Ed Coleman added 4 commits January 25, 2024 16:08
  - removes stand-alone create-wal from earlier commits
  - relocates create-empty from core to tserver util becuase of LogEntry
    dependency
  - updates tests
@EdColeman
Copy link
Contributor Author

4741d79 removes create-empty-wal as a stand-alone and adds a type option [rfile, wal] to create-empty.

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added the crypto service stuff into the RFile creation, too. That's great!

Comment on lines 98 to 99
Process processInfo =
cluster.exec(CreateEmpty.class, "--type", "rfile", rfile.toString()).getProcess();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility, the rfile could be the default type if not specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the default type and this worked without the change, but I thought that being specific here was better that relying on the default behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this could be a nice coverage for the default case. As long as there's some coverage for the default case, I think either is fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll change it back to test the default. At one point I was "why did this work?" and then realized that oh, yea, it is the default.... ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted RecoveryWithEmptyRFileIT to use default in 2124f38. Also added a specific test in CreateEmptyTest to catch if the default changes without needing to run an IT.

EdColeman and others added 3 commits January 26, 2024 00:53
…CreateEmpty.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…CreateEmpty.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
createEmpty.createEmptyRFile(opts, context);
VolumeManager vm = context.getVolumeManager();
assertTrue(vm.exists(new Path(file1)));
assertTrue(vm.exists(new Path(file2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could attempt to read these files using the RFile APIs in the public API. Could open the file and assert the number read is zero like the following.

    try(var scanner = RFile.newScanner().from(file1).build()){
      assertEquals(0, scanner.stream().count());
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added read checks in ede1e8a


createEmpty.createEmptyWal(opts, context);

checkWalContext(file1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this check could call org.apache.accumulo.tserver.logger.LogReader to ensure these files can be read. LogReader is another user facing util.

However, not sure if its easily possible to do that. If it seems like a huge amount of work, could open a follow on issue or ignore this comment. This would be easy from the command line with accumulo installed as both create empty and log reader can be executed using the accumulo command, but may not be easy to do in a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4d34d80 I added an additional scan of the wal file (based on what was in LogReader) but not sure it adds much to what was already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4d34d80 I added an additional scan of the wal file (based on what was in LogReader) but not sure it adds much to what was already there.

I like what was added there. If uses the same code that LogSorter uses to open the wal and sort it, so that is a good test for regression purposes.

@EdColeman EdColeman merged commit ac80267 into apache:main Apr 9, 2024
8 checks passed
3.1.0 automation moved this from In progress to Done Apr 9, 2024
@EdColeman EdColeman deleted the add_empty_wal_util branch April 9, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
3.1.0
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants