Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: provide facility for retaining obsolete WALs, MANIFESTs, and sstables #321

Merged
merged 1 commit into from Oct 4, 2019

Conversation

@hueypark
Copy link
Contributor

hueypark commented Oct 1, 2019

For debugging purposes, db cleaner option can set as archive cleaner. If this option is set, the files are renamed to the archive directory instead of deleting.

Fixes #284

@petermattis

This comment has been minimized.

Copy link
Collaborator

petermattis commented Oct 1, 2019

This change is Reviewable

Copy link
Collaborator

petermattis left a comment

Thanks for the unexpected contribution!

Rather than a Cleaner enum, I'm wondering if there should be a Cleaner interface with a DefaultCleaner implementation that does normal file deletion. This would provide a place to hook for performing rate-limiting of file deletions (see #129).

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on @hueypark)


compaction.go, line 1368 at r1 (raw file):

// archivebsoleteFile archives file that is no longer needed.
func (d *DB) archivebsoleteFile(fileType fileType, path string) error {

s/archivebsoleteFile/archiveObsolete/File/g


compaction.go, line 1376 at r1 (raw file):

	}

	destDir := "archive"

This should be a subdirectory of d.dir. As it is written right now, you'll be trying to open archive even on a filesystem which will be relative to where the process started. So my comment below that the directory to use should be either d.dirname or d.walDirname depending on the file type.


compaction.go, line 1388 at r1 (raw file):

		return err
	}
	dir, err := fs.OpenDir(destDir)

You don't seem to be doing anything with the directory. Unless you think we need to sync the directory, which I don't think is necessary for this usage, I think you can remove this OpenDir call.


compaction.go, line 1406 at r1 (raw file):

	destPath := fs.PathJoin(destDir, fs.PathBase(path))
	err = vfs.LinkOrCopy(fs, path, destPath)

On Unix, hard link followed by remove is similar to rename, but on Windows it is different. I think we'd get more portability by using vfs.Rename here. And I suspect we don't want the Copy fallback as doing so make file deletion really slow. Since archiving is intended only for debugging purposes, I think it is fine to require that the archive directory exists on the same filesystem as the data directory. The one place that may need attention is if a separate WALDir has been specified. Perhaps that should be handled by having <dir>/archive and <wal-dir>/archive.


internal/base/options.go, line 22 at r1 (raw file):

const (
	DefaultCleaner Cleaner = iota
	CleanCleaner

CleanCleaner is repetitive and a bit of a mouthful. How about DeleteCleaner?


testdata/cleaner, line 84 at r1 (raw file):

mkdir-all: archive 0755
open-dir: archive
link: db/000005.sst -> archive/000005.sst

loggingFS currently doesn't log anything for Remove which is an oversight and also makes this test confusing.

@hueypark hueypark force-pushed the hueypark:archive branch from 430b9be to 20c3941 Oct 2, 2019
Copy link
Contributor Author

hueypark left a comment

TFTR!

I created a Cleaner interface.

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @petermattis)


compaction.go, line 1368 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/archivebsoleteFile/archiveObsolete/File/g

Done.
Technically, the archivebsoleteFile function has been erased.


compaction.go, line 1376 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This should be a subdirectory of d.dir. As it is written right now, you'll be trying to open archive even on a filesystem which will be relative to where the process started. So my comment below that the directory to use should be either d.dirname or d.walDirname depending on the file type.

Done.


compaction.go, line 1388 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You don't seem to be doing anything with the directory. Unless you think we need to sync the directory, which I don't think is necessary for this usage, I think you can remove this OpenDir call.

Done.


compaction.go, line 1406 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

On Unix, hard link followed by remove is similar to rename, but on Windows it is different. I think we'd get more portability by using vfs.Rename here. And I suspect we don't want the Copy fallback as doing so make file deletion really slow. Since archiving is intended only for debugging purposes, I think it is fine to require that the archive directory exists on the same filesystem as the data directory. The one place that may need attention is if a separate WALDir has been specified. Perhaps that should be handled by having <dir>/archive and <wal-dir>/archive.

Done.


internal/base/options.go, line 22 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

CleanCleaner is repetitive and a bit of a mouthful. How about DeleteCleaner?

Done.


testdata/cleaner, line 84 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

loggingFS currently doesn't log anything for Remove which is an oversight and also makes this test confusing.

Done.

Copy link
Collaborator

petermattis left a comment

The Cleaner interface approach looks much better. Another round of comments, but I think this is all directionally good.

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hueypark and @petermattis)


syncing_fs.go, line 10 at r2 (raw file):

// syncingFS exports the base.SyncingFS type.
type syncingFS = base.SyncingFS

You should be able to revert this because ArchiveCleaner doesn't need to use SyncingFS. See my comment in cleaner.go.


internal/base/cleaner.go, line 19 at r2 (raw file):

// DeleteCleaner deletes file.
type DeleteCleaner struct{}

I think these implementations could be moved to the pebble package. Generally, we've tried to keep internal/base for interfaces, and for implementations that need to be shared by multiple packages. I believe DeleteCleaner and ArchiveCleaner only need to exist in the top-level pebble package.


internal/base/cleaner.go, line 37 at r2 (raw file):

	switch fileType {
	case FileTypeLog, FileTypeManifest, FileTypeTable:
		destDir := filepath.Dir(path) + "/archive"

filepath.Dir won't work correctly with memFS on Windows. I think we need to add a FS.PathDir. I needed that somewhere else recently and managed to avoid it. I think this is good evidence that it is useful to add.

Also, use fs.PathJoin instead of hardcoding the path separator.


internal/base/cleaner.go, line 44 at r2 (raw file):

				BytesPerSync: bytesPerSync,
			},
		}

I don't think SyncingFS is necessary here because Create will never be called.


testdata/cleaner, line 84 at r1 (raw file):

Previously, hueypark (Jaewan Park) wrote…

Done.

Oh, I think my comment was misinterpreted. I liked the use of loggingFS as it displayed the filesystem operations being performed. The confusing part is that Remove operations were not showing up. Please re-add the use of loggingFS. It should be clearer what is going on now that you've switched to Rename.

@hueypark hueypark force-pushed the hueypark:archive branch from 20c3941 to 6b310fa Oct 2, 2019
Copy link
Contributor Author

hueypark left a comment

Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @petermattis)


syncing_fs.go, line 10 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You should be able to revert this because ArchiveCleaner doesn't need to use SyncingFS. See my comment in cleaner.go.

Done.


internal/base/cleaner.go, line 19 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think these implementations could be moved to the pebble package. Generally, we've tried to keep internal/base for interfaces, and for implementations that need to be shared by multiple packages. I believe DeleteCleaner and ArchiveCleaner only need to exist in the top-level pebble package.

Since it is used as the default for Options, it looks good to leave it for now.


internal/base/cleaner.go, line 37 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

filepath.Dir won't work correctly with memFS on Windows. I think we need to add a FS.PathDir. I needed that somewhere else recently and managed to avoid it. I think this is good evidence that it is useful to add.

Also, use fs.PathJoin instead of hardcoding the path separator.

Done.


internal/base/cleaner.go, line 44 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't think SyncingFS is necessary here because Create will never be called.

Done.


testdata/cleaner, line 84 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Oh, I think my comment was misinterpreted. I liked the use of loggingFS as it displayed the filesystem operations being performed. The confusing part is that Remove operations were not showing up. Please re-add the use of loggingFS. It should be clearer what is going on now that you've switched to Rename.

Done.

Copy link
Collaborator

petermattis left a comment

This looks great. A few super tiny comments. Thanks for the contribution!

Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on @hueypark)


compaction.go, line 1328 at r3 (raw file):

// deleteObsoleteFile deletes file that is no longer needed.
func (d *DB) deleteObsoleteFile(fileType fileType, jobID int, path string, fileNum uint64) {
	// TODO(peter): need to handle this errror, probably by re-adding the

Happened to notice this misspelling in this existing comment: s/errror/error/g.


internal/base/cleaner.go, line 19 at r2 (raw file):

Previously, hueypark (Jaewan Park) wrote…

Since it is used as the default for Options, it looks good to leave it for now.

Oh, good point.


internal/base/cleaner.go, line 13 at r3 (raw file):

// Cleaner cleans obsolete files.
type Cleaner interface {
	Clean(fs vfs.FS, fileType FileType, bytesPerSync int, path string) error

I think you can remove the bytesPerSync parameter now.


internal/base/cleaner.go, line 25 at r3 (raw file):

func (DeleteCleaner) String() string {
	return "Delete"

We don't have consistency around names that show up in the OPTIONS file, but I lean towards making these lower case for new items. So s/"Delete"/"delete"/g and s/"Archive"/"archive"/g below

…bles

For debugging purposes, db cleaner option can set as archive cleaner. If this option is set, the files are renamed to the archive directory instead of deleting.

Fixes #284
@hueypark hueypark force-pushed the hueypark:archive branch from 6b310fa to 6e0e629 Oct 4, 2019
Copy link
Contributor Author

hueypark left a comment

Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on @hueypark and @petermattis)


compaction.go, line 1328 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Happened to notice this misspelling in this existing comment: s/errror/error/g.

Done.


internal/base/cleaner.go, line 13 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you can remove the bytesPerSync parameter now.

Done.


internal/base/cleaner.go, line 25 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We don't have consistency around names that show up in the OPTIONS file, but I lean towards making these lower case for new items. So s/"Delete"/"delete"/g and s/"Archive"/"archive"/g below

Done.

@petermattis

This comment has been minimized.

Copy link
Collaborator

petermattis commented Oct 4, 2019

Thanks again for your work on this, @hueypark.

@petermattis petermattis merged commit c9ca3a9 into cockroachdb:master Oct 4, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hueypark

This comment has been minimized.

Copy link
Contributor Author

hueypark commented Oct 6, 2019

TFTR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.