New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-copying transaction log recovery #339

Merged
merged 151 commits into from May 12, 2018

Conversation

Projects
None yet
7 participants
@etschannen
Copy link
Contributor

etschannen commented May 8, 2018

This set of changes makes it so that transaction logs do not copy all of the mutations from the previous generation.

  • Storage servers only pull data from their local DC. Previously they would only pull from the most recent set of logs, relying on the fact that the most recent log would eventually copy data from previous sets of logs.

  • Transaction logs copy data between the known committed version and the recovery version of the last set of logs. These versions may be at an artificially reduced fault tolerance level because temporarily unavailable logs at the time of the recovery have the known committed version, but might not have the recovery version, therefore we copy this data at the start of every new generation of logs. In addition, the transaction logs will copy any versions which have not been made durable in the local DC. This ensures the local storage servers will have a location to retrieve those versions. Because a generation of logs may accept writes without ever recruiting remote log servers, we may need to pull data from multiple previous generations.

  • We now rely on all storage servers popping their data to determine when old log generations are no longer needed. The TXS tag now needs to re-write all its data every generation.

etschannen added some commits Mar 29, 2018

first version of non-copying recovery. Upgrades are broken, and it ha…
…s not been tested using fearless configurations yet
fix: do not reuse tags that are still in historyTags, pop historyTags…
… past epochEnd to allow tlogs to finish recovery

fix: peekLocal did not properly respect end
fix: the storage server added to the end of the history vector instead of the beginning
fix: using only one region still means we need 3 machines per datacen…
…ter, the other machines in the other datacenters just won’t be used
fix: pop all tags that did not have data at the recovery version beca…
…use fully popped tags may come back when pullAsyncData re-indexes the mutations
fix: peekLocal does not stop when a locality does not exist
fix: lock logs only stops on special or upgraded locality
fix: recruiting old log routers respects the passed in startVersion
fix: tlogs are now initialized immediately, instead of when starting …
…the core, this must be done to pop the log routers during recovery

fix: log router start version must be the same as remote log start version
fix: the known committed version of a newly initialized log is 1, sin…
…ce by definition the first commit must have succeeded

etschannen and others added some commits May 9, 2018

Merge pull request #348 from etschannen/release-5.2
DR upgrade tests now test the durability of the data.
Addressed review comments.
Remove redundant FDBLibTLS/ITLSPlugin.h.
Merge pull request #350 from etschannen/release-5.2
updated release notes for 5.2
Merge pull request #353 from ajbeamon/release-5.2
version stamp -> versionstamp
Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.h
Merge pull request #352 from etschannen/release-5.2
Properly handle endpoint failures on the client for all request types
Add secure_connection param to BlobStore to configure security.
Default is https. Setting secure_connection=0 makes it http.
Merge pull request #359 from bnamasivayam/release-5.2
Add secure_connection param to BlobStore to configure security.
Merge pull request #360 from etschannen/release-5.2
fix: white space issue in getKnobDescription
@alexmiller-apple
Copy link
Contributor

alexmiller-apple left a comment

Comments to be discussed in person.

@alexmiller-apple alexmiller-apple merged commit e8afc37 into apple:master May 12, 2018

@grandinj

This comment has been minimized.

Copy link

grandinj commented on fdbserver/DataDistribution.actor.cpp in b1935f1 Jun 1, 2018

seems like a line of commentary was accidentally omitted here?

@alecgrieser alecgrieser added this to the 6.0 milestone Jul 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment