Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pausing and stopping after upload #3712

Open
wants to merge 9 commits into
base: master
from

Conversation

@seantempleton
Copy link
Contributor

commented Apr 3, 2019

The BasicResults class was not calling Pause, Resume, or Stop on
m_taskController. m_taskController is the instance of ITaskReader that
is passed to all of the worker classes that use it to check if they
should pause or stop.

This fixes pausing / resuming and stopping after upload. Stopping immediately
also works but due to issues of corrupting the database when aborting,
the stop now button has been removed until the corruption issues are
fixed.

Changed the BackendUploader to check TransferProgressAsync instead of
ProgressAsync, as the transfer verison is meant for the worker that is
uploading files.

Made the MetadataPreProcess and BackupHandler classes catch CoCoL
retired exceptions and do nothing with them. If the user stops the
operation they should not have any warnings or errors about a retired
channel. They asked for it to be retired.

Also added cancellation support to the SFTP backend as it was missed when
the other backends had cancellation support added.

Fixes #3565

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

Nice! I believe this might address issue #1088 as well?

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

When testing this, I did encounter a strange case when pausing did appear to pause the backup, but the progress bar kept updating with "X files (Y bytes) to go at Z MB/s", where X would slowly decrease. This could lead the user to think that the backup was not actually paused. Are you able to reproduce this?

  1. Start a backup.
  2. Wait until the progress bar says "X files (Y bytes) to go at Z MB/s".
  3. Pause the backup.
  4. Watch the progress bar and observe the "Z MB/s" slowly decreasing.
@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

My understanding was the intention of #1088 is to pause or cancel the actual uploads, rather than pausing the backup job and letting the uploads that are currently in progress finish.

I can indeed reproduce the slowly decreasing transfer rate. The thread that is doing the transfer rate computing continues to run when everything is paused. It could be paused when the user presses pause, but since the uploads actually continue until they finish then it wouldn't really be telling the truth.

I can see a way to make it reflect the transfer rate and then show nothing after all uploads are done by checking the state of m_transfer in TaskControl. (It's not a public value right now and I'm not sure the intention is that it should be.) If paused wait for active uploads to finish and then show nothing.
The TransferProgressAsync function causes the calling thread to wait so that can't be used to stop updating the rate since it would need to keep calculating while in the paused state.

Although I kind of want to tackle the todo comment in BackendProgressUpdater about using a dynamic window for calculating the transfer rate. Maybe have a 10 second rolling window. Once paused it would still continue to show the rate while the uploads are going, then once they are finished it would slowly wind down to 0 over 10 seconds. That is probably outside of the scope of this change though. Thoughts?

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

I'm not very familiar with how the progress is updated, but what if we add some elements to OperationPhase, say Pause_WaitForUpload and Paused, and invoke m_operationProgressUpdater.UpdatePhase in the ResultClasses? We can then inform the user that we are about to pause (but an in progress upload needs to be finished), and then later indicate that we are paused (after the upload is finished).

I think this would require modifying Duplicati/Server/webroot/ngax/scripts/services/ServerStatus.js to handle these new states as well.

Would that even work? Or a dumb idea?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

I like your proposed idea in general. The problem is that calling ProgressAsync or TransferProgressAsync causes the calling thread to wait if we are in a paused state. So we have no way of knowing ahead of time if we would need to update the OperationPhase. In addition the BackendUploader or someone would have to know when all current uploads are finished and then update the UI again, but that can't happen if a thread is waiting for the ManualResetEvent in BasicResults to be signaled.

I went ahead and tried out adding a rolling window to BackendProgressUpdater for calculating the transfer rate per second. It was actually a lot easier than I thought it was going to be. When the uploads finish while paused the displayed transfer rate slowly drops over 20 seconds and then disappears.

If the thinking is that we ought to solve the issue in a more direct way, similar to your proposal, I'll save that for a future PR.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

Thanks for looking into this. I think as long as the UI makes it clear that the backup is paused, that's an improvement.

Perhaps we should focus on the issue with pausing and stopping here, and handle the notification separately.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

It looks like ignoring the CoCoL retired exception causes some of the failed Put exceptions in issue #3673 to be unobserved. Are we sure that this is a safe thing to do? Are we possibly ignoring other important exceptions by doing this?

@kenkendk

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I like your proposed idea in general. The problem is that calling ProgressAsync or TransferProgressAsync causes the calling thread to wait if we are in a paused state. So we have no way of knowing ahead of time if we would need to update the OperationPhase

Is a solution that the call itself updates the state? Otherwise there could be an AreWePaused property, although that should be used carefully, as there could be a race between reading the property and calling the method.

When testing this, I did encounter a strange case when pausing did appear to pause the backup, but the progress bar kept updating with "X files (Y bytes) to go at Z MB/s", where X would slowly decrease. This could lead the user to think that the backup was not actually paused.

The intention was to avoid pausing the transfers, as that would cause timeouts and retries. So a pause would stop processing files (effectively pause the CPU usage) but let the active uploads continue. Looking at the code, it seems that invoking Pause will now also pause transfers?

It looks like ignoring the CoCoL retired exception causes some of the failed Put exceptions in issue #3673 to be unobserved. Are we sure that this is a safe thing to do? Are we possibly ignoring other important exceptions by doing this?

@warwickmm : The RetiredException is thrown whenever the channel is closed (or disposed). Once this happens, the channel is considered "retired" and always throws the exceptions.

As a design in CoCoL, this looks nice, but when using it in a real scenario, like Duplicati, it has some drawbacks. Sometimes the closing of a channel is intentional and part of a normal flow, which is when we want to ignore it. Other times, the channel is closed because something else breaks, and in this case we do still want to ignore it, but there is no mechanism to report the original error.

I think I want to change CoCoL to allow it to report an error close and a normal close, like NormalRetiredException and ErrorRetiredException, where the latter has a ReasonException property (names are not final, but the general idea is the same).

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2019

@warwickmm I think I misread your idea with a new operation phase at first. Rereading it, I like it for getting us to the Paused_WaitForUpload state. Then the problem is getting to the Paused state. We could have the BackendUploader take IOperationProgressUpdaterAndReporter in its constructor. That has an event for when the phase changes as well as the ability to update the phase. The BackendUploader could keep note of phase changes and when we hit the Paused_WaitForUpload state then after each upload finishes, that thread can check if it was the last one, and if so then update to Paused.

It will probably be a little clunky due to needing to handle multi-threading issues in the BackendUploader and it also might be a little awkward because Paused_WaitForUpload would be triggered in BasicResults and Paused would be triggered in the BackendUploader. But I think that should work.

Looking at the code, it seems that invoking Pause will now also pause transfers?

@kenkendk This will not pause the uploads that are currently in progress but will prevent any new ones. The BackendUploader calls TransferProgressAsync at the beginning of it's while loop, but all of the uploads on other threads will continue on as normal. I could rename that boolean from alsoTransfers to alsoNewTransfers to make that clearer.

In regards to the RetiredException I'll put that back to how it was. I agree it wouldn't be a good idea to swallow those exceptions in cases where it legitimately needs to be reported. I like your idea @kenkendk of having unique exceptions for each scenario.

@seantempleton seantempleton force-pushed the seantempleton:pauseandstop branch 2 times, most recently from daac9a1 to 7877e3a Apr 13, 2019

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 13, 2019

I changed this up a little bit from the initial plan. I added the CurrentPhase property to IOperationProgress so the BackendUploader could check what phase we are in. It also now uses a timeout when waiting for input, and catches the exception from the timeout since it is expected, and no longer calls TransferProgressAsync regularly so the thread is not blocked when a pause happens.

If the phase changes to Pause_WaitForUpload it will wait for current uploads, change the phase to paused, and then call TransferProgressAsync to wait for when it should resume.

The StateController on the frontend will show the current transfer rate while waiting for uploads to finish and then disappear when the phase changes to Paused.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

This is great. I think the updated progress is very useful.

I did come across a case where stopping (via Stop Now) the backup did not actually stop the uploads. The UI indicated the usual error (the thread was aborted), but the uploads continued. Are you able to replicate this?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 14, 2019

I was able to reproduce not stopping when pressing Stop Now. I need to start doing a better job of testing after my initial changes. That worked for me on my initial commit but I broke it in a later commit. Thanks for testing and catching problems warwickmm.

I was also having problems with the transfer rate displaying when waiting for uploads to finish during a pause. I am confident that was working earlier so I think that is another case of not testing after making changes. I moved the if statement where that work was being done. Would you mind testing that part so I know I'm not crazy?

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

I can see the transfer rate when waiting for uploads to finish before pausing.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

Not sure if it is clear or not but this is ready for review. Since the last push all the testing I have thrown at it has worked for me.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

Thanks @seantempleton. I tested this again and unfortunately I still see uploads continuing after requesting "Stop Now". The UI appears as if the operation has stopped (after a "The thread was aborted" error), but the uploads (to an SFTP backend) continue. The backup source is a little over 200 MB, the stop was requested when about 15 MB of each of the 4 dblock files had been uploaded, but the uploads continue until each dblock file is 50 MB. Are you able to replicate this?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

Arg, seems I'm not done yet after all. I'll see if I can reproduce that.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

I was testing the FTP backend and was getting discouraged because it seemed to be working properly and then I reread your message and realized you said SFTP! Turns out I somehow didn't actually implement cancellation in the SFTP backend so I'll fix that.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

Ah, that makes sense. I should have clarified that I was using SFTP earlier.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2019

I switched to a Google Drive backend and pausing/stopping appears to work without any issues. Is the plan to fix the SFTP backend here, or in a separate pull request?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2019

I'll fix that in this pull request.

@seantempleton seantempleton force-pushed the seantempleton:pauseandstop branch from bd2f5f7 to 4891364 Apr 29, 2019

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Thanks @seantempleton. I was able to stop the backup successfully with the SFTP backend.

However, after stopping/restarting a backup several times, I did encounter a few cases of

Unexpected difference in fileset version 1: 4/29/2019 7:18:16 PM (database id: 1), found 5 entries, but expected 11
2019-04-29 19:43:04 -07 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteHash]: remote file duplicati-b0a447be5bf4d43898a98ee7531d10eb6.dblock.zip.aes is listed as Uploaded with size 13030122 but should be 52428925, please verify the sha256 hash
"07ZE5Q9/gRI0i6N/xlcYXTSdxvspmApTlW/nHAzb5LQ="
2019-04-29 19:45:12 -07 - [Error-Duplicati.Library.Main.Operation.TestHandler-FailedToProcessFile]: Failed to process file duplicati-bc57dfc389e9d477a9ea9e485f5339950.dblock.zip.aes
2019-04-29 19:46:55 -07 - [Error-Duplicati.Library.Main.Operation.RepairHandler-RemoteFileVerificationError]: Failed to perform verification for file: duplicati-b0a447be5bf4d43898a98ee7531d10eb6.dblock.zip.aes, please run verify; message: File length is invalid

as well as others when backing up, repairing, etc.

This might not be due to changes in this pull request, as it appears that others encounter similar issues when their backups get interrupted (see issue #3644 for example). However, I'm wondering if fixing pausing/stopping will actually increase the occurrence of this annoying database issue, which is difficult to recover from.

I'd like to get other people's thoughts on this. It is really nice to have working pausing/stopping. However, if this comes at the expense of more users encountering unrecoverable database issues, then perhaps we should fix the database issues first?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2019

I'm torn on this. There are complaints that pausing and stopping don't work, but then yes if the functionality is there but then users start encountering issues, that won't make anyone happy either. I think pausing should be okay in terms of database corruption and other errors as it actually allows processes to finish and start again gracefully.

Did you encounter any of those errors when stopping after upload or just stopping immediately?

On the other hand, I assume it is going to take a very long time to fix up the code to allow it to stop and abort without causing errors.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

I don't think I encountered any errors when using pause/stop after upload.

I'm not sure what the best course of action is. Would it be unreasonable to implement these fixes, but disable the "Stop Now" option until the database issues are resolved (this could take a long time, like you mention)? One case I can think of where a user would prefer "Stop Now" over "Stop after upload" is if the volume size is large and bandwidth issues are a concern.

@kenkendk

This comment has been minimized.

Copy link
Member

commented May 7, 2019

I agree that the database corruption issues are not related to the stopping functionality, but an error that will also appear with many kinds of hard-kill scenarios.

But I think this one is caused by the shutdown logic not working correctly:

2019-04-29 19:43:04 -07 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteHash]: remote file duplicati-b0a447be5bf4d43898a98ee7531d10eb6.dblock.zip.aes is listed as Uploaded with size 13030122 but should be 52428925, please verify the sha256 hash
"07ZE5Q9/gRI0i6N/xlcYXTSdxvspmApTlW/nHAzb5LQ="

For this to happen, the volume is added to the database and put in the uploading state and transfer starts. The backup is then stopped during upload, but for some reason the database record is updated, marking the file as uploaded before stopping.

Once the file has reached the uploaded state, Duplicati assumes the upload has completed without erro. But since the file was not written fully, we get the error message.

The uploading state is set here:

await m_database.UpdateRemoteVolumeAsync(item.RemoteFilename, RemoteVolumeState.Uploading, item.Size, item.Hash);

And the uploaded state is set here:

await m_database.UpdateRemoteVolumeAsync(item.RemoteFilename, RemoteVolumeState.Uploaded, item.Size, item.Hash);

If we hard-kill the process we should not see this particular error, because line 392 is not executed, leaving the database with only the uploading state, and Duplicati will delete the partial file on next run. My guess is that the Stop command does not produce an exception from line L382 and/or L385 causing it to continue and incorrectly mark the file as being fully uploaded.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

I would not be opposed to specifically disabling the Stop Now functionality and leaving the others intact. That would at least give users some control.

Thanks for the tip @kenkendk. I will look and see if that is indeed what is happening for that specific error.

Check for pause or stop when all upload slots are full
When all upload slots are full we wait for one to complete before starting
another. If the user pauses or stops during that time we need to check and
act as needed.

Also added a check if we should cancel uploads right after the check for
whether we should pause.
Set file as uploaded if cancellation was not requested
When an upload is cancelled and an exception is not thrown, check whether
cancellation was requested before marking the file as uploaded.
Remove the stop now button for cancelling a backup
Due to database corruption issues it was decided to prevent users from
aborting an in progress backup by using the Stop now button. Once
the corruption issues are addressed the button can be re-added.

@seantempleton seantempleton force-pushed the seantempleton:pauseandstop branch from 06ccb1a to 9ea3cc8 Jul 16, 2019

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

Wow @warwickmm, going above and beyond trying that 50 times!
Sorry, it is ingrained in me to do a rebase to resolve conflicts and that's what I did without thinking. If you would like I can go back and merge master in so you can easily see how the conflicts were resolved.

I'll see if I can reproduce what you were talking about with stopping the backup before it actually starts uploading.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

Rebase is fine. 👍

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

I setup a local Minio server and was able to reproduce the unexpected difference in fileset error. I'll see if I can figure out what is going on.

My random thought is maybe the first backup is stopping before all of the source files are inserted into the database and then on the 2nd backup it says hey there are $Total files (since it actually finished locating all of the files this time) but only X were in the database.

@ts678

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

I'll see if I can figure out what is going on.

"Unexpected difference" is checked at the start of a backup, so one needs to go backwards to see how it arose. You might read "Unexpected difference in fileset" test case and code clue #3800 for more on that, including detailed technical notes on that test case, and screen shots of the DB as error arose. I used this with --run-script-before, and either ignored or fixed the "cannot find the file" while DB history trail filled:

SET DB="C:\ProgramData\Duplicati\duplicati-2.0.4.22_canary_2019-06-30\RUN\BSUYJQFBGS"
MOVE /Y %DB%.4.sqlite %DB%.5.sqlite
MOVE /Y %DB%.3.sqlite %DB%.4.sqlite
MOVE /Y %DB%.2.sqlite %DB%.3.sqlite
MOVE /Y %DB%.1.sqlite %DB%.2.sqlite
COPY /Y %DB%.sqlite %DB%.1.sqlite
EXIT 0

Here's the SQL that gets run. Sometimes cut-down versions help pinpoint where in the DB the issue lies:

-- From 2.0.4.5 beta Duplicati/Library/Main/Database/LocalDatabase.cs
-- Expanding expandedCmd to study "Unexplained difference in fileset"

SELECT COUNT(*) FROM (SELECT DISTINCT "Path" FROM (
-- [ LIST_FILESETS
SELECT
    "L"."Path", 
    "L"."Lastmodified", 
    "L"."Filelength", 
    "L"."Filehash", 
    "L"."Metahash", 
    "L"."Metalength",
    "L"."BlocklistHash", 
    "L"."FirstBlockHash",
    "L"."FirstBlockSize",
    "L"."FirstMetaBlockHash",
    "L"."FirstMetaBlockSize",
    "M"."Hash" AS "MetaBlocklistHash"
FROM
    (
    SELECT 
        "J"."Path", 
        "J"."Lastmodified", 
        "J"."Filelength", 
        "J"."Filehash", 
        "J"."Metahash", 
        "J"."Metalength",
        "K"."Hash" AS "BlocklistHash", 
        "J"."FirstBlockHash",
        "J"."FirstBlockSize",
        "J"."FirstMetaBlockHash",
        "J"."FirstMetaBlockSize",
        "J"."MetablocksetID"
    FROM 
        (
        SELECT 
	        "A"."Path" AS "Path", 
	        "D"."Lastmodified" AS "Lastmodified", 
	        "B"."Length" AS "Filelength", 
	        "B"."FullHash" AS "Filehash", 
	        "E"."FullHash" AS "Metahash", 
	        "E"."Length" AS "Metalength",
	        "A"."BlocksetID" AS "BlocksetID",
	        "F"."Hash" AS "FirstBlockHash",
	        "F"."Size" AS "FirstBlockSize",
	        "H"."Hash" AS "FirstMetaBlockHash",
	        "H"."Size" AS "FirstMetaBlockSize",
	        "C"."BlocksetID" AS "MetablocksetID"
        FROM 
	        "File" A	
        LEFT JOIN "Blockset" B
          ON "A"."BlocksetID" = "B"."ID" 
        LEFT JOIN "Metadataset" C  
          ON "A"."MetadataID" = "C"."ID"
        LEFT JOIN "FilesetEntry" D
          ON "A"."ID" = "D"."FileID"
        LEFT JOIN "Blockset" E
          ON "E"."ID" = "C"."BlocksetID"
        LEFT JOIN "BlocksetEntry" G
          ON "B"."ID" = "G"."BlocksetID"
        LEFT JOIN "Block" F 
          ON "G"."BlockID" = "F"."ID"  
        LEFT JOIN "BlocksetEntry" I
          ON "E"."ID" = "I"."BlocksetID"
        LEFT JOIN "Block" H 
          ON "I"."BlockID" = "H"."ID"
        WHERE 
          "A"."BlocksetId" >= 0 AND
-- Change FilesetID to desired ID from Fileset table
          "D"."FilesetID" = 99 AND
          ("I"."Index" = 0 OR "I"."Index" IS NULL) AND  
          ("G"."Index" = 0 OR "G"."Index" IS NULL)
        ) J
    LEFT OUTER JOIN 
        "BlocklistHash" K 
    ON 
        "K"."BlocksetID" = "J"."BlocksetID" 
    ORDER BY "J"."Path", "K"."Index"
    ) L

LEFT OUTER JOIN
    "BlocklistHash" M
ON
    "M"."BlocksetID" = "L"."MetablocksetID"
-- ] LIST_FILESETS
) UNION SELECT DISTINCT "Path" FROM (
-- [ LIST_FOLDERS_AND_SYMLINKS
SELECT
    "G"."BlocksetID",
    "G"."ID",
    "G"."Path",
    "G"."Length",
    "G"."FullHash",
    "G"."Lastmodified",
    "G"."FirstMetaBlockHash",
    "H"."Hash" AS "MetablocklistHash"
FROM
    (
    SELECT
        "B"."BlocksetID",
        "B"."ID",
        "B"."Path",
        "D"."Length",
        "D"."FullHash",
        "A"."Lastmodified",
        "F"."Hash" AS "FirstMetaBlockHash",
        "C"."BlocksetID" AS "MetaBlocksetID"
    FROM
        "FilesetEntry" A, 
        "File" B, 
        "Metadataset" C, 
        "Blockset" D,
        "BlocksetEntry" E,
        "Block" F
    WHERE 
        "A"."FileID" = "B"."ID" 
        AND "B"."MetadataID" = "C"."ID" 
        AND "C"."BlocksetID" = "D"."ID" 
        AND "E"."BlocksetID" = "C"."BlocksetID"
        AND "E"."BlockID" = "F"."ID"
        AND "E"."Index" = 0
-- FOLDER_BLOCKSET_ID = -100
-- SYMLINK_BLOCKSET_ID = -200
        AND ("B"."BlocksetID" = -100 OR "B"."BlocksetID" = -200) 
-- Change FilesetID to desired ID from Fileset table
        AND "A"."FilesetID" = 99
    ) G
LEFT OUTER JOIN
   "BlocklistHash" H
ON
   "H"."BlocksetID" = "G"."MetaBlocksetID"
ORDER BY
   "G"."Path", "H"."Index"
-- ] LIST_FOLDERS_AND_SYMLINKS
));

Thanks for digging into this!

@duplicatibot

This comment has been minimized.

Copy link

commented Jul 19, 2019

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/fatal-error-detected-non-empty-blocksets-with-no-associated-blocks/5938/43

@ts678

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

due to issues of corrupting the database when aborting,
the stop now button has been removed until the corruption issues are
fixed.

I mentioned that in Fatal error: Detected non-empty blocksets with no associated blocks but I'm not sure if that's what you saw. Regardless, there's a chase underway now to see if anybody can reproduce it reliably.

@warwickmm

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

That's interesting. I swear that during my tests I was using the "Stop Now" button. Now, I don't see it anymore. Perhaps browser caching issues? However, when attempting to stop during a "Verifying backend data' step, a "Stop Now" was visible at one point.

When I have time, I'll try and run some more tests.

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2019

@seantempleton do I need to be a collaborator to contribute to your branch?

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

@BlueBlock I think that is correct. If someone wants to push directly to another repository they have to be a collaborator, otherwise they have to fork the repository and create a pull request. What are you looking to do?

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@BlueBlock I think that is correct. If someone wants to push directly to another repository they have to be a collaborator, otherwise they have to fork the repository and create a pull request. What are you looking to do?

I worked off your branch and incorporated my solution. I'd like to commit it to your branch since it seems to make sense to work within the same branch. It would remove the need to merge later on.

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

I think I've got a good solution which merges our 2 efforts; my small one into your large one.

Your work involves the "stop now" and "pause". Mine only involves the "stop after the current file".

Fix errors when removing remote volumes caused by stopping
When removing remote volumes the delete queries would not always delete
the correct FilesetEntry or BlocklistHash rows. This could cause errors
when verifying the backup, though the errors would be sporadic.

After using the "Stop after Upload" button over and over and over,
sometimes before any files were actually uploaded and sometimes after,
the two errors below would appear.

"Unexpected difference in fileset" could occur due to the delete for
FilesetEntry deleting based on volume ids and the bsIdsSubQuery returning
other BlocksetIds that should also be deleted. FilesetEntry would then
have a higher count of rows than what would be returned by the LEFT
OUTER JOINS on all of the other tables in VerifyConsistency() since more
rows had been deleted from FileLookup.

"Detected non-empty blocksets with no associated blocks" could occur due
to BlocklistHash rows not being deleted. The row would still be present,
then when starting another backup AddBlockset() in LocalBackupDatabase
would insert a Blockset, the insert into BlocklistHash would throw an
exception due to the unique constraint, and the insert for BlocksetEntry
would never run. That caused orphaned rows to be present in Blockset
since the corresponding row was not inserted in BlocksetEntry.
@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

I encountered the "unexpected difference in fileset" issue and then after preventing that from happening, the "Detected non-empty blocksets with no associated blocks" cropped up. I fixed both of them in my testing but I would definitely like someone more knowledgeable of the database to look at the changes. They seem to make sense to me but at the same time I'm not sure if it's an indication of a deeper problem.

The delete queries that are there have been happening that way for 2 to 3 years so if deleting from several tables by bsIdsSubQuery was a problem I would have hoped that the problem would have been noticed since then.

@BlueBlock My PR actually removes the Stop Now button so that's not a problem for now and is focused on the "Stop after Upload" and "Pause" buttons. I looked at your commit and I see you added a CancelToken to a couple places. Unless I'm missing something I don't believe that should be needed as the taskReader.ProgressAsync call will throw an exception, which should be caught, and stop the thread.

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

@BlueBlock My PR actually removes the Stop Now button so that's not a problem for now and is focused on the "Stop after Upload" and "Pause" buttons. I looked at your commit and I see you added a CancelToken to a couple places. Unless I'm missing something I don't believe that should be needed as the taskReader.ProgressAsync call will throw an exception, which should be caught, and stop the thread.

The cancel token cleanly allows the current file to complete and leaves no orphan files.

I say your solution revolves around "stop now" because your stopping the task before the current file gets to complete but you do still allow the backend storage files queued to complete. This all has the effect of not letting the current file finish (orphaned files) but leaving things in a better state than the previous "stop now" which forced the task to terminate.

I'd suggest pulling my commit into yours and giving it a try run. You'll then see the "stop after the current file" will stop cleanly once the current file is done. This test is very evident when using a large 1GB file. Test them both and you'll see the different behavior.

So in my code:
my solution -> "stop after the current file"
your solution -> "stop now"

They both have a good reason since "stop now" might be desired for someone wanting to stop during a gigantic file.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

@BlueBlock I think part of my problem here is that I'm getting confused by stop now and stop after upload, what I thought they did previously (stop now being a thread abort, and stop after upload not starting any new uploads and letting current uploads finish), and your statement of letting the current file complete. Current file meaning the current file being split into blocks, compressed, etc. or the current file being uploaded? I am under the impression that any files not uploaded will get deleted on the next backup and those files would be split into blocks all over again, meaning it wouldn't matter if it finished or not. Maybe I'm wrong on that.

I'll get your changes and try them out, maybe that will help me understand.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

Ah, I see now what you're saying BlueBlock.

I think the question now boils down to what functionality do we want? One or the other, or maybe both? Stopping after the current uploads finish transferring will stop quicker overall. Stopping after finishing with the current file and also uploading it could take a while but on the other hand if you're 75% done processing a 2GB file for instance, you might want it to finish and then upload before all operations stop.

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2019

I think both would be nice to have. Plus both capabilities are now implmeneted. Step-now is quite useful especially when testing with large files and you want to stop as soon as possible and cleanly.

I have my change in a new PR. There should be very little conflict in the code when you rebase yours. Though I can give a hand with it at that time I'd you like.

@seantempleton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2019

I integrated your code with mine for testing purposes and I think you're right, I shouldn't have a problem with rebasing it.

I'll wait to update this PR until after your PR goes through.

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Great.. glad to hear it!

@BlueBlock

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2019

I might be missing the strategy.. is the idea to be able to pause a backup, close Duplicati, and then resume a paused backup after restarting Duplicati? So a paused state would be remembered when restarting Duplicati?

@duplicatibot

This comment has been minimized.

Copy link

commented Aug 30, 2019

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/issues-to-address-to-get-out-of-beta/7871/35

@duplicatibot

This comment has been minimized.

Copy link

commented Sep 1, 2019

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/fatal-error-detected-non-empty-blocksets-with-no-associated-blocks/5938/46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.