New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore modifications and deletions of sentences #1484

Closed
trang opened this Issue Jul 26, 2017 · 13 comments

Comments

Projects
None yet
3 participants
@trang
Member

trang commented Jul 26, 2017

When we used the CSV files to restore data that was lost between February and June, only addition of content was restored (i.e. new sentences, new comments, new tags, etc)

Modifications and deletions were not restored. We cannot restore it for all content (such as comments), but we can do it for sentences, since we have logs for sentences.

For example: https://tatoeba.org/eng/sentences/show/76450
The sentence was modified on Feb 20, but the sentence text is set to the original value.

Related Wall thread: https://tatoeba.org/eng/wall/show_message/28429#message_28429

@ckjpn

This comment has been minimized.

ckjpn commented Sep 18, 2017

See #1504 for a list of all such sentences and a Python script that can help find the most recent revision based on a comparison of the pre-crash export and the current export.

@trang

This comment has been minimized.

Member

trang commented Oct 8, 2017

I had a look at the file, and the content seems fine. I didn't look in details at the script though.

I guess we could go ahead and replace the text/owner of the sentences in the database by the text/owner of the sentences in the file. I should be able to take care of this next weekend. It would be nice if by then other people could verify the script, or/and the content of the generated file.

File: sentences_detailed_ONES_NEEDING_REVERTED 2017-09-11.zip
Script: fix-sentences_detailed - version 2.zip

@ckjpn

This comment has been minimized.

ckjpn commented Oct 9, 2017

Newer files are here.

http://iteslj.org/tatoeba/

Updated using the October 14, 2017 export.

@trang

This comment has been minimized.

Member

trang commented Oct 15, 2017

Modifications of sentences should be restored now.
I ran the following script: https://gist.github.com/trang/66835392f279653b1aa0c4647938bfdb

I noticed that some sentences had a modification that was not logged. I'm not sure if that modification was simply never logged, or if we lost the logs for it...

For instance https://tatoeba.org/eng/sentences/show/330

The text is now:

„Ich habe Lust, Karten zu spielen.“ – „Ich auch.“

But in the logs we only see that it was created as:

"Ich habe Lust, {Karten}{1} zu spielen." "Ich auch."

Anyway, @ckjpn If possible have a look if the replaced sentences are now okay. I'll close this issue and open new ones for potentially missing logs and for restoring deletions of sentences.

@trang trang closed this Oct 15, 2017

@trang trang reopened this Oct 15, 2017

@trang

This comment has been minimized.

Member

trang commented Oct 15, 2017

Re-opening, I've reverted the modifications made by script.

There is a problem with sentences that have been modified recently.

For instance sentence 3333 has been modified lately by Aiji.

3318	fra	"Pierre qui roule n'amasse pas mousse" est un proverbe.	trotter	\N	\N	LAST GOOD Export 2017-06-10
3318	fra	« Pierre qui roule n'amasse pas mousse » est un proverbe.	trotter	\N	2017-10-06 03:15:28	CURRENT Export

If I replace the current text by the text from the generated file, I will erase the good version of the sentence.

@ckjpn

This comment has been minimized.

ckjpn commented Oct 16, 2017

I'll have to check the script. Maybe checking a real date against \N is the problem.

I could possibly do a quick fix and ignore those comparisons, so you could at least revert many of the sentences. Would you like me to do that?

Of course, that might not be the problem, but I suspect that's what happened.

Maybe I can just convert all the \N data to 0000-00-00 00:00:00 before running the script and it will work.

@ckjpn

This comment has been minimized.

ckjpn commented Oct 16, 2017

The problem with the previous file of corrections was caused by the \N in the date fields. Python couldn't correctly compare \N with a date.
I converted all the \N in the 5th and 6th fields to 0000-00-00 00:00:00 and then did the comparisons.

Note that in the current exported data that there is a problem with the modification date being \N when no modification has taken place, which was part of the problem.

For these corrections, I used the first export after the crash (2017-07-01), so that corrections made by Aiji and adopted by him correctly give him credit. Some were only adopted I think with no corrections needed. He said on the Wall that some of the sentences he had previously adopted were being adopted by other members. It seems only fair to give these sentences to the person who adopted them first.

Attached are 3 files.

_about.txt

sentences_detailed_CORRECTIONS_NEEDED 2017-10-17.txt
-- the file you need to revert the modifications

sentences_detailed_ONES_NEEDING_REVERTED_pairs 2017-10-17.csv
-- a file with both the line from the "last good export" and the "first export after the crash".

sentences that need modifications reverted 2017-10-17.zip

@trang

This comment has been minimized.

Member

trang commented Oct 22, 2017

Thanks for the fix @ckjpn.

I've applied the changes in the database. Please have a look :) Hopefully everything's fine this time.

@ckjpn

This comment has been minimized.

ckjpn commented Oct 23, 2017

There are 1,319 sentences in the latest export between 1 and 6139244 that were not in the 2017-06-10 export. 6139244 was the last sentence in the 2017-06-10 export.

Probably, it would be safest to not try to delete these, until after the duplicate-merging script has successfully run and merged translations, since some of these deletions might have been done by Horus.

The attached file contains these sentences numbers, if you want to glance through them.

possibly deleted sentences that need to be re-deleted.zip

@ckjpn

This comment has been minimized.

ckjpn commented Nov 6, 2017

At least one sentence didn't get reverted to the pre-crash correction.

https://tatoeba.org/eng/sentences/show/5850144

It's one that the duplicate-merging script had apparently deleted, so this is really related to sentences the still need to be deleted.

@trang

This comment has been minimized.

Member

trang commented Dec 31, 2017

I've deleted sentences that were deleted pre-crash (1396 sentences).

For the record, these are the deleted sentences:
https://gist.github.com/trang/3a658ec28a8e76227c563ef5fff64b38

@jiru

This comment has been minimized.

Member

jiru commented Jun 4, 2018

Can we close this issue?

@trang

This comment has been minimized.

Member

trang commented Jun 4, 2018

Yes.

I didn't close it because there was still links to restore. But that would be another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment