Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never mark spam on restore #342

Open
aaronadamsCA opened this issue Jan 21, 2022 · 25 comments
Open

Never mark spam on restore #342

aaronadamsCA opened this issue Jan 21, 2022 · 25 comments

Comments

@aaronadamsCA
Copy link

The issue tracker is for reporting product deficiencies. How do I questions should be posted to the discussion forum at https://groups.google.com/group/got-your-back. When in doubt, start at the discussion forum and return here only when instructed to do so.

Please confirm the following:

Full steps to reproduce the issue:

  1. Back up from one account
  2. Restore to another account

Expected outcome (what are you trying to do?): All messages restored with similar structure.

Actual outcome (what errors or bad behavior do you see instead?): Thousands of legitimate messages from the first account classified as spam in the second account.

Unfortunately Gmail won't let me bulk mark them all "not spam", either, so this is a whole lot of repetitive clicking to rectify.

I see the Gmail API has a neverMarkSpam option on some endpoints, but I can't tell if it's available on the endpoint you're using because I can't read Python. 🙃

@jay0lee
Copy link
Member

jay0lee commented Jan 21, 2022

Yes, GYB sets this parameter:

https://github.com/jay0lee/got-your-back/blob/main/gyb.py#L1971

so this shouldn't be happening. Can you provide sample messages or a sample backup that is showing this problem?

@aaronadamsCA
Copy link
Author

aaronadamsCA commented Jan 21, 2022

I'm up to several thousand messages in spam, but thankfully I found a workaround that lets you mark more than 50 messages as "not spam" in the Gmail UI:

  1. Search label:spam -label:inbox
  2. Click "Select all"
  3. Click "Select all conversations that match this search"
  4. Click "Move to inbox"

@aaronadamsCA
Copy link
Author

Each message in spam shows the same reason for being there:

Why is this message in spam? It is similar to messages that were identified as spam in the past.

So it doesn't seem like it would be a phishing filter thing (my old and new addresses are similar, which had me wondering).

Here is a cleaned-up version of the commands I used:

cd
bash <(curl -s -S -L https://git.io/gyb-install)

mkdir first@firstlast.ca
cd first@firstlast.ca/
~/bin/gyb/gyb --email first@firstlast.ca --action quota
~/bin/gyb/gyb --email first@firstlast.ca --action backup

mkdir firstlast.ca@gmail.com
cd firstlast.ca@gmail.com/
~/bin/gyb/gyb --email firstlast.ca@gmail.com --action create-project
~/bin/gyb/gyb --email firstlast.ca@gmail.com --action restore --local-folder ../first@firstlast.ca/GYB-GMail-Backup-first@firstlast.ca/ --label-restored "firstlast.ca"

The messages going to spam are decidedly the "spammier" ones, it's almost exclusively newsletters and notification emails; so it does seem like the spam filter is somehow processing each inbound message despite being asked not to.

@aaronadamsCA
Copy link
Author

Can you provide sample messages or a sample backup that is showing this problem?

Let me know if any of the information above helps. If not, after my restore finishes running, I can try reproducing the problem with a small backup that I'd be comfortable sharing.

@aaronadamsCA
Copy link
Author

Ha... unaddressed report from 2018 complete with repro:

https://issuetracker.google.com/issues/109956036

I added a comment (didn't mention gyb just in case they filter out issues that mention your GREAT project). I'm willing to bet this is unfixable on your end, since I can clearly see you're doing what you can.

@bvinnerd
Copy link

I'm seeing this issue as well, backing up a Workspace account and restoring to a free Gmail account.

I have a total of 53,001 messages in the backup, and on restore there was ~7,200 messages in the Spam folder.

My workaround was to move all of those messages in Spam back to Inbox (by selecting, 100 messages at a time and clicking the Not Spam button in the Gmail UI).

If you're going to do this, please ensure that you have 0 spam messages in the target Gmail account, otherwise you could end up moving genuine spam into your Inbox.

@flipflophhj
Copy link

flipflophhj commented Jan 26, 2022

I have this issue too 15000 msgs in spam. Mainly very old messages.

Also many seem gotten the date set to the restore time instead of the original date it was sent.

Most of the messages affected are from before 2000 but I also found one from 2003

jay0lee added a commit that referenced this issue Jan 26, 2022
--cleanup - ensure restored messages have a valid Message-ID, From and Date header. Should help with #342.
--cleanup-date - on --cleanup, use provided date instead of current date when we can't get a valid date on the message at all.
--cleanup-from - on --cleanup, use provided from header value when we can't get a valid value on the message at all.
@jay0lee
Copy link
Member

jay0lee commented Jan 26, 2022

I just released GYB 1.55 which adds a --cleanup option on restore. This tells GYB to confirm the message has a valid From:, Message-ID: and Date: header on it before restoring. This should prevent the message from landing in Spam.

Can a few people do some testing and confirm it works for them? See the 1.55 release details for more info:

https://git.io/gyb-releases

@flipflophhj
Copy link

flipflophhj commented Jan 26, 2022

Hm.. I thought if I emptied the spam folder and then did a restore it would restore all those messages again but it doesn't seem so.
What should I do ?
Doing an estimate to see if that helps.

@jay0lee
Copy link
Member

jay0lee commented Jan 26, 2022 via email

@flipflophhj
Copy link

Hm it would be nice to be able to label the messages that were cleaned up though. I tried to use label-restored but it labels everything now.

@flipflophhj
Copy link

Traceback (most recent call last):
File "gyb.py", line 2532, in
File "gyb.py", line 2007, in main
File "gyb.py", line 1769, in message_hygiene
File "gyb.py", line 1713, in cleanup_from
File "email\utils.py", line 215, in parseaddr
File "email_parseaddr.py", line 513, in init
File "email_parseaddr.py", line 256, in getaddrlist
TypeError: object of type 'Header' has no len()
[29748] Failed to execute script 'gyb' due to unhandled exception!

@flipflophhj
Copy link

Still got about 300 in spam of the 6000 restored before the exception

@jay0lee
Copy link
Member

jay0lee commented Jan 27, 2022

I can no longer reproduce the issue with the sample from the issue tracker and --cleanup. Can you share examples of messages that went to Spam?

@jay0lee
Copy link
Member

jay0lee commented Jan 27, 2022

I'd need to see the full headers as described at:

https://support.google.com/mail/answer/29436?hl=en

@flipflophhj
Copy link

Does it work to send the eml file ?

@jay0lee
Copy link
Member

jay0lee commented Jan 27, 2022

Yes, that's fine. You can post it here or email it to me.

@flipflophhj
Copy link

Ok I sent an email.

@flipflophhj
Copy link

Oh by the way I saw that all the mails that had the now() date after restore seems to have a correct date in msg-db.sqlite so maybe that could be used for --cleanup

@brechmos
Copy link

brechmos commented Jan 31, 2022

I am in the same boat (185,000 email to transfer though). I was watching my Spam as the transfer was happening and saw some go in and then automatically go out of Spam. I was nervous the "older than 30 days will be deleted" thing was happening faster than I was moving them out of Spam.

I am redoing my restore but put this filter in place:
image

I have not seen anything go to Spam. When the restore is done I'll turn off that filter.

I don't know enough about how quickly "older than 30 days" gets removed from Spam, and don't know if this is "the right thing to do" but it makes this data hoarder less nervous.

@jay0lee
Copy link
Member

jay0lee commented Jan 31, 2022 via email

@flipflophhj
Copy link

flipflophhj commented Feb 1, 2022 via email

@jhult
Copy link

jhult commented Mar 7, 2022

FWIW, I am also experiencing emails going into Spam (I have not yet tried --cleanup).

@Suncatcher
Copy link

Suncatcher commented May 20, 2022

Has anyone else tested with --cleanup to see if that helps?

Yes I did, and I can say: it doesn't work.

My numbers on restoration:

  • without --cleanup
    200+ messages in SPAM out of 1500

  • with --cleanup
    123 messages in SPAM out of 604

I was restoring different accounts so absolute numbers are different but you can easily calculate the percentage, it's nearly the same, with cleanup even worse.

@chrishoage
Copy link

chrishoage commented Jun 8, 2022

I have been doing an import moving 69k emails from a workspace account to a personal gmail account. I used --cleanup when doing the restore and it was still happening.

I have been running into this same issue.

The messages going to spam are decidedly the "spammier" ones, it's almost exclusively newsletters and notification emails; so it does seem like the spam filter is somehow processing each inbound message despite being asked not to.

This has been my experience as well. Lots of receipts, newsletters, etc.

I put in the same filter @brechmos did and that has helped eliminate messages going to spam. The downside to this is new emails are going to "All Mail" but I can live with this for not sending email to spam during the restore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants