Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent duplicates when using Imap archive function? #72

Closed
gusreyes01 opened this issue Nov 8, 2015 · 10 comments
Closed

How to prevent duplicates when using Imap archive function? #72

gusreyes01 opened this issue Nov 8, 2015 · 10 comments

Comments

@gusreyes01
Copy link

Here is the URI I'm using to fetch emails:

imap+ssl://test@karen-huber.com:xxxxxx@secure.emailsrvr.com?archive=Inbox

when running:

python manage.py getmail

I'm getting my email downloaded again. Is there any way to prevent a same email from being fetched more than once? Perhaps filtering by messge_id?

@gusreyes01
Copy link
Author

I ended up making my own proxy model:

class AdastraMailbox(Mailbox):
    class Meta:
        proxy = True

    def get_new_mail(self, condition=None):
        """Connect to this transport and fetch new messages."""
        new_mail = []
        connection = self.get_connection()
        if not connection:
            return new_mail
        for message in connection.get_message(condition):
            message_id = message['message-id'][0:255]
            if not Message.objects.filter(message_id=message_id).exists():
                msg = self.process_incoming_message(message)
                new_mail.append(msg)
        return new_mail

It works, as a suggestion it would be nice to have this feature of the box 👍

@ad-m
Copy link
Collaborator

ad-m commented Nov 8, 2015

What mailbox do you use? What IMAP provider/server? Django Mailbox will delete messages immediately after processing them. Why the mail wasn't deleted? Is the question.

@ad-m
Copy link
Collaborator

ad-m commented Nov 8, 2015

I don't think your code are appreciate to merge. It will generate a tons of queries each update if mailbox aren't clean-uped. Mailbox SHOULD be clean-uped every update.

@gusreyes01
Copy link
Author

I'm using Rackspace Webmail .

Mailbox does delete the messeges but I'm using the "archive=Inbox" parameter as specified on the docs and this allows me to keep a copy of email on the server. I don't see why it should be completely cleaned up on every update, I thought that was exactly IMAP was for ( only sync , not delete ) . I'm aware my solution isn't the most efficient but it works for now. I'd appreciate any feedback on this.

@ad-m
Copy link
Collaborator

ad-m commented Nov 8, 2015

Do you receive new e-mails to folder "Inbox" and archive them to folder "Inbox" again?

@gusreyes01
Copy link
Author

Exactly

@coddingtonbear
Copy link
Owner

Ahh; that's a very different use case than the one that caused the "archive" feature to be added; the idea behind the archive feature is that the module will copy the email message to a different folder for safekeeping after processing.

Regarding IMAP and sync vs. delete: although the protocol does allow for such a thing, django-mailbox always consumes every message it can find in its specified mailbox. Checking each message to determine whether it can be duplicated will get more and more time consuming over time, so for mailboxes having thousands of messages, you may want to consider the approach described above.

@gusreyes01
Copy link
Author

The problem with consuming and deleting is that it seems more like a POP3 approach. IMAP was designed to access elements directly on their server so IMAPs benefits cannot be taken advantage of when using django-mailbox :( .

I found a similar django-client that does this using IMAP sync:
https://github.com/iggy/simone
However it lacks many features and doesn't seem as documented as django-mailbox.

Do you have any suggestion on dealing with thousands of messages without ending with my Hard Drive space?

Thanks.

@leifurhauks
Copy link

It could be possible to write an feature that, if a certain option is enabled in the settings module, only retrieves messages marked as unread, and which marks as read all messages that it downloads. I actually thought about doing that myself a while back, but abandoned the idea when I realized it didn't fit my use case.

Another option would be to store a copy of the mailbox locally, and when connecting to the imap server, only retrieve messages which don't exist in your local copy (by uid). But that would likely involve even more extensive changes to the package.

Much simpler would be to use the archive feature to move already retrieved messages from the inbox to another folder, with a name like 'archived' or 'processed'.

If you need the inbox to stay intact, one possibility might be to arrange on the server for all incoming messages to be copied to some folder, let's say it's called 'django-mailbox'. Then you can have django-mailbox consume from that folder, deleting or archiving the retrieved messages from that folder without touching the inbox.

@coddingtonbear
Copy link
Owner

Hey there guys; I'm going to close this given that there isn't really anything actionable for anybody to do here. Let me know if you need any other help.

Cheers,
Adam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants