<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Gmail - List most interesting emails
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Gmail/Gmail_Get_emails_stats_by_sender.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=&template=template-request.md&title=Tool+-+Action+of+the+notebook+">Template request</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=Gmail+-+Get+emails+stats+by+sender:+Error+short+description">Bug report</a> | <a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_Start_data_product.ipynb" target="_parent">Generate Data Product</a>

**Tags:** #gmail #productivity #naas_drivers #operations #automation #analytics #plotly

**Author:** [Antonio Georgiev](www.linkedin.com/in/antonio-georgiev-b672a325b)

**Description:** This notebook analyses users' inbox and extracts a list of senders that the user is most interested in, depending on the user's opening rate of the sender's emails in the last two weeks.

## Input

### Import libraries

In [2]:
import naas
from naas_drivers import email
import pandas as pd
import numpy as np
import plotly.express as px
from datetime import datetime, timedelta
import pytz

### Setup Variables
Create an application password following [this procedure](https://support.google.com/mail/answer/185833?hl=en)
- `username`: This variable stores the username or email address associated with the email account
- `password`: This variable stores the password or authentication token required to access the email account
- `smtp_server`: This variable represents the SMTP server address used for sending emails.
- `box`: This variable stores the name or identifier of the mailbox or folder within the email account that will be accessed.

In [3]:
username = "theuniverse.bg@gmail.com"
password = naas.secret.get("GMAIL_APP_PASSWORD")
smtp_server = "imap.gmail.com"
box = "INBOX"

## Model

### Connect to email box

In [4]:
emails = email.connect(username, password, username, smtp_server)

### Get all seen emails list

In [5]:
seen_emails = emails.get(box="INBOX", criteria="seen", mark="unseen")
print(len(seen_emails))

In [12]:
today = datetime.now().date()
two_weeks_ago = today - timedelta(days=14)
seen_emails = emails.get_emails_by_date(date=two_weeks_ago, condition="before")
print(len(seen_emails))

KeyError: '"date__lt" is an invalid parameter.'

### Filter emails for the last two weeks

In [40]:
two_weeks_ago = pd.Timestamp.now() - pd.DateOffset(weeks=20)
two_weeks_ago = two_weeks_ago.tz_localize('UTC')
seen_emails['date'] = pd.to_datetime(seen_emails['date'], utc=True)
filtered_emails = seen_emails[seen_emails['date'] >= two_weeks_ago]

In [41]:
filtered_emails

Unnamed: 0,uid,subject,from,to,cc,bcc,reply_to,date,text,html,flags,headers,size_rfc822,size,obj,attachments
34,1021,Verify your e-mail address,"{'email': 'wifi@trevisoairport.it', 'name': ''...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],[],2023-02-23 16:37:53+00:00,Visit this link to verify your account and sta...,<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 T...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",25114,24807,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
35,1111,Welcome to 000webhost!,"{'email': 'clients@000webhost.com', 'name': '0...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': 'clients@000webhost.com', 'name': '...",2023-03-20 21:51:01+00:00,\r\n,<!-- title -->\r\n\r\n\r\n\r\n\r\n\r\n<!-- con...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",21677,21097,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
36,1113,"Everything has a name, what about your website?","{'email': 'clients@000webhost.com', 'name': '0...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': 'clients@000webhost.com', 'name': '...",2023-03-20 22:50:57+00:00,Enter your TEXT content here\r\n,<!-- title -->\r\n\r\n\r\n\r\n<!-- content -->...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",20108,19588,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
37,1153,Password Reset at 000webhost,"{'email': '000webhost@hostingermail.com', 'nam...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': '000webhost@hostingermail.com', 'na...",2023-03-31 12:27:32+00:00,"Hello theuniverse.bg ,\r\n\r\nWe have received...",<!-- content -->\r\n\r\n\r\n\r\n\r\n\r\n\r\n<!...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",44422,43101,"[Delivered-To, Received, X-Received, X-Google-...",0
38,1154,Password Reset at 000webhost,"{'email': '000webhost@hostingermail.com', 'nam...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': '000webhost@hostingermail.com', 'na...",2023-03-31 12:29:58+00:00,"Hello theuniverse.bg ,\r\n\r\nWe have received...",<!-- content -->\r\n\r\n\r\n\r\n\r\n\r\n\r\n<!...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",44417,43096,"[Delivered-To, Received, X-Received, X-Google-...",0
39,1155,Password Reset at 000webhost,"{'email': '000webhost@hostingermail.com', 'nam...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': '000webhost@hostingermail.com', 'na...",2023-03-31 12:34:19+00:00,"Hello theuniverse.bg ,\r\n\r\nWe have received...",<!-- content -->\r\n\r\n\r\n\r\n\r\n\r\n\r\n<!...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",44244,42924,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
40,1183,Verify your e-mail address,"{'email': 'wifi@trevisoairport.it', 'name': ''...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],[],2023-04-07 11:25:38+00:00,Visit this link to verify your account and sta...,<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 T...,"(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",25054,24747,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
41,1302,Your subscription to Pauline Tantot has expired,"{'email': 'no-reply@onlyfans.com', 'name': 'On...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],"[{'email': 'support@onlyfans.com', 'name': '',...",2023-05-09 20:18:16+00:00,,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...","(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",17461,17033,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
42,1493,"“Mitev, T” cited by “Neofitov, Alexander”","{'email': 'premium@academia-mail.com', 'name':...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],[],2023-06-23 19:39:56+00:00,"Dear Tony,\r\n\r\n“Mitev, T” cited by “Neofito...","<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 S...","(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",38811,37860,"[Delivered-To, Received, X-Google-Smtp-Source,...",0
43,1507,"📄 ""Stylistics: 'Negro' by Langston Hughes "" by...","{'email': 'updates@academia-mail.com', 'name':...","[{'email': 'theuniverse.bg@gmail.com', 'name':...",[],[],[],2023-06-27 07:43:07+00:00,From your Reading History:\r\n\r\nStylistics: ...,"<!DOCTYPE html>\r\n<html xmlns=""http://www.w3....","(SEEN,)","{'delivered-to': ('theuniverse.bg@gmail.com',)...",50543,49420,"[Delivered-To, Received, X-Google-Smtp-Source,...",0


### Get most viewed senders by counting the occurencies

In [42]:
most_viewed_senders = filtered_emails['from'].value_counts().head(10)

In [43]:
most_viewed_senders

TypeError: unhashable type: 'dict'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas/_libs/hashtable_class_helper.pxi", line 4588, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'dict'


{'email': 'no-reply@accounts.google.com', 'name': 'Google', 'full': 'Google <no-reply@accounts.google.com>'}                                                  14
{'email': '000webhost@hostingermail.com', 'name': '000webhost.com', 'full': '000webhost.com <000webhost@hostingermail.com>'}                                   3
{'email': 'info@n.myprotein.com', 'name': 'Myprotein', 'full': 'Myprotein <info@n.myprotein.com>'}                                                             3
{'email': 'wifi@trevisoairport.it', 'name': '', 'full': 'wifi@trevisoairport.it'}                                                                              2
{'email': 'toktok@info.glovoapp.com', 'name': 'Glovo', 'full': 'Glovo <toktok@info.glovoapp.com>'}                                                             2
{'email': 'clients@000webhost.com', 'name': '000webhost', 'full': '000webhost <clients@000webhost.com>'}                                                       2
{'email': 'news@mailing.tommy.com'

### Identify the top 3 senders

In [44]:
top_senders = most_viewed_senders.head(3).index.tolist()

In [45]:
top_senders

[{'email': 'no-reply@accounts.google.com',
  'name': 'Google',
  'full': 'Google <no-reply@accounts.google.com>'},
 {'email': '000webhost@hostingermail.com',
  'name': '000webhost.com',
  'full': '000webhost.com <000webhost@hostingermail.com>'},
 {'email': 'info@n.myprotein.com',
  'name': 'Myprotein',
  'full': 'Myprotein <info@n.myprotein.com>'}]

### Find the unseen emails for the past two weeks from the top senders

In [46]:
for sender in top_senders:
    unseen_emails = emails.get(box="INBOX", criteria="unseen", mark="unseen").get_emails_by_sender(sender, exact=True)
    unseen_emails['date'] = pd.to_datetime(unseen_emails['date'], utc=True)
    unseen_filtered_emails = unseen_emails[unseen_emails['date'] >= two_weeks_ago]

AttributeError: 'DataFrame' object has no attribute 'get_emails_by_sender'

## Output

### Print the list with the unseen emails for the past two weeks from the top senders

In [22]:
print("The unseen emails by the top 3 senders (based on opened emails) for the last 2 weeks:")
print(unseen_filtered_emails)

The unseen emails by the top 3 senders (based on opened emails) for the last 2 weeks:
Empty DataFrame
Columns: [uid, subject, from, to, cc, bcc, reply_to, date, text, html, flags, headers, size_rfc822, size, obj, attachments]
Index: []
