<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Gmail - Get most common senders
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Gmail/Gmail_Get_most_common_senders.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://bit.ly/3JyWIk6">Give Feedbacks</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=Gmail+-+Get+most+common+senders:+Error+short+description">Bug report</a>

**Tags:** #gmail #productivity #naas_drivers #operations #automation #analytics #plotly

**Author:** [Antonio Georgiev](www.linkedin.com/in/antonio-georgiev-b672a325b)

**Description:** This notebook analyses users' inbox, identifies a list of the most common senders depending on the emails for the set period of time, and outputs the list of most common senders.
This notebook aims to identify unwanted subscriptions or emails that Gmail didn't successfully filter as "Spam."

## Input

### Import libraries

In [1]:
import datetime
import os
from imapclient import IMAPClient
import naas
from collections import Counter
import quopri
import email.header

### Setup Variables
Create an application password following [this procedure](https://support.google.com/mail/answer/185833?hl=en)
- `username`: This variable stores the username or email address associated with the email account
- `password`: This variable stores the password or authentication token required to access the email account
- `date_start`: Number of days to filter your inbox, it must be negative value
- `most_common_senders`: Number of most common senders you want to list as output

In [2]:
username = "xxxxx@xxxx"
password = naas.secret.get("GMAIL_APP_PASSWORD")
date_start = -30
most_common_senders = 10

## Model

### Connect to email box

In [3]:
server = IMAPClient('imap.gmail.com')
server.login(username, password)
server.select_folder('INBOX')
print("✅ Successfully connected to INBOX")

### Get all emails for the set period of time with their flags (seen or unseen), date, and sender

In [4]:
today = datetime.date.today()
start = today + datetime.timedelta(days=date_start)
all_messages = server.search(['SINCE', start.strftime('%d-%b-%Y')])
all_metadata = server.fetch(all_messages, ['RFC822.SIZE', 'FLAGS', 'INTERNALDATE', 'ENVELOPE'])
print("✅ All emails fetched:", len(all_metadata))

### Get the most common senders using the method most_common

##### The method most_common identifies the senders with the highest index of occurrences and outputs the sorted list in descending order

In [5]:
senders = []
for msg_id, data in all_metadata.items():
    envelope = data[b'ENVELOPE']
    if envelope.from_:
        sender_email = envelope.from_[0].mailbox.decode() + "@" + envelope.from_[0].host.decode()
        senders.append(sender_email)

sender_counts = Counter(senders)
top_senders = sender_counts.most_common(most_common_senders)

## Output

### Print the list with the unseen emails for the past two weeks from the top senders

In [6]:
print(f"The {most_common_senders} most common senders:")
for sender, count in top_senders:
    print(f"{sender}: {count} emails")