# Counting Email Messages per Address using SQLite (Line-by-Line Explanation)

This tutorial explains a Python program that reads an email mailbox file  
and counts how many times each email address appears, storing the results  
in a SQLite database.

---

## 1. Importing the SQLite Library


In [2]:
import sqlite3

This module allows Python to create and interact with SQLite databases.

----

## 2. Connecting to the SQLite Database


In [3]:
conn = sqlite3.connect('emaildb.sqlite')


Creates (or opens if it already exists) a SQLite database file named `emaildb.sqlite`.

The database file will be created in the current working directory.


In [4]:
cur = conn.cursor()


In [1]:
import sqlite3

def main():
    # 1) Connect to SQLite DB file (this will create it if it does not exist)
    conn = sqlite3.connect('orgdb.sqlite')
    cur = conn.cursor()

    # 2) Create table (fresh each run)
    cur.execute('DROP TABLE IF EXISTS Counts')
    cur.execute('CREATE TABLE Counts (org TEXT, count INTEGER)')

    # 3) Input file name
    fname = input('Enter file name: ')
    if len(fname) < 1:
        fname = 'mbox.txt'

    # 4) Read file and count orgs
    with open(fname, encoding='utf-8', errors='ignore') as fh:
        for line in fh:
            if not line.startswith('From '):
                continue

            pieces = line.split()
            if len(pieces) < 2:
                continue

            email = pieces[1]
            if '@' not in email:
                continue

            org = email.split('@')[1]   # domain part

            cur.execute('SELECT count FROM Counts WHERE org = ?', (org,))
            row = cur.fetchone()

            if row is None:
                cur.execute('INSERT INTO Counts (org, count) VALUES (?, 1)', (org,))
            else:
                cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?', (org,))

    # 5) Commit once (fast)
    conn.commit()

    # 6) Optional: show top 10
    print("\nTop organizations:")
    for row in cur.execute('SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'):
        print(row[0], row[1])

    cur.close()
    conn.close()
    print("\nDone. Database saved as orgdb.sqlite")

if __name__ == '__main__':
    main()



Top organizations:
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17

Done. Database saved as orgdb.sqlite
