# Counting Email Messages per Address using SQLite (Line-by-Line Explanation)

This tutorial explains a Python program that reads an email mailbox file  
and counts how many times each email address appears, storing the results  
in a SQLite database.

---

## 1. Importing the SQLite Library


In [2]:
import sqlite3

This module allows Python to create and interact with SQLite databases.

----

## 2. Connecting to the SQLite Database


In [3]:
conn = sqlite3.connect('emaildb.sqlite')


Creates (or opens if it already exists) a SQLite database file named `emaildb.sqlite`.

The database file will be created in the current working directory.


In [4]:
cur = conn.cursor()


Creates a cursor object.

The cursor is used to execute SQL commands and retrieve results.

---

## 3. Dropping the Table if It Exists


In [5]:
cur.execute('DROP TABLE IF EXISTS Counts')


<sqlite3.Cursor at 0x1e228ff44c0>

Deletes the table `Counts` if it already exists.

This ensures the database starts fresh every time the program runs.

---

## 4. Creating the Counts Table


In [6]:
cur.execute('''
CREATE TABLE Counts (email TEXT, count INTEGER)
''')


<sqlite3.Cursor at 0x1e228ff44c0>

Creates a table named `Counts`.

### Columns:

- `email` → stores the email address (TEXT)
- `count` → stores how many times the email appears (INTEGER)

---

## 5. Asking for the Input File Name

In [13]:
fname = input('Enter file name: ')


Prompts the user to enter the name of the mailbox file

In [14]:
if (len(fname) < 1): fname = 'mbox.txt'


If the user presses Enter without typing anything,
the program defaults to `mbox-short.txt`.

---

## 6. Opening the Mailbox File

In [15]:
fh = open(fname)


Opens the mailbox file for reading.

---


## 7.Processing the Mailbox File (The `for` Loop)

The following loop reads the mailbox file line by line, extracts email
addresses, and updates their counts in the database.


In [22]:
for line in fh:
    if not line.startswith('From '):
        continue

    pieces = line.split()
    email = pieces[1]

    cur.execute('SELECT count FROM Counts WHERE email = ?', (email,))
    row = cur.fetchone()

    if row is None:
        cur.execute(
            'INSERT INTO Counts (email, count) VALUES (?, 1)',
            (email,)
        )
    else:
        cur.execute(
            'UPDATE Counts SET count = count + 1 WHERE email = ?',
            (email,)
        )


### Explanation of the `for` Loop Logic

This section explains how the program processes the mailbox file and updates
the database using a `for` loop.

- The loop reads the mailbox file **line by line**.
- Each iteration processes exactly one line from the file.

Only lines that start with `"From "` are useful for this task.
All other lines are skipped because they do not contain sender email addresses.

When a valid line is found:
- The line is split into individual words.
- The email address is extracted as the second word in the line.

The program then checks the database to see if this email address already exists.

If the email is **not found** in the database:
- A new row is inserted into the table.
- The count for that email is initialized to `1`.

If the email **already exists**:
- The existing row is updated.
- The count value is increased by `1`.

This process continues until the entire file has been read.

As a result, the database contains one row per email address,
with the count showing how many times that email appeared in the mailbox file.

This loop is the core logic that combines:
- File reading
- Data extraction
- Database insertion and updating

---

## 8. Saving Changes to the Database


In [23]:
conn.commit()


- Saves all changes made to the database.

- Committing once after the loop greatly improves performance.

---

## 9. Retrieving the Top Results

In [24]:
sqlstr = 'SELECT email, count FROM Counts ORDER BY count DESC LIMIT 10'


- Selects email addresses and their counts.

- Sorts results in descending order by count.

- Limits the output to the top 10 email addresses.
  
---

## 10. Displaying the Results

In [25]:
for row in cur.execute(sqlstr):
    print(str(row[0]), row[1])


zqian@umich.edu 195
mmmay@indiana.edu 161
cwen@iupui.edu 158
chmaurer@iupui.edu 111
aaronz@vt.edu 110
ian@caret.cam.ac.uk 96
jimeng@umich.edu 93
rjlowe@iupui.edu 90
dlhaines@umich.edu 84
david.horwitz@uct.ac.za 67


- Executes the SQL query.

- Prints each email address along with its count.

---

## 11. Closing the Database Resources

In [26]:
cur.close()
conn.close()


## Final Program Flow Summary

This program performs the following steps:

- Connects to a SQLite database
- Creates a fresh table for counting emails
- Reads a mailbox file line by line
- Extracts email addresses from valid lines
- Inserts or updates counts in the database
- Commits all changes once for efficiency
- Retrieves and displays the top results
- Closes all database resources

This project is a complete example of combining  
Python file handling with SQL database operations.


In [18]:
import sqlite3

def main():
    # 1) Connect to SQLite DB file (this will create it if it does not exist)
    conn = sqlite3.connect('orgdb.sqlite')
    cur = conn.cursor()

    # 2) Create table (fresh each run)
    cur.execute('DROP TABLE IF EXISTS Counts')
    cur.execute('CREATE TABLE Counts (org TEXT, count INTEGER)')

    # 3) Input file name
    fname = input('Enter file name: ')
    if len(fname) < 1:
        fname = 'mbox.txt'

    # 4) Read file and count orgs
    with open(fname, encoding='utf-8', errors='ignore') as fh:
        for line in fh:
            if not line.startswith('From '):
                continue

            pieces = line.split()
            if len(pieces) < 2:
                continue

            email = pieces[1]
            if '@' not in email:
                continue

            org = email.split('@')[1]   # domain part

            cur.execute('SELECT count FROM Counts WHERE org = ?', (org,))
            row = cur.fetchone()

            if row is None:
                cur.execute('INSERT INTO Counts (org, count) VALUES (?, 1)', (org,))
            else:
                cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?', (org,))

    # 5) Commit once (fast)
    conn.commit()

    # 6) Optional: show top 10
    print("\nTop organizations:")
    for row in cur.execute('SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'):
        print(row[0], row[1])

    cur.close()
    conn.close()
    print("\nDone. Database saved as orgdb.sqlite")

if __name__ == '__main__':
    main()



Top organizations:
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17

Done. Database saved as orgdb.sqlite
