Read a file from disk and count how many times each email address appears.



1. What is the format of the input file — one email per line, or embedded in text?
2. Are the emails well-formed (e.g., user@example.com)? Should we validate them?
3. Should the output be:Total count of all emails?Frequency per email address? Sorted by frequency?


Approach 
Assuming emails are embedded in arbitrary text, I’ll use a regular expression to extract all email addresses. I’ll read the file line-by-line to keep memory usage low and use a dictionary (hashmap) to count frequency of each email.
We use dict for O(1) insertion and lookup while counting.



 Time and Space Complexity
Time: O(n × m)
n = number of lines
m = number of email matches per line (regex scan)

Space: O(u)
u = number of unique email addresses

In [None]:
import re
from collections import defaultdict

# Step 1: Email pattern (basic one)
email_pattern = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')

# Step 2: Dictionary to count frequency
email_counts = defaultdict(int)

# Step 3: Read file line by line
with open('input.txt') as f:
    for line in f:
        # Step 4: Extract all emails in the current line
        emails = email_pattern.findall(line)
        for email in emails:
            email_counts[email] += 1

# Step 5: Print email frequencies
for email, count in sorted(email_counts.items(), key=lambda x: -x[1]):
    print(f"{email}: {count}")


✅ Output Example
If input.txt contains:

Hi john.doe@example.com, please contact jane@abc.com. john.doe@example.com is repeated.

Output:

john.doe@example.com: 2
jane@abc.com: 1