# 🧑‍🏫 Class 9 – File Handling & Regular Expressions

**Objective:** Understand how to handle files in Python (both text and binary), and how to use regular expressions for pattern matching and text extraction.

Let's break this lesson into five parts: Introduction, File Handling Basics, Regular Expressions, File Processing using RegEx, and Recap.

## 🔥 Warm-Up: What Are Files and Regex?

### What are Files?
Think of a file like a notebook. A text file is like writing with a pen, you can open it and read what's inside.

### Why Handle Files in Programming?
Imagine your program is like a robot. File handling allows this robot to read instructions or save information to share with others or remember things later.

### Types of Files:
- **Text Files:** Like `.txt`, readable by humans
- **Binary Files:** Like images or videos, not human-readable

### What Are Regular Expressions?
Regular expressions (regex) are like patterns used to search through text. If you want to find all phone numbers or emails in a big document, regex helps like a magnifying glass for patterns.

In [1]:
# Quick Demo: Check if os module works
import os
print("Operating System Name:", os.name)  # 'posix' for Linux/Mac, 'nt' for Windows

Operating System Name: nt


## 📂 File Handling Basics

Python can open files to read or write just like opening a notebook to read or write something. The `open()` function is your key tool here.

**Common Modes:**
- `'r'` – Read
- `'w'` – Write (overwrite)
- `'a'` – Append
- `'rb'` – Read Binary
- `'wb'` – Write Binary

In [1]:
# Reading a text file
# Example: Reading your diary to see what you wrote yesterday
with open("sample.txt", "r") as file:
    content = file.read()
    print(content)

Hello! My name is Neha Shrestha. 


In [2]:
# Writing to a text file
# Example: Writing a new note in your notebook
with open("output.txt", "w") as file:
    file.write("Hello, this is a test.")

In [3]:
# Reading a binary file (like reading raw image data)
# This shows only the first 20 bytes, like peeking into a photo's digital data
with open("image.jpg", "rb") as file:
    data = file.read()
    print(data[:20])

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00'


In [4]:
# Writing to a binary file
# Copying the data to a new image file
with open("output_image.jpg", "wb") as file:
    file.write(data)

## 🧵 Introduction to Regular Expressions

The `re` module in Python helps you find patterns in text.

Think of it like CTRL + F in Word, but smarter — it can look for patterns, not just fixed words.

For example, if you want to find all the words in a sentence or find email addresses, regex can do that easily!

In [5]:
# Match all words in a string
# \b\w+\b means word boundaries with one or more word characters
import re
text = "The quick brown fox"
pattern = r"\b\w+\b"
matches = re.findall(pattern, text)
print("Words Found:", matches)

Words Found: ['The', 'quick', 'brown', 'fox']


In [10]:
# pattern = r"\b\w+\b"

# r = raw string 

# This tells don't treat backslash as special escape character.
# For example: "\n" means a new line but r"\n" just means a backslash and an "n"

# We use a lot of backslashes so r"" helps to avoid confusion

# \b = Word Boundary

# This is like fence "this is where a word starts or ends"
# It doesn't match a letter — it matches the space between a word and something else (like space, punctuation, or the start/end of a line)

# \w = Word Character

# This means any letter, number, or underscore (_).

# + = One or more

# This says: “Match one or more of the thing before me.”

# So \w+ means: match one or more letters/numbers/underscores in a row — basically, a word.

# \b\w+\b = Altogether

# This pattern finds entire words in a sentence.
# Beginning of word(\b), collects all the letters/numbers(\w+) and stops at the end of the word(\b)



In [None]:
# Search for a social security number format (###-##-####)
# Useful for finding ID patterns
pattern = r"\d{3}-\d{2}-\d{4}"
text = "My SSN is 123-45-6789"
matches = re.findall(pattern, text)
print(matches)


match = re.search(pattern, text)
if match:
    print("Found:", match.group())

['123-45-6789']
Found: 123-45-6789


In [11]:
# pattern = r"\d{3}-\d{2}-\d{4}"

# r"" = Raw String

# \d = \d 


# \d = Single Digit(0 to 9)

# \d means find one digit


# {3}, {2}, {4} = Repetition

# Curly braces mean how many times to repeat

# \d{3} means match exactly 3 digits

# \d{2} means match exactly 2 digits

# \d{4} means match exactly 4 digits


# - Hyphen is just a regular dash matching "-" in the text

In [None]:
# Extract emails from a string
# This finds any valid email address
pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
text = "Send an email to example@example.com for more info."
emails = re.findall(pattern, text)
print("Emails:", emails)

In [None]:
# \b = Word Boundary (marks the start and end of email address)

# [A-Za-z0-9._%+-]+ = This is the username part(before @ symbol)
# [ ] means: "match any one character from this list."
# A-Z = uppercase letters
# a-z = lowercase letters
# 0-9 = numbers
# ._%+- = special characters often used in emails
# The + means: "match one or more of these characters."

#  So this matches things like:
# hello
# john.doe
# user_name123+tag

# @
# This just matches the @ symbol in the email address.

# [A-Za-z0-9.-]+
# This is the domain name (after the @ symbol), like:
# gmail
# my-school
# example123
# It includes:
# Letters (A-Z, a-z)
# Numbers (0-9)
# Dots . and dashes -

# \.
# This matches the dot before the domain ending like .com, .org, .net.

# [A-Z|a-z]{2,}
# This is the domain ending, like:
# com, org, net, edu, etc.

# But here's a small catch:
#The part [A-Z|a-z] includes | by mistake (a common error in regex).

# {2,} = at least 2 letters (like com, io, edu, etc.)

# Final \b
# Marks the end of the word (email address).

## 🛠 File Processing with Regular Expressions

Now let’s combine both: read a file, and then use regular expressions to extract specific patterns like phone numbers.

### Real-Life Example:
Imagine you have a contact list and want to extract all phone numbers from it — regex makes this job super simple.

In [8]:
# Extracting phone numbers from a text file
with open("contacts.txt", "r") as file:
    content = file.read()

pattern = r"\b9\d{9}\b"
phone_numbers = re.findall(pattern, content)
print("Phone Numbers Found:", phone_numbers)

Phone Numbers Found: ['9813337680']


In [None]:
phone_pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"

### Count Word Occurrences in a File
Let’s count how many times the word "Python" appears in a file.

Useful when analyzing text for keywords!

In [9]:
# Count the number of times a word appears
word_to_count = "Python"
with open("sample.txt", "r") as file:
    content = file.read()

matches = re.findall(rf"\b{word_to_count}\b", content, re.IGNORECASE)
print(f"'{word_to_count}' found {len(matches)} times.")

'Python' found 2 times.


## 🧠 Wrap-Up

- Files store data permanently, and we can use Python to read/write them.
- Regular expressions help find patterns in text like phone numbers and emails.
- These skills are essential in data processing and automation.

### Homework ✍️
1. Create a Python script to count how many times a particular word appears in a file.
2. Extract all email addresses from a file using regex.

Happy Coding! 🐍