### **<span style="color: #28B463;">What is open()?</span>**

* **<span style="color: #E74C3C;">Definition:</span>** The `open()` function in Python is used to open files for reading, writing, or both.
* **<span style="color: #2E86C1;">Key Point:</span>** It returns a file object that provides methods to interact with the file.

**<span style="color: #2E86C1;">Tip:</span>** Always use `with open()` to handle files ‚Äî it closes the file automatically.

In [14]:
with open('../../datasets/samplefile.txt','r+') as file:
    text = file.read()

print(text)

Hello Ankush,  

Please contact John Doe at john.doe@example.com or jane_doe99@mail.co.uk.  
Our website: https://www.example.com/path?query=123 and backup: http://sub.example.org.  

Order IDs: ORD-2025-0001, ORD-2024-9876.  
Invoice total: $1,299.99 USD, Discount: 15%, Weight: 2.5kg, Height: 180cm.  

Meeting scheduled on 13-08-2025 at 10:45 AM.  
Backup date: 2025/08/20 or 20 Aug 2025.  

Phone numbers: +91-9876543210, (022) 2345-6789, 555-1234.  

Random strings: abc123xyz, XY_42_test, A1B2C3.  

Note: Some emails may be fake@example, some links without protocol: www.fake-site.net.  


## **Open Modes Available**

| Mode  | Meaning                                 | File must exist? | Overwrites file? | Allows Reading? | Allows Writing? | Cursor Position on Open |
|-------|-----------------------------------------|------------------|------------------|-----------------|-----------------|-------------------------|
| `r`   | Read (text)                             | ‚úÖ Yes           | ‚ùå No            | ‚úÖ Yes          | ‚ùå No           | Beginning               |
| `w`   | Write (text)                            | ‚ùå No            | ‚úÖ Yes (truncates) | ‚ùå No           | ‚úÖ Yes          | Beginning (truncates)   |
| `a`   | Append (text)                           | ‚ùå No            | ‚ùå No            | ‚ùå No           | ‚úÖ Yes          | End of file             |
| `r+`  | Read and write (text)                   | ‚úÖ Yes           | ‚ùå No            | ‚úÖ Yes          | ‚úÖ Yes          | Beginning               |
| `w+`  | Write and read (text)                   | ‚ùå No            | ‚úÖ Yes (truncates) | ‚úÖ Yes          | ‚úÖ Yes          | Beginning (truncates)   |
| `a+`  | Append and read (text)                  | ‚ùå No            | ‚ùå No            | ‚úÖ Yes          | ‚úÖ Yes          | End (writes), can read anywhere |
| `rb`  | Read (binary)                           | ‚úÖ Yes           | ‚ùå No            | ‚úÖ Yes          | ‚ùå No           | Beginning               |
| `wb`  | Write (binary)                          | ‚ùå No            | ‚úÖ Yes (truncates) | ‚ùå No           | ‚úÖ Yes          | Beginning (truncates)   |
| `ab`  | Append (binary)                         | ‚ùå No            | ‚ùå No            | ‚ùå No           | ‚úÖ Yes          | End of file             |
| `rb+` | Read and write (binary)                 | ‚úÖ Yes           | ‚ùå No            | ‚úÖ Yes          | ‚úÖ Yes          | Beginning               |
| `wb+` | Write and read (binary)                 | ‚ùå No            | ‚úÖ Yes (truncates) | ‚úÖ Yes          | ‚úÖ Yes          | Beginning (truncates)   |
| `ab+` | Append and read (binary)                | ‚ùå No            | ‚ùå No            | ‚úÖ Yes          | ‚úÖ Yes          | End (writes), can read anywhere |


## **Commonly Used File Object Methods**

| Method                          | Description                                                                                 | Returns                         | Common Usage Example                         |
|---------------------------------|---------------------------------------------------------------------------------------------|----------------------------------|-----------------------------------------------|
| `file.read(size=-1)`            | Reads the entire file or up to `size` characters/bytes.                                     | `str` (text) or `bytes` (binary) | `data = f.read()`                             |
| `file.readline(size=-1)`        | Reads one line from the file (up to `size` chars if specified).                             | `str` or `bytes`                 | `line = f.readline()`                         |
| `file.readlines(hint=-1)`       | Reads all lines into a list (optional `hint` limits total chars read).                      | `list` of `str`/`bytes`          | `lines = f.readlines()`                       |
| `file.write(string)`            | Writes a string (text mode) or bytes (binary mode) to the file.                             | Number of characters/bytes written | `f.write("Hello")`                         |
| `file.writelines(list_of_str)`  | Writes a list of strings to the file (no newline added automatically).                      | None                             | `f.writelines(["A\n", "B\n"])`                |
| `file.seek(offset, whence=0)`   | Moves the file cursor to a position. `whence`: 0 = start, 1 = current, 2 = end.              | New cursor position (int)        | `f.seek(0)`                                   |
| `file.tell()`                   | Returns the current cursor position in the file.                                            | Integer                          | `pos = f.tell()`                              |
| `file.flush()`                  | Flushes the internal buffer, forcing write to disk immediately.                             | None                             | `f.flush()`                                   |
| `file.truncate(size=None)`      | Truncates the file to `size` bytes (default: current position).                             | None                             | `f.truncate(0)`                               |
| `file.close()`                  | Closes the file, freeing system resources.                                                  | None                             | `f.close()`                                   |
| `file.__enter__()` / `__exit__()` | Context manager methods for use with `with` statements.                                    | File object / None               | `with open(...) as f:`                        |
| `file.readable()`               | Checks if the file stream supports reading.                                                 | `bool`                           | `f.readable()`                                |
| `file.writable()`               | Checks if the file stream supports writing.                                                 | `bool`                           | `f.writable()`                                |
| `file.seekable()`               | Checks if the file stream supports `seek()`.                                                | `bool`                           | `f.seekable()`                                |


### **<span style="color: #28B463;">What is Regex?</span>**

* **<span style="color: #E74C3C;">Definition:</span>** Regex (Regular Expression) is a sequence of characters that defines a search pattern.
* **<span style="color: #2E86C1;">Key Point:</span>** Useful for pattern matching, text searching, and text manipulation.

### **<span style="color: #28B463;">Common Use Cases</span>**

1. **Validation** ‚Äì Emails, phone numbers, dates
2. **Searching** ‚Äì Find specific words or patterns in text
3. **Data Cleaning** ‚Äì Remove unwanted characters or whitespace

In [6]:
import re 
regex = re.findall(r'\b\w+\b', text)
print("Printing all the words in the text file:")
print(regex)

Printing all the words in the text file:
['Hello', 'Ankush', 'Please', 'contact', 'John', 'Doe', 'at', 'john', 'doe', 'example', 'com', 'or', 'jane_doe99', 'mail', 'co', 'uk', 'Our', 'website', 'https', 'www', 'example', 'com', 'path', 'query', '123', 'and', 'backup', 'http', 'sub', 'example', 'org', 'Order', 'IDs', 'ORD', '2025', '0001', 'ORD', '2024', '9876', 'Invoice', 'total', '1', '299', '99', 'USD', 'Discount', '15', 'Weight', '2', '5kg', 'Height', '180cm', 'Meeting', 'scheduled', 'on', '13', '08', '2025', 'at', '10', '45', 'AM', 'Backup', 'date', '2025', '08', '20', 'or', '20', 'Aug', '2025', 'Phone', 'numbers', '91', '9876543210', '022', '2345', '6789', '555', '1234', 'Random', 'strings', 'abc123xyz', 'XY_42_test', 'A1B2C3', 'Note', 'Some', 'emails', 'may', 'be', 'fake', 'example', 'some', 'links', 'without', 'protocol', 'www', 'fake', 'site', 'net']


## **<span style="color: #D35400; font-weight: bold;">Basic Regex Patterns</span>**

### **<span style="color: #28B463;">üîπ Fundamental Character Matching</span>**

| **<span style="color: #E74C3C;">Regex Pattern</span>** | **<span style="color: #E74C3C;">What It Does</span>** | **<span style="color: #E74C3C;">Use Case/Example</span>** |
|---|---|---|
| `.` | Matches any single character except newline | Find any character: `a.c` matches "abc", "axc", "a1c" |
| `^` | Matches start of string/line | Ensure string starts with specific pattern: `^Hello` |
| `$` | Matches end of string/line | Ensure string ends with pattern: `world$` |
| `*` | Matches 0 or more of preceding character | Find repeated characters: `a*` matches "", "a", "aaa" |
| `+` | Matches 1 or more of preceding character | Ensure at least one: `a+` matches "a", "aaa" but not "" |
| `?` | Matches 0 or 1 of preceding character | Optional character: `colou?r` matches "color" or "colour" |
| `\d` | Matches any digit (0-9) | Find numbers in text: `\d+` finds "123" in "abc123def" |
| `\w` | Matches word characters (letters, digits, underscore) | Find words: `\w+` matches "hello", "test_123" |
| `\s` | Matches whitespace characters (space, tab, newline) | Find spaces: `\s+` finds multiple spaces or tabs |

### **<span style="color: #28B463;">üîπ Character Classes</span>**

| **<span style="color: #E74C3C;">Regex Pattern</span>** | **<span style="color: #E74C3C;">What It Does</span>** | **<span style="color: #E74C3C;">Use Case/Example</span>** |
|---|---|---|
| `[abc]` | Matches any character in brackets | Find specific letters: `[aeiou]` matches vowels |
| `[^abc]` | Matches any character NOT in brackets | Find consonants: `[^aeiou]` matches non-vowels |
| `[a-z]` | Matches any lowercase letter | Find lowercase: `[a-z]+` matches "hello" |
| `[A-Z]` | Matches any uppercase letter | Find uppercase: `[A-Z]+` matches "HELLO" |
| `[0-9]` | Matches any digit | Find numbers: `[0-9]+` same as `\d+` |
| `[a-zA-Z]` | Matches any letter (upper or lower) | Find letters only: `[a-zA-Z]+` matches "Hello" |


**<span style="color:red">Note:</span>** You can use any AI like ChatGPT, Claude, etc... in the market to generate a regex query now, but knowing what regex is and where you can actually use it is important.