---

**Q5: Search for a Word in File**  
*Medium | Level 3*  
Write a function `find_word(file_path, word)` that returns `True` if the word exists in the file, else `False`.  
- Input: `'data.txt', 'apple'`  
- Output: `True` or `False`

---

In [1]:
def find_word(file_path,word):
    
    if not isinstance(word,str):
        raise TypeError(f'Expected word to be a string')
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
    
    try:
        with open(file_path,'r',encoding='utf-8') as file:
            content = file.read()
            if word in content:
                return True
            else:
                return False
    except FileNotFoundError:
        raise FileNotFoundError(f'file not found - {file_path}')
    except PermissionError:
        raise PermissionError(f'file has no write permission- {file_path}')
    except OSError as e:
        raise OSError(f'Cannot write to this file path: {e}')

In [2]:
word = 'Batman'
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/write_test.txt'

response = find_word(file_path,word)
print(response)

True


### Improved

In [14]:
import re

def find_word(file_path,word,case_sensitive=False):
    if not isinstance(word,str):
        raise TypeError(f'Expected word to be a string')
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
        
    try:
        if not case_sensitive:
            word = word.lower()
        with open(file_path,'r',encoding='utf-8') as file:
            for line in file:
                if not case_sensitive:
                    line = line.lower()
                if re.search(r'\b' + re.escape(word) + r'\b', line):
                    return True
        return False
    
    
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {file_path}")
    except PermissionError:
        raise PermissionError(f"No read permission: {file_path}")
    except OSError as e:
        raise OSError(f"Failed to read file: {e}")
        

- Uses re.search(r'\bword\b') to avoid substring matches.

- re.escape(word) handles special regex characters (e.g., *, +).

In [15]:
word = 'Batman'
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/write_test.txt'

response = find_word(file_path,word)
print(response)

True


---

**Q6: Remove Blank Lines from a File**  
*Medium | Level 3*  
Write a function `remove_blank_lines(file_path)` that overwrites the same file, removing all blank lines.  
- Input: `'data.txt'`  
- Output: File without empty lines.

---

In [16]:
def remove_blank_lines(file_path):
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
        
    try:
        with open(file_path,'r',encoding='utf-8') as file:
            
            lines = [line for line in file if line.strip()]
            
        with open(file_path,'w',encoding='utf-8') as file:
            
            file.writelines(lines)
            
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {file_path}")
    except PermissionError:
        raise PermissionError(f"No read permission: {file_path}")
    except OSError as e:
        raise OSError(f"Failed to read file: {e}")

In [18]:
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/blank.txt'

remove_blank_lines(file_path)

# What `strip()` Does in Python

The `strip()` method is a string operation that removes (strips) **leading and trailing** whitespace characters from a string. Here's a detailed explanation:

## Basic Functionality
```python
text = "   Hello World   \n"
stripped = text.strip()
# Result: "Hello World" (no spaces or newline at start/end)
```

## What It Removes
By default, `strip()` removes:
- Spaces (`' '`)
- Tabs (`\t`)
- Newlines (`\n`)
- Carriage returns (`\r`)
- Any combination of these

## Variations
1. **`strip()`** - Removes both leading AND trailing whitespace
2. **`lstrip()`** - Removes only leading (left-side) whitespace
3. **`rstrip()`** - Removes only trailing (right-side) whitespace

## Common Use Cases

### 1. Cleaning User Input
```python
user_input = "  user@example.com  "
clean_input = user_input.strip()
```

### 2. Processing File Lines
```python
with open('file.txt') as f:
    for line in f:
        processed = line.strip()  # Remove newlines and spaces
```

### 3. Custom Character Removal
You can specify characters to remove:
```python
text = "###Hello###"
print(text.strip('#'))  # "Hello"

text = "abcHelloabc"
print(text.strip('abc'))  # "Hello"
```

## Important Notes
- **Doesn't affect internal spaces**: `"Hello   World".strip()` → `"Hello   World"`
- **Returns a new string**: Original string remains unchanged
- **Empty strings**: `"".strip()` → `""` (returns empty string)
- **None values**: `None.strip()` → AttributeError

## Example in File Processing
When checking for blank lines:
```python
line = "\n\t  \r\n"
if line.strip():  # False (line is "blank")
    print("Not empty")
else:
    print("Empty")  # This will execute
```

## Performance
- Very fast operation (Python's string methods are optimized)
- Minimal memory overhead (creates new string object)

Would you like to see more examples of how this applies specifically to file handling?

### Improved Code

In [22]:
import os

def remove_blank_lines(file_path):
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
    
    temp_path = file_path + '.tmp'
    try:
        with open(file_path,'r',encoding='utf-8') as infile, \
        open(temp_path,'w',encoding='utf-8') as outfile:
            for line in infile:
                if line.strip():
                    outfile.write(line)
                    
        os.replace(temp_path,file_path)
        
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {file_path}")
    except PermissionError:
        raise PermissionError(f"No read permission: {file_path}")
    except OSError as e:
        raise OSError(f"Failed to read file: {e}")
        

In [24]:
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/blank.txt'

remove_blank_lines(file_path)

### **What is Atomic File Replacement?**

**Atomic replacement** means the file operation either completes entirely or fails without leaving the system in a corrupted state. In your code, `os.replace(temp_path, file_path)` performs this critical operation.

---

### **Why It Matters**
1. **Prevents Data Corruption**  
   - If your program crashes mid-operation, you won't end up with a partially written file.  
   - The original file is either fully replaced or left untouched.

2. **Filesystem Guarantees**  
   - On POSIX systems (Linux/macOS), `os.replace()` is truly atomic (uses `rename()` syscall).  
   - On Windows, it’s as close to atomic as possible (emulated via `MoveFileEx`).

---

### **How It Works in Your Code**
```python
temp_path = file_path + '.tmp'

# 1. Write cleaned content to temp file
with open(temp_path, 'w') as outfile:
    outfile.write("Cleaned content")

# 2. Atomically replace original with temp
os.replace(temp_path, file_path)  # ← THIS IS ATOMIC
```

1. **Step 1**: All changes are written to a temporary file (`.tmp`).  
2. **Step 2**: The system *instantly* swaps the temp file with the original via a filesystem-level operation.  

---

### **Key Properties of Atomicity**
| Property | Explanation |
|----------|-------------|
| **All-or-Nothing** | Either the replacement succeeds fully, or the original file remains intact. |
| **No Intermediate State** | Other processes will never see a partially written file. |
| **Thread/Process-Safe** | Safe even if multiple processes access the file concurrently. |

---

### **What Happens Without Atomic Replacement?**
❌ **Non-atomic (risky) approach**:
```python
# BAD: Non-atomic write (dangerous!)
with open(file_path, 'w') as f:  # Immediately truncates original!
    f.write("New content")       # If crash here, data is lost.
```
- If the program crashes during `write()`, the original file is already truncated and unrecoverable.

---

### **Edge Cases Handled Automatically**
1. **Power Failure**  
   - The temp file exists but isn’t linked to the original path yet → no corruption.  

2. **Permission Errors**  
   - `os.replace()` fails if the target is read-only → original file preserved.  

3. **Disk Full**  
   - Temp file write fails early → original untouched.  

---

### **Cross-Platform Notes**
| OS | Behavior |
|----|----------|
| **Linux/macOS** | True atomic operation (`rename()` syscall). |
| **Windows** | Near-atomic (best-effort `MoveFileEx`). |

---

### **When to Use Atomic Replacement**
- **File Updates**: Whenever modifying critical files (configs, databases, logs).  
- **Concurrent Access**: When other processes might read the file during writes.  
- **Data Integrity**: For mission-critical data where corruption is unacceptable.

---

---

**Q7: Reverse File Content Line by Line**  
*Medium | Level 4*  
Write a function `reverse_lines(file_path)` that reverses the order of lines in the file and writes them back.  
- Input: `'data.txt'`  
- Output: File lines reversed.

---


In [26]:
import os

def reverse_lines(file_path):
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
        
    
    temp_path = file_path + '.tmp'
    try:
        with open(file_path,'r',encoding='utf-8') as infile, \
        open(temp_path,'w',encoding='utf-8') as outfile:
            content_list = infile.readlines()
            if content_list and not content_list[-1].endswith('\n'):
                content_list[-1] += '\n'
            print(content_list)
            reversed_content_order = content_list[::-1]
            outfile.writelines(reversed_content_order)
                    
        os.replace(temp_path,file_path)
        
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {file_path}")
    except PermissionError:
        raise PermissionError(f"No read permission: {file_path}")
    except OSError as e:
        raise OSError(f"Failed to read file: {e}")
        

In [27]:
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/reverse_test.txt'

reverse_lines(file_path)

['I am Mahbub.\n', 'I am a DE.\n', 'I work at IQVIA.\n']


- A clear comparison between `read()`, `readlines()`, `write()`, and `writelines()` in Python file handling:

| **Method**       | **Returns** | **Behavior (Reading)** | **Behavior (Writing)** | **Use Case** |
|------------------|------------|------------------------|------------------------|--------------|
| **`file.read()`** | Entire content as a **single string** | Reads the whole file into one string. | Writes a **single string** to the file. | Best for processing the entire file at once (e.g., parsing JSON/XML). |
| **`file.readlines()`** | A **list of strings** (each line is an item) | Reads all lines, **keeping `\n`** at the end of each line. | — | Useful when you need to process lines individually (e.g., reversing lines). |
| **`file.write(str)`** | None | — | Writes a **single string** to the file. **Does not add `\n`** unless included in the string. | Writing raw text (e.g., `file.write("Hello\n")`). |
| **`file.writelines(list)`** | None | — | Takes a **list of strings** and writes them sequentially. **Does not add `\n`** automatically. | Writing pre-formatted lines (e.g., after reversing with `readlines()`). |

---

### **Key Differences:**
#### **Reading (`read()` vs `readlines()`)**:
| Feature          | `read()` | `readlines()` |
|-----------------|----------|---------------|
| **Return Type**  | Single string | List of strings |
| **Memory Use**   | Reads everything at once (heavy for large files) | Reads all lines into a list (still heavy for huge files) |
| **Line Breaks**  | Preserves `\n` in the string | Each list item ends with `\n` (if present in file) |
| **Best For**     | Processing the whole file (e.g., regex search) | Line-by-line operations (e.g., reversing order) |

#### **Writing (`write()` vs `writelines()`)**:
| Feature          | `write()` | `writelines()` |
|-----------------|-----------|-----------------|
| **Input**       | Single string | List of strings |
| **Line Breaks**  | You must add `\n` manually | No auto `\n` (assumes strings already have it) |
| **Performance**  | Slower for multiple writes (loop needed) | Faster for writing multiple lines (single call) |
| **Best For**     | Writing raw text | Writing pre-processed lines (e.g., from `readlines()`) |

---

### **Example Code:**
#### **Reading:**
```python
with open("file.txt", "r") as f:
    content = f.read()  # "Line 1\nLine 2\n"
    lines = f.readlines()  # ["Line 1\n", "Line 2\n"]
```

#### **Writing:**
```python
with open("file.txt", "w") as f:
    f.write("Line 1\n")  # Manual \n
    f.writelines(["Line 1\n", "Line 2\n"])  # No auto \n
```

---

### **When to Use Which?**
- **`read()` + `write()`**: Small files or non-line-based data (e.g., JSON).  
- **`readlines()` + `writelines()`**: Line-based operations (e.g., reversing, filtering).  
- For large files, consider **line-by-line iteration** (`for line in file:`).  

---

**Q8: Extract Specific Columns from CSV**  
*Medium | Level 4*  
Write a function `extract_columns(file_path, columns)` that reads a CSV file and returns only the specified columns as a list of dictionaries.  
- Input: `'data.csv', ['name', 'age']`  
- Output: `[{'name': 'Alice', 'age': 30}, ...]`

---

In [36]:
import csv

def extract_columns(file_path, columns):
    if not isinstance(file_path,str):
        raise TypeError(f'Expected file path here')
        
    if not isinstance(columns,list):
        raise TypeError(f'Expected list of string here')
    
    result = []
    try:
        with open(file_path,'r',encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            
            missing_columns = [col for col in columns if col not in reader.fieldnames]
            
            if missing_columns:
                raise ValueError(f'Columns not found in CSV: {missing_columns}')
                
            for row in reader:
                filtered_row = {col:row[col] for col in columns}
                result.append(filtered_row)
            return result
                
    except FileNotFoundError:
        raise FileNotFoundError(f'file not found {file_path}')
    except csv.Error as e:
        raise ValueError(f'Error reading CSV file {e}')

In [37]:
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/extract.csv'
columns = ['name','city']
result = extract_columns(file_path,columns)
print(result)

[{'name': 'Alice', 'city': 'New York'}, {'name': 'Bob', 'city': 'San Francisco'}, {'name': 'Charlie', 'city': 'Chicago'}, {'name': 'Diana', 'city': 'Boston'}, {'name': 'Eve', 'city': 'Seattle'}]



---

### **📌 What is `csv.DictReader`?**
A built-in Python class (from the `csv` module) that reads CSV files and **converts each row into an ordered dictionary**, where:
- **Keys** are column headers (from the first row).
- **Values** are the corresponding row values.

---

### **🎯 Key Features**
| Feature | Description |
|---------|-------------|
| **Automatic Header Handling** | Uses the first CSV row as fieldnames (dictionary keys). |
| **Order Preservation** | Maintains column order (Python 3.6+ guarantees insertion order). |
| **Memory Efficiency** | Streams data line-by-line (good for large files). |
| **Flexible Dialects** | Handles commas, tabs, custom delimiters, and quoted fields. |

---

### **📝 Basic Usage**
```python
import csv

with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)  # Each 'row' is a dictionary: {'Column1': 'Value1', ...}
```

---

### **🔧 Customization Options**
1. **Custom Fieldnames**  
   Override headers if the CSV lacks them:
   ```python
   csv.DictReader(file, fieldnames=['name', 'age'])
   ```

2. **Different Delimiters**  
   For TSV (tab-separated) files:
   ```python
   csv.DictReader(file, delimiter='\t')
   ```

3. **Skip Initial Rows**  
   Ignore header comments:
   ```python
   csv.DictReader(file, skipinitialspace=True)
   ```

---

### **⚠️ Common Pitfalls**
1. **Missing Headers**  
   - If the CSV has no header row, pass `fieldnames` explicitly.
   - Otherwise, the first data row becomes headers!

2. **Whitespace**  
   Use `skipinitialspace=True` to trim spaces around entries.

3. **Case Sensitivity**  
   Keys are case-sensitive (`{'Name': ...}` ≠ `{'name': ...}`).

---

### **🆚 vs. `csv.reader`**
| `csv.DictReader` | `csv.reader` |
|------------------|-------------|
| Returns dictionaries | Returns lists |
| Self-documenting (keys=headers) | Requires manual column indexing |
| Slower due to dict overhead | Faster for raw data access |

---

### **💡 When to Use?**
- **Best for**: CSV files with headers needing named access (e.g., `row['email']`).
- **Avoid for**: Headerless CSV or when only numeric indices are needed.

---

### **🌰 Real-World Example**
Given `employees.csv`:
```csv
id,name,department
1,Alice,Engineering
2,Bob,Marketing
```

**Code**:
```python
with open('employees.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(f"{row['name']} works in {row['department']}")
```

**Output**:
```
Alice works in Engineering
Bob works in Marketing
```

---

### **🚀 Performance Note**
For **large CSV files**, `DictReader` is memory-friendly but slower than `csv.reader` due to dictionary creation. Use `csv.reader` if you need raw speed and can work with column indices.