
---

# 🐍 **Python File Handling — Data Engineering Problem Set**

---

### 💡 **Level 1 — Basics**
---

**Q1: Read File Contents**  
*Easy | Level 1*  
Write a function `read_file(file_path)` that opens a text file and returns its contents as a single string.  
- Input: `file_path = 'data.txt'`  
- Output: `'All file contents as a single string'`

---

**Q2: Count Lines in a File**  
*Easy | Level 1*  
Write a function `count_lines(file_path)` to count and return the number of lines in the given file.  
- Input: `file_path = 'data.txt'`  
- Output: `Number of lines as integer`

---

**Q3: Write a List of Strings to File**  
*Easy | Level 2*  
Write a function `write_lines(file_path, lines)` that writes a list of strings to a file, each string on a new line.  
- Input: `["apple", "banana", "cherry"]`  
- Output: Creates `data.txt` with these three lines.

---

**Q4: Append Data to an Existing File**  
*Easy | Level 2*  
Write a function `append_line(file_path, line)` that appends a single line to an existing text file.  
- Input: `'New Data\n'`  
- Output: Appends this line at the end of `data.txt`.

---

### 💡 **Level 2 — Intermediate Operations**
---

**Q5: Search for a Word in File**  
*Medium | Level 3*  
Write a function `find_word(file_path, word)` that returns `True` if the word exists in the file, else `False`.  
- Input: `'data.txt', 'apple'`  
- Output: `True` or `False`

---

**Q6: Remove Blank Lines from a File**  
*Medium | Level 3*  
Write a function `remove_blank_lines(file_path)` that overwrites the same file, removing all blank lines.  
- Input: `'data.txt'`  
- Output: File without empty lines.

---

**Q7: Reverse File Content Line by Line**  
*Medium | Level 4*  
Write a function `reverse_lines(file_path)` that reverses the order of lines in the file and writes them back.  
- Input: `'data.txt'`  
- Output: File lines reversed.

---

**Q8: Extract Specific Columns from CSV**  
*Medium | Level 4*  
Write a function `extract_columns(file_path, columns)` that reads a CSV file and returns only the specified columns as a list of dictionaries.  
- Input: `'data.csv', ['name', 'age']`  
- Output: `[{'name': 'Alice', 'age': 30}, ...]`

---

### 💡 **Level 3 — Data Engineering Practical Patterns**
---

**Q9: Log File Tail Reader**  
*Medium-Hard | Level 5*  
Write a function `tail(file_path, N)` that returns the last `N` lines from a log file.  
- Input: `'app.log', 10`  
- Output: Last 10 lines as a list.

---

**Q10: Split Large File into Chunks**  
*Medium-Hard | Level 5*  
Write a function `split_file(file_path, lines_per_file)` that splits a large file into multiple smaller files with `lines_per_file` lines each.  
- Input: `data.txt, 1000`  
- Output: Files like `data_part1.txt`, `data_part2.txt`, etc.

---

**Q11: Merge Multiple Files**  
*Medium-Hard | Level 6*  
Write a function `merge_files(file_list, output_file)` that merges multiple text files into one single file.  
- Input: `['file1.txt', 'file2.txt']`  
- Output: `merged.txt`

---

**Q12: Count Word Frequency in File**  
*Hard | Level 6*  
Write a function `word_frequency(file_path)` that returns a dictionary of each word and its frequency.  
- Input: `'data.txt'`  
- Output: `{'word1': 5, 'word2': 3}`

---

**Q13: Replace Specific Word in Large File**  
*Hard | Level 7*  
Write a function `replace_word(file_path, old_word, new_word)` that replaces all instances of `old_word` with `new_word` in the same file, efficiently handling large files (avoid loading entire file into memory).

---

**Q14: File Metadata Extractor**  
*Hard | Level 7*  
Write a function `file_metadata(file_path)` that returns:  
- file size (bytes)  
- last modified time  
- creation time.

---

### 💡 **Level 4 — Performance & Real-World Use Cases**
---

**Q15: Process Large CSV Line by Line**  
*Hard | Level 8*  
Write a generator `csv_reader(file_path)` that yields one row at a time from a CSV file without loading it fully into memory.

---

**Q16: Detect Duplicate Lines in a File**  
*Hard | Level 8*  
Write a function `detect_duplicates(file_path)` that returns a list of lines that appear more than once.

---

**Q17: Monitor File Size Growth (Real-Time)**  
*Hard | Level 9*  
Write a function `monitor_file_growth(file_path, interval)` that prints the file size every `interval` seconds until manually stopped.

---

### 💡 **Level 5 — Extreme Data Engineering Scenarios**
---

**Q18: Efficient Binary File Reader**  
*Very Hard | Level 9*  
Write a function `read_binary_chunks(file_path, chunk_size)` that reads a binary file in chunks of `chunk_size` bytes.

---

**Q19: JSON Lines Processor**  
*Very Hard | Level 10*  
Write a function `process_jsonl(file_path)` that reads a `.jsonl` file (one JSON object per line) and returns a list of dictionaries.

---

**Q20: Parallel File Processor**  
*Expert | Level 10*  
Write a function `parallel_process(file_list, process_function)` that uses multithreading or multiprocessing to apply `process_function` to each file in `file_list` and return the combined result.

---

✅ **These 20 questions cover:**

- Text, CSV, JSON, Binary handling.
- Reading, Writing, Appending, Transforming.
- Efficient I/O for Big Data.
- Real-world log processing.
- Parallel execution.

---
