## **Python Interview Questions for Data Engineers – Part 1**

In [0]:
print("Python Interview Questions for Data Engineers – Part 1")

1. Large File Ingestion – Generators vs Lists

**Interview Question (Context + Question Together)**

You are ingesting a 50GB application log file in Python and you only need to filter **ERROR** records before loading them into the Bronze layer.
If you read the entire file into a list, the job crashes due to memory issues.
How would you solve this in Python and why?

In [0]:
def read_logs(path):
    with open(path,'r') as f:
        for line in f:
            yield line

error_logs=(log for log in read_logs("app.log") if "ERROR" in log)

for log in error_logs:
    print(log)

2. Schema Builder Bug – Mutable Default Arguments

**Interview Question**

You are writing a reusable Python function to dynamically build a list of columns for transformation.
This function is called multiple times in the same pipeline run.
However, columns from previous runs keep appearing unexpectedly.
What is wrong in this code and how do you fix it?

In [0]:
#BUG CODE
def add_column(col, cols=[]):
    cols.append(col)
    return cols


In [0]:
add_column("order_id")
add_column("customer_id")