# Batch Processing using Python!
This example demonstrates a basic ETL (Extract, Transform, Load) process using batch ingestion in Python. It generates random sample data, processes it in chunks (batches), applies a simple transformation to each record, and loads the results into a Pandas DataFrame for display.

### generate_sample_data(num_records)
Creates a list of mock records, each with a random id and value. This simulates incoming raw data.

In [1]:
import random
import uuid
import pandas as pd

def generate_sample_data():
  data = []
  for i in range(100):
    record = {
        "id": str(uuid.uuid4()),
        "value": random.random()
    }
    data.append(record)
  return data


### process_in_batches(data, batch_size)
Splits the full dataset into smaller chunks (batches) of size batch_size using a generator.

In [3]:

import time

def process_data(data, batch_size):
  for i in range(0, len(data), batch_size):
    yield data[i:i + batch_size]
    time.sleep(2)

### transform_data(batch)
Transforms each record in a batch by adding a new field transformed_value (value × 1.1).

In [4]:
def transform_data(batch):
    transformed_batch = []
    for record in batch:
        transformed_record = {
            'id': record['id'],
            'value': record['value'],
            'transformed_value': record['value'] * 1.1
        }
        transformed_batch.append(transformed_record)
    return transformed_batch

### load_data(batch)
Simulates loading the transformed records into a database by printing them to the console.

In [5]:
def load_data(transformed_records):
    df = pd.DataFrame(transformed_records)
    # Example: Save to CSV (or just return it)
    df.to_csv("transformed_data.csv", index=False)
    return df

### main()
Orchestrates the ETL flow: generates data, processes it in batches, transforms each batch, and "loads" it.

In [6]:
def main():
    whole_data = []
    for batch in process_data(data, batch_size):
        result = transform_data(batch)
        whole_data.extend(result)
    df = load_data(whole_data)
    return df

In [7]:
data = generate_sample_data()
batch_size = 10

df = main()
display(df)


Unnamed: 0,id,value,transformed_value
0,7f1cfd21-a6e3-40a8-93a5-0301b501ee55,0.345324,0.379857
1,ee03a6e7-9a16-4856-81fa-8fe7ae8ca66a,0.145308,0.159839
2,887d688a-9963-4f77-a72a-29c266bcf3cd,0.953234,1.048557
3,ece3ec86-1aa0-4849-88f3-6670288087b7,0.828684,0.911553
4,570188fb-6606-4f17-866b-0818c8d00952,0.599167,0.659084
...,...,...,...
95,bca120b8-71c0-460f-ac6b-6a4815950fe5,0.732817,0.806098
96,2d2f43e6-83c3-4965-b5e8-84a441dd72c2,0.522611,0.574872
97,b34d80e9-d7c7-43c3-a253-1360e8b6a03c,0.739436,0.813380
98,dc81f036-929f-4ac1-b7dd-497345ad7104,0.878785,0.966664
