# Question:
You have a large dataset of customer transactions, and you are asked to implement a system that can efficiently group transactions by customer and calculate the average amount spent by each customer. The dataset is too large to fit into memory, and you must use efficient memory management techniques to process the data in chunks.

#### Requirements:

1. The dataset is stored as a CSV file where each row represents a transaction with columns: customer_id, transaction_amount.

2. You should implement a solution using pandas or dask (for handling large datasets).

3. Your solution should be memory-efficient and handle large datasets efficiently

In [6]:
import pandas as pd
import numpy as np

In [7]:
# Set the number of records you want to generate
num_records = 1000000  # Adjust based on your needs

In [8]:
# Create sample data
np.random.seed(42)  # For reproducibility
customer_ids = np.random.randint(1, 1001, num_records)  # 1000 unique customers
transaction_amounts = np.random.uniform(10, 500, num_records)  # Transaction amounts between 10 and 500

In [9]:
# Create a DataFrame
data = {
    'customer_id': customer_ids,
    'transaction_amount': transaction_amounts
}

In [10]:
df = pd.DataFrame(data)

In [2]:
# Initialize a dictionary to store the results
customer_totals = {}
customer_counts = {}

In [3]:
# Define the chunk size for reading the data in manageable parts
chunk_size = 100000  # Adjust based on the memory capacity

In [11]:
# Process the data in chunks (simulate chunk processing here, even though it's all in memory)
for start in range(0, num_records, chunk_size):
    chunk = df.iloc[start:start + chunk_size]
    for index, row in chunk.iterrows():
        customer_id = row['customer_id']
        transaction_amount = row['transaction_amount']
        
        # Update total transaction amount and transaction count for each customer
        if customer_id in customer_totals:
            customer_totals[customer_id] += transaction_amount
            customer_counts[customer_id] += 1
        else:
            customer_totals[customer_id] = transaction_amount
            customer_counts[customer_id] = 1

In [12]:
# Calculate the average for each customer
customer_averages = {customer_id: customer_totals[customer_id] / customer_counts[customer_id] 
                     for customer_id in customer_totals}

In [13]:
# Display the results (optional, can print or save as needed)
for customer_id, avg in customer_averages.items():
    print(f'Customer {customer_id}: Average Transaction Amount = {avg:.2f}')

Customer 103.0: Average Transaction Amount = 249.94
Customer 436.0: Average Transaction Amount = 256.25
Customer 861.0: Average Transaction Amount = 255.33
Customer 271.0: Average Transaction Amount = 259.24
Customer 107.0: Average Transaction Amount = 260.82
Customer 72.0: Average Transaction Amount = 251.41
Customer 701.0: Average Transaction Amount = 251.25
Customer 21.0: Average Transaction Amount = 251.18
Customer 615.0: Average Transaction Amount = 250.53
Customer 122.0: Average Transaction Amount = 264.24
Customer 467.0: Average Transaction Amount = 245.00
Customer 215.0: Average Transaction Amount = 253.28
Customer 331.0: Average Transaction Amount = 258.33
Customer 459.0: Average Transaction Amount = 266.18
Customer 88.0: Average Transaction Amount = 257.55
Customer 373.0: Average Transaction Amount = 261.52
Customer 100.0: Average Transaction Amount = 258.13
Customer 872.0: Average Transaction Amount = 263.88
Customer 664.0: Average Transaction Amount = 263.76
Customer 131.0: