---

### 🎓 **Professor**: Apostolos Filippas

### 📘 **Class**: E-Commerce

### 📋 **Topic**: Randomized Assignment and Balance Tests

🚫 **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---


## Overview

Let's use our Python knowledge to perform a randomized assignment, and verify we did it correctly.

**What we'll learn:**
- How to perform randomized assignment
- How to check if randomization worked
- Balance tests and their importance
- Setting random seeds for reproducibility


In [1]:
# Let's import the libraries we will use
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Load user data for randomization
df_users = pd.read_csv("../data/users.csv")

print("Dataset loaded successfully!")
print(f"Dataset shape: {df_users.shape}")
print(f"Columns: {df_users.columns.tolist()}")

print("Sample of user data:")
print(df_users.head())


Dataset loaded successfully!
Dataset shape: (19272, 9)
Columns: ['user', 'city', 'gender', 'age', 'asset_year', 'user_status', 'earnings', 'first_trip_m', 'price_hourly']
Sample of user data:
   user           city  gender   age  asset_year  user_status  earnings  \
0     1      San Diego    MALE   NaN      2003.0     DELISTED    24.960   
1     2        Oakland    MALE   NaN      2015.0      LIMITED     0.000   
2     3        Chicago    MALE  30.0      2012.0  OFFBOARDING   995.132   
3     4  San Francisco  FEMALE  27.0      2013.0       ACTIVE   235.474   
4     5    Los Angeles    MALE  26.0      2018.0       ACTIVE  1495.042   

   first_trip_m  price_hourly  
0           NaN          8.00  
1          80.0          5.72  
2          41.0          9.50  
3          79.0         11.00  
4          80.0         12.55  


In [2]:
# Set random seed for reproducibility
np.random.seed(42)

# Randomized assignment
# Add a column with random numbers between 0 and 1
df_users["random_number"] = np.random.uniform(0, 1, len(df_users))

# Assign to treatment those users that "drew" more than 0.5
df_users["treatment"] = np.where(df_users["random_number"] > 0.5, "Treatment", "Control")

# Check how many users we assigned to each group
df_assignment = (
    df_users.groupby("treatment")
    .agg({"user": "count"})
    .rename(columns={"user": "n"})
    .reset_index()
)

print("Assignment results:")
print(df_assignment)


Assignment results:
   treatment     n
0    Control  9656
1  Treatment  9616


---

## 🎉 Summary

We learned how to perform randomized assignment:
- **Random number generation** for assignment
- **Treatment vs Control** group creation
- **Balance checking** to verify randomization
- **Reproducible results** using random seeds

Randomized assignment is fundamental to experimental design and causal inference.

### Next:
We'll design and analyze controlled experiments

---
