**The latest fictional news from August 2025 provides key details that are perfect for generating our synthetic data:**

  * **Tariffs:** A new 25% tariff on Indian goods by the U.S., effective August 1, 2025, is a major factor. While it's noted that electronics might have a temporary exemption, the uncertainty and potential for cost increases or delays provide a strong basis for a feature in our model. This allows us to create a clear "tariff-impacted" shipping route.
  * **Mac Product Lines:** The Mac Studio with the M3 Ultra chip and a 512GB RAM configuration is a confirmed high-end product. The new MacBook Air with the M4 chip is also a recent release. This gives us specific product SKUs to include in the data, with varying levels of customization and supply chain vulnerability.
  * **TSMC Arizona:** The news confirms that the first fab is active, but the second and third are still being built. This reinforces the idea that domestic U.S. chip production is not yet at a level to fully alleviate the supply chain pressure on advanced chips like the M3 Ultra.

*This information will be crucial for creating realistic relationships and distributions within the synthetic data.*

-----

### **Python Script for Synthetic Data Generation**

**Document:** `generate_synthetic_data.ipynb` (Jupyter Notebook for Google Colab)

In [1]:
# Disclaimer: This project uses synthetic data for demonstration purposes only.
# The data generated below is for a portfolio project and does not represent
# real-world operational data or confidential information from Apple Inc.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# --- Configuration for Data Generation ---
NUM_ORDERS = 50000
START_DATE = datetime(2024, 10, 1)
END_DATE = datetime(2025, 7, 31)

# Product and Configuration Details
product_skus = ['Mac Studio M3 Ultra', 'Mac Studio M3 Max', 'MacBook Pro M4 Max', 'MacBook Air M4', 'iMac M4']
ram_options = [16, 32, 64, 128, 256, 512]
storage_options = [512, 1024, 2048, 4096, 8192, 16384]

# Manufacturing and Shipping Details
manufacturing_plants = ['China Plant A', 'India Plant B', 'Vietnam Plant C']
shipping_countries = ['USA', 'China', 'Germany', 'UK', 'Japan']
carriers = ['FedEx', 'UPS', 'DHL']

# --- Generate Orders Data ---
print("Generating orders.csv...")
np.random.seed(42)

orders_data = {
    'order_id': [f'ORD-{i+1:06}' for i in range(NUM_ORDERS)],
    'customer_id': [f'CUST-{np.random.randint(10000, 99999)}' for _ in range(NUM_ORDERS)],
    'order_date': [START_DATE + timedelta(days=np.random.randint(0, (END_DATE - START_DATE).days)) for _ in range(NUM_ORDERS)],
    'product_sku': np.random.choice(product_skus, NUM_ORDERS, p=[0.1, 0.15, 0.25, 0.3, 0.2]),
    'retail_store_id': [f'STORE-{np.random.randint(100, 105)}' if np.random.rand() < 0.2 else None for _ in range(NUM_ORDERS)]
}

orders_df = pd.DataFrame(orders_data)
orders_df['is_retail_pickup'] = orders_df['retail_store_id'].notna()
orders_df['shipping_country'] = np.random.choice(shipping_countries, NUM_ORDERS)

# Create custom configurations based on a new, refined logic
# Let's assume custom configs are less common
orders_df['is_custom_config'] = np.random.rand(NUM_ORDERS) < 0.2
orders_df['ram_config_gb'] = np.random.choice(ram_options, NUM_ORDERS, p=[0.2, 0.3, 0.2, 0.1, 0.1, 0.1])
orders_df['storage_config_gb'] = np.random.choice(storage_options, NUM_ORDERS, p=[0.2, 0.3, 0.2, 0.15, 0.1, 0.05])


# Define the promised delivery date based on product and configuration
def get_promised_delivery(row):
    base_days = 7
    if 'MacBook Air M4' in row['product_sku'] or 'iMac M4' in row['product_sku']:
        base_days = 14
    return row['order_date'] + timedelta(days=base_days)

orders_df['promised_delivery_date'] = orders_df.apply(get_promised_delivery, axis=1)

# --- Generate Shipment Logs Data ---
print("Generating shipment_logs.csv...")
shipment_logs_data = []

for index, row in orders_df.iterrows():
    ship_date = row['order_date'] + timedelta(days=np.random.randint(2, 10))

    # Introduce delays based on the new, counter-intuitive logic
    delay = 0

    # A high percentage of non-custom orders experience a small delay
    if not row['is_custom_config'] and np.random.rand() < 0.35:
        delay += np.random.randint(1, 3)

    # A smaller percentage of custom orders experience a moderate delay
    if row['is_custom_config'] and np.random.rand() < 0.1:
        delay += np.random.randint(1, 5)

    # Products with high volume are more likely to have delays
    if row['product_sku'] in ['MacBook Air M4', 'iMac M4'] and np.random.rand() < 0.2:
        delay += np.random.randint(1, 5)

    # Simulate geopolitical impact based on the actual output (shipping_country_Germany is a strong predictor)
    manufacturing_plant = np.random.choice(manufacturing_plants, p=[0.5, 0.3, 0.2])
    if row['shipping_country'] == 'Germany' and np.random.rand() < 0.25:
        delay += np.random.randint(2, 8)

    delivery_date = ship_date + timedelta(days=np.random.randint(1, 5) + delay)

    if delivery_date < row['order_date']:
        delivery_date = row['order_date'] + timedelta(days=1)

    delay_in_days = (delivery_date - row['promised_delivery_date']).days
    if delay_in_days < 0:
        delay_in_days = 0

    shipment_logs_data.append({
        'shipment_id': f'SHIP-{index+1:06}',
        'order_id': row['order_id'],
        'manufacturing_plant': manufacturing_plant,
        'ship_date': ship_date,
        'delivery_date': delivery_date,
        'delay_in_days': delay_in_days,
        'carrier': np.random.choice(carriers)
    })

shipment_logs_df = pd.DataFrame(shipment_logs_data)

# --- Generate Supply Chain Constraints Data ---
print("Generating supply_chain_constraints.csv...")
supply_chain_data = {
    'component_id': ['M3 Ultra Chip', '512GB Unified RAM', 'M4 Chip', 'MacBook Air Chassis'],
    'supply_status': ['Constrained', 'Constrained', 'Stable', 'Stable'],
    'supplier_country': ['Taiwan', 'Taiwan', 'USA', 'China'],
    'log_date': [datetime(2025, 6, 1), datetime(2025, 6, 1), datetime(2025, 6, 1), datetime(2025, 6, 1)]
}
supply_chain_df = pd.DataFrame(supply_chain_data)

# --- Save to CSV files ---
orders_df.to_csv('orders.csv', index=False)
shipment_logs_df.to_csv('shipment_logs.csv', index=False)
supply_chain_df.to_csv('supply_chain_constraints.csv', index=False)

print("Synthetic data generation complete. The following files have been saved:")
print(" - orders.csv")
print(" - shipment_logs.csv")
print(" - supply_chain_constraints.csv")

Generating orders.csv...
Generating shipment_logs.csv...
Generating supply_chain_constraints.csv...
Synthetic data generation complete. The following files have been saved:
 - orders.csv
 - shipment_logs.csv
 - supply_chain_constraints.csv
