## Cash Flow Data Generation

In [1]:
import numpy as np
import pandas as pd

np.random.seed(42)

## Business Parameters

I'm simulating 10 years (120 months) of cash flow for a mid-sized company.

**Starting position:** $100k cash  
**Average revenue:** $50k per month with seasonal changes  
**Cost structure:** 50% COGS, $15k fixed costs, 10% variable costs

I also built in two events:
- A recession in years 5-6 (revenue drops 20%)
- A policy change in year 7 (faster payment collection)

In [2]:
months = 120
cash_initial = 100000
avg_revenue = 50000
seasonal_factor = [1.1, 1.0, 0.9, 0.95, 1.2, 1.15, 1.0, 0.9, 0.85, 1.05, 1.2, 1.1] * 10
COGS_ratio = 0.5
OPEX_fixed = 15000
OPEX_variable_ratio = 0.1
AR_Delay_options = [0, 1, 2, 3]

# Recession: revenue drops 20% in years 5 and 6
recession_years = [5, 6]
# Policy Change: After year 7, payment delays improve
policy_change_year = 7

## Generate Revenue and Costs

Revenue varies with seasonal patterns plus random noise (Â±10%).  
During recession years, revenue drops by 20%.

Payment delays (AR_Delay) are random between 0-3 months, but after year 7 they improve to max 2 months.

In [3]:
data = pd.DataFrame({
    'Month': range(1, months+1),
    'Year': [(i//12)+1 for i in range(months)]
})

# Revenue with seasonality and noise
data['Revenue'] = [avg_revenue * sf * np.random.uniform(0.9, 1.1) for sf in seasonal_factor]

# Apply recession
data['Recession'] = data['Year'].apply(lambda x: 1 if x in recession_years else 0)
data.loc[data['Recession'] == 1, 'Revenue'] *= 0.8

# AR_Delay
data['AR_Delay'] = np.random.choice(AR_Delay_options, size=months)

# First month has no delay (cold start)
data.loc[0, 'AR_Delay'] = 0

# Policy Change: After year 7, delays reduce to max 2 months
data.loc[data['Year'] >= policy_change_year, 'AR_Delay'] = np.clip(data.loc[data['Year'] >= policy_change_year, 'AR_Delay'], 0, 2)

# Other Expenses
data['Other_Expenses'] = np.random.randint(0, 3000, size=months)

# COGS and OPEX
data['COGS'] = data['Revenue'] * COGS_ratio
data['OPEX_Variable'] = data['Revenue'] * OPEX_variable_ratio
data['OPEX_Fixed'] = OPEX_fixed
data['Total_Expenses'] = data['COGS'] + data['OPEX_Variable'] + data['OPEX_Fixed'] + data['Other_Expenses']

## Cash Collection with Payment Delays

This is where revenue timing matters. If AR_Delay = 2, the cash from this month's sales arrives 2 months later.

This creates a gap between when you earn revenue (accounting) and when you actually get the money (cash flow).

In [4]:
data['Revenue_Lag1'] = data['Revenue'].shift(1).fillna(0)
data['Revenue_Lag2'] = data['Revenue'].shift(2).fillna(0)
data['Revenue_Lag3'] = data['Revenue'].shift(3).fillna(0)

conditions = [
    data['AR_Delay'] == 1,
    data['AR_Delay'] == 2,
    data['AR_Delay'] == 3
]

choices = [
    data['Revenue_Lag1'],
    data['Revenue_Lag2'],
    data['Revenue_Lag3']
]

data['Cash_Collected'] = np.select(conditions, choices, default=data['Revenue'])

# Net Cash Flow
data['Net_Cash_Flow'] = (data['Cash_Collected'] - data['Total_Expenses']).round(0)

# Remove helper columns
data.drop(columns=['Revenue_Lag1', 'Revenue_Lag2', 'Revenue_Lag3'], inplace=True)

## Calculate Running Cash Balance

Track cash month by month: Cash_End = Cash_Start + Net_Cash_Flow

I also flag months where cash drops below 10% of starting capital as a warning signal.

In [5]:
cash_start = [cash_initial]
cash_end = []
shortage_alert = []
shortage_threshold_ratio = 0.1  # Warning if cash < 10% of initial

for i in range(months):
    current_net_flow = data.loc[i, "Net_Cash_Flow"]
    current_cash_end = cash_start[i] + current_net_flow
    cash_end.append(current_cash_end)
    
    # Flag warning
    shortage_alert.append(1 if current_cash_end < cash_initial * shortage_threshold_ratio else 0)
    
    # Prepare next month starting cash
    if i < months - 1:
        cash_start.append(current_cash_end)

data['Cash_Start'] = cash_start
data['Cash_End'] = cash_end
data['Shortage_Alert'] = shortage_alert

In [6]:
print(data.head(12))

    Month  Year       Revenue  Recession  AR_Delay  Other_Expenses  \
0       1     1  53619.941307          0         0            2520   
1       2     1  54507.143064          0         2            1969   
2       3     1  47087.945476          0         3            2198   
3       4     1  48437.255600          0         0            1438   
4       5     1  55872.223685          0         2            1634   
5       6     1  53543.936984          0         1             262   
6       7     1  45580.836122          0         0            1787   
7       8     1  48295.585312          0         0            2191   
8       9     1  43359.477600          0         0            2393   
9      10     1  54684.762067          0         2             623   
10     11     1  54247.013932          0         1            1016   
11     12     1  60169.008374          0         0             880   

            COGS  OPEX_Variable  OPEX_Fixed  Total_Expenses  Cash_Collected  \
0   26809.

In [7]:
data.to_csv('simulated_cashflow_data.csv', index=False)
print("Data saved as 'simulated_cashflow_data.csv'")

Data saved as 'simulated_cashflow_data.csv'
