# AARRR Framework: Product Analytics Foundations

**Session 1A - Duration: 30 minutes**  
**Course**: Product Data Analytics & Data Science  
**Student**: Diogo Barros  

---

## What is AARRR? 🏴‍☠️

**AARRR** (pronounced "Arrr" like a pirate) is a framework for measuring and optimizing user behavior throughout their entire journey with your product.

Created by **Dave McClure** in 2007, it's also known as **"Pirate Metrics"** because of the sound it makes.

### The 5 Stages:

| Stage | Letter | Question | Key Metric Example |
|-------|--------|----------|-------------------|
| **🎯 Acquisition** | **A** | How do users find you? | Website visitors, App downloads |
| **⚡ Activation** | **A** | Do they have a great first experience? | Account creation, First purchase |
| **🔄 Retention** | **R** | Do users come back? | Weekly active users, Churn rate |
| **📢 Referral** | **R** | Do users tell others? | Referral rate, Viral coefficient |
| **💰 Revenue** | **R** | How do you make money? | Conversion rate, LTV |

### Why AARRR Works
- **📊 Complete Picture**: Covers the entire user lifecycle
- **🎯 Focus**: Helps teams prioritize what matters most
- **📈 Growth**: Systematic approach to growing your product
- **🔧 Actionable**: Each stage has clear metrics and optimization strategies

## 🎯 Stage 1: ACQUISITION
**"How do users discover your product?"**

### What is Acquisition?
Acquisition is about getting potential users to **know about your product** and visit your website/app for the first time.

### Key Metrics:
- **Traffic/Downloads**: How many people visit your website or download your app
- **Cost Per Acquisition (CPA)**: How much you spend to get one new user
- **Channel Performance**: Which marketing channels work best

### Formula:
```
Cost Per Acquisition (CPA) = Total Marketing Spend ÷ Number of New Users
```

### Real Example: Airbnb's Early Growth (2008-2010)

**The Challenge**: How do you get people to trust strangers' homes?

**Acquisition Strategy**:
1. **🔍 SEO**: Created city-specific landing pages
2. **📸 Photography**: Professional photos made listings look amazing
3. **💡 Craigslist Hack**: Cross-posted Airbnb listings to Craigslist (genius but risky!)

**Results**:
- **Traffic Growth**: 500% increase in 18 months
- **Cost**: Only $1.20 per new user (incredibly cheap!)
- **Lesson**: Creative solutions can dramatically reduce acquisition costs

### Common Acquisition Channels:
- **🔍 Organic Search (SEO)**: Free, high-quality traffic
- **💰 Paid Ads**: Google Ads, Facebook Ads, etc.
- **📱 Social Media**: Instagram, TikTok, LinkedIn
- **🤝 Referrals**: Word-of-mouth from existing users
- **📰 Content Marketing**: Blog posts, videos, podcasts

### 💡 Practical Exercise: Acquisition Analysis
Let's analyze some real acquisition data:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Sample acquisition data for a SaaS product
acquisition_data = {
    'Channel': ['Organic Search', 'Google Ads', 'Facebook Ads', 'Referrals', 'Content Marketing'],
    'Monthly Users': [2500, 1200, 800, 600, 400],
    'Monthly Cost': [0, 3600, 2400, 300, 800],
    'Conversion Rate': [0.08, 0.05, 0.03, 0.12, 0.09]  # % who become customers
}

df = pd.DataFrame(acquisition_data)

# Calculate Cost Per Acquisition (CPA)
df['CPA'] = df['Monthly Cost'] / df['Monthly Users']
df['CPA'] = df['CPA'].fillna(0)  # Organic search is "free"

# Calculate customers acquired
df['Customers'] = df['Monthly Users'] * df['Conversion Rate']

# Calculate Cost Per Customer
df['Cost Per Customer'] = df['Monthly Cost'] / df['Customers']
df['Cost Per Customer'] = df['Cost Per Customer'].fillna(0)

print("📊 ACQUISITION CHANNEL ANALYSIS")
print("=" * 50)
for i, row in df.iterrows():
    print(f"📱 {row['Channel']:15} | Users: {row['Monthly Users']:4} | CPA: ${row['CPA']:5.1f} | Customers: {row['Customers']:3.0f} | Cost/Customer: ${row['Cost Per Customer']:6.1f}")

print("\n🎯 KEY INSIGHTS:")
best_volume = df.loc[df['Monthly Users'].idxmax(), 'Channel']
best_conversion = df.loc[df['Conversion Rate'].idxmax(), 'Channel']
lowest_cost = df.loc[df[df['Cost Per Customer'] > 0]['Cost Per Customer'].idxmin(), 'Channel']

print(f"• Highest Volume: {best_volume} ({df['Monthly Users'].max():,} users/month)")
print(f"• Best Conversion: {best_conversion} ({df['Conversion Rate'].max():.1%} conversion rate)")
print(f"• Most Cost-Effective: {lowest_cost} (${df[df['Cost Per Customer'] > 0]['Cost Per Customer'].min():.1f} per customer)")

In [None]:
# Visualize the data
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('📊 Acquisition Channel Performance', fontsize=16, fontweight='bold')

# Users by channel
ax1.bar(df['Channel'], df['Monthly Users'], color='skyblue')
ax1.set_title('👥 Users by Channel')
ax1.set_ylabel('Monthly Users')
ax1.tick_params(axis='x', rotation=45)

# Conversion rates
ax2.bar(df['Channel'], df['Conversion Rate'] * 100, color='lightgreen')
ax2.set_title('📈 Conversion Rates')
ax2.set_ylabel('Conversion Rate (%)')
ax2.tick_params(axis='x', rotation=45)

# Cost per customer (excluding free channels)
paid_channels = df[df['Cost Per Customer'] > 0]
ax3.bar(paid_channels['Channel'], paid_channels['Cost Per Customer'], color='salmon')
ax3.set_title('💰 Cost Per Customer')
ax3.set_ylabel('Cost ($)')
ax3.tick_params(axis='x', rotation=45)

# Total customers by channel
ax4.bar(df['Channel'], df['Customers'], color='gold')
ax4.set_title('🎯 Customers Acquired')
ax4.set_ylabel('Customers per Month')
ax4.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n💡 STRATEGIC INSIGHTS:")
print("• Organic Search: High volume + free = invest in SEO content")
print("• Referrals: Best conversion rate = build referral program")
print("• Paid Ads: Expensive but scalable = optimize targeting")
print("• Content Marketing: Good conversion + reasonable cost = create more content")

## ⚡ Stage 2: ACTIVATION
**"Do users have a great first experience?"**

### What is Activation?
Activation is when a user **experiences the core value** of your product for the first time. It's that "Aha!" moment when they realize your product solves their problem.

### Why Activation Matters:
- **🚪 First Impression**: You only get one chance to make a good first impression
- **🔮 Predicts Success**: Users who activate are 5-10x more likely to become long-term customers
- **⏱️ Time Sensitive**: Usually happens in the first few minutes or hours

### Key Metrics:
- **Activation Rate**: % of new users who complete the key action
- **Time to Value**: How long it takes for users to see value
- **First-Day Experience**: What users accomplish in their first session

### Finding Your "Aha Moment"
To find your activation event, ask:
1. **What action predicts long-term retention?**
2. **When do users realize your product's core value?**
3. **What's the difference between users who stay vs. leave?**

### Real Example: Facebook's "7 Friends in 10 Days"

**The Discovery (2008)**: Facebook's growth team analyzed millions of users and found:
- Users who connected with **7+ friends within 10 days** had **90% retention**
- Users with fewer than 7 friends had only **20% retention**

**The Strategy**:
1. **🧑‍🤝‍🧑 Friend Suggestions**: Made finding friends the top priority
2. **📧 Email Import**: Simplified adding contacts from email
3. **🎯 Onboarding**: Guided new users to connect with friends first

**Results**:
- **Activation Rate**: Improved from 60% to 85%
- **User Growth**: From 58M to 250M users (2008-2010)
- **Business Impact**: $2.1B+ in additional lifetime value

### 💡 Practical Exercise: Finding the Aha Moment
Let's analyze user behavior to find the activation point:

In [None]:
# Sample user data for a project management tool
user_activation_data = {
    'Action': ['Account Created', 'Profile Setup', 'First Project Created', 
               'Invited Team Member', 'First Task Completed', 'Used Mobile App'],
    'Users Who Did Action': [1000, 750, 400, 250, 350, 180],
    '30-Day Retention Rate': [0.25, 0.35, 0.78, 0.85, 0.82, 0.90]
}

activation_df = pd.DataFrame(user_activation_data)

print("🔍 ACTIVATION ANALYSIS: Finding the 'Aha Moment'")
print("=" * 60)

for i, row in activation_df.iterrows():
    users = row['Users Who Did Action']
    retention = row['30-Day Retention Rate']
    print(f"📱 {row['Action']:20} | Users: {users:4} | Retention: {retention:5.1%}")

# Find the best activation event
best_activation = activation_df.loc[activation_df['30-Day Retention Rate'].idxmax()]

print(f"\n🎯 AHA MOMENT IDENTIFIED:")
print(f"📋 Action: {best_activation['Action']}")
print(f"👥 Users: {best_activation['Users Who Did Action']} people")
print(f"📈 Retention: {best_activation['30-Day Retention Rate']:.1%}")
print(f"\n💡 STRATEGY: Focus onboarding on getting users to '{best_activation['Action']}'!")

In [None]:
# Visualize activation funnel
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Activation funnel
ax1.barh(activation_df['Action'], activation_df['Users Who Did Action'], color='lightblue')
ax1.set_title('📊 Activation Funnel: User Drop-off')
ax1.set_xlabel('Number of Users')
ax1.invert_yaxis()

# Retention by action
colors = ['red' if x < 0.5 else 'orange' if x < 0.7 else 'green' for x in activation_df['30-Day Retention Rate']]
ax2.barh(activation_df['Action'], activation_df['30-Day Retention Rate'] * 100, color=colors)
ax2.set_title('📈 30-Day Retention by Action')
ax2.set_xlabel('Retention Rate (%)')
ax2.invert_yaxis()

plt.tight_layout()
plt.show()

print("\n🎯 OPTIMIZATION STRATEGY:")
print("1. 🚀 Priority: Get more users to invite team members")
print("2. 📧 Tactic: Send email prompts after project creation")
print("3. 🎁 Incentive: Offer premium features for team invites")
print("4. 📱 UX: Make team invitation more prominent in onboarding")

## 🔄 Stage 3: RETENTION
**"Do users come back and keep using your product?"**

### What is Retention?
Retention measures whether users **continue to use your product over time**. It's one of the most important metrics because:
- **💰 More Valuable**: Retained users spend 5x more than new users
- **📊 Cheaper**: It costs 5-7x more to acquire new users than retain existing ones
- **🔮 Predictive**: High retention = sustainable business

### Key Metrics:
- **Day 1, 7, 30 Retention**: % of users still active after X days
- **Churn Rate**: % of users who stop using your product
- **Cohort Analysis**: Track groups of users over time

### Formulas:
```
Retention Rate = (Users still active at end of period) ÷ (Users at start of period)
Churn Rate = 1 - Retention Rate
```

### The Three Types of Retention Curves:
1. **😊 Smile Curve**: Initial drop, then flattens (GOOD - shows product-market fit)
2. **📉 Declining Curve**: Continuous decline (BAD - poor product-market fit)
3. **📏 Flat Curve**: High retention maintained (EXCELLENT - strong habit formation)

### Real Example: Netflix's Content Strategy

**The Challenge**: Keep users engaged in a competitive streaming market

**Retention Strategy**:
1. **🎬 Personalized Content**: Algorithm recommends shows you'll love
2. **📺 Binge-Worthy Series**: Create shows designed for binge-watching
3. **🎨 Personalized Thumbnails**: Different artwork for different users
4. **🔄 Autoplay**: Reduces friction between episodes

**Results**:
- **Churn Rate**: Improved from 9% to 2.4% monthly
- **Viewing Time**: 2+ hours per day average
- **Business Impact**: Industry-leading retention enables premium pricing

### 💡 Practical Exercise: Cohort Retention Analysis
Let's analyze retention patterns for different user cohorts:

In [None]:
# Sample cohort retention data
cohort_data = {
    'Cohort': ['Jan 2024', 'Feb 2024', 'Mar 2024', 'Apr 2024', 'May 2024', 'Jun 2024'],
    'Users': [1000, 1200, 1500, 1800, 2000, 2200],
    'Day_1': [0.85, 0.87, 0.89, 0.91, 0.93, 0.94],
    'Day_7': [0.65, 0.68, 0.72, 0.75, 0.78, 0.81],
    'Day_30': [0.35, 0.38, 0.42, 0.45, 0.48, 0.52],
    'Day_90': [0.25, 0.28, 0.31, 0.34, 0.37, 0.40]
}

cohort_df = pd.DataFrame(cohort_data)

print("📊 COHORT RETENTION ANALYSIS")
print("=" * 50)
print(f"{'Cohort':10} | {'Users':6} | {'Day 1':6} | {'Day 7':6} | {'Day 30':7} | {'Day 90':7}")
print("-" * 50)

for i, row in cohort_df.iterrows():
    print(f"{row['Cohort']:10} | {row['Users']:6} | {row['Day_1']:5.1%} | {row['Day_7']:5.1%} | {row['Day_30']:6.1%} | {row['Day_90']:6.1%}")

# Calculate average retention rates
avg_retention = {
    'Day 1': cohort_df['Day_1'].mean(),
    'Day 7': cohort_df['Day_7'].mean(),
    'Day 30': cohort_df['Day_30'].mean(),
    'Day 90': cohort_df['Day_90'].mean()
}

print(f"\n📈 AVERAGE RETENTION RATES:")
for period, rate in avg_retention.items():
    print(f"• {period:6}: {rate:.1%}")

# Identify trends
improvement_30_day = (cohort_df['Day_30'].iloc[-1] - cohort_df['Day_30'].iloc[0]) / cohort_df['Day_30'].iloc[0]
print(f"\n🎯 RETENTION TREND:")
print(f"• 30-day retention improved by {improvement_30_day:.1%} over 6 months")
print(f"• Latest cohort: {cohort_df['Day_30'].iloc[-1]:.1%} (vs. {cohort_df['Day_30'].iloc[0]:.1%} in Jan)")

In [None]:
# Visualize retention curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Retention curve over time
periods = ['Day 1', 'Day 7', 'Day 30', 'Day 90']
latest_cohort = [cohort_df['Day_1'].iloc[-1], cohort_df['Day_7'].iloc[-1], 
                cohort_df['Day_30'].iloc[-1], cohort_df['Day_90'].iloc[-1]]
first_cohort = [cohort_df['Day_1'].iloc[0], cohort_df['Day_7'].iloc[0], 
               cohort_df['Day_30'].iloc[0], cohort_df['Day_90'].iloc[0]]

ax1.plot(periods, [x*100 for x in latest_cohort], marker='o', linewidth=3, label='Jun 2024 (Latest)', color='green')
ax1.plot(periods, [x*100 for x in first_cohort], marker='o', linewidth=3, label='Jan 2024 (First)', color='red')
ax1.set_title('📈 Retention Curve Comparison')
ax1.set_ylabel('Retention Rate (%)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Cohort performance over time
ax2.plot(cohort_df['Cohort'], cohort_df['Day_30'], marker='o', linewidth=3, color='blue')
ax2.set_title('📊 30-Day Retention by Cohort')
ax2.set_ylabel('30-Day Retention Rate')
ax2.tick_params(axis='x', rotation=45)
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 KEY INSIGHTS:")
print("✅ Retention is improving over time - good sign!")
print("✅ Latest cohort shows 'smile curve' pattern - indicates product-market fit")
print("🎯 Focus Area: Improve Day 7 retention (biggest drop-off point)")
print("📋 Action Items: Analyze what successful Day 7 users do differently")

## 📢 Stage 4: REFERRAL
**"Do users tell others about your product?"**

### What is Referral?
Referral is when existing users **recommend your product to others**. It's incredibly valuable because:
- **🆓 Free Growth**: Referred users cost almost nothing to acquire
- **🏆 Higher Quality**: Referred users have higher retention and lifetime value
- **🔗 Network Effects**: Each new user can bring more users

### Key Metrics:
- **Referral Rate**: % of users who refer others
- **Viral Coefficient (K-factor)**: How many new users each user brings
- **Net Promoter Score (NPS)**: How likely users are to recommend you

### The Viral Coefficient Formula:
```
K = (Invites per User) × (Conversion Rate of Invites)

If K > 1: Exponential viral growth 🚀
If K = 1: Sustainable growth 📈
If K < 1: Need other growth channels 🔧
```

### Real Example: Dropbox's Referral Program

**The Problem**: Customer acquisition cost was $233-388 per user through ads

**The Solution**: "Get Space, Give Space" referral program
- **🎁 Referrer Gets**: 500MB extra storage
- **🎁 New User Gets**: 500MB extra storage
- **💡 Smart Design**: Reward = core product value (storage)

**Results**:
- **Referral Rate**: 35% of users made referrals
- **Conversion**: 18% of invites became users
- **Viral Coefficient**: K = 0.35 × 0.18 = 0.063
- **Growth**: 60% of new signups came from referrals
- **Cost**: Reduced CAC from $233+ to $4.50

### 💡 Practical Exercise: Viral Coefficient Calculation
Let's calculate the viral potential of different referral strategies:

In [None]:
# Different referral program scenarios
referral_scenarios = {
    'Program Type': ['No Program', 'Basic Sharing', 'Cash Reward', 'Product Reward', 'Two-Sided Reward'],
    'Users Who Refer': [0.02, 0.08, 0.15, 0.25, 0.35],  # % of users who make referrals
    'Invites per Referrer': [1.2, 2.1, 3.2, 2.8, 4.1],   # Average invites sent
    'Invite Conversion Rate': [0.05, 0.08, 0.12, 0.18, 0.22],  # % of invites that convert
    'Program Cost per User': [0, 0, 25, 8, 12]  # Cost to run the program
}

referral_df = pd.DataFrame(referral_scenarios)

# Calculate key metrics
referral_df['Total Invites per User'] = referral_df['Users Who Refer'] * referral_df['Invites per Referrer']
referral_df['Viral Coefficient (K)'] = referral_df['Total Invites per User'] * referral_df['Invite Conversion Rate']
referral_df['Monthly Viral Users'] = referral_df['Viral Coefficient (K)'] * 1000  # Assuming 1000 users
referral_df['Cost per Viral User'] = referral_df['Program Cost per User'] / referral_df['Viral Coefficient (K)']
referral_df['Cost per Viral User'] = referral_df['Cost per Viral User'].fillna(0)  # Handle division by zero

print("🚀 VIRAL COEFFICIENT ANALYSIS")
print("=" * 80)
print(f"{'Program':17} | {'Refer %':7} | {'Invites':7} | {'Convert':8} | {'K-factor':8} | {'Cost/User':9}")
print("-" * 80)

for i, row in referral_df.iterrows():
    refer_pct = f"{row['Users Who Refer']:.1%}"
    invites = f"{row['Invites per Referrer']:.1f}"
    convert = f"{row['Invite Conversion Rate']:.1%}"
    k_factor = f"{row['Viral Coefficient (K)']:.3f}"
    cost = f"${row['Cost per Viral User']:.0f}" if row['Cost per Viral User'] > 0 else "Free"
    
    print(f"{row['Program Type']:17} | {refer_pct:7} | {invites:7} | {convert:8} | {k_factor:8} | {cost:9}")

# Find the best program
best_k = referral_df.loc[referral_df['Viral Coefficient (K)'].idxmax()]
best_roi = referral_df[referral_df['Cost per Viral User'] > 0]['Cost per Viral User'].idxmin()
best_roi_program = referral_df.loc[best_roi]

print(f"\n🏆 BEST VIRAL GROWTH: {best_k['Program Type']}")
print(f"📈 Viral Coefficient: {best_k['Viral Coefficient (K)']:.3f}")
print(f"👥 Monthly Viral Users: {best_k['Monthly Viral Users']:.0f} (from 1000 existing users)")

print(f"\n💰 BEST ROI: {best_roi_program['Program Type']}")
print(f"💵 Cost per Viral User: ${best_roi_program['Cost per Viral User']:.0f}")

In [None]:
# Visualize viral growth potential
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Viral coefficient by program type
colors = ['red' if k < 0.1 else 'orange' if k < 0.5 else 'green' for k in referral_df['Viral Coefficient (K)']]
bars = ax1.bar(referral_df['Program Type'], referral_df['Viral Coefficient (K)'], color=colors)
ax1.set_title('🚀 Viral Coefficient (K-factor) by Program')
ax1.set_ylabel('Viral Coefficient')
ax1.tick_params(axis='x', rotation=45)
ax1.axhline(y=1.0, color='red', linestyle='--', alpha=0.7, label='Breakeven (K=1)')
ax1.legend()

# Add value labels on bars
for bar, value in zip(bars, referral_df['Viral Coefficient (K)']):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
             f'{value:.3f}', ha='center', va='bottom')

# Cost efficiency (excluding free programs)
paid_programs = referral_df[referral_df['Cost per Viral User'] > 0]
ax2.bar(paid_programs['Program Type'], paid_programs['Cost per Viral User'], color='skyblue')
ax2.set_title('💰 Cost per Viral User')
ax2.set_ylabel('Cost ($)')
ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n💡 STRATEGIC INSIGHTS:")
print("🎯 Two-Sided Rewards have the highest viral coefficient")
print("💰 Product Rewards offer better ROI than cash rewards")
print("🚀 None achieve K>1, but significant growth boost possible")
print("📋 Recommendation: Start with Product Reward, optimize toward Two-Sided")

## 💰 Stage 5: REVENUE
**"How do you make money from users?"**

### What is Revenue in AARRR?
Revenue is how you **monetize your users** and turn your product into a sustainable business. It's the ultimate validation of value creation.

### Key Metrics:
- **Conversion Rate**: % of users who become paying customers
- **Average Revenue Per User (ARPU)**: How much each user pays on average
- **Customer Lifetime Value (CLV)**: Total revenue from a customer over time
- **Monthly Recurring Revenue (MRR)**: Predictable monthly revenue

### Essential Formulas:
```
ARPU = Total Revenue ÷ Total Users
CLV = ARPU ÷ Churn Rate (simplified)
LTV:CAC Ratio = Customer Lifetime Value ÷ Customer Acquisition Cost

Healthy LTV:CAC ratio = 3:1 or higher
```

### Common Revenue Models:
- **🎫 Subscription**: Monthly/yearly recurring payments (Netflix, Spotify)
- **🆓➡️💰 Freemium**: Free version + paid premium (Spotify, Slack)
- **💳 Transaction**: Take % of each transaction (Stripe, Uber)
- **📺 Advertising**: Revenue from showing ads (Facebook, Google)
- **🛒 Marketplace**: Commission on sales (Amazon, Airbnb)

### Real Example: Spotify's Freemium Evolution

**The Challenge**: Compete with free piracy while paying music royalties

**Revenue Strategy Evolution**:

**Phase 1 (2008-2012)**: Basic Freemium
- Free: Unlimited streaming with ads
- Premium: $9.99/month, ad-free + offline
- Result: 25% conversion rate

**Phase 2 (2013-2018)**: Optimized Freemium
- Free: Limited skips, shuffle-only mobile
- Premium: Unlimited features
- Family Plan: $14.99 for 6 accounts
- Student: $4.99/month
- Result: 46% conversion rate

**Phase 3 (2019-2024)**: Platform Monetization
- Podcasts: Higher margin content
- Creator tools: New revenue streams
- Ad technology: Better targeting
- Result: Revenue grew from $5.3B to $13.2B

### 💡 Practical Exercise: Revenue Model Analysis
Let's analyze the unit economics of different revenue models:

In [None]:
# Sample revenue data for different business models
revenue_models = {
    'Model': ['Basic SaaS', 'Freemium SaaS', 'Marketplace', 'Ad-Supported', 'Premium Only'],
    'Monthly Users': [10000, 50000, 25000, 100000, 5000],
    'Conversion Rate': [0.15, 0.08, 0.35, 0.02, 0.45],  # % who pay
    'ARPU (Monthly)': [29.99, 12.50, 45.00, 2.30, 79.99],  # Average revenue per user
    'Monthly Churn': [0.05, 0.03, 0.08, 0.12, 0.02],  # Monthly churn rate
    'CAC': [125, 45, 180, 15, 250]  # Customer acquisition cost
}

revenue_df = pd.DataFrame(revenue_models)

# Calculate key metrics
revenue_df['Paying Customers'] = revenue_df['Monthly Users'] * revenue_df['Conversion Rate']
revenue_df['Monthly Revenue'] = revenue_df['Paying Customers'] * revenue_df['ARPU (Monthly)']
revenue_df['CLV'] = revenue_df['ARPU (Monthly)'] / revenue_df['Monthly Churn']  # Simplified LTV
revenue_df['LTV:CAC Ratio'] = revenue_df['CLV'] / revenue_df['CAC']
revenue_df['Payback Period (Months)'] = revenue_df['CAC'] / revenue_df['ARPU (Monthly)']

print("💰 REVENUE MODEL COMPARISON")
print("=" * 85)
print(f"{'Model':15} | {'Users':7} | {'Convert':8} | {'ARPU':6} | {'Revenue':9} | {'LTV:CAC':8} | {'Payback':8}")
print("-" * 85)

for i, row in revenue_df.iterrows():
    users = f"{row['Monthly Users']/1000:.0f}K"
    convert = f"{row['Conversion Rate']:.1%}"
    arpu = f"${row['ARPU (Monthly)']:.0f}"
    revenue = f"${row['Monthly Revenue']/1000:.0f}K"
    ltv_cac = f"{row['LTV:CAC Ratio']:.1f}:1"
    payback = f"{row['Payback Period (Months)']:.1f}m"
    
    print(f"{row['Model']:15} | {users:7} | {convert:8} | {arpu:6} | {revenue:9} | {ltv_cac:8} | {payback:8}")

# Find best performers
highest_revenue = revenue_df.loc[revenue_df['Monthly Revenue'].idxmax()]
best_ltv_cac = revenue_df.loc[revenue_df['LTV:CAC Ratio'].idxmax()]
fastest_payback = revenue_df.loc[revenue_df['Payback Period (Months)'].idxmin()]

print(f"\n🏆 HIGHEST REVENUE: {highest_revenue['Model']} (${highest_revenue['Monthly Revenue']/1000:.0f}K/month)")
print(f"💎 BEST LTV:CAC: {best_ltv_cac['Model']} ({best_ltv_cac['LTV:CAC Ratio']:.1f}:1 ratio)")
print(f"⚡ FASTEST PAYBACK: {fastest_payback['Model']} ({fastest_payback['Payback Period (Months)']:.1f} months)")

In [None]:
# Visualize revenue model performance
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('💰 Revenue Model Performance Analysis', fontsize=16, fontweight='bold')

# Monthly revenue by model
bars1 = ax1.bar(revenue_df['Model'], revenue_df['Monthly Revenue']/1000, color='lightgreen')
ax1.set_title('📊 Monthly Revenue by Model')
ax1.set_ylabel('Revenue ($K)')
ax1.tick_params(axis='x', rotation=45)

# Add value labels
for bar, value in zip(bars1, revenue_df['Monthly Revenue']):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 5,
             f'${value/1000:.0f}K', ha='center', va='bottom')

# LTV:CAC Ratio
colors = ['red' if x < 3 else 'orange' if x < 5 else 'green' for x in revenue_df['LTV:CAC Ratio']]
bars2 = ax2.bar(revenue_df['Model'], revenue_df['LTV:CAC Ratio'], color=colors)
ax2.set_title('📈 LTV:CAC Ratio (Higher is Better)')
ax2.set_ylabel('LTV:CAC Ratio')
ax2.tick_params(axis='x', rotation=45)
ax2.axhline(y=3, color='red', linestyle='--', alpha=0.7, label='Minimum (3:1)')
ax2.legend()

# ARPU comparison
ax3.bar(revenue_df['Model'], revenue_df['ARPU (Monthly)'], color='skyblue')
ax3.set_title('💵 Average Revenue Per User')
ax3.set_ylabel('ARPU ($)')
ax3.tick_params(axis='x', rotation=45)

# Payback period
ax4.bar(revenue_df['Model'], revenue_df['Payback Period (Months)'], color='salmon')
ax4.set_title('⏱️ CAC Payback Period (Lower is Better)')
ax4.set_ylabel('Months to Payback')
ax4.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n💡 KEY INSIGHTS:")
print("📊 Premium Only: High ARPU but limited scale")
print("🎯 Freemium SaaS: Best balance of scale and unit economics")
print("⚠️  Ad-Supported: High scale but challenging unit economics")
print("💰 Marketplace: High revenue per transaction but volatile")
print("\n🎯 RECOMMENDATION: Start with freemium, optimize conversion and retention")

## 🎯 Putting It All Together: Complete AARRR Analysis

Let's analyze a complete user journey through all 5 stages:

In [None]:
# Complete AARRR funnel analysis
aarrr_funnel = {
    'Stage': ['👥 Acquisition', '⚡ Activation', '🔄 Retention (30d)', '📢 Referral', '💰 Revenue'],
    'Users': [10000, 3500, 1400, 490, 280],  # Users at each stage
    'Conversion Rate': [1.0, 0.35, 0.40, 0.35, 0.57],  # Conversion to next stage
    'Industry Benchmark': [1.0, 0.25, 0.30, 0.15, 0.40],  # Typical industry rates
    'Monthly Value ($)': [0, 0, 0, 0, 8400]  # Revenue generated
}

aarrr_df = pd.DataFrame(aarrr_funnel)

# Calculate performance vs benchmark
aarrr_df['vs Benchmark'] = aarrr_df['Conversion Rate'] / aarrr_df['Industry Benchmark']
aarrr_df['Performance'] = ['Baseline'] + [
    '🟢 Above' if x > 1.1 else '🟡 Average' if x > 0.9 else '🔴 Below' 
    for x in aarrr_df['vs Benchmark'][1:]
]

print("🏴‍☠️ COMPLETE AARRR FUNNEL ANALYSIS")
print("=" * 70)
print(f"{'Stage':20} | {'Users':6} | {'Rate':6} | {'Benchmark':9} | {'Performance':11}")
print("-" * 70)

for i, row in aarrr_df.iterrows():
    stage = row['Stage']
    users = f"{row['Users']:,}"
    rate = f"{row['Conversion Rate']:.1%}" if i > 0 else "100%"
    benchmark = f"{row['Industry Benchmark']:.1%}" if i > 0 else "100%"
    performance = row['Performance']
    
    print(f"{stage:20} | {users:6} | {rate:6} | {benchmark:9} | {performance:11}")

# Calculate key business metrics
total_acquisition_cost = 10000 * 25  # $25 per user acquired
total_revenue = 280 * 30  # $30 per paying customer
roi = (total_revenue - total_acquisition_cost) / total_acquisition_cost

print(f"\n💰 BUSINESS METRICS:")
print(f"• Total Acquisition Cost: ${total_acquisition_cost:,}")
print(f"• Monthly Revenue: ${total_revenue:,}")
print(f"• Overall ROI: {roi:.1%}")
print(f"• Conversion Funnel: 10,000 → 280 ({280/10000:.2%} overall)")

# Identify biggest opportunity
worst_performer = aarrr_df[aarrr_df['vs Benchmark'] > 0]['vs Benchmark'].idxmin()
opportunity_stage = aarrr_df.loc[worst_performer, 'Stage']
print(f"\n🎯 BIGGEST OPPORTUNITY: {opportunity_stage}")
print(f"📊 Current vs Benchmark: {aarrr_df.loc[worst_performer, 'vs Benchmark']:.2f}x")

In [None]:
# Visualize complete AARRR funnel
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# AARRR Funnel
ax1.fill_between(range(len(aarrr_df)), aarrr_df['Users'], alpha=0.7, color='lightblue')
ax1.plot(range(len(aarrr_df)), aarrr_df['Users'], marker='o', linewidth=3, markersize=8, color='blue')

# Add value labels
for i, (stage, users) in enumerate(zip(aarrr_df['Stage'], aarrr_df['Users'])):
    ax1.text(i, users + 200, f'{users:,}', ha='center', va='bottom', fontweight='bold')
    ax1.text(i, -500, stage.split(' ', 1)[1], ha='center', va='top', fontsize=10, rotation=0)

ax1.set_title('🏴‍☠️ AARRR Funnel: User Journey', fontsize=14, fontweight='bold')
ax1.set_ylabel('Number of Users')
ax1.set_xticks(range(len(aarrr_df)))
ax1.set_xticklabels([stage.split(' ')[0] for stage in aarrr_df['Stage']])
ax1.grid(True, alpha=0.3)

# Performance vs Benchmark
performance_data = aarrr_df[1:]  # Exclude acquisition baseline
x_pos = range(len(performance_data))

bars1 = ax2.bar([i - 0.2 for i in x_pos], performance_data['Conversion Rate'], 
                width=0.4, label='Your Performance', color='lightblue')
bars2 = ax2.bar([i + 0.2 for i in x_pos], performance_data['Industry Benchmark'], 
                width=0.4, label='Industry Benchmark', color='lightcoral')

ax2.set_title('📊 Performance vs Industry Benchmarks', fontsize=14, fontweight='bold')
ax2.set_ylabel('Conversion Rate')
ax2.set_xticks(x_pos)
ax2.set_xticklabels([stage.split(' ', 1)[1] for stage in performance_data['Stage']], rotation=45)
ax2.legend()
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n🎯 STRATEGIC RECOMMENDATIONS:")
print("1. 🔴 Priority: Fix retention (biggest gap vs benchmark)")
print("2. 🟡 Opportunity: Improve referral program (low current rate)")
print("3. 🟢 Strength: Revenue conversion is above benchmark")
print("4. 📈 Focus: 10% improvement in retention = 25% more revenue")

## 🎓 Key Takeaways: When to Use AARRR

### ✅ AARRR Works Best For:
- **🚀 Early-stage startups** looking for product-market fit
- **📱 Consumer apps** with clear user journeys
- **💰 SaaS products** with subscription models
- **📊 Growth teams** wanting comprehensive metrics

### ❌ Consider Alternatives When:
- **🏢 Complex B2B sales** (6+ month sales cycles)
- **🎨 UX optimization focus** (use HEART framework)
- **🎯 Team alignment needs** (use North Star metric)
- **🏪 Two-sided marketplaces** (need separate funnels)

### 💡 Success Tips:
1. **📊 Start with data infrastructure** - you can't optimize what you can't measure
2. **🎯 Focus on one stage at a time** - don't try to optimize everything simultaneously
3. **👥 Get team alignment** - everyone should understand their AARRR responsibility
4. **🔄 Iterate constantly** - AARRR is about continuous improvement
5. **📈 Quality over quantity** - better users are more valuable than more users

### 🎯 Next Steps:
1. **Define your activation event** - what's your "Aha moment"?
2. **Set up tracking** - measure each AARRR stage properly
3. **Identify your biggest bottleneck** - where are you losing the most users?
4. **Run experiments** - A/B test improvements to your weakest stage
5. **Monitor and iterate** - AARRR is a continuous optimization process

---

**Remember**: AARRR is a framework for thinking about your entire user journey. It helps you see the big picture while identifying specific areas for improvement. The key is to use it as a guide, not a rigid rulebook! 🏴‍☠️