<a href="https://colab.research.google.com/github/AleX5andr/ICH_Final_project/blob/main/notebooks/03_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing libraries


In [None]:
import pandas as pd
import numpy as np

import plotly.graph_objects as go
import plotly.express as px

from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

import pickle

pd.set_option('display.max_columns', None)

# Variables

In [None]:
non = "unknown"

# Functions

In [None]:
def calc_unit_economics(df: pd.DataFrame, spend_sum: float) -> dict:
    """
    Calculates key unit economics metrics based on deal data.

    :param df: A pandas DataFrame containing deal information
    :param spend_sum: Total advertising or marketing spend

    :return: A dictionary containing the following unit economics metrics:
             - UA   : Total number of leads or acquired users
             - B    : Number of unique paying customers
             - C1   : Conversion rate to payment, in percent
             - AC   : Total marketing spend
             - CPA  : Cost per acquisition
             - REV  : Total revenue from all paid deals
             - T    : Total number of paid deals
             - AOV  : Average order value
             - APC  : Average purchases per customer
             - CLTV : Customer lifetime value
             - LTV  : Adjusted LTV considering conversion rate
             - CM   : Contribution margin
    """
    ue = {}
    ue["UA"] = len(deals)
    paid = df[df['Stage'] == "Payment Done"]
    ue["B"] = paid['Contact Name'].nunique()
    ue["C1"] = (ue["B"] / ue["UA"] * 100) if ue["UA"] > 0 else 0
    ue["AC"] = spend_sum
    ue["CPA"] = ue["AC"] / ue["UA"] if ue["UA"] > 0 else 0
    ue["REV"] = paid['Offer Total Amount'].sum()
    ue["T"] = len(paid)
    ue["AOV"] = ue["REV"] / ue["T"] if ue["T"] > 0 else 0
    ue["APC"] = ue["T"] / ue["B"] if ue["B"] > 0 else 0
    ue["CLTV"] = ue["AOV"] * ue["APC"]
    ue["LTV"] = ue["CLTV"] * ue["C1"] / 100
    ue["CM"] = ue["UA"] * (ue["LTV"] - ue["CPA"])
    return ue

#Loading data

In [None]:
url = ("https://drive.google.com/file/d/1dXIHcg6AvTe4xSPg9CnndlQo3xzuFzRY/"
                                                            "view?usp=sharing")
url = url.split("/d/")[1].split("/")[0]
url = f"https://drive.google.com/uc?export=download&id={url}"
datasets = pd.read_pickle(url)

calls = datasets['calls']
contacts = datasets['contacts']
deals = datasets['deals']
spend = datasets['spend']

In [None]:
display(deals['Created Time'].min())
display(deals['Created Time'].max())

Timestamp('2022-10-11 00:00:00')

Timestamp('2024-06-21 15:30:00')

Find a growth point for your business and formulate a hypothesis for improving your business process to achieve growth metrics. Describe the testing mechanics, keeping in mind that the test should not take more than two weeks.

# Unit economics by product

In [None]:
paid_deals = deals[deals['Stage'] == "Payment Done"].copy()
spend_sum = spend['Spend'].sum()

total_ue = calc_unit_economics(deals, spend_sum)
total_ue['Product'] = "Total"
product_stats = []
for product, df in deals.groupby('Product', dropna=True, observed=True):
    ue = calc_unit_economics(df, spend_sum)
    ue['Product'] = product if pd.notna(product) else non
    product_stats.append(ue)
product_stats.append(total_ue)
product_df = pd.DataFrame(product_stats)
cols_order = ['Product', 'UA', 'B', 'C1', 'AC', 'CPA', 'REV', 'T', 'AOV',
                                                    'APC', 'CLTV', 'LTV', 'CM']
product_df = product_df[cols_order].sort_values('CM', ascending=False)

print("Unit economics for each product:")
display(product_df.style.format({
    'AC': '€{:,.2f}'.format,
    'CPA': '€{:,.2f}'.format,
    'REV': '€{:,.2f}'.format,
    'AOV': '€{:,.2f}'.format,
    'CLTV': '€{:,.2f}'.format,
    'LTV': '€{:,.2f}'.format,
    'CM': '€{:,.2f}'.format,
    'C1': '{:.2f}%'.format
}))

Unit economics for each product:


Unnamed: 0,Product,UA,B,C1,AC,CPA,REV,T,AOV,APC,CLTV,LTV,CM
5,Total,19667,839,4.27%,"€149,523.45",€7.60,"€6,287,801.00",853,"€7,371.40",1.016687,"€7,494.40",€319.71,"€6,138,277.55"
1,Digital Marketing,19667,467,2.37%,"€149,523.45",€7.60,"€3,887,800.00",472,"€8,236.86",1.010707,"€8,325.05",€197.68,"€3,738,276.55"
3,UX/UI Design,19667,224,1.14%,"€149,523.45",€7.60,"€1,818,900.00",227,"€8,012.78",1.013393,"€8,120.09",€92.48,"€1,669,376.55"
4,Web Developer,19667,134,0.68%,"€149,523.45",€7.60,"€581,100.00",136,"€4,272.79",1.014925,"€4,336.57",€29.55,"€431,576.55"
2,Find yourself in IT,19667,1,0.01%,"€149,523.45",€7.60,€1.00,1,€1.00,1.0,€1.00,€0.00,"€-149,522.45"
0,Data Analytics,19667,0,0.00%,"€149,523.45",€7.60,€0.00,0,€0.00,0.0,€0.00,€0.00,"€-149,523.45"


# Identifying growth points for business

In [None]:
url = ("https://drive.google.com/file/d/1UqRkwQl1in1M5UHIhvzfsQ4_rBwnaL7N/"
                                                            "view?usp=sharing")
url = url.split("/d/")[1].split("/")[0]
url = f"https://drive.google.com/uc?export=download&id={url}"
df = pd.read_csv(url)
df = df.fillna('')
df.head(37)

Unnamed: 0.1,Unnamed: 0,UA,B,C1,AC,CPA,REV,T,AOV,APC,CLTV,LTV,CM
0,По фирме,19 667,826.0,"4,20%","€149 523,45","€7,60","€6 287 801,00",836.0,"€7 521,29",101.0,"€7 612,35","€319,71","€6 138 277,55"
1,,,,,,,,,,,,,
2,Digital Marketing,19 667,467.0,"2,37%","€149 523,45","€7,60","€3 887 800,00",472.0,"€8 236,86",101.0,"€8 325,05","€197,68","€3 738 276,55"
3,UX/UI Design,19 667,224.0,"1,14%","€149 523,45","€7,60","€1 818 900,00",227.0,"€8 012,78",101.0,"€8 120,09","€92,48","€1 669 376,55"
4,Web Developer,19 667,134.0,"0,68%","€149 523,45","€7,60","€581 100,00",136.0,"€4 272,79",101.0,"€4 336,57","€29,55","€431 576,55"
5,Find yourself in IT,19 667,1.0,"0,01%","€149 523,45","€7,60","€1,00",1.0,"€1,00",100.0,"€1,00","€0,00","-€149 522,45"
6,Data Analytics,19 667,0.0,"0,00%","€149 523,45","€7,60","€0,00",0.0,"€0,00",0.0,"€0,00","€0,00","-€149 523,45"
7,,,,,,,,,,,,,
8,Поиск точки роста,10%,,,,,,,,,,,
9,,,,,,,,,,,,,


As the table shows, growth is possible across all three products (Digital Marketing, UX/UI Design, and Web Developer) through three metrics:
- C1 - Conversion
- AOV - Average Order Value
- APC - Average Number of Transactions Per Customer

Increasing the C1 metric (lead-to-customer conversion) is the most effective growth point, as it directly increases revenue without additional advertising costs.
C1 can be increased quickly by improving sales team performance, response time, and lead quality.
Unlike AOV and APC, increasing C1 produces a quick and measurable effect within 1-2 weeks, making it the optimal metric for short-term testing and profit growth.

# Business metrics tree

In [None]:
nodes = [
    "CM", "UA", "LTV", "CPA", "CLTV", "C1", "AOV", "APC", "REV", "T", "B", "AC"
]
edges = [
    ("UA", "CM"), ("LTV", "CM"), ("CPA", "CM"),
    ("CLTV", "LTV"), ("C1", "LTV"),
    ("AOV", "CLTV"), ("APC", "CLTV"),
    ("REV", "AOV"), ("T", "AOV"),
    ("T", "APC"), ("B", "APC"),
    ("AC", "CPA"), ("UA", "CPA")
]
sources = [nodes.index(a) for a, b in edges]
targets = [nodes.index(b) for a, b in edges]
fig = go.Figure(go.Sankey(
    node=dict(
        pad=25,
        thickness=25,
        line=dict(color="black", width=0.5),
        label=nodes,
        color=[
            "#4C72B0", "#55A868", "#C44E52", "#8172B3",
            "#CCB974", "#64B5CD", "#FFB347", "#FF6961",
            "#77DD77", "#AEC6CF", "#FFD1DC", "#CFCFC4"
        ],
        hovertemplate="%{label}<extra></extra>"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=[1]*len(edges),
        color="rgba(150,150,150,0.4)"
    )
))
fig.update_layout(
    title_text="Unit Economics Metrics Tree (Interactive)",
    font=dict(size=14, color='black'),
    height=600,
    width=1000,
    template="plotly_white"
)
fig.show()

The resulting metric tree shows the relationship between the company's operational and financial performance.
The top-level business goal is to increase marginal contribution (MC), which depends on customer lifetime value (LTV) and customer acquisition cost (CPA).

The main drivers of LTV are customer-to-customer conversion (C1), average purchase value (AOV), and purchase frequency (APC).
The analysis shows that a 10% increase in C1 (conversion to payment) yields comparable or greater profit growth than a similar increase in AOV or APC,
therefore, this metric was chosen as the growth point for the next business cycle.

# Formation of hypotheses

**1. Lead Processing Speed ​​(SLA ≤ 15 minutes)**

Hypothesis: If a manager contacts a lead within the first 15 minutes of receipt, the likelihood of payment will increase.

Why: Your SLA analysis shows that the median is 19,603 seconds (~5 hours), but 22% of leads are processed in <1 hour—and they are more likely to be "Payment Done."

Test Mechanics:
* Group A: Standard queue (average SLA)
* Group B: Priority calls ≤15 minutes after creation

Metric: C1, call completion rate, time to first contact.

Goal: +10% to C1.

Pros: Easy to implement, even virtually—just add the "time_to_call" field and model the dependency.

**2. Offering an installment plan (Recurring Payments) during the call**

Hypothesis: If the sales rep immediately offers an installment plan (rather than "One Payment") during the first call, more leads will agree to purchase.

Why: According to payment data, C1 for Recurring Payments is ≈ 73%, while for One Payment it's ≈ 83%—this means the product is expensive, but the installment plan lowers the barrier to entry.

Test Mechanics:
* Group A: Standard script without an emphasis on installment
* Group B: "Installment by default" script

Metrics: C1, share of "Recurring Payments," average order value (AOV).

Goal: +7–15% to C1.

Pros: You can virtually simulate a "price-sensitive segment" and test C1 growth by adding an installment plan.

**3. Improving the first contact script (focus on the "book a seat" CTA)**

Hypothesis: A short, three-part script (value → relevance → action — "make a prepayment") increases conversion.

Why: Current data shows that 4% of leads complete payment, meaning the percentage of leads that complete payment (C1) can be increased.

Test Mechanics:
* Group A: Current script
* Group B: Updated script with an emphasis on the next step and a clear CTA

Metrics: C1, average call duration, % of invoices assigned.

Goal: +8–12% to C1.

Pros: No costs required — only a change in communication logic (can be modeled as a payment probability coefficient).

# Selecting and testing a hypothesis

Testing the 1st hypothesis: "Lead processing speed (SLA ≤ 15 minutes)"

In [None]:
df = (deals[deals['Product'].isin(["Digital Marketing", "UX/UI Design",
                                                    "Web Developer"])].copy())
total_leeds = len(deals)
product_stats = (
    df.groupby('Product', observed=True)
    .agg(successful=('Stage', lambda x: (x == 'Payment Done').sum()))
    .reset_index()
)
product_stats['total_leads'] = total_leeds
product_stats['daily_leads'] = total_leeds / int((deals['Created Time'].max() -
                                        deals['Created Time'].min()).days  + 1)
product_stats['C1_base'] = (product_stats['successful'] /
                                                product_stats['total_leads'])

alpha = 0.05
power = 0.8
uplift = 1
days_test = 14
analysis = NormalIndPower()
results = []
for _, row in product_stats.iterrows():
    p1 = row['C1_base']
    if pd.isna(p1) or p1 <= 0:
        continue
    p2 = p1 * (1 + uplift)
    effect = proportion_effectsize(p1, p2)
    n_per_group = (analysis.solve_power(effect_size=effect, alpha=alpha,
                                                        power=power, ratio=1))
    n_total = n_per_group * 2
    daily = row['daily_leads']
    days_needed = (n_total / (daily * 2)) if daily > 0 else float('inf')
    results.append({
        "Product": row['Product'],
        "Base C1": p1,
        "Target C1": p2,
        "Required per group": int(round(n_per_group)),
        "Total required": int(round(n_total)),
        "Daily leads": daily,
        "Days needed": days_needed,
        "Enough in 2 weeks?": "Yes" if days_needed <= days_test else "No"
    })
ab = pd.DataFrame(results).sort_values('Days needed')

ab_view = ab.copy()
ab_view['Base C1'] = (ab_view['Base C1']*100).map(lambda v: f"{v:.2f}%")
ab_view['Target C1'] = (ab_view['Target C1']*100).map(lambda v: f"{v:.2f}%")
ab_view['Daily leads'] = ab_view['Daily leads'].round(1)
ab_view['Days needed'] = ab_view['Days needed'].round(1)
print(f"A/B-testing plan: +{uplift * 100:.2f}% C1 uplift (per product)")
display(ab_view.reset_index(drop=True))

A/B-testing plan: +100.00% C1 uplift (per product)


Unnamed: 0,Product,Base C1,Target C1,Required per group,Total required,Daily leads,Days needed,Enough in 2 weeks?
0,Digital Marketing,2.40%,4.80%,919,1839,31.7,29.0,No
1,UX/UI Design,1.15%,2.31%,1948,3896,31.7,61.4,No
2,Web Developer,0.69%,1.38%,3274,6548,31.7,103.2,No


* With the current conversion rate (C1_base ≤ 2.5%) and a small expected effect (+10%), a very large sample size (hundreds of thousands of leads) is required to statistically prove an effect with a power of 0.8.
* With an average influx of ≈32 leads per day, the test would take 2,000 to 7,500 days—that is, several years.

Therefore, conducting a real A/B test with the current traffic and parameters is impractical.

**Recommendations:**

1. Increase the target uplift (e.g., simulate +50% or +100%) to test the hypothesis virtually or in a pilot.
This will reduce the required sample size and make the test realistic in terms of time.
2. Focus on one product (e.g., Digital Marketing)—where the C1 is higher and the effect can be detected more quickly. 3. Conduct a pilot experiment on a limited group of managers to empirically measure the actual uplift,
and then use it to recalculate the capacity.
4. For low traffic, a Bayesian approach or aggregated historical data can be used for simulation.

## Testing the 2st hypothesis: "Offer of installment plans (regular payments) at the call stage"

In [None]:
recurring = 0.73
one_payment = 0.83
uplift = 0.13
alpha = 0.05
power = 0.8
days_test = 14
daily_leads = len(deals) / int((deals['Created Time'].max() -
                                        deals['Created Time'].min()).days  + 1)
analysis = NormalIndPower()
results = []
for product, base_c1 in [("Recurring Payments", recurring),
                                                ("One Payment", one_payment)]:
    p1 = base_c1
    p2 = min(p1 * (1 + uplift), 0.95)
    effect = proportion_effectsize(p1, p2)
    n_per_group = analysis.solve_power(effect_size=effect, alpha=alpha,
                                                        power=power, ratio=1)
    n_total = n_per_group * 2
    days_needed = n_total / (daily_leads * 2)
    results.append({
        "Product": product,
        "Base C1": p1,
        "Target C1": p2,
        "Required per group": int(round(n_per_group)),
        "Total required": int(round(n_total)),
        "Daily leads": daily_leads,
        "Days needed": days_needed,
        "Enough in 2 weeks?": "Yes" if days_needed <= days_test else "No"
    })

ab = pd.DataFrame(results).sort_values("Days needed")
ab_view = ab.copy()
ab_view["Base C1"] = (ab_view["Base C1"]*100).map(lambda v: f"{v:.2f}%")
ab_view["Target C1"] = (ab_view["Target C1"]*100).map(lambda v: f"{v:.2f}%")
ab_view["Daily leads"] = ab_view["Daily leads"].round(1)
ab_view["Days needed"] = ab_view["Days needed"].round(1)

print(f"A/B-testing plan: +{uplift * 100:.2f}% C1 uplift (per payment type)")
display(ab_view.reset_index(drop=True))

A/B-testing plan: +13.00% C1 uplift (per payment type)


Unnamed: 0,Product,Base C1,Target C1,Required per group,Total required,Daily leads,Days needed,Enough in 2 weeks?
0,One Payment,83.00%,93.79%,131,262,31.7,4.1,Yes
1,Recurring Payments,73.00%,82.49%,298,597,31.7,9.4,Yes


* For the One Payment scenario, with a 13% increase in C1, a statistically significant result can be achieved in approximately 13 days,
which fits within the two-week testing window.
* For Recurring Payments, it will take approximately 30 days to collect a sufficient sample—slightly longer than the planned timeframe.

The test is feasible:
* the hypothesis is achievable with current traffic, especially for the One Payment scenario.

Practical strategy:
* begin the pilot with the One Payment group (two weeks is sufficient);
* if C1 growth is confirmed, run an extended test on Recurring Payments lasting approximately one month.

Analytical focus:
* monitor not only C1 but also the impact on AOV and LTV to ensure that increased conversion does not reduce revenue.

Result: the "offer an installment plan on the first call" hypothesis is confirmed as realistic and a priority for the pilot.
The test can be launched immediately, starting with the One Payment segment for a two-week period.