<!-- metadata: title -->
# Summary of Registered Entities/Companies in Kenya

<!-- metadata: subtitle -->
> ### Analyzing Business Registration Trends Across Political Transitions

**Published Date:**
<!-- metadata: date -->
2024-10-10
<!-- metadata: -->

<!-- metadata: keywords, is_array=true -->
**Keywords:**
  - Kenya
  - business-registration
  - economic-trends
  - political-transitions
  - uhuru-kenyatta
  - william-ruto
  - jubilee-party
  - UDA-party

<!-- metadata: categories, is_array=true -->
**Categories:**
  - data science
  - economics
  - business
  - politics
  - east-africa

## Abstract

<!-- metadata: abstract -->
This analysis examines the patterns of business entity registrations in Kenya from financial year 2015/2016 to 2024/2025, spanning two distinct political eras. By visualizing data from the Business Registration Service (BRS)[^companies-registry-statistics], we uncover intriguing trends in the formation of various business entities, including private companies and business names, and their potential correlation with political shifts in the country.

[^companies-registry-statistics]: <https://brs.go.ke/companies-registry-statistics/> - [archive](https://web.archive.org/web/20241009120859/https://brs.go.ke/companies-registry-statistics/)

## Description

<!-- metadata: description -->
Dive into a decade of Kenya's economic landscape through the lens of business registrations. This post explores how political transitions between the Uhuru/Jubilee and Ruto/UDA eras may have influenced business formation trends, offering insights into the interplay between politics and entrepreneurship in East Africa's powerhouse.

## Introduction

The business registration landscape of a country often serves as a barometer for its economic health and entrepreneurial spirit. In Kenya, a nation known for its dynamic economy and political atmosphere, tracking these registrations can provide valuable insights into the country's economic trajectory and the impact of political transitions on business confidence. 

Since the post election violence of 2007 that say both the current president (William Ruto) and his predicesor (Uhuru Kenyatta) defend themselves at the hague, there has always been a slowdown of economy around general elections. On the day of the election, the economy literally stops, evidenced by closure of all businesses and related activities such as delivery, money trasfer, and investments. NSE also looses a significant investments during this time period. Open markets are often closed, public trasport left scanty and most people travel back to rural areas either to vote or for security reasons. During this time, you want to know that you know your neighbour. you want familar faces around you. people often group along tribe lines, predominatly being kikuyus, kalengins and luos. During this frenzy atmosphear people are often less inclined to share personal information or opinions out loud.

This analysis delves into data from the Business Registration Service (BRS) of Kenya, covering a decade from financial year 2015/2016 to 2024/2025. This period is particularly interesting as it encompasses two distinct political eras: the Uhuru Kenyatta/Jubilee Party era and the William Ruto/UDA Party era. By examining the trends in business registrations across these periods, we aim to uncover patterns that may reflect the broader economic and political climate of Kenya.

## Methodology

Our analysis utilizes data scraped from the BRS website, focusing on monthly registration figures for various types of business entities. The data was processed and visualized using Python, with libraries such as pandas for data manipulation and matplotlib for creating insightful graphs.

The visualization process involved:
1. Aggregating monthly data across multiple financial years
2. Calculating rolling averages to smooth out short-term fluctuations
3. Plotting trends for different types of business entities
4. Marking significant political events, such as the 2017 and 2022 elections

## Analysis

In [17]:
#| code-summary: "Show python imports"

import sys
import os
from pathlib import Path

# Add root directory as python path
root_dir = os.path.abspath(Path(sys.executable).parents[2])
sys.path.append(root_dir)

%reload_ext autoreload
%autoreload 2

# Other imports
import pandas as pd
from pyppeteer.page import Page
from python_utils.web_screenshot import web_screenshot_async
import io
from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
from datetime import date, timedelta
from calendar import monthrange, month_abbr
import matplotlib
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from typing import Callable
from statsmodels.tsa.seasonal import seasonal_decompose
import textwrap

In [None]:
async def page_action_fn(page: Page):
    return await page.waitForSelector(
        '.elementor-widget-container > [role="tablist"]')

# Take a screenshot
await web_screenshot_async(
    "https://brs.go.ke/companies-registry-statistics/",
    action = page_action_fn,
    width = 1200,
    height = 1200,
    # executable_path = '/snap/bin/chromium',
    screenshot_options = {'fullPage': False })

Crawl and map the financial years and the registered companies

In [2]:
html: str = urlopen(
    "https://brs.go.ke/companies-registry-statistics/").read()
html_parser = BeautifulSoup(html, "html.parser").select_one(
    '.elementor-widget-container > [role="tablist"]')
years = { 
    i.attrs['data-tab']: i.get_text(separator='', strip=True) 
    for i 
    in html_parser.select(".ha-tabs__nav .ha-tab__title")
}
records = { 
    i.attrs['data-tab']: i.find('table') 
    for i 
    in html_parser.select('.ha-tabs__content [role="tabpanel"]') 
}
years_records = [(year, records[id]) for id, year in years.items()]

methods to create a dataframe table, filled with relevant date

In [3]:
def get_date(month, year):
    # Convert month name to number
    month_num = [i.lower() for i in month_abbr].index(month.lower())
    # Get the last day of the month
    _, last_day = monthrange(int(year), month_num)
    return date(int(year), month_num, last_day)

def get_table(index: int):
    (finantial_year, table_str) = years_records[index]
    (finantial_year_1, finantial_year_2) = finantial_year.split('/')
    df = pd.read_html(io.StringIO(str(table_str)))[0]
    first_column = df.columns[0]
    # Remove last row (`Total Entities Registered`) and last column (`Grand Total`)
    df = df[df[first_column] != "Total Entities Registered"].drop("Grand Total", axis=1)
    # replace - with NaN
    df = df.replace("-", np.nan)
    df[first_column] = df[first_column].astype(str)
    for column in df.columns[1:]:
        df[column] = df[column].astype(float)
    indexes = [get_date(month, finantial_year_1) for month in df.columns[1:7]] + \
        [get_date(month, finantial_year_2) for month in df.columns[7:]]
    df = df.set_index(first_column).T
    df.index = indexes
    df.columns = [i.lower().strip() for i in df.columns]
    return df

merge all records

In [None]:
all_registrations = pd.concat([get_table(i) for i in range(len(years_records))]).sort_index(ascending=True)
all_registrations.index = pd.to_datetime(all_registrations.index)
all_registrations

### Registered Entities over Time

In [54]:
def draw_election_lines(start_date, end_date, ax: matplotlib.axes.Axes, election_dates_y):
    date_list = [min(start_date, date(2017, 1, 1)) + timedelta(days=x) for x in range((end_date - start_date).days + 1)]
    # Add a vertical line at the split date
    election_date_2017 = date(2017, 8, 9)
    election_date_2022 = date(2022, 8, 9)
    ax.axvline(
        x=election_date_2017, color='green', linestyle='--', linewidth=2, zorder=4)
    ax.axvline(
        x=election_date_2022, color='green', linestyle='--', linewidth=2, zorder=4)
    # Fill the regions
    # Convert dates to matplotlib date format
    dates_mpl = mdates.date2num(date_list)
    split_date_mpl = mdates.date2num(election_date_2022)
    ax.fill_between(
        dates_mpl, 0, 100, where=dates_mpl < split_date_mpl, 
        facecolor='#f62f3c', alpha=0.08, transform=ax.get_xaxis_transform())
    ax.fill_between(
        dates_mpl, 0, 100, where=dates_mpl >= split_date_mpl,
        facecolor='#f8c811', alpha=0.08, transform=ax.get_xaxis_transform())
    ax.text(
        election_date_2017, election_dates_y[0], '2017 election', fontsize = 18, 
        rotation=90, zorder=6, ha='right')
    ax.text(
        election_date_2022, election_dates_y[1], '2022 election', fontsize = 18, 
        rotation=90, zorder=6, ha='right')

def plot_data(
        data_to_plot: pd.DataFrame | list[pd.DataFrame], title_1st_part: str,
        election_dates_y = (7200, 8500), legend_loc='upper left',
        axis_callback: Callable[[matplotlib.figure.Figure, matplotlib.axes.Axes], None] = None,
        ylabel = 'Registered Entities'):
    fig, ax = plt.subplots(figsize=(20, 10))
    data_to_plot: list[pd.DataFrame] = data_to_plot if type(data_to_plot) == list else [data_to_plot]
    columns_used = [j for i in data_to_plot for j in i.columns]
    start_date = min(min(i.index.date) for i in data_to_plot)
    end_date = max(max(i.index.date) for i in data_to_plot)
    ax.set_title(
        f'{title_1st_part} ({start_date.strftime("%b %Y")} to {end_date.strftime("%b %Y")})', 
        fontsize = 24, pad = 45)
    if columns_used:
        ax.text(0.5,  1.03, 
            f'{", ".join(columns_used[:-1]) + " and " + columns_used[-1]}' 
                if len(columns_used) > 1  else columns_used[0], 
            transform=ax.transAxes, fontsize=14, ha='center',  style='italic')
    for subdata in data_to_plot:
        ax.plot(subdata.index, subdata.values, linewidth=4)
    ax.legend(columns_used, loc=legend_loc, fontsize=18)
    draw_election_lines(start_date, end_date, ax, election_dates_y)
    fig.text(0.72, 0.035, 'Ruto/UDA era', fontsize = 20, ha='left')
    fig.text(0.35, 0.035, 'Ruto/UDA era', fontsize = 20, ha='left')
    # Add a watermark to the center of the plot
    ax.text(1, 0.2, 'ToKnow.ai', ha='right', va='bottom', 
        fontsize=18, color='gray', alpha=0.5, transform=ax.transAxes, rotation=50)
    ax.set_xlabel('Years')
    ax.set_ylabel(ylabel)
    if axis_callback:
        axis_callback(fig, ax)

In [None]:
#| label: preview-image

plot_columns = ['business names', 'private companies']
other_plot_columns = list(set(all_registrations.columns) - set(plot_columns))
plot_data(
    data_to_plot = all_registrations[plot_columns].dropna(),
    title_1st_part = 'Registered Entities in Kenya over Time')

In [None]:
plot_data(
    data_to_plot = [
        all_registrations[[column]].dropna()
        for column 
        in other_plot_columns
    ],
    title_1st_part = 'Registered Entities in Kenya over Time',
    election_dates_y = (100, 250))

### Total Registrations of Entities

In [None]:
plot_data(
    data_to_plot = 
        all_registrations[plot_columns].dropna().resample('YE').sum().dropna(),
    title_1st_part = 'Total Entity Registrations per Year in Kenya',
    election_dates_y = (60000, 60000))

In [None]:
plot_data(
    data_to_plot = [
        all_registrations[[column]].dropna().resample('YE').sum().dropna() 
        for column 
        in other_plot_columns
    ],
    title_1st_part = 'Total Entity Registrations per Year in Kenya',
    election_dates_y = (900, 900))

### Year-over-Year Growth Rate of Business Entity Registrations

In [None]:
plot_data(
    data_to_plot = all_registrations[plot_columns].dropna().resample('YE').sum().dropna().pct_change() * 100,
    title_1st_part = 'Year-over-Year Growth Rate of Business Name Registrations',
    election_dates_y = (-20, 40),
    legend_loc = 'upper right',
    axis_callback= lambda fig, ax: ax.axhline(y=0, color='purple', linestyle='--', linewidth=.5))

In [None]:
plot_data(
    data_to_plot = [
        all_registrations[[column]].dropna().resample('YE').sum().dropna().pct_change() * 100
        for column 
        in other_plot_columns
    ],
    title_1st_part = 'Year-over-Year Growth Rate of Business Name Registrations',
    election_dates_y = (450, 300),
    legend_loc = 'upper right',
    axis_callback= lambda fig, ax: ax.axhline(y=0, color='purple', linestyle='--', linewidth=.5))

### Trend, Seasonality and Residuals

In [None]:
def plot_trend_and_seasonality(
        columns_used: list[str], trend_period = 12, title_wrap: int = None,
        title_args = { 'fontsize': 18, 'fontweight': 'bold', 'fontstyle': 'italic' },
        election_dates_y_trend = (6000, 4600), election_dates_y_resid = (1100, 1100)):
    fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(15, 20))
    start_date: date = None
    end_date: date = None
    actual_columns_used = []
    for column_used in columns_used:
        plot_series = all_registrations[column_used].dropna()
        if len(plot_series) < 2 * trend_period:
            continue
        actual_columns_used.append(column_used)
        __start_date = min(plot_series.index.date)
        __end_date = max(plot_series.index.date)
        start_date = min(start_date or __start_date, __start_date or start_date)
        end_date = min(end_date or __end_date, __end_date or end_date)
        decomposition = seasonal_decompose(
            plot_series, model='additive', period=trend_period)
        # Plot trend component
        ax1.plot(decomposition.trend.index, decomposition.trend.values, label = column_used)
        ax1.set_ylabel('Trend')
        ax1.set_xlabel('Years')
        # Plot seasonal component
        ax2.plot(decomposition.seasonal.index, decomposition.seasonal.values, label = column_used)
        ax2.axhline(y=0, color='r', linestyle='--', linewidth=.5)
        ax2.set_ylabel('Seasonality')
        ax2.set_xlabel('Years')
        # Plot residual component
        ax3.plot(decomposition.resid.index, decomposition.resid.values, label = column_used)
        ax3.axhline(y=0, color='r', linestyle='--', linewidth=.5)
        ax3.set_ylabel('Residuals')
        ax3.set_xlabel('Years')
    if start_date or end_date:
        draw_election_lines(start_date, end_date, ax1, election_dates_y_trend)
        draw_election_lines(start_date, end_date, ax3, election_dates_y_resid)
        title_template = (
            '{0} - '
            f'{", ".join(actual_columns_used[:-1])} & {actual_columns_used[-1]}'
            if len(actual_columns_used) > 1  else actual_columns_used[0]
        )
        get_title = lambda s1: "\n".join(textwrap.wrap(title_template.format(s1), width=title_wrap)) \
            if title_wrap \
            else title_template.format(s1)
        ax1.set_title(get_title('Trend'), **title_args)
        ax2.set_title(get_title('Seasonality'), **title_args)
        ax3.set_title(get_title('Residuals'), **title_args)
        ax1.legend(fontsize=16)
        ax2.legend(fontsize=16)
        ax3.legend(fontsize=16)
        fig.tight_layout(h_pad=5, pad=4)
        fig.text(
            x = .5, y = 1, fontsize = 26, ha='center',
            s = (
                f'{trend_period}-Month Summary of Registered Entities in Kenya '
                f'({start_date.strftime("%b %Y")} to {end_date.strftime("%b %Y")})'
            ))
    else:
        fig.clear()
        fig.axes.clear()

plot_trend_and_seasonality(plot_columns)

In [None]:
plot_trend_and_seasonality(
    other_plot_columns, title_wrap = 100,
    title_args = { 'fontsize': 14, 'fontweight': 'bold', 'fontstyle': 'italic' })

## Key Findings

### 1. Dominance of Business Names and Private Companies

The analysis reveals that the most common forms of business registration in Kenya are business names and private companies. These two categories consistently outpace other forms of registration, indicating a preference for simpler business structures among Kenyan entrepreneurs.

### 2. Impact of Election Periods

Both the 2017 and 2022 elections appear to have had noticeable impacts on business registration trends:

- There's a visible dip in registrations around the 2017 election period, possibly indicating uncertainty in the business community.
- The 2022 election seems to have had a less pronounced effect, with registration numbers maintaining a relatively steady trend.

### 3. Transition Between Political Eras

The transition from the Uhuru/Jubilee era to the Ruto/UDA era is marked by interesting shifts in registration patterns:

- The Uhuru/Jubilee era (pre-2022) shows a general upward trend in registrations, particularly for business names and private companies.
- The early Ruto/UDA era (post-2022) exhibits some volatility in registration numbers, with a slight downward trend observable in some categories.

### 4. Resilience of the Entrepreneurial Spirit

Despite political transitions and global events (such as the COVID-19 pandemic, which falls within this period), the overall trend of business registrations remains positive. This suggests a robust entrepreneurial spirit in Kenya that persists through various challenges.

## Implications and Discussion

1. **Political Stability and Business Confidence**: The relatively minor impact of the 2022 election compared to 2017 might indicate growing confidence in Kenya's political stability among entrepreneurs.

2. **Economic Policy Impact**: The shifts in registration trends between political eras could reflect changes in economic policies or business environment perceptions under different administrations.

3. **Formalization of the Economy**: The consistent growth in formal business registrations may indicate a gradual formalization of Kenya's economy, a key goal for many developing nations.

4. **Entrepreneurial Resilience**: The ability of Kenya's business sector to maintain growth through political transitions and global crises speaks to the resilience and adaptability of its entrepreneurs.

## Conclusion

This analysis of Kenya's business registration trends offers a unique window into the country's economic dynamics and the interplay between politics and entrepreneurship. While political transitions clearly have some impact on business formation, the overall trend suggests a robust and growing formal business sector in Kenya.

As Kenya continues to position itself as a key economic player in East Africa, understanding these trends can be crucial for policymakers, investors, and entrepreneurs alike. Future research could delve deeper into sector-specific trends or compare Kenya's patterns with those of neighboring countries to provide a more comprehensive regional perspective.

The story of Kenya's business registrations is one of resilience, growth, and adaptation – a testament to the entrepreneurial spirit that continues to drive the nation's economic development.

In [None]:
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose

# Assuming all_registrations is your DataFrame from the original analysis
# If not, you'll need to recreate it using the methods from your original code

# Select 'business names' column and ensure the index is a DatetimeIndex
plot_series = all_registrations['business names']
plot_series.index = pd.to_datetime(plot_series.index)

# Handle missing values
business_names_filled = plot_series.interpolate()

# Perform time series decomposition
decomposition = seasonal_decompose(business_names_filled, model='additive', period=12)

# Create a figure with subplots
fig, (ax1, ax1, ax2, ax3) = plt.subplots(4, 1, figsize=(15, 20))

# Plot original data
plot_series.plot(ax=ax1)
business_names_filled.plot(ax=ax1, alpha=0.7)
ax1.set_title('Original Time Series (Blue: Original, Orange: Interpolated)')
ax1.set_ylabel('Number of Registrations')
ax1.legend(['Original', 'Interpolated'])

# Plot trend component
decomposition.trend.plot(ax=ax1)
ax1.set_title('Trend')
ax1.set_ylabel('Trend')

# Plot seasonal component
decomposition.seasonal.plot(ax=ax2)
ax2.set_title('Seasonality')
ax2.set_ylabel('Seasonality')

# Plot residual component
decomposition.resid.plot(ax=ax3)
ax3.set_title('Residuals')
ax3.set_ylabel('Residuals')

plt.tight_layout()
plt.show()

# Calculate and print the average monthly seasonal effect
seasonal_effect = decomposition.seasonal[:12]
print("Average monthly seasonal effect:")
for month, effect in zip(range(1, 13), seasonal_effect):
    print(f"Month {month}: {effect:.2f}")

# Calculate and print the overall trend
start_trend = decomposition.trend.dropna().iloc[0]
end_trend = decomposition.trend.dropna().iloc[-1]
total_growth = end_trend - start_trend
average_annual_growth = total_growth / (len(decomposition.trend) / 12)

print(f"\nOverall trend growth: {total_growth:.2f}")
print(f"Average annual growth: {average_annual_growth:.2f}")

# Check if growth rate is decreasing
trend_values = decomposition.trend.dropna()
growth_rates = trend_values.pct_change(periods=12)  # Year-over-year growth rate

print("\nIs growth rate decreasing?")
print(f"Start of period growth rate: {growth_rates.iloc[12]*100:.2f}%")
print(f"End of period growth rate: {growth_rates.iloc[-1]*100:.2f}%")

if growth_rates.iloc[-1] < growth_rates.iloc[12]:
    print("Yes, the growth rate is decreasing.")
else:
    print("No, the growth rate is not decreasing.")

# Additional analysis: Calculate average growth rate for first and last year
first_year_growth = growth_rates.iloc[12:24].mean()
last_year_growth = growth_rates.iloc[-12:].mean()

print(f"\nAverage growth rate in first year: {first_year_growth*100:.2f}%")
print(f"Average growth rate in last year: {last_year_growth*100:.2f}%")

if last_year_growth < first_year_growth:
    print("The average growth rate has decreased over the period.")
else:
    print("The average growth rate has not decreased over the period.")

In [None]:
import pandas as pd
import seaborn as sns

# Assuming all_registrations is your DataFrame from the original analysis
# If not, you'll need to recreate it using the methods from your original code

# Select 'business names' column and ensure the index is a DatetimeIndex
plot_series = all_registrations['business names']
plot_series.index = pd.to_datetime(plot_series.index)

# Resample the data to yearly frequency, summing up the monthly values
yearly_data = plot_series.resample('Y').sum()

# Calculate Year-over-Year (YoY) Growth Rate
yoy_growth_rate = yearly_data.pct_change() * 100

# Create a DataFrame with both the yearly totals and growth rates
result_df = pd.DataFrame({
    'Total Registrations': yearly_data,
    'YoY Growth Rate (%)': yoy_growth_rate
})

# Print the results
print(result_df)

# Plotting
fig, (ax1, ax1) = plt.subplots(2, 1, figsize=(12, 10))

# Plot total registrations
ax1.plot(yearly_data.index, yearly_data.values, marker='o')
ax1.set_title('Total Business Name Registrations per Year')
ax1.set_xlabel('Year')
ax1.set_ylabel('Number of Registrations')

# Plot YoY growth rate
ax1.plot(yoy_growth_rate.index, yoy_growth_rate.values, marker='o', color='green')
ax1.axhline(y=0, color='r', linestyle='--')  # Add a horizontal line at y=0
ax1.set_title('Year-over-Year Growth Rate of Business Name Registrations')
ax1.set_xlabel('Year')
ax1.set_ylabel('Growth Rate (%)')

plt.tight_layout()
plt.show()

# Calculate average growth rate
average_growth_rate = yoy_growth_rate.mean()
print(f"\nAverage YoY Growth Rate: {average_growth_rate:.2f}%")

# Check if growth rate is decreasing
is_decreasing = yoy_growth_rate.iloc[-1] < yoy_growth_rate.iloc[1]
print(f"Is the growth rate decreasing? {'Yes' if is_decreasing else 'No'}")

# Calculate the change in growth rate
growth_rate_change = yoy_growth_rate.iloc[-1] - yoy_growth_rate.iloc[1]
print(f"Change in growth rate from first to last year: {growth_rate_change:.2f} percentage points")

# Identify years with highest and lowest growth rates
max_growth_year = yoy_growth_rate.idxmax().year
min_growth_year = yoy_growth_rate.idxmin().year
print(f"\nYear with highest growth rate: {max_growth_year} ({yoy_growth_rate.max():.2f}%)")
print(f"Year with lowest growth rate: {min_growth_year} ({yoy_growth_rate.min():.2f}%)")

# Calculate the proportion of years with positive growth
positive_growth_years = (yoy_growth_rate > 0).sum()
total_years = len(yoy_growth_rate)
proportion_positive = positive_growth_years / total_years
print(f"\nProportion of years with positive growth: {proportion_positive:.2%}")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Assuming all_registrations is your DataFrame from the original analysis
# If not, you'll need to recreate it using the methods from your original code

# Select 'business names' column and ensure the index is a DatetimeIndex
plot_series = all_registrations['business names']
plot_series.index = pd.to_datetime(plot_series.index)

# Resample the data to yearly frequency, summing up the monthly values
yearly_data = plot_series.resample('Y').sum()

def calculate_cagr(start_value, end_value, num_years):
    """Calculate the Compound Annual Growth Rate"""
    return (end_value / start_value) ** (1 / num_years) - 1

# Calculate overall CAGR
overall_cagr = calculate_cagr(yearly_data.iloc[0], yearly_data.iloc[-1], len(yearly_data) - 1)

# Calculate rolling CAGR for 3-year periods
rolling_cagr = yearly_data.rolling(window=3).apply(lambda x: calculate_cagr(x.iloc[0], x.iloc[-1], 2))

# Print results
print(f"Overall CAGR: {overall_cagr:.2%}")
print("\nRolling 3-Year CAGR:")
print(rolling_cagr)

# Plotting
fig, (ax1, ax1) = plt.subplots(2, 1, figsize=(12, 10))

# Plot total registrations
ax1.plot(yearly_data.index, yearly_data.values, marker='o')
ax1.set_title('Total Business Name Registrations per Year')
ax1.set_xlabel('Year')
ax1.set_ylabel('Number of Registrations')

# Plot rolling CAGR
ax1.plot(rolling_cagr.index, rolling_cagr.values, marker='o', color='green')
ax1.axhline(y=0, color='r', linestyle='--')  # Add a horizontal line at y=0
ax1.set_title('3-Year Rolling CAGR of Business Name Registrations')
ax1.set_xlabel('Year')

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Assuming all_registrations is your DataFrame from the original analysis
# If not, you'll need to recreate it using the methods from your original code

# Select 'business names' column and ensure the index is a DatetimeIndex
plot_series = all_registrations['business names']
plot_series.index = pd.to_datetime(plot_series.index)

# Resample the data to yearly frequency, summing up the monthly values
yearly_data = plot_series.resample('Y').sum()

# Convert years to numerical values for regression
X = np.arange(len(yearly_data)).reshape(-1, 1)
y = yearly_data.values.reshape(-1, 1)

def fit_segmented_regression(X, y, breakpoint):
    X1 = X[X <= breakpoint]
    X2 = X[X > breakpoint]
    y1 = y[X <= breakpoint]
    y2 = y[X > breakpoint]
    
    # Fit two separate linear regressions
    model1 = LinearRegression().fit(X1, y1)
    model2 = LinearRegression().fit(X2 - breakpoint, y2)
    
    # Calculate R-squared for the segmented model
    y_pred = np.concatenate([model1.predict(X1), model2.predict(X2 - breakpoint)])
    r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - np.mean(y))**2)
    
    return model1, model2, r_squared

# Try different breakpoints and find the one with the highest R-squared
best_r_squared = 0
best_breakpoint = 0
for breakpoint in range(1, len(X) - 1):
    _, _, r_squared = fit_segmented_regression(X, y, breakpoint)
    if r_squared > best_r_squared:
        best_r_squared = r_squared
        best_breakpoint = breakpoint

# Fit the best segmented regression model
model1, model2, _ = fit_segmented_regression(X, y, best_breakpoint)

# Fit a single linear regression for comparison
single_model = LinearRegression().fit(X, y)
single_r_squared = single_model.score(X, y)

# Fit a quadratic model for comparison
quad_model = LinearRegression().fit(PolynomialFeatures(degree=2).fit_transform(X), y)
quad_r_squared = quad_model.score(PolynomialFeatures(degree=2).fit_transform(X), y)

# Plotting
plt.figure(figsize=(12, 8))
plt.scatter(X, y, color='blue', label='Actual data')
plt.plot(X, single_model.predict(X), color='red', label='Single linear regression')
plt.plot(X, quad_model.predict(PolynomialFeatures(degree=2).fit_transform(X)), color='green', label='Quadratic regression')

# Plot segmented regression
X1 = X[X <= best_breakpoint]
X2 = X[X > best_breakpoint]
plt.plot(X1, model1.predict(X1), color='purple', linestyle='--', label='Segmented regression (first segment)')
plt.plot(X2, model2.predict(X2 - best_breakpoint), color='purple', linestyle='--', label='Segmented regression (second segment)')

plt.axvline(x=best_breakpoint, color='gray', linestyle=':', label='Breakpoint')

plt.title('Segmented Regression Analysis of Business Name Registrations')
plt.xlabel('Year')
plt.ylabel('Number of Registrations')
plt.legend()
plt.show()

# Print results
print(f"Best breakpoint: Year {yearly_data.index[best_breakpoint].year}")
print(f"R-squared for segmented regression: {best_r_squared:.4f}")
print(f"R-squared for single linear regression: {single_r_squared:.4f}")
print(f"R-squared for quadratic regression: {quad_r_squared:.4f}")

print("\nSegmented Regression Results:")
print(f"First segment slope: {model1.coef_[0][0]:.2f}")
print(f"Second segment slope: {model2.coef_[0][0]:.2f}")

if model2.coef_[0][0] < model1.coef_[0][0]:
    print("The growth rate is decreasing after the breakpoint.")
else:
    print("The growth rate is increasing after the breakpoint.")

# Calculate the change in growth rate
growth_rate_change = model2.coef_[0][0] - model1.coef_[0][0]
print(f"Change in growth rate: {growth_rate_change:.2f}")