<!-- metadata: title -->
# Collective Investment Schemes in Kenya: Money Market Funds(KES) - Analysis Dataset

<!-- metadata: subtitle -->
> ### Sourcing, Cleaning, and Exploratory Analysis of Kenyan Money Market Fund Data

**Published Date:**
<!-- metadata: date -->
2024-09-25
<!-- metadata: -->

<!-- metadata: keywords, is_array=true -->
**Keywords:**
  - money
  - kenya
  - unit-trusts
  - money-market-funds
  - MMF
  - dataset

<!-- metadata: categories, is_array=true -->
**Categories:**
  - kenya unit trusts
  - data science
  - money

## Description

<!-- metadata: description -->
Money Market Funds (MMFs) are a type of collective investment scheme in Kenya that have gained significant popularity in recent years. These funds operate by pooling capital from numerous investors, which professional fund managers then invest collectively in short-term, highly liquid financial instruments. 

## Introduction

In the Kenyan financial landscape, MMFs offer several advantages over traditional bank deposits:

1. Higher Returns: MMFs typically provide superior interest rates compared to standard savings accounts.
2. Lower Entry Barriers: Investors can start with smaller amounts, making them more accessible to a broader range of investors.
3. Compound Interest: Unlike most bank deposits that offer simple interest, MMFs generally provide compound interest, potentially leading to faster wealth accumulation.
4. Liquidity: MMFs maintain high liquidity, allowing investors to access their funds quickly when needed, within a day or two after withdrawal.

Our analysis will delve into the performance and characteristics of various Money Market Funds in Kenya, utilizing publicly available data. Through this exploration, we aim to:

1. Source and gather data
2. Clean the data and perform simple EDA
3. Archive and publish the data for further and future analysis

This analysis will not only offer basic insights into the current state of these investments but also contribute to a broader understanding of unit trust investments in the Kenyan financial market.

Through our process of data sourcing, cleaning, and exploratory analysis, we aim to primarily enable researchers perform further analysis with the data, while also uncovering valuable insights that could benefit both potential investors and industry stakeholders in making informed decisions about Money Market Funds in Kenya.

***

Before we begin, lets prepapre our enviroment with some important python packages and reusable functions

In [1]:
#| code-fold: true
#| code-summary: "Show python imports"

import sys
import os
from pathlib import Path

# Add root directory as python path
root_dir = os.path.abspath(Path(sys.executable).parents[2])
sys.path.append(root_dir)

%reload_ext autoreload
%autoreload 2

# Other imports
import pandas as pd
from pyppeteer.page import Request, Page
import asyncio
import io
from bs4 import BeautifulSoup, Tag
from urllib.request import urlopen
from matplotlib import pyplot as plt
import json5 as json5
import json
from tqdm import tqdm
from glob import glob
import re
import webbrowser
from typing import Callable, Literal
from copy import copy
from datetime import datetime, timedelta
import plotly.io as pio
import plotly.express as px
from json2txttree import json2txttree
from python_utils.web_screenshot import web_screenshot_async
from python_utils.get_browser import get_browser_page_async
from typing import Any
from toolz import groupby
import numpy as np

In [16]:
collective_scheme_type = dict[Literal['Scheme'], str] | dict[Literal['Funds'], list[str]]

def strip_start_end(s1: str, last_acceptable_characters = ')'):
    if type(s1) != str or s1 is None:
        return ''
    # Define a regex pattern to match 'and' followed by any non-alphabet characters at the end of the string
    and_pattern = r'\band[^a-zA-Z]*$'
    # Define a regex pattern to match any non-alphabet characters at the start of the string
    non_alphabet_start = r'^[^a-zA-Z]+'
    # Define a regex pattern to match any non-alphabet characters at the end of the string
    non_alphabet_end = f'[^a-zA-Z{last_acceptable_characters}]+$'
    # Define a regex pattern to match the phrase "comprising of|which comprises of"
    comprising_of_pattern = r'comprising of|which comprises of'
    # Replace multiple spaces with a single space
    multiple_white_space = r'\s+'
    s2 = re.sub(comprising_of_pattern, '', s1)
    s3 = re.sub(and_pattern, '', s2)
    s4 = re.sub(non_alphabet_start, '', s3)
    s5 = re.sub(non_alphabet_end, '', s4)
    s6 = re.sub(multiple_white_space, ' ', s5)
    s7 = s6.strip()
    # Recursively apply the function if any of the patterns still match the string
    while any(re.match(p, s7) for p in [and_pattern, non_alphabet_start, non_alphabet_end, comprising_of_pattern]):
        return strip_start_end(s5)
    # remove non ASCII characters
    s8 = s7.encode('ascii', errors='ignore').decode()
    # Return the cleaned string
    return s8

def hacky_normalizer(val: str):
    val = val.strip().upper()
    # Replace special characters with underscore
    modified_string = re.sub(r'[^a-zA-Z0-9\%()_]', '_', val)
    # Replace multiple consecutive underscores with a single underscore
    modified_string = re.sub(r'_+', '_', modified_string)
    return modified_string

## Sourcing and Gathering Data

### Approved Collective Schemes

To get a comprehensive and up to date list of approved collective managers, we crawled Capital Markets Authrity (CMA). They have  published a list of approved schemes <https://www.cma.or.ke/licensees-market-players/> and <https://licensees.cma.or.ke/licenses/15/>.

#### Screenshots of the pages

Lets start with some screenshots of the pages

In [None]:
async def collective_investment_schemes_click_fn(page: Page):
    await page.waitForSelector('ul.module-accordion')
    elements = await page.querySelectorAll('li .accordion-title')
    # Iterate through the elements to find the one containing 'APPROVED COLLECTIVE INVESTMENT SCHEMES'
    for element in elements:
        text_content = await page.evaluate('(element) => element.textContent', element)
        if 'APPROVED COLLECTIVE INVESTMENT SCHEMES' in text_content:
            # Click on the target element
            await element.click()
            accordion_element = await page.waitForSelector('li.current.builder-accordion-active')
            await page.evaluate("""() => {
                document.querySelector('#headerwrap').style.display = 'none';
                document.querySelector('.pojo-a11y-toolbar-toggle').style.display = 'none';
            }""")
            await asyncio.sleep(1)
            return accordion_element
    print('Element not found')

# Take a screenshot
await web_screenshot_async(
    # Fund manager URL
    "https://www.cma.or.ke/licensees-market-players/", 
    action = collective_investment_schemes_click_fn,
    width = 1000, 
    screenshot_options = None,
    crop_options = { 'bottom': 500 })

In [None]:
async def collective_investment_schemes_2(page: Page):
    return await page.querySelector('table')

# Take a screenshot
await web_screenshot_async(
    # Fund manager URL
    "https://licensees.cma.or.ke/licenses/15/", 
    action = collective_investment_schemes_2,
    width = 1500, 
    screenshot_options = None,
    crop_options = { 'bottom': 500 })

#### Crawling

Next, let's try grab the fund managers table into a dataframe that we can work with.
Below is the list of all the certified fund mangers in Kenya by CMA.^[Approved Fund Managers by CMA. <https://www.cma.or.ke/licensees-market-players/>]

In [None]:
def extract_collective_scheme_name(para: Tag):
    full_name = ' '.join([i.get_text(strip=True) for i in para.find_all('strong')])
    return strip_start_end(full_name)


def make_collective_unit_obj(tbody_tr_td: Tag) -> collective_scheme_type:
    return {
        'Scheme': extract_collective_scheme_name(tbody_tr_td.find('p') or tbody_tr_td),
        'Funds': [
            strip_start_end(i.get_text(separator=' ', strip=True)) 
            for i 
            in tbody_tr_td.select('ul li')
        ]
    }

def fetch_collective_schemes_1():
    CMA_market_players_html: str = urlopen("https://www.cma.or.ke/licensees-market-players/").read()
    investment_schemes_table_html = BeautifulSoup(CMA_market_players_html, "html.parser")\
        .find('span', string="APPROVED COLLECTIVE INVESTMENT SCHEMES")\
        .find_parent('li')\
        .find('table')
    return [
        make_collective_unit_obj(tbody_tr_td)
        for tbody_tr_td 
        in investment_schemes_table_html.select('tbody tr td')
    ]

def fetch_collective_schemes_2():
    CMA_market_players_html: str = urlopen("https://licensees.cma.or.ke/licenses/15/").read()
    investment_schemes_table_html = BeautifulSoup(CMA_market_players_html, "html.parser")\
        .find('table')
    return [
        make_collective_unit_obj(tbody_tr_td)
        for tbody_tr_td 
        in investment_schemes_table_html.select('tbody tr > :first-child')
    ]

# For example: 
#       Orient Umbrella Collective Investment Scheme (formerly Alphafrica Umbrella Fund) => 
#       Orient Umbrella Collective Investment Scheme
def remove_quoted_str(str1: str): return re.sub(r'\(.*?(?!\)).*?$', '', str1 or '').strip()

def make_merge_key(str1: str): return hacky_normalizer(remove_quoted_str(str1))

def merge_collective_schemes(schemes_list: list[collective_scheme_type]) -> collective_scheme_type:
    all_names: dict[str, list[str]] = groupby(
        make_merge_key, [unit_obj['Scheme'] for unit_obj in schemes_list])
    all_schemes: dict[str, list[str]] = groupby(
        make_merge_key, [scheme for unit_obj in schemes_list for scheme in unit_obj['Funds']])
    return {
        'Scheme': sorted(
            [name for values in all_names.values() for name in values], 
            key = lambda x: len(x), 
            reverse=True
        )[0],
        'Funds': [
            sorted(schemes, key = lambda x: len(x), reverse=True)[0]
            for schemes 
            in all_schemes.values()
        ]
    }

collective_schemes_1 = fetch_collective_schemes_1()
collective_schemes_2 = fetch_collective_schemes_2()
collective_schemes_1_2 = collective_schemes_1 + collective_schemes_2
collective_schemes_grouped_by_name = groupby(
    lambda x: make_merge_key(x['Scheme']), collective_schemes_1_2)
collective_schemes = [
    merge_collective_schemes(collective_schemes) 
    for collective_schemes 
    in collective_schemes_grouped_by_name.values()]
collective_schemes_df = pd.DataFrame(collective_schemes)
collective_schemes_df

We have `51` approved unit trust schemes as at {{< meta date-modified >}}. Schemes are normally managed by approved investment bank managers or approved fund managers. By investment bank, its not the traditional bank, but rather a CMA approved investment bank, such as "BLA BLA BLA". Traditional banks can also have fund managers. for example KCB has KCB Asset managers which is approved by CMA to manage "BLA BLA BLA". If an investment vehicle dont have an approved fund mannger ot investment bank, but they would like to have an approved scheme, they can norminate existing fund managers to manage the scheme for them. For example, *Mali Money Market Fund* ^[[Frequently Asked Questions / Mali](https://www.safaricom.co.ke/media-center-landing/frequently-asked-questions/mali)] ^[[Safaricom to launch unit trust, new savings service](https://www.businessdailyafrica.com/bd/markets/capital-markets/safaricom-to-launch-unit-trust-new-savings-service-2288556)] which is wholly or partly owned by the Kenyan Telecommunications company, Safaricom PLC ^[[M-PESA / M-PESA Services / Wealth / Mali](https://www.safaricom.co.ke/main-mpesa/m-pesa-services/wealth/mali)], is not listed among licenced fund managers. According to Business Daily ^[[Safaricom's Mali unit trust asset base hits Sh1.4bn](https://www.businessdailyafrica.com/bd/markets/capital-markets/safaricom-s-mali-unit-trust-asset-base-hits-sh1-4bn--4582142)], Mali MMF is administered by Genghis Capital Limited, which is listed by CMA as an Investment Bank. Genghis Capital Limited also has its own unit trust fund called Gencap Hela Imara Money Market Fund ^[[Genghis Capital Unit Trust Fund](https://genghis-capital.com/asset-management/money-market-fund/)]. This may or may not raise a potential conflict of interest. 
Similaraly, there are fund managers that no longer offer Unit Turst investments under Money Makrket Funds, such as Zimele ^[[Zimele Savings Plan Transition: From Money Market to Fixed Income Fund](https://www.zimele.co.ke/zimele-savings-plan-transition-from-money-market-to-fixed-income-fund/)]. Before you choose a fund manager, make sure you do your due diligence, and understand the risks you are willing to take.

:::{.callout-note}
Always invest with caution! When important information is missing, unclear or overly complicated.
:::

### Market Data

Despite the requirements to have the daily yield published in two national newspapers, it is fairly tasking to find a good data source. Also, since there dont have to publish the newspapers that have digital precenese, it also becomes difficult to capture all yileds without visiting the library and grabbing the actual physical copies, which makes this task very expesnsive. Again, getting historical data is also not free, most old newspaper records are sold, adding up the cost. Fortunately, since 2014, Cytonn Fund Managers has been doing free market research, and publishing them at <https://cytonnreport.com/>. A few fund managers publish their daily yields at their websites, but without historical data; just the current day's yield, which effectively makes this data unuseful for analysis.

We settled on crawling and analysing the massive cytton research data that is publicly available since 2014. With over 600 reports with, we crawl each of the reports in a way that doesnt break their systems, or deny others the service, extract the table, aggregate the table results and analyze the tables. We checked with Cytonn's terms of service. users are allowed to use their copyright data in accordance with fair use/dealing, ^[Reproduction is prohibited other than in accordance with the copyright notice, which forms part of these terms and conditions. <https://cytonn.com/terms-of-use> ]. To allow others to reproduce this analysys, we will save a copy of the crawled raw data for future researchers and data enthusisists.

Cytonn has reports in two websites, <https://cytonn.com/researches> and <https://cytonnreport.com/research>.

#### Screenshots of Cytonn Reports

##### cytonn.com

In [None]:
# Take a screenshot
await web_screenshot_async(
    "https://cytonn.com/researches",
    width = 1500,
    height = 1200,
    screenshot_options = {'fullPage': False })

##### cytonnreport.com

In [None]:
async def cytonnreport_fn(page: Page):
    await page.waitForSelector('.grid-x > .pagination')
    await asyncio.sleep(1)

# Take a screenshot
await web_screenshot_async(
    "https://cytonnreport.com/research",
    action = cytonnreport_fn,
    width = 1500,
    height = 1200,
    screenshot_options = {'fullPage': False })

##### Money Market Fund Yield

In [None]:
#| label: preview-image

# Define a function that selects a table by its header text
def select_table_by_title(target_header_text: str):
    # Define a nested asynchronous function that takes a Page object as an argument
    async def fn(page: Page):
        # Wait for any table element to be present on the page
        await page.waitForSelector('table')
        # Query and collect all table elements on the page
        table_elements = await page.querySelectorAll('table')
        # Iterate through each table element
        for table_element in table_elements:
            # Query and collect all header cells in the current table
            table_headers = await table_element.querySelectorAll('thead tr td')
            # Iterate through each header cell
            for table_header in table_headers:
                # Extract the text content of the current header cell
                header_text:str = await page.evaluate('(element) => element.textContent', table_header)
                # Check if the header text starts with the target text
                if header_text.startswith(target_header_text):
                    # If a match is found, return the current table element
                    return table_element
    return fn

await web_screenshot_async(
    # URL to take a screenshot of
    "https://cytonnreport.com/research/cytonn-monthly-",
    # Action deciding WHAT (element) or WHEN (eg: click) to take the screenshot
    action = select_table_by_title('Cytonn Report: Money Market Fund Yield'),
    width = 1000, 
    screenshot_options = None,
    crop_options = { 'bottom': 500 })

#### Crawling

At <https://cytonn.com/researches>, we can crawl and parse HTML, but it could be very slow. We notice that <https://cytonnreport.com/research>, the exact same data is displayed, but using a background request, <https://cytonnreport.com/get/allreports>. We can use this to crawl multiple reports faster.

In [None]:
async def get_all_cytonn_reports(per_page_count: int = 10):
    """
    Retrieves all Cytonn reports from the Cytonn Report website.

    Args:
        per_page_count (int, optional): The number of reports to retrieve per page. Defaults to 10.

    Returns:
        list: A list of all the retrieved reports.
    """
    ...
    page, browser = await get_browser_page_async()
    reports_url = "https://cytonnreport.com/get/allreports"
    reports_headers: dict = None
    reports_method: str = None
    async def catch_request(request: Request):
        nonlocal reports_headers
        nonlocal reports_method
        if request.url == reports_url:
            reports_headers = request.headers.copy()
            reports_method = request.method
            await request.continue_()
        else:
            await request.continue_()
    async def get_cytonn_reports(current_page: int):
        javascript_fetch_fn = f'''
            async () => {{
                try {{
                    const response = await fetch(
                        "{reports_url}", 
                        {{
                            "headers": {json.dumps(reports_headers)},
                            "method": "{reports_method}",
                            "body": {json.dumps(json.dumps(
                                {
                                    "pagination": {
                                        "per_page": per_page_count, 
                                        "current_page": current_page
                                    }
                                }))},
                            "referrer": "https://cytonnreport.com/research",
                            "referrerPolicy": "no-referrer-when-downgrade",
                            "mode": "cors",
                            "credentials": "include"
                        }});
                    if (!response.ok) {{
                        throw new Error(`HTTP error! status: ${{response.status}}`);
                    }}
                    const json = await response.json();
                    return json;
                }} catch (error) {{
                    console.error('Fetch error:', error);
                    throw error; // Re-throw to allow calling code to handle it
                }}
            }}
        '''
        response_json = await page.evaluate(javascript_fetch_fn)
        return response_json
    # Enable request interception
    await page.setRequestInterception(True)
    # Attach the request handler
    page.on('request', lambda request: asyncio.ensure_future(catch_request(request)))
    # Navigate to the desired URL
    await page.goto("https://cytonnreport.com/research")
    while not reports_headers:
        await asyncio.sleep(1)
    current_page = 1
    all_reports = []
    pbar: tqdm = None
    while True:
        reports_response = await get_cytonn_reports(current_page)
        reports = reports_response['data'] if reports_response else []
        if len(reports) > 0:
            total = reports_response['total']
            pbar = pbar or tqdm(total=total)
            pbar.update(len(reports))
            all_reports.extend(reports)
            last_page = reports_response['last_page']
            if last_page == current_page:
                break
            current_page += 1
        else:
            break
        await asyncio.sleep(0.4)
    await browser.close()
    if pbar:
        pbar.close()
    return all_reports

all_cytonn_reports = await get_all_cytonn_reports()
print(f'There are {len(all_cytonn_reports)} reports')

In [None]:
# converting the JSON into dataframe
all_cytonn_reports_df = pd.DataFrame(all_cytonn_reports)
with pd.option_context(
  'display.max_columns', None, 
  'display.max_colwidth', 100):
  display(all_cytonn_reports_df)

As can be observed, the dataset above is a bit complex and diffucult to uderstand or analyze. This is because alot of information is contained in the reports. There is real estate data, money market funds dataset, fund managers over the years,  As such, we will try to extract money market funds (KES) from the reports.

## Simple Exploratory Data Analysis

The goal here is to extract the details of money market funds (KES).

### Summary of numerical columns

In [None]:
all_cytonn_reports_df.describe()

### Preview the Columns

In [None]:
all_cytonn_reports_df.columns

In [None]:
all_cytonn_reports_df.iloc[0]

### Data types

In [None]:
all_cytonn_reports_df.dtypes

Below is a tree structure of one record, to visualize the objects and their inner properties

In [None]:
print(json2txttree(all_cytonn_reports_df.loc[0,:].to_dict()))

A full article is formed by articles. Each `topics` is a subsection, with `title` being the header and `body` being the content. We will merge all bodies from the articles to form the entire report HTML, which we will parse to extract the Money Market Funds yields tables. In addition, we are also going to add the main `body` and main `summary` and topics `summary` to encure we capture any table we might miss.

In [24]:
CYTONN_RECORD_LITERALS = Literal['summary', 'body', 'topics', 'researchdate']
def get_report_HTML(report: dict[CYTONN_RECORD_LITERALS, Any]) -> str:
    summary_html = report['summary']
    body_html = report['body']
    topics_html = ''.join([f"{i['summary']} \n\n {i['body']}" for i in report['topics']])
    return f"{summary_html} \n {body_html} \n {topics_html}"

# from IPython.display import HTML
# HTML(get_report_HTML(all_cytonn_reports[0]))

### Parsing Dates

There are some summary tables that have dates such as `Q1'2023`, `Q1'2023 (%)`, `FY'2023`, `FY'2023 (%)`, `Q1'2024`, `Q1'2024 (%)`

In [None]:
await web_screenshot_async(
    # URL to take a screenshot of
    "https://cytonnreport.com/research/q1-2024-unit-trust-funds-performance-note",
    # Action deciding WHAT (element) or WHEN (eg: click) to take the screenshot
    action = select_table_by_title('Cytonn Report: Assets Under Management (AUM) for the Approved Collective Investment Schemes'),
    width = 1000, 
    screenshot_options = None,
    crop_options = { 'bottom': 500 })

Below function will help parse such time ranges:

In [None]:
def get_fiscal_period_dates(date_string: str) -> (tuple[str, str] | None):
    """
    This function parses a date string representing a fiscal period 
    (Fiscal/Financial Year, Quarter, or Half-year) and returns the corresponding 
    start and end dates.
    """
    # Define a regex pattern to match fiscal periods (FY, Q1-Q4, H1-H2) followed by a year
    pattern = r"^(FY|Q[1-4]|H[1-2])'(\d{4})$"
    # Try to match the input string against the pattern
    match = re.match(pattern, date_string, re.IGNORECASE)
    # If no match is found, return None
    if not match:
        return None
    # Extract the period type and year from the match
    period, year = match.groups()
    year = int(year)
    # Handle Fiscal Year (FY) case
    if period.upper() == 'FY':
        start_date = datetime(year, 1, 1)
        end_date = datetime(year, 12, 31)
    # Handle Quarter (Q1-Q4) cases
    elif period.upper().startswith('Q'):
        quarter = int(period[1])
        start_month = (quarter - 1) * 3 + 1
        start_date = datetime(year, start_month, 1)
        # Calculate end date of the quarter
        end_date = start_date.replace(month=start_month + 2) + timedelta(days=32)
        end_date = end_date.replace(day=1) - timedelta(days=1)
    # Handle Half-year (H1-H2) cases
    elif period.upper().startswith('H'):
        half = int(period[1])
        start_month = (half - 1) * 6 + 1
        start_date = datetime(year, start_month, 1)
        # Calculate end date of the half-year
        end_date = start_date.replace(month=start_month + 5) + timedelta(days=32)
        end_date = end_date.replace(day=1) - timedelta(days=1)
    # Return start and end dates formatted as strings
    return (start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d'))

def TEST_get_fiscal_period_dates():
    # Test the function
    test_dates = ["FY'2019", "Q1'2020", "H1'2019", "fy'2018", "q3'2021", "h2'2022", "H3'2020"]

    for expanding_value in test_dates:
        result = get_fiscal_period_dates(expanding_value)
        if result:
            print(f"{expanding_value}: {result}")
        else:
            print(f"{expanding_value}: Invalid format")
TEST_get_fiscal_period_dates()

### Parsing a Money Market Fund(KES) Record Entry.

The `RecordEntry` class below represents and validates a financial record entry. It validates `record type` (_Assets Under Management_ or _Effective Annual Rate_), `date`, `value`, and `fund manager`. The class also maintains lists of non-existent fund managers and invalid records. Assets under management (AUM) is the market value of the investments managed by the fund manager on behalf of clients. The effective annual interest rate is the actual return of the Money Market Fund account when the effects of compounding are considered.

In [None]:
class MoneyMarketFund_KES_RecordEntry:
    """
    A class to represent and validate financial entry information.
    
    Class Attributes:
    INVALID_FUNDS (list[str]): Stores funds not found in the mapping.
    INVALID_DATES (list[str]): Stores entry dates not valid.
    INVALID_VALUES (list[str]): Stores entry values not valid.
    TYPE_ASSETS_UNDER_MANAGEMENT (str): Constant for Assets Under Management type.
    TYPE_EFFECTIVE_ANNUAL_RATE (str): Constant for Effective Annual Rate type.
    """
    INVALID_FUNDS: list[str] = []
    INVALID_DATES: list[str] = []
    INVALID_VALUES: list[str] = []
    TYPE_ASSETS_UNDER_MANAGEMENT: str = 'ASSETS_UNDER_MANAGEMENT' # Assets Under Management
    TYPE_EFFECTIVE_ANNUAL_RATE: str = 'EFFECTIVE_ANNUAL_RATE' # Effective Annual Rate

    def __init__(self, 
                 entry_type: Literal['ASSETS_UNDER_MANAGEMENT', 'EFFECTIVE_ANNUAL_RATE'], 
                 entry_date: str, 
                 entry_value: str, 
                 entry_fund: str,
                 entry_fund_filter_predicate: Callable[[str], list[str]]):
        """
        Initialize a RecordEntry instance with validated attributes.
        
        Args:
        entry_type (str): Type of the record (TYPE_ASSETS_UNDER_MANAGEMENT or TYPE_EFFECTIVE_ANNUAL_RATE).
        entry_date (str): Date of the record (2024-03-01) or Financial period (H1'2024).
        entry_value (str): Value of the record.
        entry_fund (str): Name of the MMF(KES) fund
        fund_manager_filter_predicate (Callable): A predicate to filter and return matched MMF(KES) fund for validation.
        """
        self.entry_type = MoneyMarketFund_KES_RecordEntry.validate_type(entry_type)
        self.entry_date = MoneyMarketFund_KES_RecordEntry.validate_date(entry_date)
        self.entry_value = MoneyMarketFund_KES_RecordEntry.validate_value(entry_value)
        self.entry_fund_manager = MoneyMarketFund_KES_RecordEntry.validate_fund(entry_fund, entry_fund_filter_predicate)
        if self.entry_date is None:
            MoneyMarketFund_KES_RecordEntry.INVALID_DATES.append(entry_date)
        if self.entry_value is None:
            MoneyMarketFund_KES_RecordEntry.INVALID_VALUES.append(entry_value)
        if self.entry_fund_manager is None:
            MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS.append(entry_fund)

    def is_valid(self) -> bool:
        """
        Check if the record is valid (all attributes are non-empty).
        """
        is_valid = \
            bool(self.entry_type) \
            and bool(self.entry_date) \
            and bool(self.entry_fund_manager)
        return is_valid

    @staticmethod
    def validate_fund(value: str, filter_predicate: Callable[[str], list[str]]) -> str|None:
        """
        Validate and standardize the date or financial period.
        """
        try:
            value = str(value or '').lower()
            # These represent USD MMF's
            EXCLUDES = ['Dollar', 'USD']
            is_USD_MMF = any((exclude.lower() in value) for exclude in EXCLUDES)
            if not is_USD_MMF:
                names = filter_predicate(value)
                if len(names) == 1:
                    return names[0]
                if len(names) > 1:
                    print(f'"{value}" has more that two matches! {names}')
            return None
        except:
            return None
    
    @staticmethod
    def validate_date(value: str) -> str|tuple[str, str]|None:
        """
        Validate and standardize the date
        """
        try:
            return get_fiscal_period_dates(value)\
                    or datetime.strptime(value, "%Y-%m-%d").strftime('%Y-%m-%d')\
                    or None
        except:
            return None

    @staticmethod
    def validate_value(value: str|float) -> str|None:
        """
        Validate and clean the entry value.
        """
        try:
            if type(value) == float:
                    return value
            # remove percentage sign
            value = value.rstrip('%')
            # remove comma and white space
            value = ''.join([i for i in value if i not in [' ', ',', '-']])
            return float(value) if len(value) > 0 else None
        except:
            return None

    @staticmethod
    def validate_type(value: str) -> Literal['ASSETS_UNDER_MANAGEMENT', 'EFFECTIVE_ANNUAL_RATE']:
        """
        Validate the record type.
        
        Args:
        value (str): The record type to validate.
        
        Raises:
        TypeError exception.
        """
        value = (value or '').upper()
        if value in [MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE]:
            return value
        raise TypeError(f"{value} is not proper entry Type!")
    
def TEST_MoneyMarketFund_KES_RecordEntry():
    # Test the class
    test_cases = [
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "2024-03-01",
            "entry_value": "1,000,000",
            "entry_fund": "britam",
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE,
            "entry_date": "H1'2024",
            "entry_value": "5.5%",
            "entry_fund": "old mutual",
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "invalid-date",
            "entry_value": "1,000,000",
            "entry_fund": "sanlam",
            "invalid": "invalid date"
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "2024-03-01",
            "entry_value": "invalid-value",
            "entry_fund": "britam",
            "invalid": "invalid value"
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "2024-03-01",
            "entry_value": "1,000,000",
            "entry_fund": "unknown fund",
            "invalid": "unmapped fund"
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "2024-03-01",
            "entry_value": "1,000,000",
            "entry_fund": "britam sanlam",
            "invalid": "2 funds matched"
        },
        {
            "entry_type": MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT,
            "entry_date": "2024-03-01",
            "entry_value": "1,000,000",
            "entry_fund": "Britam USD Dollar Fund",
            "invalid": "USD MMF"
        },
    ]
    # Define the fund filter function
    test_fund_map = [
        (
            'Britam MMF(KES)',
            ['britam', 'british-american', 'british', 'american']
        ),
        (
            'UAP Old Mutual MMF(KES)',
            ['old mutual', 'uap old mutual', 'uap']
        ),
        (
            'Sanlam MMF(KES)',
            ['sanlam', 'sanlam investments']
        )
    ]

    def test_fund_filter(value: str):
        value = value.lower()
        names = [name for name, aliases in test_fund_map if any(alias in value for alias in aliases)]
        return names

    # Run tests
    for test_case in test_cases:
        entry = MoneyMarketFund_KES_RecordEntry(
            test_case["entry_type"],
            test_case["entry_date"],
            test_case["entry_value"],
            test_case["entry_fund"],
            test_fund_filter
        )
        cases = [entry.entry_date, entry.entry_value, entry.entry_fund_manager]
        invalid = f" ({test_case.get('invalid')})" if test_case.get('invalid') else ''
        print(f"Valid: {entry.is_valid()}{invalid}, {cases}")

    # Print invalid entries
    print("\nInvalid Funds:", MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS)
    print("\nInvalid Dates:", MoneyMarketFund_KES_RecordEntry.INVALID_DATES)
    print("\nInvalid Values:", MoneyMarketFund_KES_RecordEntry.INVALID_VALUES)
    MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS = []
    MoneyMarketFund_KES_RecordEntry.INVALID_DATES = []
    MoneyMarketFund_KES_RecordEntry.INVALID_VALUES = []
TEST_MoneyMarketFund_KES_RecordEntry()

The reason we have fund managers map with a tuple of `aliases` and `name` was because the records don't have a simple or stardard naming in the Cytonn reports. As such, we need to use very unique and simple names that we can use to match abitrary fund manager names from the crawled data.

In [None]:
EXCLUDE_WORDS = [
    'Unit', 'Trust', 'Scheme', 'Fund', 'Funds', 'Equity', 'Kenya', 'Shilling', 'Fixed', 'Formerly', 'Market',
    'Enhanced', 'Managed', 'Yield', 'Income', 'Money', 'Balanced', 'Special', 'KES', 'Dollar', 'Wealth',
    'Bond', 'Asset', 'Shariah', 'Growth', 'X', 'Investment', 'USD', 'Regional', 'Africa', 'East', 
    'Multi', 'Diversified', 'Collective', 'Fund\(USD', 'Global', 'Fund\(KES']
def merge_names(scheme: collective_scheme_type):
    
    scheme_words = scheme['Scheme'].split()
    comprising_words = [word for unit_name in scheme['Comprising of'] for word in unit_name.split()]
    all_words = list(set(scheme_words + comprising_words))
    all_words = [strip_start_end(word, '') for word in all_words]
    return [
        word 
        for word 
        in all_words 
        if not any([re.search(f'^{EXCLUDE_WORD}$', word, re.IGNORECASE) for EXCLUDE_WORD in EXCLUDE_WORDS])
    ]

[(i['Scheme'], merge_names(i)) for i in collective_schemes]

In [None]:
# https://licensees.cma.or.ke/
# https://licensees.cma.or.ke/licenses/15/
# https://licensees.cma.or.ke/licenses/8/
# https://www.rba.go.ke/registered-fund-managers/

MMF_NAME_ALIAS_MAP = [
    # The African Alliance (AA) Kenya Shillings Fund is a money market fund by 
    # African Alliance Kenya Investment Bank Limited (the fund manager) 
    # launched on 27th April 2015.
    # https://centwarrior.com/aa-kenya-shillings-fund/
    # https://www.linkedin.com/posts/centwarrior_aa-kenya-shillings-fund-explained-in-2024-activity-7169322082814705664-8nwu?utm_source=share&utm_medium=member_desktop
    # https://cytonn.com/topicals/investment-risk-analysis
    (
        'African Alliance Kenya Unit Trust Scheme', 
        ['african', 'alliance', 'aa kenya']
    ),
    (
        'British-American Unit Trust Scheme', 
        ['britam', 'british-american', 'british', 'american']
    ),
    (
        'NCBA Unit Trust Funds', 
        ['ncba', 'cba', 'commercial bank of africa']
    ),
    (
        'Zimele Unit Trust Scheme', 
        ['zimele']
    ),
    (
        'ICEA Unit Trust Scheme', 
        ['icea']
    ),
    (
        'Standard Investment Trust Funds', 
        ['standard', 'mansa']
    ),
    (
        'CIC Unit Trust Scheme', 
        ['cic']
    ),
    (
        'Madison Unit Trust Fund', 
        ['Madison', 'madisson']
    ),
    (
        'Dyer and Blair Unit Trust Scheme', 
        ['dyer', 'blair']
    ),
    (
        'Amana Unit Trust Funds Scheme', 
        ['amana']
    ),
    (
        'Diaspora Unit Trust Scheme', 
        ['diaspora']
    ),
    (
        'First Ethical Opportunities Fund', 
        ['ethical', 
         # 'first', 'opportunities'
        ]
    ),
    # https://www.cma.or.ke/licensees-market-players/
    # https://genghis-capital.com/asset-management/money-market-fund/
    (
        'Genghis Unit Trust Funds', 
        ['hela','genghis', 'hazina', 'hisa', 'iman', 'gencap', 'compliant', 'eneza', 'genCap', 'imara']
    ),
    # https://www.businessdailyafrica.com/bd/markets/capital-markets/safaricom-s-mali-unit-trust-asset-base-hits-sh1-4bn--4582142
    (
        'Mali Money Market Fund', 
        ['mali']
    ),
    (
        'Sanlam Unit Trust Scheme', 
        ['sanlam']
    ),
    (
        'Nabo Africa Funds', 
        ['nabo']
    ),
    (
        'Old Mutual Unit Trust Scheme', 
        ['mutual', 'old', 'Faulu']
    ),
    # https://equitygroupholdings.com/ke/investor-relations/eib
    # https://www.cma.or.ke/licensees-market-players/
    (
        'Equity Investment Bank Collective Investment Scheme', 
        ['equity']
    ),
    # https://www.cma.or.ke/licensees-market-players/
    (
        'Dry Associates Unit Trust Scheme', 
        ['dry associates', 'dry', 'associates']
    ),
    (
        'Co-op Trust Fund', 
        ['co-op', 'gratuity']
    ),
    (
        'Apollo Unit Trust Scheme', 
        ['aggressive', 'apollo']
    ),
    (
        'Cytonn Unit Trust Scheme', 
        ['cytonn']
    ),
    (
        'Orient Umbrella Collective Investment Scheme (formerly Alphafrica Umbrella Fund)', 
        ['orient', 'kasha', 'alpha', 'alphafrica']
    ),
    (
        'Wanafunzi Investment Unit Trust Fund', 
        ['wanafunzi']
    ),
    (
        'Absa Unit Trust Funds', 
        ['absa']
    ),
    (
        'Jaza Unit Trust Fund', 
        ['jaza']
    ),
    (
        'Masaru Unit Trust Scheme', 
        ['masaru']
    ),
    (
        'ADAM Unit Trust Scheme', 
        ['adam']
    ),
    (
        'KCB Unit Trust Scheme (formerly Natbank Unit Trust Scheme)', 
        ['kcb', 'natbank']
    ),
    (
        'GenAfrica Unit Trust Scheme', 
        ['genafrica']
    ),
    (
        'Amaka Unit Trust (Umbrella) Scheme', 
        ['amaka']
    ),
    (
        'Jubilee Unit Trust Collective Investment Scheme', 
        ['jubilee']
    ),
    # Previusly "Liberty Pension Services Limited"
    # https://enwealth.co.ke/about/#governance
    # https://www.linkedin.com/company/enwealth-kenya/?originalSubdomain=ke
    # https://enwealth.co.ke/capital/enwealth-money-market-fund/
    (
        'Enwealth Capital Unit Trust Scheme', 
        ['enwealth']
    ),
    (
        'Kuza Asset Management Unit Trust Scheme', 
        ['kuza', 'momentum']
    ),
    # https://www.linkedin.com/company/arvocap-asset-managers/
    # https://www.businessdailyafrica.com/bd/markets/avocarp-latest-to-enter-kenya-s-asset-management-market-4644586
    (
        'Arvocap Unit Trust Scheme', 
        ['arvocap']
    ),
    (
        'Etica Capital Limited', 
        ['etica']
    ),
    # https://licensees.cma.or.ke/licenses/15/
    (
        'Mayfair umbrella Collective investment scheme', 
        ['mayfair']
    ),
    (
        'Lofty Corban Unit Trust Scheme', 
        ['lofty-corban', 'lofty', 'corban']
    ),
    (
        'CPF Unit Trust Funds', 
        ['cpf', 'cpof']
    ),
    (
        'Stanbic Unit Trust Funds', 
        ['stanbic']
    ),
    #############################################
    ##### UNVERIFIED COLLECTIVE INVESTMENTS #####
    #############################################
    (
        'Metropolitan Canon Asset Managers Limited',
        ['metropolitan']
    ),
    (
        'FCB Capital Limited',
        ['fcb']
    ),
    (
        'Fusion Investment Management Limited',
        ['fusion']
    ),
    (
        'Altree Capital Kenya Limited',
        ['altree']
    ),
    (
        'CFS Asset Management Limited',
        ['cfs']
    ),
    (
        'I&M Capital Limited',
        ['i&m']
    ),
    (
        'Globetec Asset Managers Limited',
        ['globetec']
    ),
    (
        'Waanzilishi Capital Limited',
        ['waanzilishi']
    ),
    (
        'Star Capital Management Limited',
        ['star']
    ),
    # Unverified and NO online presense!
    (
        'Stanlib Kenya',
        ['stanlib']
    ),
    
]
MMF_NAME_ALIAS_MAP

In [None]:
def fund_manager_filter(value: str) -> list[str]:
    value = value.lower()
    names = [
        name 
        for name, aliases
        in MMF_NAME_ALIAS_MAP if any((alias.lower() in value) for alias in aliases)
    ]
    return names

# Test
def TEST_fund_manager_filter(fund_manager: str):
    name = fund_manager_filter(fund_manager)
    print(f"{fund_manager} => {name}")
TEST_fund_manager_filter('KCB Fund Managers')
TEST_fund_manager_filter('Cytonn Fund Mangers')
TEST_fund_manager_filter('Nabo')
TEST_fund_manager_filter('madison')

One report can more than one table, see: <https://cytonnreport.com/research/unit-trust-fund-performance-q3-1>. The list below contains a tuple of:
1. A list if table names in the records. For matching, the table names are normalized with the below function:

In [None]:
def normalize_and_compare_two_strs(str1: str, str2:str) -> bool:
    if not str1 or not str2:
        return False
    str1 = str1.strip().upper()
    str2 = str2.strip().upper()
    return str1 == str2 or hacky_normalizer(str1) == hacky_normalizer(str2)

# Test
def TEST_normalize_and_match_two_strs():
    str1_str2 = [
        ('q2’2020-aum(kshs-mns)', 'Q2 2020_AUM(KSHs_MNs)'),
        ('q2’2020-aum(kshs-mns)', "Q2_2020_AUM(KSHs_MNs)"),
        ('', ""),
        (None, None),
        ('no.', 'NO.')
    ]
    for str1, str2 in str1_str2:
        is_match = normalize_and_compare_two_strs(str1, str2)
        print(f'IS_MATCH={is_match}; {[str1, str2]}')

TEST_normalize_and_match_two_strs()

2. A list of functions that process the matched tables. Each function should essentially process a single column. The function receives two parameters: a dataframe row data (a table row entry), and the entire record from which the table was extracted from. This first parameter is useful to capture the entry value, and the second is important to capture the date of the record if not provided in the table. The table row entry values are a dictionally named with the table names used to match the tables. <br/> Callback function returns a `RecordEntry`

One report can more than one table, see: <https://cytonnreport.com/research/unit-trust-fund-performance-q3-1> and <https://cytonnreport.com/research/fy2019-utf-performance>.

In [180]:
EXTRACTION_MAP: list[
    tuple[
        list[str], 
        list[Callable[[dict[str, Any], dict[CYTONN_RECORD_LITERALS, Any]], MoneyMarketFund_KES_RecordEntry]]
    ]] = [
    (
        [
            'RANK', 
            'FUND_MANAGER', 
            'EFFECTIVE_ANNUAL_RATE'
        ], 
        [
            # https://cytonnreport.com/research/kenyas-fy2024-2025-budget
            # https://cytonnreport.com/research/nairobi-metropolitan-area-serviced-apartments-report-2021
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                record['researchdate'], 
                row['EFFECTIVE_ANNUAL_RATE'], 
                row['FUND_MANAGER'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'RANK', 
            'FUND_MANAGER', 
            'EFFECTIVE_ANNUAL'
        ], 
        [
            # https://cytonnreport.com/research/cytonn-monthly-may-2024
            # https://cytonnreport.com/research/q12023-unit-trust-funds-performance-cytonn-monthly-july-2023
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                record['researchdate'], 
                row['EFFECTIVE_ANNUAL'], 
                row['FUND_MANAGER'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'RANK', 
            'FUND_MANAGER', 
            'DAILY_YIELD', 
            'EFFECTIVE_ANNUAL_RATE'
        ], 
        [
            # Effective Annual Rate is better than Daily Yield: https://cytonnreport.com/research/cytonn-monthly-october-2021
            # https://cytonnreport.com/research/potential-effects-covid-19
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                record['researchdate'], 
                row['EFFECTIVE_ANNUAL_RATE'], 
                row['FUND_MANAGER'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'Q1_2020_AUM(KSHS_MNS)',
            'Q1_2020MARKET_SHARE',
            'Q2_2020_AUM(KSHS_MNS)',
            'Q2_2020MARKET_SHARE',
            'AUM_GROWTHQ1_2020_Q2_2020'
        ], 
        [
            # https://cytonnreport.com/research/unit-trust-funds-performance-q2-2020
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q1'2020", 
                row['Q1_2020_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q2'2020", 
                row['Q2_2020_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'FY_2019_AUM(KSHS_MNS)',
            'Q1_2020_AUM(KSHS_MNS)',
            'AUM_GROWTH_FY_2019_Q1_2020'
        ], 
        [
            # https://cytonnreport.com/research/unit-trust-funds-perfomance-q1-2020-cytonn-weekly
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2019", 
                row['FY_2019_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q1'2020", 
                row['Q1_2020_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'FY_2018_AUM_(KSHS_MNS)',
            'H1_2019_AUM_(KSHS_MNS)',
            'AUM_H1_2019_ANNUALIZED_GROWTH'
        ], 
        [
            # https://cytonnreport.com/research/unit-trust-funds-performance
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2018", 
                row['FY_2018_AUM_(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "H1'2019", 
                row['H1_2019_AUM_(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.', 
            'MONEY_MARKET_FUND', 
            '2018_AVERAGE_EFFECTIVE_ANNUAL_YIELD_P.A.'
        ], 
        [
            # https://cytonnreport.com/research/investing-in-unit
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                record['researchdate'], 
                row['2018_AVERAGE_EFFECTIVE_ANNUAL_YIELD_P.A.'], 
                row['MONEY_MARKET_FUND'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'NO.', 
            'PRODUCT',
            'FY_2018',
            'FY-2018_MARKET_SHARE'
        ], 
        [
            # https://cytonnreport.com/research/investing-in-unit
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2018", 
                row['FY_2018'], 
                row['PRODUCT'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'Q2_2020_AUM',
            'Q2_2020',
            'Q3_2020_AUM',
            'Q3_2020',
            'AUM_GROWTH'
        ], 
        [
            # https://cytonnreport.com/research/unit-trust-fund-performance-q3-1
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q2'2020", 
                row['Q2_2020_AUM'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q3'2020", 
                row['Q3_2020_AUM'], 
                row['FUND_MANAGERS'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'RANK', 
            'MONEY_MARKET_FUNDS', 
            'EFFECTIVE_ANNUAL_RATE_(AVERAGE_Q3_2020)'
        ], 
        [
            # https://cytonnreport.com/research/unit-trust-fund-performance-q3-1
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                record['researchdate'], 
                row['EFFECTIVE_ANNUAL_RATE_(AVERAGE_Q3_2020)'], 
                row['MONEY_MARKET_FUNDS'],
                fund_manager_filter)
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'FY_2018_AUM(KSHS_MNS)',
            'FY_2019_AUM(KSHS_MNS)',
            'AUM_GROWTHFY_2018_FY_2019'
        ], 
        [
            # https://cytonnreport.com/research/fy2019-utf-performance
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2018", 
                row['FY_2018_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2019", 
                row['FY_2019_AUM(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'FY_2018_MONEY_MARKET_FUND(KSHS_MNS)',
            'FY_2019_MONEY_MARKET_FUND(KSHS_MNS)',
            'FY_2018_MARKET_SHARE',
            'FY_2019_MARKET_SHARE',
            'VARIANCE'
        ], 
        [
            # https://cytonnreport.com/research/fy2019-utf-performance
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                "FY'2018", 
                row['FY_2018_MONEY_MARKET_FUND(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                "FY'2019", 
                row['FY_2019_MONEY_MARKET_FUND(KSHS_MNS)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'RANK', 
            'MONEY_MARKET_FUNDS', 
            'EFFECTIVE_ANNUAL_RATE_(AVERAGE_FY_2019)'
        ], 
        [
            # https://cytonnreport.com/research/fy2019-utf-performance
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_EFFECTIVE_ANNUAL_RATE, 
                "FY'2019", 
                row['EFFECTIVE_ANNUAL_RATE_(AVERAGE_FY_2019)'], 
                row['MONEY_MARKET_FUNDS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.', 
            'UNIT_TRUST_FUND_MANAGER', 
            'AUM', 
            '%_OF_MARKET_SHARE'
        ], 
        [
            # https://cytonnreport.com/research/investment-options-in-kenyan-market
            lambda row, record: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                record['researchdate'], 
                row['AUM'], 
                row['UNIT_TRUST_FUND_MANAGER'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'FUND_MANAGERS',
            'H1_2018_MONEY_MARKET_FUND(KSHS_MN)',
            'FY_2018_MONEY_MARKET_FUND_(KSHS_MN)',
            'H1_2019_MONEY_MARKET_FUND(KSHS_MN)',
            'ANNUALIZED_H1_2019_GROWTH'
        ], 
        [
            # https://cytonnreport.com/research/options-for-your-pension
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "H1'2018", 
                row['H1_2018_MONEY_MARKET_FUND(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2018", 
                row['FY_2018_MONEY_MARKET_FUND_(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "H1'2019", 
                row['H1_2019_MONEY_MARKET_FUND(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            '#',
            'FUND_MANAGERS',
            'H1_2018_MONEY_MARKET_FUND_AUM_(KSHS_MN)',
            'FY_2018_MONEY_MARKET_FUND_AUM(KSHS_MN)',
            'H1_2019_MONEY_MARKET_FUND_AUM(KSHS_MN)',
            'ANNUALIZED_H1_2019_AUM_GROWTH'
        ], 
        [
            # https://cytonnreport.com/research/cytonn-monthly-august-2019
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "H1'2018", 
                row['H1_2018_MONEY_MARKET_FUND_AUM_(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2018", 
                row['FY_2018_MONEY_MARKET_FUND_AUM(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "H1'2019", 
                row['H1_2019_MONEY_MARKET_FUND_AUM(KSHS_MN)'], 
                row['FUND_MANAGERS'],
                fund_manager_filter),
        ]
    ),
    (
        [
            'NO.',
            'COLLECTIVE_INVESTMENT_SCHEMES',
            'FY_2023_AUM',
            'FY_2023',
            'Q1_2024_AUM',
            'Q1_2024',
            'AUM_GROWTH'
        ], 
        [
            # https://cytonnreport.com/research/q1-2024-unit-trust-funds-performance-note
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "FY'2023", 
                row['FY_2023_AUM'], 
                row['COLLECTIVE_INVESTMENT_SCHEMES'],
                fund_manager_filter),
            lambda row, _: MoneyMarketFund_KES_RecordEntry(
                MoneyMarketFund_KES_RecordEntry.TYPE_ASSETS_UNDER_MANAGEMENT, 
                "Q1'2024", 
                row['Q1_2024_AUM'], 
                row['COLLECTIVE_INVESTMENT_SCHEMES'],
                fund_manager_filter)
        ]
    ),
]

In [181]:
def get_table(table: Tag):
    for tag in table.find_all(True):
        tag.attrs = {} # remove tags such as colspan and rowspan
    # Iterate through predefined extraction mappings
    for (table_columns, extractor_callbacks) in EXTRACTION_MAP:
        clean_up_tasks: list[Callable[[], None]] = []
        header_tr_s: list[Tag] = table.select('thead tr')
        is_match = False
        # Check if table headers match the expected columns
        for header_tr in header_tr_s:
            header_td_s: list[Tag] = header_tr.find_all('td')
            is_match_new = \
                len(header_td_s) == len(table_columns)\
                and all(
                    [
                        normalize_and_compare_two_strs(header_td.get_text(strip=True), table_column) 
                        for header_td, table_column 
                        in zip(header_td_s, table_columns)
                    ])
            # If not a match, add to cleanup tasks.
            # We add cleap tasks here to delay deleting the table headers before we decide 
            # that this table is matched. When the given columns are matched, the other columns
            # are deleted to ensure the dataframe has one column.
            if not is_match_new:
                clean_up_tasks.append(header_tr.extract)
            is_match = is_match or is_match_new
        # If a match is found, process the table
        if is_match:
            try:
                # Execute cleanup tasks
                [clean_up_task() for clean_up_task in clean_up_tasks]
                # Convert table to DataFrame
                table_df = pd.read_html(io.StringIO(str(table)))[0]
                table_df.columns = table_columns
                return (table_df, extractor_callbacks)
            except Exception as e:
                print('error', e, table)
                continue
    return (None, None)

def is_valid_dataframe(df: pd.DataFrame | None) -> bool:
    return df is not None and not df.empty

DEBUG_OPTIONS = dict[
    Literal[
        'log_unmatched_table', 
        'log_invalid_columns', 
        'log_extracted_valid',
        'log_extracted_invalid',
        'log_extractor_count',
    ], 
    Callable[[str], None]
]

def get_tables(html: str, *, debug_options: DEBUG_OPTIONS = {}):
    log_unmatched_table = debug_options.get('log_unmatched_table')
    # Parse the HTML content using BeautifulSoup
    parsed_html = BeautifulSoup(html, "html.parser")
    # Find all <table> elements in the parsed HTML and store them in a list
    # remove duplicates
    tables: list[Tag] = list({ hacky_normalizer(str(table)): table for table in parsed_html.find_all('table')}.values())
    # Iterate over each table found in the HTML
    for table in tables:
        # Generate a DataFrame and a list of extractor callbacks for each table
        table_df, extractor_callbacks = get_table(copy(table))
        # Check if the DataFrame is valid and not None
        if is_valid_dataframe(table_df) and table_df is not None:
            # Yield the DataFrame and the associated callbacks
            yield (table_df, extractor_callbacks)
        else:
            if log_unmatched_table:
                log_unmatched_table(str(table))

def extract_table_by_column_names(report: CYTONN_RECORD_LITERALS, *, debug_options: DEBUG_OPTIONS = {}):
    log_invalid_columns = debug_options.get('log_invalid_columns')
    log_extractor_count = debug_options.get('log_extractor_count')
    log_extracted_valid = debug_options.get('log_extracted_valid')
    log_extracted_invalid = debug_options.get('log_extracted_invalid')
    # Get the HTML content of the report
    report_html = get_report_HTML(report)
    # Generate tables and callbacks using the get_tables function
    table__callback__generator = get_tables(report_html, debug_options=debug_options)
    # Iterate over each table DataFrame and its extractor callbacks
    for table_df, extractor_callbacks in table__callback__generator:
        if log_extractor_count:
            log_extractor_count(len(extractor_callbacks))
        if len(extractor_callbacks) > 0:
            # Apply each callback function to the rows of the table
            for extractor_callback in extractor_callbacks:
                table_rows = [
                    extractor_callback(raw_table_row.to_dict(), report)
                    for _, raw_table_row 
                    in table_df.iterrows()
                ]
                # Convert the processed rows into a new DataFrame
                extracted_df = pd.DataFrame([vars(i) for i in table_rows if i.is_valid()])
                # Check if the extracted DataFrame is valid and yield it
                if is_valid_dataframe(extracted_df):
                    if log_invalid_columns:
                        __invalid_columns = [vars(i) for i in table_rows if not i.is_valid()]
                        if len(__invalid_columns) > 0:
                            log_invalid_columns(pd.DataFrame(__invalid_columns))
                    if log_extracted_valid:
                        log_extracted_valid(extracted_df)
                    yield extracted_df
                elif log_extracted_invalid:
                    log_extracted_invalid(table_df)

### Validating the Parsing

Below code extracts and parses entries from all the records and stores various metrics for validating.

In [None]:
len(all_cytonn_reports_df), len(all_cytonn_reports), len(all_cytonn_reports_df['id'].unique())

In [None]:
# Stores a tuple of the index of the record, the key for the log and the value logged
debug_log_store: list[tuple[str, str, Any]] = []
def extract_all_records():
    for index, report in tqdm(all_cytonn_reports_df.iterrows(), total=len(all_cytonn_reports_df)):
        debug_options = {
            'log_unmatched_table': 
                lambda value_str: debug_log_store.append((index, 'log_unmatched_table', value_str)),
            'log_invalid_columns': 
                lambda value_df: debug_log_store.append((index, 'log_invalid_columns', value_df)),
            'log_extracted_valid': 
                lambda value_tuple_df: debug_log_store.append((index, 'log_extracted_valid', value_tuple_df)),
            'log_extracted_invalid': 
                lambda value_tuple_df: debug_log_store.append((index, 'log_extracted_invalid', value_tuple_df)),
            'log_extractor_count': 
                lambda value_int: debug_log_store.append((index, 'log_extractor_count', value_int)),
        }
        yield from extract_table_by_column_names(report, debug_options = debug_options)

extracted_records_df = pd.concat(objs = extract_all_records(), ignore_index = True)

# # Save to files
# extracted_records_df.to_json('extracted-records.json', orient='records')

# # Load from files
# extracted_records_df = pd.read_json('extracted-records.json')

extracted_records_df


In [None]:
len(MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS), len(MoneyMarketFund_KES_RecordEntry.INVALID_DATES), len(MoneyMarketFund_KES_RecordEntry.INVALID_VALUES)

In [None]:
np.unique(MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS)

In [None]:
all(
    re.match(r".*(total|usd|average|dollar).*|nan", str(i), flags=re.IGNORECASE) 
    for i 
    in MoneyMarketFund_KES_RecordEntry.INVALID_FUNDS)

In [None]:
np.unique(MoneyMarketFund_KES_RecordEntry.INVALID_VALUES).tolist()

Lets see if every extractor callback was able to extract a table. We'll do this by comparing the number of extracted tables and the number of extractor callbacks.

In [189]:
grouped_by_key_id = groupby(lambda x: (x[0], x[1]), debug_log_store)

In [None]:
def compare_extractor_count_and_extracted_tables(index: int):
    extractor_count_log = grouped_by_key_id.get((index, 'log_extractor_count'))
    if extractor_count_log is None:
        return True
    extractor_count = extractor_count_log[0][2]
    extracted_count = len(grouped_by_key_id[(index, 'log_extracted_valid')])
    return len(extractor_count_log) == 1 and extracted_count >= extractor_count

under_extracted_reports = [
    i 
    for i 
    in range(len(all_cytonn_reports)) 
    if not compare_extractor_count_and_extracted_tables(i)
]
under_extracted_reports

In [None]:
all_cytonn_reports_df.iloc[under_extracted_reports]

In [None]:
def check_why_less_extracted(index):
    extracted_valid = grouped_by_key_id.get((index, 'log_extracted_valid'), [])
    extracted_invalid = grouped_by_key_id.get((index, 'log_extracted_invalid'), [])
    print(f'extracted_valid_count={len(extracted_valid)}, extracted_invalid_count={len(extracted_invalid)}')
    return { 
        'extracted_valid': extracted_valid, 
        'extracted_invalid': extracted_invalid, 
        'extracted_valid_df1': extracted_valid[0][2] if extracted_valid else "No dataframe", 
        'extracted_invalid_df1': extracted_invalid[0][2] if extracted_invalid else "No dataframe", 
    }

check_why_less_extracted(37)['extracted_invalid_df1']

37 has unextracted table because it is in USD money Market

In [None]:
pd.read_html(io.StringIO(grouped_by_key_id.get((292, 'log_unmatched_table'), [])[1][2]))[0]
# all_cytonn_reports_df.iloc[292]['url']

Lets look at unmatched tables

In [None]:
def table_has_CIC_fund_manager(tables_str_value: str):
    tables_str_value = tables_str_value.lower()
    unwanted_regexes = ['cic\s*group', 'cic\s*insurance', 'cic\s*academia']
    for unwanted_regex in unwanted_regexes:
        tables_str_value = re.sub(unwanted_regex, "", tables_str_value, flags=re.IGNORECASE)
    return 'cic' in tables_str_value

unmatched_tables = [
    (index, table)
    for (index, key), tables 
        in grouped_by_key_id.items() 
        if key == 'log_unmatched_table' 
    for (_,_,table) 
        in tables 
        if table_has_CIC_fund_manager(table)
]
len(unmatched_tables)

In [271]:
unmatched_table_iterator_index = 0

In [None]:
unmatched_table_index, unmatched_table_html = unmatched_tables[unmatched_table_iterator_index]
unmatched_record = all_cytonn_reports_df.loc[unmatched_table_index, :]
print(
    f"{unmatched_table_iterator_index + 1}/{len(unmatched_tables)};\
    DATE: {unmatched_record['researchdate']};\
    URL: {unmatched_record['url']}")
unmatched_table_iterator_index += 1 if unmatched_table_iterator_index < len(unmatched_tables) - 1 else 0
pd.read_html(io.StringIO(unmatched_table_html))[0]

## Exploratory Data Analysis (EDA)

In [None]:
def expand_date_column(df: pd.DataFrame, expand_column: str):
    for _, row in df.iterrows():
        expanding_values = row[expand_column]
        if type(expanding_values) == list:
            start_date = datetime.strptime(expanding_values[0], "%Y-%m-%d")
            end_date = datetime.strptime(expanding_values[1], "%Y-%m-%d")
            start_end_diff_days = (end_date - start_date).days
            day_list = [
                (start_date + timedelta(days=i)).strftime('%Y-%m-%d') 
                for i 
                in range(start_end_diff_days + 1)
            ]
            for day in day_list:
                yield { **row.to_dict(), expand_column: day }
        else:
            yield row.to_dict()

extracted_records_df = pd.DataFrame(expand_date_column(extracted_records_df, 'record_date'))
extracted_records_df

In [None]:
grouped_df = extracted_records_df.groupby(
    ['record_type', 'record_date', 'fund_manager'])['record_value'].mean().reset_index()
grouped_df

In [None]:
EAR_df = grouped_df[grouped_df['record_type'] == 'EAR'].drop(columns=['record_type']).copy()
EAR_df['record_date'] = pd.to_datetime(EAR_df['record_date'])
EAR_pivot = EAR_df.pivot(index='record_date', columns='fund_manager', values='record_value')
EAR_pivot

In [None]:
EAR_fig = px.line(EAR_pivot, x=EAR_pivot.index, y=EAR_pivot.columns)
EAR_fig.update_layout(
    height=800,
    margin=dict(t=100),
    title=dict(
        text="Effective Annual Rate",  # Your title here
        y=0.98,                   # Adjust the title's vertical position
        x=0.5,                    # Center the title
        xanchor='center',
        yanchor='top'
    ),
    xaxis=dict(
        side="top",    # This moves the x-axis to the top
        title="Date"   # This sets the title for the x-axis
    ),
    yaxis=dict(
        title="Effective Annual Rate"   # This sets the title for the x-axis
    ),

    legend=dict(
        orientation="h",  # horizontal orientation
        yanchor="bottom",
        y=-4.5,  # move the legend below the plot
        xanchor="center",
        x=0.5
    ))
EAR_fig.update_traces(
    hovertemplate="<br>".join([
        "fund_manager=%{fullData.name}",
        "date=%{x|%Y-%m-%d}",
        "annual_rate=%{y}%",
        # removes any additional trace information that Plotly might add by default.
        "<extra></extra>"
    ])
)
EAR_fig.show()

In [None]:
start_date = '2022-09-01'
end_date = '2022-09-30'

# Filter the DataFrame
filtered_df = all_cytonn_reports_df[
    (pd.to_datetime(all_cytonn_reports_df['researchdate']) >= start_date) & \
    (pd.to_datetime(all_cytonn_reports_df['researchdate']) <= end_date)
]
list(filtered_df.loc[:,'url'])

In [None]:
all_cytonn_reports_df

<hr/>

In [None]:
AUM_df = grouped_df[grouped_df['record_type'] == 'AUM'].drop(columns=['record_type']).copy()
AUM_df['record_date'] = pd.to_datetime(AUM_df['record_date'])
AUM_pivot = AUM_df.pivot(index='record_date', columns='fund_manager', values='record_value')
AUM_pivot

In [None]:
AUM_fig = px.line(AUM_pivot, x=AUM_pivot.index, y=AUM_pivot.columns)
AUM_fig.update_layout(
    height=800,
    margin=dict(t=100),
    title=dict(
        text="Assets Under Management",  # Your title here
        y=0.98,                   # Adjust the title's vertical position
        x=0.5,                    # Center the title
        xanchor='center',
        yanchor='top'
    ),
    xaxis=dict(
        side="top",    # This moves the x-axis to the top
        title="Date"   # This sets the title for the x-axis
    ),
    yaxis=dict(
        title="Assets Under Management"   # This sets the title for the x-axis
    ),

    legend=dict(
        orientation="h",  # horizontal orientation
        yanchor="bottom",
        y=-4.5,  # move the legend below the plot
        xanchor="center",
        x=0.5
    ))
AUM_fig.update_traces(
    hovertemplate="<br>".join([
        "fund_manager=%{fullData.name}",
        "date=%{x|%Y-%m-%d}",
        "annual_rate=%{y}%",
        # removes any additional trace information that Plotly might add by default.
        "<extra></extra>"
    ])
)
AUM_fig.show()

<hr/>

## Exploratory Data Analysis (EDA)

In [None]:
def topics_tables_predicate(row: pd.Series):
    html = ' '.join([topic.get('body') for topic in row['topics']])
    parsed_html = BeautifulSoup(html, "html.parser")
    tables: list[Tag] = [table for table in parsed_html.find_all('table')]
    tables_str_value = ' '.join(str(table) for table in tables).lower()
    unwanted_regexes = ['cic\s*group', 'cic\s*insurance', 'cic\s*academia']
    for unwanted_regex in unwanted_regexes:
        tables_str_value = re.sub(unwanted_regex, "", tables_str_value, flags=re.IGNORECASE)
    return 'cic' in tables_str_value

matched_records = []
for report, record in tqdm(all_cytonn_reports_df.iterrows(), total=len(all_cytonn_reports_df)):
    extracted_dfs = extract_table_by_column_names(record)
    extracts = list(extracted_dfs)
    is_topics_match = topics_tables_predicate(record)
    matched_records.append((report, len(extracts), is_topics_match))

In [None]:
indexes_with_cic = [index for index, tables, is_topics_match in matched_records if tables == 0 and is_topics_match]
indexes_with_cic

<hr/>

In [None]:
len(indexes_with_cic)

In [None]:
webbrowser.get("/usr/bin/google-chrome %s")
for report in indexes_with_cic[20:]:
    url = str(all_cytonn_reports_df.loc[report, 'url'])
    webbrowser.open(url)

In [None]:
table = all_cytonn_reports_df.loc[
    all_cytonn_reports_df['url'] == 'https://cytonnreport.com/research/unit-trust-fund-performance-q3-1'
].iloc[0]
df_objs = extract_table_by_column_names(table)

In [None]:
df_objs[0]

<hr/>

In [None]:
table = all_cytonn_reports_df.loc[
    all_cytonn_reports_df['url'] == 'https://cytonnreport.com/research/q1-2024-unit-trust-funds-performance-note'
].iloc[0]
EXTRACTION_MAP = [
    (
        [ 'No.', 'Collective Investment Schemes', "FY’2023 AUM", "FY’2023", "Q1'2024 AUM", "Q1’2024", 'AUM Growth'], 
        [
            # https://cytonnreport.com/research/q1-2024-unit-trust-funds-performance-note
            lambda row, _: MoneyMarketFund_KES_RecordEntry("AUM", 'FY’2023', row["FY’2023 AUM"], row['Collective Investment Schemes']),
            lambda row, _: MoneyMarketFund_KES_RecordEntry("AUM", 'Q1’2024', row["Q1'2024 AUM"], row['Collective Investment Schemes'])
        ]
    ),
]
df_objs = extract_table_by_column_names(table)

In [None]:
one = next(df_objs)

In [None]:
one[1] 

## Achrives

To ensure perpetuity and reproducibility of this analysis, the crawled [Cytton Reports](https://huggingface.co/datasets/ToKnow-ai/money-market-funds-in-kenya__july-2024-archive/viewer/cytonn_reports) and [Capital Markets Authority Approved Fund Managers](https://huggingface.co/datasets/ToKnow-ai/money-market-funds-in-kenya__july-2024-archive/viewer/cma_approved_fund_managers) has been archived at <https://huggingface.co/datasets/ToKnow-ai/money-market-funds-in-kenya__july-2024-archive>.

In [None]:
#|output: false
#|echo: false

from python_utils.upload_dataset import upload_dataframe_to_huggingface

repo_id = "ToKnow-ai/collective-investment-schemes-in-kenya"

# upload_dataframe_to_huggingface(
#     fund_managers_df, 
#     repo_id=repo_id, 
#     dataset_name="Approved Collective Investment Schemes", 
#     split="data")

# upload_dataframe_to_huggingface(
#     fuller_fund_managers_df, 
#     repo_id=repo_id, 
#     dataset_name="Approved Money Market Funds Details", 
#     split="data")

# upload_dataframe_to_huggingface(
#     all_cytonn_reports_df, 
#     repo_id=repo_id, 
#     dataset_name="Cytonn Reports", 
#     split="data")

<!-- #| .content-visible when-format: "html" -->

### CMA Approved Fund Managers

{{< iframe 
  'Loading Approved Fund Managers...' 
  src="https://huggingface.co/datasets/ToKnow-ai/unit-trust-investments-in-kenya-money-market-funds/embed/viewer/C.M.A%20Approved%20Fund%20Managers/data"
  frameborder="0"
  width="100%"
  height="560px" >}}

### Detailed CMA Approved Fund Managers
{{< iframe 
  'Loading Approved Fund Managers...' 
  src="https://huggingface.co/datasets/ToKnow-ai/unit-trust-investments-in-kenya-money-market-funds/embed/viewer/Detailed%20C.M.A%20Approved%20Fund%20Managers/data"
  frameborder="0"
  width="100%"
  height="560px" >}}

### Cytonn Reports

{{< iframe 
  'Loading Cytonn Reports...' 
  src="https://huggingface.co/datasets/ToKnow-ai/unit-trust-investments-in-kenya-money-market-funds/embed/viewer/Cytonn%20Reports/data"
  frameborder="0" 
  width="100%" 
  height="560px" >}}

## Conclusion

<!-- metadata: abstract, preserve_cell=true -->
It seems like good investment. blah blah blah blah blah blah blah blah...
<!-- metadata: -->

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from typing import List, Tuple

def compound_interest(principal: float, rate: float, time: float, n: int, additions: float, addition_frequency: int) -> Tuple[float, float, float]:
    total_deposits = principal + additions * addition_frequency * time
    amount = principal * (1 + rate/n)**(n*time)
    for i in range(int(time * addition_frequency)):
        amount += additions * (1 + rate/n)**(n*time - i/addition_frequency)
    interest_earned = amount - total_deposits
    return amount, total_deposits, interest_earned

def inflation_adjusted_value(future_value: float, inflation_rate: float, time: float) -> float:
    return future_value / (1 + inflation_rate)**time

def calculate_compound_interest(initial_balance: float, annual_rate: float, years: float, 
                                addition: float, addition_frequency: int, inflation_rate: float) -> pd.DataFrame:
    compounding_intervals = {
        'Daily (360/Yr)': 360,
        'Daily (365/Yr)': 365,
        'Monthly (12/Year)': 12,
        'Quarterly': 4,
        'Semi-annually': 2,
        'Annually': 1
    }
    
    results = []
    for interval, n in compounding_intervals.items():
        future_value, total_deposits, interest_earned = compound_interest(initial_balance, annual_rate, years, n, addition, addition_frequency)
        inflation_adjusted = inflation_adjusted_value(future_value, inflation_rate, years)
        results.append({
            'Compounding Interval': interval,
            'Future Value': future_value,
            'Future Value (Inflation Adjusted)': inflation_adjusted,
            'Total Deposits': total_deposits,
            'Interest Earned': interest_earned
        })
    
    return pd.DataFrame(results)

def plot_compound_interest(initial_balance: float, annual_rate: float, years: float, 
                           addition: float, addition_frequency: int) -> go.Figure:
    compounding_intervals = {
        'Daily (365/Yr)': 365,
        'Monthly (12/Year)': 12,
        'Quarterly': 4,
        'Semi-annually': 2,
        'Annually': 1
    }
    
    fig = go.Figure()
    
    for interval, n in compounding_intervals.items():
        x = np.linspace(0, years, int(years * 12) + 1)
        y = [compound_interest(initial_balance, annual_rate, t, n, addition, addition_frequency)[0] for t in x]
        fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name=interval))
    
    fig.update_layout(
        title='Compound Interest Comparison',
        xaxis_title='Years',
        yaxis_title='Account Balance',
        legend_title='Compounding Interval'
    )
    
    return fig

# Example usage
initial_balance = 10000
annual_rate = 0.05
years = 10
addition = 100
addition_frequency = 12  # monthly
inflation_rate = 0.02

# Calculate and display the comparison table
df = calculate_compound_interest(initial_balance, annual_rate, years, addition, addition_frequency, inflation_rate)
print(df.to_string(index=False))

# Plot the compound interest comparison
fig = plot_compound_interest(initial_balance, annual_rate, years, addition, addition_frequency)
fig.show()