<!-- metadata: title -->
# Kenya Unit Trusts: Money Market Fund(KES) Analysis

<!-- metadata: subtitle -->
> ### Can Kenyan Money Market Funds gurantee capital preservation? 

<!-- metadata: date, type=date -->
**Published Date:**
2024-03-03

<!-- metadata: date-modified, type=date-->
**Date Modified:**
2024-05-05

<!-- metadata: keywords, type=array -->
**Keywords:**
  - money
  - kenya
  - unit-trusts
  - money-market-funds
  - MMF

<!-- metadata: categories, type=array -->
**Categories:**
  - kenya unit trusts
  - data science
  - money

## Description

<!-- metadata: description -->
Money market is a form of unit trust, where fund managers collect money from the group of investors, and invest on their behave. This reduces the overhead of managing your portfio and significantly reduces your risk. Let's statistically and critically analyze Money market funds in Kenya in general using publicly available information and hopefully paint a clearer picture of the state of unit trusts in Kenya. 

- What are the risk factors that exist? 
- What is the performance of money market funds?

## Abstract

<!-- metadata: abstract -->
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

## Introduction

In Kenya, a good number of the population have a bank account, even if not the traditional bank account. It is estimated that by year 2029, 99.9% of Kenyans will be banked, ^[Population share with banking account in Kenya 2014-2029
Published by J. Degenhard, Jan 30, 2024. <https://www.statista.com/forecasts/1149636/bank-account-penetration-forecast-in-kenya>]. With increase in financial access, financial literacy and regular individuals will want to venture in the teritory of finantial assets huunting for higher interest rates. One of the most attractive entry level high-yield financial asset is the unit trust, specifically the money market funds. Its easy to start, deposit, withdraw, and its interest is daily compunding, while offering higher interest than any bank.

## Analysis

### Imports

In [None]:
import sys
import os

# Add parent directory to sys.path
root_dir = os.path.abspath(os.path.join(os.getcwd(), '../..'))
sys.path.append(root_dir)

%load_ext autoreload
%autoreload 2

In [None]:
import pandas as pd
from pyppeteer.page import Page
import asyncio
import json
import io
from bs4 import BeautifulSoup, Tag
from urllib.request import urlopen
from pyppeteer.page import Request
from tqdm import tqdm
from python_utils.web_screenshot import web_screenshot_async
from python_utils.get_browser import get_browser_page_async

### Fund Managers

Let's start of by listing all the certified fund mangers in Kenya by CMA.^[Approved Fund Managers by CMA. <https://www.cma.or.ke/licensees-market-players/>]

Lets start with a screenshot of the webpage.

In [None]:
async def action(page: Page):
    await page.waitForSelector('ul.module-accordion')
    elements = await page.querySelectorAll('li .accordion-title')
    # Iterate through the elements to find the one containing 'FUND MANAGERS'
    for element in elements:
        text_content = await page.evaluate('(element) => element.textContent', element)
        if 'FUND MANAGERS' in text_content:
            # Click on the target element
            await element.click()
            break
    else:
        print('Element not found')
    await page.waitForSelector('li.current.builder-accordion-active')
    await asyncio.sleep(1)

await web_screenshot_async(
    "https://www.cma.or.ke/licensees-market-players/", 
    action = action,
    width=1500)

Let's query the "Fund Managers" table.

In [None]:
url_response = urlopen("https://www.cma.or.ke/licensees-market-players/").read()
fund_managers_html_table = BeautifulSoup(url_response, "html.parser")\
    .find('span', string="FUND MANAGERS")\
        .parent\
            .parent\
                .parent\
                    .find('table')

fund_managers_df = pd.read_html(io.StringIO(str(fund_managers_html_table)))[0].dropna()
fund_managers_df

The address of `African Alliance Kenya Asset Management Limited` doesnt seem to be valid, lets populate it with `P.O. Box 27639 Nairobi 00506`

In [None]:
fund_managers_df.loc[fund_managers_df['LICENCE NO.'] == 165, 'ADDRESS'] = 'P.O. Box 27639 Nairobi 00506'
fund_managers_df

### Add Columns

- Location Coordinates (and google map)
- Headquter location/address/country
- Launch Date
- Risk Profile
- Trustee
- Custodian
- Auditors
- Minimum Investment
- Minimum Additional Investment
- Initial Fee
- Annual Management Fee
- Distribution
- Asset Under Management/Market share
- Advertised Rate [Gross, Net]
- Duration to withdraw
- Security - joint account verification/validation
- has online portal
- Withdraw charges
- Contacts

https://cytonnreport.com/research/cmmf-fact-sheet-june-2021

https://cytonnreport.com/research/cmmf-fact-sheet-may-2021

https://cytonnreport.com/research/cmmf-fact-sheet-april-2021

https://ke.cicinsurancegroup.com/mmf/

### Getting the Perormance

According to Capital Markets Authority, fund managers are required to publish their yields daily in a reputable newspaper. What this means accessing hostorical records requires a significant investment in time to collect, aggregate and validate the published yields. That not withstanding, two very important questions become obvious:
 - what is a reputbale newspaper? 
 - what yield do they publish, gross or net?
 - What picture does an anualized daily rate paint?
 - Is interest deffered or carried forward (reporting a convervative figure and retaining the rest for rainy day to preserve a picture of good performance)?

Luckily, one of the fund managers, one that has recently found its self in the courts far more often than it would have wished, does exelent investment and market research. It has a good-enough aggregate of fund managers and their performance weekly and monthly. We are going to crawl their data for analysis, and analyze it for manipulation.If the data comeout clean, we will analyze the trend of fund managers using the data.

### Cytton Research

We are going to crawl the data from cytonn research, https://cytonn.com/researches/categories/1

#### screen shots

Lests start with a view of weekly reports

In [None]:
await web_screenshot_async(
    "https://cytonn.com/researches/categories/1",
    width=1000)

Here is the latest report

In [None]:
await web_screenshot_async(
    "https://cytonnreport.com/research/cytonn-h12024-markets",
    width=1000)

Instead of directly crawling HTML from https://cytonn.com/researches/ page, we can instead crawl JSON from https://cytonnreport.com/research page, using the link https://cytonnreport.com/get/allreports.

In [None]:
async def get_all_cytonn_reports(per_page_count: int = 10):
    page, browser = await get_browser_page_async()
    reports_url = "https://cytonnreport.com/get/allreports"
    reports_headers: dict = None
    reports_method: str = None
    async def catch_request(request: Request):
        nonlocal reports_headers
        nonlocal reports_method
        if request.url == reports_url:
            reports_headers = request.headers.copy()
            reports_method = request.method
            await request.continue_()
        else:
            await request.continue_()
    async def get_cytonn_reports(current_page: int):
        js_fetch_fn = f'''
            async () => {{
                const response = await fetch(
                    "{reports_url}", 
                    {{
                        "headers": {json.dumps(reports_headers)},
                        "referrer": "https://cytonnreport.com/research",
                        "referrerPolicy": "no-referrer-when-downgrade",
                        "body": {json.dumps(json.dumps(
                            {
                                "pagination": {
                                    "per_page": per_page_count, 
                                    "current_page": current_page
                                }
                            }))},
                        "method": "{reports_method}",
                        "mode": "cors",
                        "credentials": "include"
                    }});
                const json = await response.json();
                return json;
            }}
        '''
        response_json = await page.evaluate(js_fetch_fn)
        return response_json
    # Enable request interception
    await page.setRequestInterception(True)
    # Attach the request handler
    page.on('request', lambda request: asyncio.ensure_future(catch_request(request)))
    # Navigate to the desired URL
    await page.goto("https://cytonnreport.com/research")
    while not reports_headers:
        await asyncio.sleep(1)
    current_page = 1
    all_reports = []
    pbar: tqdm = None
    while True:
        reports_response = await get_cytonn_reports(current_page)
        reports = reports_response['data'] if reports_response else []
        if len(reports) > 0:
            total = reports_response['total']
            pbar = pbar or tqdm(total=total)
            pbar.update(len(reports))
            all_reports.extend(reports)
            last_page = reports_response['last_page']
            if last_page == current_page:
                break
            current_page += 1
        else:
            break
    await browser.close()
    if pbar:
        pbar.close()
    return all_reports

all_cytonn_reports = await get_all_cytonn_reports()
print(f'There are {len(all_cytonn_reports)} reports')

In [None]:
# https://charanhu.medium.com/converting-pandas-dataframe-into-a-dataset-and-pushing-to-hugging-face-146e2ccac38d
all_cytonn_reports_df = pd.DataFrame(all_cytonn_reports)
with pd.option_context(
  'display.max_columns', None, 
  'display.max_colwidth', 100):
  display(all_cytonn_reports_df)

# all_cytonn_reports_df[['researchdate', 'created_at', 'updated_at', 'deleted_at', 'date']]

In [None]:
import re
from typing import Callable
from copy import copy

fund_manager_names = fund_managers_df.get('NAME').values.tolist()
class RecordInfo:
    def __init__(self, record_type: str, record_date: str, record_value: str, fund_manager: str):
        self.record_type = RecordInfo.__validate_record_type(record_type)
        self.record_date = record_date
        self.record_value = RecordInfo.__validate_record_value(record_value)
        self.fund_manager = RecordInfo.__validate_fund_manager(fund_manager)
    @staticmethod
    def __validate_fund_manager(value: str) -> str|None:
        first_name = lambda i: i.lower().strip().split(' ')[0]
        names = [i for i in fund_manager_names if first_name(i) in value.lower()]
        if len(names) == 1:
            return names[0]
        # raise Exception('not able to get accurate fund manager!')
        if 'total' not in str(value or '').lower():
            fund_manager_names.append(value)
            return value
        return None
    @staticmethod
    def __validate_record_value(value: str|float) -> str|None:
        if type(value) == float:
            return value
        # remove percentage sign
        value = value.rstrip('%')
        # remove comma and white space
        value = ''.join([i for i in value if i not in [' ', ',']])
        return float(value)
    @staticmethod
    def __validate_record_type(value: str) -> str|None:
        value = value.upper()
        # 'AUM' - Assets Under Management
        # 'EAR' - Effective Annual Rate
        return value if value in ['AUM', 'EAR'] else None

def column_name_match_fn(x: str, y:str) -> bool:
    return \
        x.strip().lower() == y.strip().lower() or\
        re.sub(r'\s+', ' ', x.strip().lower()).replace(" ", "-") == re.sub(r'\s+', ' ', y.strip().lower()).replace(" ", "-")

# type: AUM or EAR, fund_manager: CIC } - Assets Under Management / Effective Annual Rate

table_columns_list: list[tuple[list[str], list[Callable[[pd.Series, dict], RecordInfo]]]] = [
    (
        ['Rank', 'Fund Manager', 'Effective Annual Rate'], 
        [
            lambda row, record: RecordInfo("EAR", record['researchdate'], row['Effective Annual Rate'], row['Fund Manager'])
        ]
    ),
    (
        ['Rank', 'Fund Manager', 'Effective Annual'], 
        []
    ),
    (
        ['Rank', 'Fund Manager', 'Daily Yield', 'Effective Annual Rate'], 
        []
    ),
    (
        ['(Kshs mn)', 'Market Share', '(Kshs mn)', 'Market Share', "FY’2023 –Q1'2024"], 
        []
    ),
    (
        ['no.', 'fund-managers', 'q1’2020-aum(kshs-mns)', 'q1’2020market-share', 'q2’2020-aum(kshs-mns)', 'q2’2020market-share', 'aum-growthq1’2020-–-q2’2020'], 
        []
    ),
    (
        ['no.', 'fund-managers', "fy'2019-aum(kshs-mns)", 'q1’2020-aum(kshs-mns)', "aum-growth*fy'2019-–-q1’2020"], 
        []
    ),
    (
        ['no.', 'fund-managers', "fy'2018-aum-(kshs-mns)", "h1'2019-aum-(kshs-mns)", "aum-h1'2019-annualized-growth"], 
        []
    ),
    (
        ['no.', 'money-market-fund', '2018-average-effective-annual-yield-p.a.'], 
        []
    ),
    (
        ['no.', 'fund-managers', 'q2’2020-aum', 'q2’2020', 'q3’2020-aum', 'q3’2020', 'aum-growth'], 
        [
            lambda row, _: RecordInfo("AUM", "Q2 2020", row['q2’2020-aum'], row['fund-managers']),
            lambda row, _: RecordInfo("AUM", "Q3 2020", row['q3’2020-aum'], row['fund-managers'])
        ]
    ),
    (
        ['rank', 'money-market-funds', 'effective-annual-rate-(average-q3’2020)'], 
        []
    ),
    (
        ['no.', 'fund-managers', "fy'2018-aum(kshs-mns)", "fy'2019-aum(kshs-mns)", "aum-growthfy'2018---fy'2019"], 
        []
    ),
    (
        ['no.', 'fund-managers', "fy'2018-money-market-fund(kshs-mns)", "fy'2019-money-market-fund(kshs-mns)", "fy'2018-market-share", "fy'2019-market-share", 'variance'], 
        []
    ),
    (
        ['rank', 'money-market-funds', 'effective-annual-rate-(average-fy’2019)'], 
        []
    ),
    (
        ['no.', 'unit-trust-fund-manager', 'aum', '%-of-market-share'], 
        []
    ),
    (
        ['no.', 'fund-managers', "h1'2018-money-market-fund(kshs-mn)", 'fy’2018-money-market-fund-(kshs-mn)', "h1'2019-money-market-fund(kshs-mn)", "annualized-h1'2019-growth"], 
        []
    ),
    (
        ['#', 'fund-managers', "h1'2018-money-market-fund-aum-(kshs-mn)", "fy'2018-money-market-fund-aum(kshs-mn)", "h1'2019-money-market-fund-aum(kshs-mn)", "annualized-h1'2019-aum-growth"], 
        []
    )
]

def get_table(table: Tag):
    for tag in table.find_all(True):
        tag.attrs = {} # remove tags such as colspan and rowspan
    for (table_columns, extractor_callbacks) in table_columns_list:
        clean_up_tasks: list[Callable[[], None]] = []
        header_tr_s: list[Tag] = table.select('thead tr')
        is_match = False
        for header_tr in header_tr_s:
            header_td_s: list[Tag] = header_tr.find_all('td')
            is_match_new = \
                len(header_td_s) == len(table_columns)\
                and all(
                    [column_name_match_fn(header_td.get_text(strip=True), table_column) 
                     for header_td, table_column 
                     in zip(header_td_s, table_columns)])
            if not is_match_new:
                clean_up_tasks.append(header_tr.extract)
            is_match = is_match or is_match_new
        if is_match:
            try:
                [clean_up_task() for clean_up_task in clean_up_tasks]
                table_df = pd.read_html(io.StringIO(str(table)))[0]
                table_df.columns = table_columns
                return (table_df, extractor_callbacks)
            except Exception as e:
                print('error', e, table)
                continue
    return (None, None)

def is_valid_dataframe(df: pd.DataFrame | None) -> bool:
    return df is not None and not df.empty

def get_tables(html: str):
    parsed_html = BeautifulSoup(html, "html.parser")
    tables: list[Tag] = [table for table in parsed_html.find_all('table')]
    for table in tables:
        table_df, extractor_callbacks = get_table(copy(table))
        if is_valid_dataframe(table_df):
            yield (table_df, extractor_callbacks)

def extract_table_by_column_names(record: pd.Series):
    topics: list[dict] = record['topics']
    all_topic_bodies = ' '.join([topic.get('body') for topic in topics])
    raw_tables__extractor_callbacks = get_tables(all_topic_bodies)
    for raw_table, extractor_callbacks in raw_tables__extractor_callbacks:
        if len(extractor_callbacks) > 0:
            for callback in extractor_callbacks:
                extracted: list[RecordInfo] = [callback(raw_table_row, record) for _,raw_table_row in raw_table.iterrows()]
                yield raw_table, pd.DataFrame([vars(i) for i in extracted if i.fund_manager])
        else:
            yield raw_table, None

In [None]:
# all_cytonn_reports_df.iloc[1]
# some pages have more than one table, ge: https://cytonnreport.com/research/unit-trust-fund-performance-q3-1
example_record = all_cytonn_reports_df.loc[
    all_cytonn_reports_df['url'] == 'https://cytonnreport.com/research/unit-trust-fund-performance-q3-1'
].iloc[0]
raw_and_extracted_dataframes = extract_table_by_column_names(example_record)

In [None]:
raw_df, extracted_df = next(raw_and_extracted_dataframes)
extracted_df

In [None]:
raw_df

In [None]:
table_paths = 'extracted_tables'
os.makedirs(table_paths, exist_ok=True)
for _,record in tqdm(all_cytonn_reports_df.iterrows(), total=len(all_cytonn_reports_df)):
    raw_and_extracted_dataframes = extract_table_by_column_names(record)
    for _, extracted_df in raw_and_extracted_dataframes:
        if extracted_df is not None:
            extracted_df.to_json(f'extracted_tables/{record.id}.json', orient='records')

In [None]:
from glob import glob

dataframes = []
for filename in  glob(f'{table_paths}/*.json'):
    json_df = pd.read_json(filename)
    dataframes.append(json_df)
combined_df = pd.concat(dataframes, ignore_index=True)
combined_df

In [None]:
effective_annual_rate_df = combined_df[combined_df['record_type'] == 'EAR'].drop(columns=['record_type']).copy()
effective_annual_rate_df['record_date'] = pd.to_datetime(effective_annual_rate_df['record_date'])
pivot = effective_annual_rate_df.pivot(index='record_date', columns='fund_manager', values='record_value')
pivot

In [None]:
pivot.plot(figsize=(20, 12))

In [None]:
list(effective_annual_rate_df.groupby(['fund_manager', 'record_date']))[0][1]

In [None]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'Category': ['A', 'B', 'A', 'B'],
    'Value': [10, 20, 15, 25]
})

# Create a DataFrameGroupBy object
grouped = df.groupby(['Date', 'Category'])

# Perform an aggregation (e.g., sum)
aggregated = grouped['Value'].sum().reset_index()

# Now pivot the aggregated data
pivoted = aggregated.pivot(index='Date', columns='Category', values='Value')

pivoted

<hr/>

In [None]:
cant_get_by_topic_df = all_cytonn_reports_df[all_cytonn_reports_df['by_topics'] == False].reset_index(drop=True)
cant_get_by_topic_df

In [None]:
def predicate(row: pd.Series):
    string_value = ' '.join(str(column) for column in row).lower()
    unwanted_regexes = ['cic\s*group', 'cic\s*insurance', 'cic\s*academia']
    for unwanted_regex in unwanted_regexes:
        string_value = re.sub(unwanted_regex, "", string_value, flags=re.IGNORECASE)
    return 'cic' in string_value
indexes_with_cic = [index for index,x in cant_get_by_topic_df.iterrows() if predicate(x)]
indexes_with_cic

In [None]:
len(indexes_with_cic)

In [None]:
import webbrowser

webbrowser.get("/usr/bin/google-chrome %s")
for index in indexes_with_cic[40:]:
    url = str(cant_get_by_topic_df.loc[index, 'url'])
    webbrowser.open(url)

In [None]:
row = all_cytonn_reports_df.loc[
    all_cytonn_reports_df['url'] == 'https://cytonnreport.com/research/unit-trust-fund-performance-q3-1'
].iloc[0]
dfs = extract_table_by_column_names(row)

In [None]:
dfs[0]