# Financial Services in Tanzania
___

## 1. Introduction

The report at hand will use data from the FSDT Finscope 2017 survey, which contains demographic information and financial service details of around 10,000 individuals in Tanzania. The dataset is a geospatial map of all cash outlets in Tanzania in 2012, which includes commercial banks, community banks, ATMs, micro-finance institutions, mobile money agents, bus stations, and post offices.

In this report, we will start by loading the dataset and checking for any missing observations or invalid data columns, followed by cleaning and processing the data. We will then obtain test statistics for the dataset and analyze it using visualizations to answer questions and draw conclusions based on the dataset.

### 1.1 Importing necessary libraries

The following python libraries will be imported to aid in the analysis of the dataset

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### 1.2 Loading the dataset into the notebook

In [None]:
training_data = "../data/training.csv"

financial_df = pd.read_csv(training_data, sep=",")
financial_df.info()

The `.info()` attribute reveals that the dataset consists of 7094 rows and 37 columns. All the columns except the `Latitude` and `Longitude` columns are of type int64, whereas the 2 columns are float64 types. There do not appear to be any null values across all the columns. 

What this tells us is that the FSDT Finscope 2017 survey was conducted on 7094 individuals and all of them responded to all the questions asked. Now a snapshot of what the dataframe looks like.

In [None]:
financial_df.head(10)

___

## 2. Data Processing

### 2.1 Data validation

The dataframe looks good, there are no missing values so the need to deal with any is not required, the next step is to check whether the values within the dataframe are within the correct range for the respective columns. What is meant by this? Each column aside from the `Latitude` and `Longitude` contains numbers corresponding to answers based on the [table](http://syllabus.africacode.net/projects/data-science-specific/data-visualisation/mobile-money-viz/#:~:text=The%20table%20below%20gives%20the%20variable%20names%20in%20the%20mobile%20money%20data%20file%2C%20with%20a%20description%20of%20the%20questions%20and%20a%20key%20to%20the%20answer%20values.) on the instructions webpage. 

The next step is to verify if the numbers within the columns correspond to the key pairings.

In [None]:
def validate_column_keys(dataframe, column_names, range_of_keys):
    invalid_rows_indices = []
    removed_ids = []
    
    if isinstance(column_names, str):
        column_names = [column_names]
    
    for column in column_names:
        for index, value in dataframe[column].items():
            if value not in range_of_keys:
                invalid_rows_indices.append(index)
                removed_ids.append(dataframe.loc[index, 'ID'])
    
    if invalid_rows_indices:
        dataframe.drop(invalid_rows_indices, inplace=True)
        print(f"{len(invalid_rows_indices)} invalid row(s) removed. IDs removed: {removed_ids}")
    else:
        print("All values are valid.")


The following function will look at the specific columns and determine if the values within these columns correspond to the key value answers given in the table, If there are values which do not the respective columns will be removed from the dataframe.'

The first check is for Q3, which takes either 1, 2, 3 or 4

In [None]:
validate_column_keys(financial_df, "Q3", range(1,5))

It seems all values values are valid in the column `Q3`. The next columns to check are the ones that have yes/no answers that correspond to 1 and 2 and ones that correspond to 1 and 0 respectively.

In [None]:
columns_with_1_and_0_keys = ["Q8_1","Q8_2", "Q8_3","Q8_4","Q8_5","Q8_6","Q8_7","Q8_8","Q8_9","Q8_10","Q8_11","mobile_money","savings","borrowing","insurance"]

columns_with_1_and_2_keys = ["Q6","Q7","Q12","Q14",]

validate_column_keys(financial_df, columns_with_1_and_0_keys,  range(0,2))
validate_column_keys(financial_df, columns_with_1_and_2_keys,  range(1,3))

This step has checked the validity for a majority of the columns, the next validation columns are for the highest level of education complete, `Q4` which takes in values ranging from 1-7, "Which of the following applies to you, `Q5` which takes in values ranging from 1-6.

In [None]:
validate_column_keys(financial_df, "Q4", range(1, 8))

It appears that for the column `Q4` there were values outside the range of acceptable answers. The corresponding rows have been removed and their ID numbers written as evidence.

In [None]:
validate_column_keys(financial_df, "Q5", range(1, 7))


Next, is the `Q13` column which asks the users when was the last time they sent money and the `Q15` column which asks users when last they received money.

In [None]:
money_range = list(range(-1, 0)) + list(range(0, 7))
validate_column_keys(financial_df, "Q13", money_range)

In [None]:
validate_column_keys(financial_df, "Q15", money_range)

Both columns are valid. The next columns are `Q16` which asks users how often they use mobile money for purchases of goods and/or services in the past 12 months and column `Q17`  which asks users how often they use mobile money for paying their bill within the last 12 months.

In [None]:
mobile_money_range = list(range(-1, 0)) + list(range(0, 6))
validate_column_keys(financial_df, "Q16", mobile_money_range)

In [None]:
validate_column_keys(financial_df, "Q17", mobile_money_range)

Both columns appear to be valid. The next columns will deal with the user's language literacy knowledge. The next columns checked will be column `Q18` which asks users about their literacy in Kiswhahili and column `Q19` which checks their literacy in English.

In [None]:
validate_column_keys(financial_df, "Q18", range(0, 6))
validate_column_keys(financial_df, "Q19", range(0, 6))

The next 3 columns to validate are columns `Q9`, `Q10` and `Q11` these are different because, in addition to their answers being within the accepted range, they are also based on whether the individual indicated yes, being `1` for columns `Q8_1`, `Q8_2` and `Q8_3` respectively. To check the validity another condition would have to be added to see if a user answered yes or no.

For `Q9` which checks who the individual works for, this one starts with a -1 and checks the range 1-7 skipping 0. To overcome this situation for the function, a new list which would be a combination of the list that ranges from -1 - 0 (0 exclusive) and 1-7 would be suitable to test the validity of the column.


In [None]:
def validate_if_answer_yes(dataframe, column_to_validate, yes_no_column, range_of_answer):
    invalid_rows_indices = []
    removed_ids = []
    
    for index, value in dataframe[yes_no_column].items():
        dependent_value = dataframe.loc[index, column_to_validate]
        if (value == 1 and dependent_value not in range_of_answer[1:]) or (value == 0 and dependent_value != -1):
            invalid_rows_indices.append(index)
            removed_ids.append(dataframe.loc[index, 'ID'])
    
    if invalid_rows_indices:
        dataframe.drop(invalid_rows_indices, inplace=True)
        print(f"{len(invalid_rows_indices)} invalid row(s) removed. IDs removed: {removed_ids}")
    
    validate_column_keys(dataframe, column_to_validate, range_of_answer)



employment_type_range = list(range(-1, 0)) + list(range(1, 8))

validate_if_answer_yes(financial_df, "Q9", "Q8_1", employment_type_range)

The code checked if the answers in `Q9` are within the accepted range and correspond to the answers `Q8_1`. Next are the `Q10` and `Q11` columns which check if the answers in columns `Q8_2` and `Q8_3` correspond with the responses and if the answers are within the acceptable range.

In [None]:
selling_things_range = list(range(-1, 0)) + list(range(1, 11))
validate_if_answer_yes(financial_df, "Q10", "Q8_2", selling_things_range)

It appears that 88 columns have been removed from the original dataframe because the answers from column `Q10` do not correspond to the answers given in column `Q8_2`. After that, the answers in the keys of column `Q10 ` were confirmed to fall within the accepted range.

In [None]:
providing_service_range = list(range(-1, 0)) + list(range(1, 13))
validate_if_answer_yes(financial_df, "Q11", "Q8_3", providing_service_range)

27 rows in column `Q11` were removed from the `financial_df` dataframe because the yes/no answers in `Q8_3` did not correspond with the answers requested. Afterwards the `validate_column_keys` was applied to the remaining columns of `Q10` and all values were found to fall within the range of that column


The final column to check is the last column, `Mobile_money_classification` 

In [None]:
validate_column_keys(financial_df, "mobile_money_classification", range(0, 4))

All the values in all the columns are valid and their answers fall within the expected range of each answer key. 

Now to observe the dataframe 

In [None]:
financial_df.info()

The dataframe has all the invalid rows removed.

### 2.2 Observing the dataframe's statistics 

Now that the invalid answers have been removed from all the columns, it is time to observe the data's statistics to be able to see what insights about the data can be derived from the result.  

In [None]:
financial_df_subset_1 = financial_df[["ID","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8_1","Q8_2","Q8_3","Q8_4","Q8_5","Q8_6","Q8_7","Q8_8","Q8_10","Q8_11"]]
financial_df_subset_1_statatistic =  round(financial_df_subset_1.describe(), 2)
financial_df_subset_1_statatistic

The statistic table for the dataframe has been split into 2 to make the entirety of the table visible.

From `Q1` we see that the average age of the users is 38 years with a standard deviation of 16.36, this implies that there is not much variability in the ages due to the relatively small value of the standard deviation compared to the mean. The youngest individual is 16 years old and the oldest individual is 100 years old, the table also says that 75% of the users asked were 48 years or younger, this would imply that the 100 is a possible outlier, which would also account for the difference between the median and the mean.

The second column `Q2` deals with the gender of the users, the mean value of the gender is 1.56 which is greater than 1.5, this implies that there are slightly more female than male users, albeit not that many, it also indicates shows that the mean is less than the median making the distribution left skewed.  The `Q3` column for marital status has a mean of 1.79, this could imply that more people are married than any other marital status, this is also further supported since at least 50% of the responses chose the married status leaving the remaining options to be chosen by less than 50% of the users. The `Q4` column deals with the highest level of education, with 1 being the lowest, no formal education 6 the highest, university or higher and 7 being that they do not know. The data shows that at least 75% of the users have Primary school as their highest form of education with at least 25% being no formal education. The mean of 3.06 implies that most users have Primary school as their highest form of schooling.

* The `Q5` column shows that at least, 25% of users chose option 1, that they do not own the plot of land that they live in, whilst less than that chose option 3, that a family member owns the plot of land that they live and the 75th percentile choosing option 4, the land plot being rented, this means that less than 25% of individuals chose option 5, saying that they do not own or rent any land.
* The `Q6` column is a yes/no column that asks users if they own land (other than the land you live on) that they have land certificates of ownership for. The mean of this result being 1.84 indicates that most answers were 2, meaning a vast majority of individuals do not own any land, this is further confirmed by the fact that more than 75% of the users chose option 2, being No
* `Q7` Asks users if they own a mobile phone, with more users saying yes, based on the mean being less than 1.5 and more than 50% of the participants choosing the option yes.
  
Columns `Q8_1` to `Q8_11` deal with how users earn money. When looking at all of their means across all these columns, only `Q8_2` has a mean greater than 0.5, this suggests that people who said they get their money from trading/selling anything they produce/grow/raise/make/collect to sell more than 50% said yes, this is further evident based on the fact that users who said no were more than the 25th percentile but less than the 50th percentile. For the remaining columns, the mean was less than 0.5 indicating that more than half of users who answered those questions chose the answer no. This can be seen in how every column has 0 in their 75th percentile except for columns `Q8_2` and `Q8_4`, with `Q8_4` being at least in the 75th percentile.

This implies that out of the 6977 participants asked to participate in the survey, a vast majority of them are around the age of 38 with a variation of 16 years below and above, with most being females that are married with primary school being the highest level of education for most users. The data also shows that most users either do not own a plot of land or the land belongs to a family member, with most users making money through trading or selling things they produce/grow/raise/make/collect with the intention of selling.


In [None]:
financial_df_subset_2 = financial_df[["Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16","Q17","Q18","Q19","Latitude","Longitude","mobile_money","savings","borrowing","insurance","mobile_money_classification"]]
financial_df_subset_2_statatistic =  round(financial_df_subset_2.describe(), 2)
financial_df_subset_2_statatistic

The second statistic table deals with the remaining columns of the dataframe. 
* Column `Q9` is related to `Q8_1`, and is for people who said they get their money through a salary/wage. The mean value 0f -0.79 implies that a lot of people chose -1, implying this option does not apply to them, with more than 75% of the responses being -1 indicating many individuals do not receive a salary/wage as seen in the statistic for column `Q8-1`.
* Column `Q10`'s responses are for individuals who answered yes for column `Q8_2`. A positive mean for this result implies that many of the values are positive, when looking at the percentiles at least 25% of the responses were no, whereas the yes responses were in the 50th and 75th percentiles. the high value of the standard deviation implies variability in the answers, this could be due to the other responses acting as outliers.
* Colum `Q11`'s responses are for individuals who answered column `Q8_3`. It asks about users who get their money by providing a service, and what kind of service they provide. The mean is a negative value, this implies that this option did not apply to them further cementing the fact that most users received their money through trading and selling, this is seen in more than 75% of responses being -1.
* Column `Q12` asks users if they have sent money to someone. The mean value of 1.70 indicates that a large number of users do not send money to someone either in the country or outside. This is further supported by the fact that less than the 50th percentile of the data is the value 1 which is no.
* Column `Q13` asks users when last did they send money. The results indicate that at least 50% of users said this option did not apply to them, implying that they do not send money and around the 75th percentile indicated they sent money within the last 7 days.
* Column `Q14` asks users if they have received money from either someone within the country or outside it, the mean value being closer to 2(1.62) indicates that a majority of users do not receive money, the number of users receiving money is within the 25th percentile and the numbers of users who do not were within the 50th and 75th percentiles. This is also seen in the statistics for `Q15` which asks users when they last received money, with not applicable responses being in the 50th percentile indicating that at least 50% of users do not receive money and some users receive money within 30 days or less.
* Columns `Q16` and `Q17` ask users how often they use mobile money to either purchase goods and services or pay their bills. The results indicate that at least 50% of users never use mobile money to purchase goods and services and the 75th percentile of users use it weekly, yet the option to pay their bills with it is inapplicable to more than 75% of users.
* When looking at their literacy in  Kiswahili and English, columns `Q18` and `Q19` for the test statistics tell us that at least 50% can read and write Kiswahili and the 75th percentile can do neither, the mean value (1.87) being low supports the idea that a vast majority of users can read and write in Kiswahili, yet when you look at the English language at individuals who can neither read nor write in English are within the 50th and 75th percentiles with individuals who can only read in English being in the 25th percentile, this also further supported by the value of the mean being relatively high (between 3 and 4)

The next column is the `mobile_money` money column which asks users if they use mobile money, the mean value implies that more than half the users use mobile money with the 50th percentile and above being yes. The `savings` column asks users do they save, the mean implies that less than half the users save, with yes responses being in the 75th percentile. The same breakdown can be applied to the `borrowing` column that asks users if they borrow money.
The `insurance` column asks users whether they have insurance or not. The mean value of 0.15 indicates that almost all users have no insurance, this is also further supported by the fact that the response no was in the 75th percentile implying that at least 75% of users indicated that they do not have insurance. The last column `mobile_money_classification` looks to identify under which classification users identify in terms of how they use mobile money. The 25th percentile of the data indicates that users do not use mobile money or any other financial service, the 50th percentile of the data is on the response that users only use mobile money, and the 75th percentile is the option that users use mobile money in addition to at least one other financial service.
___

## 3. Analysis

The best way to visually represent the relationship between different columns is by using graphs that are easy to interpret and understand.

### 3.1 The relationship between age and the type of financial service used

The relationship between the ages of the users as well as the type of financial service they use will be the first relationship to look at that will help determine if age is a factor in the type of financial service being used.

In [None]:
def plot_mobile_money_classification(df, column, title, xlabel, bins=None, xtick_labels=None):
    classifications = ['Does not use any financial service', 'Does not use mobile money', 'Uses mobile money only', 'Uses both']
    df_filtered = df[df['mobile_money_classification'].isin([0, 1, 2, 3])].copy()

    if bins is None:
        bins = range(df_filtered[column].min(), df_filtered[column].max() + 6, 5)

    df_filtered['Bins'] = pd.cut(df_filtered[column], bins=bins, right=True)
    grouped_data = df_filtered.groupby(['Bins', 'mobile_money_classification'], observed=False).size().unstack(fill_value=0)
    grouped_data.plot(kind='bar', stacked=False, figsize=(12, 6))

    plt.title(title, fontweight='bold')
    plt.xlabel(xlabel, fontweight='bold')
    plt.ylabel("Number of financial services users", fontweight='bold')

    if xtick_labels:
        plt.xticks(ticks=range(len(bins) - 1), labels=xtick_labels, rotation=0)
    
    legend = plt.legend(title='Mobile Money Classification', labels=classifications)
    legend.get_title().set_fontweight('bold')
    plt.grid(axis='y', linestyle='--', alpha=0.8)
    plt.tight_layout()
    plt.show()
    
plot_mobile_money_classification(financial_df, 'Q1', "Mobile Money Dstribution by age", "Age",)

The code above was created to plot any column against the `mobile_money_classification` column to ensure that they all have the same uniformity without having to repeat the tedious process of typing up the code every time but instead just calling the function `plot_mobile_money_classification` for any column that needs to be analysed.

The bar chart presents the statistics of individuals who utilize different financial services, and it is divided into three categories based on the `mobile_money_classification` column responses. The three groups are: people who use a financial service other than mobile money, users who use mobile money exclusively and those who use both mobile money and another financial service. 

All three histograms show a right-skewed distribution. For those who do not use mobile money, the peak of the histogram falls within the age range of 26 (not inclusive) to 31. This suggests that the majority of people who do not use mobile money are between the ages of 26 and 31, and they also have the second-highest number of users overall. The lowest number of users overall are those who use mobile money exclusively. This histogram is also skewed to the right, with the peak falling within the age range of 21 to 26 (inclusive). This implies that the average age of users who only use mobile money is somewhere between 21 and 26, with a steady decline in usage as age increases. More users indicated that they use both mobile money and another financial service, with the peak falling within the age range of 21 to 26. This suggests that users who use both mobile money and another financial service are also within the age range of 21 to 26, with a sharp drop in usage as age increases.

Overall it appears that it mostly the younger age ranges that use some financial services, although the mean age of all individuals is within is less than 39 years the peak of all histograms is less than 36 years old.

### 3.2 The relationship between gender and the type of financial service used

In [None]:

plot_mobile_money_classification(df=financial_df,
                                 column='Q2',
                                 title='Distribution of Mobile Money Classification by Gender',
                                 xlabel='Gender',
                                 bins=[0.5, 1.5, 2.5],  
                                 xtick_labels=['Male', 'Female'])


When comparing the users of different financial services in Tanzania both both genders follow the same trend. In both male and female users, the least number of people indicated that they use mobile money only, with female users being slightly more than male users by less than 100 participants. When observing users who use any financial service aside from mobile money more female users indicate yes than male users, it is also the second highest option chosen. The highest option chosen by both genders is the use of mobile money and any other financial service,  this is the only choice where the number of male participants chose more than the number of female participants. these results indicate that both genders opt to use money mobile concurrently with any other financial service and more women opt to use another financial service other than mobile money. 

### 3.3 The relationship between maritial status and the type of financial service used

In [None]:
plot_mobile_money_classification(df=financial_df,
                                 column='Q3',
                                 title='Distribution of Mobile Money Classification by Marital Status',
                                 xlabel='Marital Status',
                                 bins=[0.5, 1.5, 2.5, 3.5, 4.5], 
                                 xtick_labels=['Married', 'Divorced', 'Widowed', 'Single/never married'])

After applying the `plot_mobile_money_classification` function comparing the marital statuses of users to the type of financial service they use the following results were obtained.

Upon the initial observation, it is clear that a vast majority of users who answered this question were married and out of all the married individual's majority of them indicated that they use both mobile money and some other financial service with responses exceeding 2000, followed by slightly less than 1250 uses indicating that they use some other financial service and lastly less than 500 married participants indicating that they use both. When observing all other marital statuses it appears that for almost all of them, less than 500 people answered, with slightly only more than 500 single people indicating they use both mobile money and some other financial service, in all 3 marital status groups it appears that the number of people who indicated that they use mobile money only is the lowest.

### 3.4 The relationship between land ownership and the type of financial service used

In [None]:
plot_mobile_money_classification(df=financial_df,
                                 column='Q6',
                                 title='Distribution of Mobile Money Classification by Land Ownership',
                                 xlabel='Land Ownership',

                                 bins=[0.5, 1.5, 2.5], 
                                 xtick_labels=['Yes', 'No'])


Upon initial inspection of the response above, it may seem as though a small number of individuals who were asked to participate in the study do not own any land. However, upon closer examination, it is apparent that despite the differing number of responses between the two options, they follow a similar trend. Both answers show that the least number of people indicated that they use only mobile money. This is followed by users who indicated that they use mobile money as well as some other financial service, with the latter response being slightly more than the former. After that, there is a significant difference in response to users indicating that they use both mobile money and some other financial service concurrently.

### 3.5 The relationship between what type of income earned and the type of financial service used

In [None]:
income_columns = ['Q8_1', 'Q8_2', 'Q8_3', 'Q8_4', 'Q8_5', 'Q8_6', 'Q8_7', 'Q8_8', 'Q8_10', 'Q8_11', 'mobile_money_classification']
classifications = ['Does not use any financial service', 'Does not use mobile money', 'Uses mobile money only', 'Uses both']

filtered_df = financial_df[income_columns].copy()
filtered_df = filtered_df[filtered_df['mobile_money_classification'].isin([1, 2, 3])]

column_mapping = {
    'Q8_1': 'Salaries/wages',
    'Q8_2': 'Trading/selling produce',
    'Q8_3': 'Service providing income',
    'Q8_4': 'Piece work/Casual labor',
    'Q8_5': 'Rental income',
    'Q8_6': 'Interest from savings/investments',
    'Q8_7': 'Pension',
    'Q8_8': 'Social welfare grant',
    'Q8_10': 'Expenses covered by others',
    'Q8_11': 'Other',
    'mobile_money_classification': 'Mobile Money Classification'
}

filtered_df.rename(columns=column_mapping, inplace=True)

grouped_data = filtered_df.groupby('Mobile Money Classification').sum()
grouped_data.transpose().plot(kind='bar', stacked=False, figsize=(12, 8))

plt.title('Distribution of Mobile Money Classification by Income Category', fontweight='bold')
plt.xlabel('Income Category', fontweight='bold')
plt.ylabel('Number of financial services users', fontweight='bold')
plt.legend(title='Mobile Money Classification', labels=classifications)
plt.grid(axis='y', linestyle='--', alpha=0.8)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

To see how the distribution of users who use different financial services based on the type of income they earn, the `financial_df` dataframe was filtered to only have the columns `'Q8_1', 'Q8_2', 'Q8_3', 'Q8_4', 'Q8_5', 'Q8_6', 'Q8_7', 'Q8_8', 'Q8_10', 'Q8_11' and 'mobile_money_classification'` and only choose the yes responses, this was then used as a criteria for the grouped bar graph for the 3 classifications for the type of financial service used.

Upon analyzing the bar graphs, it is evident that the majority of survey respondents earn their income from trading or selling produce, followed by piece work or casual labour, salaries or wages, and a service-providing income. The remaining income methods have very few responses, less than 200. 

Most users who earn a living from trading or selling produce prefer to use both mobile money and some other financial service. Following them are users who use some other financial service but not mobile money. A small number of users use mobile money exclusively. A similar trend is observed for users who earn their income from piece work. 

When it comes to users who earn their income from salaries or wages and those who provide a service, the majority of their users indicate that they use both mobile money and some other financial service. However, for the other two income methods, there is minimal difference between users who use mobile money and those who use some other financial service. 

For the remaining income methods, there are very few responses, with most users indicating that they use both mobile money and some other financial service. The exception is users who receive social welfare, who prefer to use some other financial service and not mobile money. 

When looking at users whose expenses are covered by others, although all responses are very little there appears to be a moderate difference in the type of financial service they use.

___

## 4. Conclusion

The object of this report was to analyze the survey response from the FSDT Finscope 2017 survey, containing demographic information and financial service details of around 7,094 individuals in Tanzania and see what insights could be drawn from them regarding the type of financial services they use. Before the initial analysis of the report could take place a check was put in place to ensure that all the responses from the dataframe were valid. After checking and removing invalid responses from the dataframe, only 6977 valid responses remained.

From these remaining users, the test statistics for each column of the `financial_df` dataframe were calculated and it was found that the average age of participants was of the age 38,24 &asymp; 38, with the youngest being 16 and the oldest being 100 years. The test statistic results also indicated that there were slightly more women than men that participated in the survey as was supported by the bar graph plotted as well. When looking at the marital statuses of users as many as 50% of participants indicated that they were married which would also affect the type of response as most married individuals are seen as more financially responsible than single and divorced couples. Another factor that was observed was the highest level of education, where at least 75% of users indicated that that their highest level of education was primary school.

While the test statistics and the insights of each column were calculated and given, emphasis was given on the `mobile_money_classification` and how its use is affected by variables such as age, gender, marital status, land ownership and type of income.

From the respective graphs, according to age, it appears that many users prefer using both mobile money and some other mobile money with the highest number of users of such services being younger than the mean age, with almost all age groups indicating that they would rather opt for both or only use some other form of mobile money. 

When comparing the use of mobile money to gender the same distribution can be seen for both genders with a vast majority of both users indicating they prefer to use both mobile money and some other mobile money, followed by the option to use some other mobile money than with the least responses being the use of mobile money only.

When looking at the marital statuses, they were divided into 4 groups 'Married', 'Divorced', 'Widowed', 'Single/never married', and with all four groups, the responses were the same, with most users indicating they prefer to use both mobile money and some other mobile money, followed by using some other mobile money and not mobile money then finally mobile money only. The only difference is that more users who responded were from the married group than any other. The same distribution is seen in other factors as well, this is evident in the `Distribution of Mobile Money Classification by Land Ownership` as well as the `Distribution of Mobile Money Classification by Income Category` graph.


Based on these responses it appears that the majority of users do not simply use mobile money, alone, most prefer to use it concurrently with other mobile money. Other users indicated they prefer some other mobile money more than mobile money. There could be many reasons for this, technological illiteracy, lack of knowledge of the service or maybe even lack of access to the devices needed for such services.