<div id="toggle_code">...</div>

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

<script src="https://cdn.jsdelivr.net/gh/philipwlewis/jupyterlab-toc-toggle@1.0/jlab-toc-toggle.js"></script>

<style>

.jlab-table td {

border: 1px solid black !important;

text-align: center !important;

background: white !important;

}

.jlab-table table {

background: white !important;

margin: 1em auto 1em auto !important;

text-align: center !important;

border-collapse: collapse !important;

border: 1px solid black !important;

}

.jlab-table th {

border: 1px solid black !important;

text-align: center !important;

background: aliceblue !important;

}

</style>

In [1]:
# Function Definition
%matplotlib agg
import io
import base64
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import HTML
from scipy.stats import linregress, spearmanr, f_oneway, ttest_ind
from scipy.stats import stats

def to_upper_str(df, columns):
    """
    Convert specified columns in a DataFrame to uppercase strings and remove leading and trailing whitespaces.

    Parameters:
        df (DataFrame): The DataFrame containing the data.
        columns (list): A list of column names in the DataFrame to be converted.

    Returns:
        df (DataFrame): A copy of the original DataFrame with the specified columns converted to uppercase strings.
    """
    
    for column in columns:
        df[column] = [str(item).upper().strip() for item in df[column].dropna().tolist()]
    return df

def drop_empty(df, column):
    """
    Remove rows with empty values in the specified column of a DataFrame.

    Parameters:
        df (DataFrame): The DataFrame containing the data.
        column (str): The name of the column in the DataFrame to be checked for empty values.

    Returns:
        df (DataFrame): A copy of the original DataFrame with rows containing empty values in the specified column removed.
    """
    
    df[column].replace('', np.nan, inplace=True)
    df = df.dropna(subset=[column])
    
    return df

def get_outliers(df, columns):
    """
    Get a list of outlier IDs from the specified columns of a DataFrame.

    Parameters:
        df (DataFrame): The DataFrame containing the data.
        columns (list): A list of column names in the DataFrame to check for outliers.

    Returns:
        all_outlier_list (list): A list of outlier IDs found in any of the specified columns.
    """
    
    # Initialize an empty list to store all outlier IDs.
    all_outlier_list = []
    for column in columns:
        
        # Calculate quartiles and IQR for the column.
        Q1 = df[column].quantile(0.25)
        Q3 = df[column].quantile(0.75)
        IQR = Q3 - Q1
        
        # Calculate the lower and upper bounds for outliers.
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        
        # Find outliers and add their IDs to the outlier list.
        all_outlier_set = set(all_outlier_list)
        col_outlier_set = set(df[(df[column] < lower_bound) | (df[column] > upper_bound)]["id"].tolist())
        all_outlier_list = list(all_outlier_set.union(col_outlier_set))
        
    return all_outlier_list

def get_error_bar(values):
    """
    Calculate the mean and standard error of a set of values.

    Parameters:
        values (array-like): An array-like object containing the values.

    Returns:
        tuple: A tuple containing the mean and standard error of the values.
    """
    
    mean = np.mean(values)
    std_err = np.std(values) / np.sqrt(len(values))
    
    return mean, std_err

def linear_regression(x_values, y_values, ax):
    """
    Perform linear regression and plot the best fit line along with the regression equation.

    Parameters:
        x_values (array-like): The x values.
        y_values (array-like): The y values.
        ax (matplotlib.axes.Axes): The axes object where the plot will be drawn.

    Returns:
        tuple: A tuple containing the scatter plot object and the regression coefficients.
    """
    
    # Plot scatter plot.
    scatter = ax.scatter(x=x_values, y=y_values)
    
    # Get regression coefficients.
    regression_coefficients = slope, intercept, rvalue, pvalue, stderr = linregress(x_values,y_values)
    
    # Calculate coordinates for the best fit line and plot it.
    x0 = min(x_values)
    x1 = max(x_values)
    y0 = x0*slope + intercept
    y1 = x1*slope + intercept
    best_fit_line=ax.plot([x0, x1], [y0, y1], "r-")
    
    # Annotate the plot with the regression equation
    if intercept >= 0:
        sign = '+'
    else:
        sign = '-'
    regression_formula = f'y = {slope:.2f}x {sign} {abs(intercept):.2f}'
    ax.text(0.05, 0.95, regression_formula, transform=ax.transAxes, fontsize=12, verticalalignment='top')
    
    return scatter, regression_coefficients

def even_odd_correlation(score_list, ax):
    """
    Calculate the correlation between even and odd indexed elements in a list of scores and plot the regression line.

    Parameters:
        score_list (list of lists): A list containing lists of scores.
        ax (matplotlib.axes.Axes): The axes object where the plot will be drawn.

    Returns:
        tuple: A tuple containing the scatter plot object and the regression coefficients.
    """
    
    # Add scores for odd and even questions for each individual.
    odd_scores = []
    even_scores = []
    for scores in score_list:
        odd_score = 0
        even_score = 0
        for i in range(len(scores)):
            if (i+1)%2 != 0:
                odd_score += scores[i]
            else:
                even_score += scores[i]
        odd_scores.append(odd_score/round(len(scores)/2))
        even_scores.append(even_score/int(len(scores)/2))
    
    # Perform linear regression to the two score lists.
    scatter, regression_coefficients = linear_regression(odd_scores, even_scores, ax)
    
    return scatter, regression_coefficients

def add_subpot_label(axes):
    """
    Add subplot labels (a, b, c, ...) to each subplot.

    Parameters:
        axes (list of matplotlib.axes.Axes): A list of axes objects representing subplots.

    Returns:
        None
    """
    
    subplot_labels = ['a', 'b', 'c', 'd']
    for ax, label in zip(axes, subplot_labels[:len(axes)]):
        ax.text(0.5, -0.2, label, transform=ax.transAxes, fontsize=12, va='top', ha='center')
    
    return

def display_figure(fig, fig_id, caption, w=0.6, fs=12):
    """
    Display a matplotlib figure with a caption.

    Parameters:
        fig (matplotlib.figure.Figure): The matplotlib figure to display.
        fig_id (str): The ID of the figure.
        caption (str): The caption to display below the figure.
        w (float, optional): The width of the figure as a fraction of the available space (default is 0.6).
        fs (int, optional): The font size for title, labels, and legend (default is 12).

    Returns:
        None
    """
    
    # Set dpi for saving the figure.
    plt.rcParams['figure.dpi'] = 300
    plt.rcParams['savefig.dpi'] = 300
    
    # Create a byte stream to store the figure as PNG image.
    pic_IObytes = io.BytesIO()
    
    # set fontsize for title and labels.      
    for ax in fig.axes:
        text_items = [ax.title, ax.xaxis.label, ax.yaxis.label]
        if ax.get_legend() is not None:
            text_items = text_items + ax.get_legend().get_texts()
        for item in (text_items + ax.get_xticklabels() + ax.get_yticklabels() ):
            item.set_fontsize(fs)
    
    # Save the figure as PNG image.
    plt.savefig(pic_IObytes,  format='png', bbox_inches='tight')
    pic_IObytes.seek(0)
    pic_hash = base64.b64encode(pic_IObytes.read())
    
    # Convert the PNG image to HTML img tag and display with the caption.
    img = f'<img margin="auto" width="{w*100}%" src="data:image/png;base64, {str(pic_hash)[2:-1]}" />'
    caption = f'<figcaption style="text-align: center; font-style: italic;">{caption}</figcaption>'
    display(HTML(f'<br><fig width="100%" id={fig_id}><center>{img}</center> {caption}</fig><br>'))
    
    # Close the figure
    plt.close(fig)
    
    return

def display_table(df, table_id, caption, w=1):
    """
    Display a pandas DataFrame as an HTML table with a caption.

    Parameters:
        df (DataFrame): The DataFrame to display.
        table_id (str): The ID of the table.
        caption (str): The caption to display above the table.
        w (float, optional): The width of the table as a fraction of the available space (default is 1).

    Returns:
        None
    """
    
    # Convert the DataFrame to HTML table and display with the caption.
    table = df.round(3).to_html(border=0, classes='table table-striped', justify='center').replace('<table ', f'<table style="margin: auto; width: 60%;" id="{table_id}" ')
    caption = f'<caption style="text-align: center; font-style: italic; font-size:14px;">{caption}</caption>'
    display(HTML(f"""<br>{table[:table.find('>')+1] + caption + table[table.find('>')+1:]}<br>"""))
    
    return

In [2]:
# General data processing.
ANS_result = to_upper_str(drop_empty(pd.read_csv("ANS_Response.csv"), "id"), ["id", "gender"])
Math_result = to_upper_str(drop_empty(pd.read_csv("Math_Ability_Response.csv"), "id"), ["id", "gender"])
Memory_result = to_upper_str(drop_empty(pd.read_csv("Memory_Response.csv"), "username"), ["username", "gender"])
SR_result = to_upper_str(drop_empty(pd.read_csv("Spatial_Reasoning_Response.csv"), "user_id"), ["user_id", "gender"])

ANS_result.rename(columns={'score': 'ANS_score'}, inplace=True)
Math_result.rename(columns={'score': 'Math_score'}, inplace=True)

Memory_result.rename(columns={'username': 'id'}, inplace=True)
Memory_result.rename(columns={'points': 'Memory_score'}, inplace=True)

SR_result.rename(columns={'user_id': 'id'}, inplace=True)
SR_result.rename(columns={'total_score': 'SR_score'}, inplace=True)

main_result = pd.DataFrame()

# Cognitive Test Report

## 1 Introduction

The approximate number system (ANS) is a part of our innate cognition to rapidly and intuitively sense numbers and their relations (1). This sense is active throughout our life and allows us to estimate huge quantities without counting. ANS application is most evident in the visual context such as estimating the number of dots in a single frame, ANS can operate on any approximation independent of modality (1). For example, estimating the number of voices heard from a recording.

Past studies have shown the mathematical aptitude is influenced by ANS (2,3) . This relationship has been documented from the earliest developmental stage where studies have shown that ANS accuracy measured as early as 6 months of age (or early nursery school age) provides an indicator of symbolic math performance (4) . This is so as ANS helps to aid in children’s formation of imprecise numerical estimation which are utilised in magnitude comparison and mathematical learning (5).  Previous studies have also found that visual-spatial working memory strongly influences ANS acuity in a study of children who have difficulty in performing arithmetic calculation otherwise known as developmental dyscalculia (6). Building upon this research, the present study aims to investigate the correlation of ANS vs mathematical ability, ANS vs memory and ANS vs spatial reasoning. Based on previous study, it is expected that mathematical ability has the highest correlation with ANS.

This study also investigates if there is significant difference between males and females in their ANS aptitude. While some studies report that males performed better in approximate arithmetic due to their greater spatial ability, another study reports that number sense ability between males and females do not differ significantly (7,8). Therefore, it is anticipated that while ANS aptitude between males and females might differ it will not be significant. As fatigue may impede cognitive tasks, this study also aims to investigate the correlation of fatigue with ANS aptitude (9). It is expected that more fatigue people tend to score worse in ANS tests.

The significance in this study lies in the understanding how ANS intertwines with mathematical ability, memory and spatial reasoning. Investigation of gender differences and the effect of fatigue will give insight on how gender-related cognitive differences and external factors affect ANS respectively. Findings from this study may be useful in developing strategies for mathematical education that accounts for the impact of ANS acuity in different individuals.

## 2 Method

This research aims to assess participants’ cognitive ability via four different tests, including Approximate Number Sense (ANS) test, Math ability test, Memory test, and Spatial reasoning test. Each test takes about approximately 3 minutes to complete.

### 2.1 Data collection

Our target sample population are mainly students at UCL. To make the sample more representative and meaningful, we not only collected data from BIOS0030 students but also asked other UCL students from different programmes and even different departments to participate in this research.

### 2.2 Consent collection

At the beginning of each of the four tests, a consent form will appear, and participants could choose whether to share their test information and results and allow them to be used in the data analysis. Several additional personal information including age, gender, and the frequency of taking sports are collected because they could contribute to finding factors that might affect people's performance on the cognitive tests.

### 2.3 Test details

#### 2.3.1 ANS test

ANS is the first test, mainly measuring participants' ability to estimate within a short time. A total of 64 figures with settled ratio numbers and random order will be presented to participants in 0.75s. For each trial, participants will have 3 seconds to consider which sides have more dots and make their final decision by clicking the corresponding buttons. After that, there will be a 1.5s pause between each trial. A seed is created in the code to ensure the reproducibility of this test. More specifically, while the 64 figures come from a random arrangement of 16 pictures with a designed ratio, each participant completes the test using the same order of 64 figures.

#### 2.3.2 Math ability test

The maths test measures participant’s mathematical aptitude by answering arithmetic expressions. Each part of the arithmetic expression was shown for 3 seconds one-by-one and then hidden. The fully formed arithmetic expressions were hidden from the participant as they attempted to input the answer. The arithmetic expression had three levels of difficulty. The first level involved simple addition and subtraction operations with lower two digit numbers. The second level involved addition and subtraction operations with higher two digit numbers. The third level involved  addition or subtraction with multiplication operations. Besides the score, the average time taken to answer each question was taken. 

#### 2.3.3 Memory test

The memory test measures the participant's ability to memorise several images within a time period. A total of 4 images were shown to the participant and in each image, there will be a grid that contains various symbols and numbers. The participants would then be required to memorise every single detail of the symbols such as the colour and the position of the symbols in 20 seconds.  The participant will then be presented with 5 questions that correlate to the image that they had been shown.Every question offered four options, and participants had ten seconds to select the right answer. The difficulty of the test would increase from image to image.

#### 2.3.4 Spatial reasoning test

The spatial Reasoning test evaluates a participant's capability to visualize and comprehend three-dimensional space. Participants are presented with a series of 9 questions that involve randomly generated three-dimensional arrangements of cubes. For each question, they are shown 4 two-dimensional images and given 25 seconds to identify the image that cannot be obtained by rotating the given three-dimensional figure. As the test progresses, the complexity of the cube arrangements increases due to an expanding size of the three-dimensional space.

## 3 Result

### 3.1 Split-half reliability test

In [3]:
# Remove outliers based on performance from each test dataset.
ANS_filtered = ANS_result[~ANS_result['id'].isin(get_outliers(ANS_result, ["ANS_score"]))]
Math_filtered = Math_result[~Math_result['id'].isin(get_outliers(Math_result, ["Math_score"]))]
Memory_filtered = Memory_result[~Memory_result['id'].isin(get_outliers(Memory_result, ["Memory_score"]))]
SR_filtered = SR_result[~SR_result['id'].isin(get_outliers(SR_result, ["SR_score"]))]

# Get individual score lists for each test.
ANS_score = []
for score_list in ANS_filtered["correctness"]:
    ANS_score.append([int(score.strip()) for score in score_list.split(',')])
Math_score = []
for score_list in Math_filtered["score_list"]:
    Math_score.append([int(score.strip()) for score in score_list.split(',')])
SR_score = []
for score_list in SR_filtered["score_list"]:
    SR_score.append([int(score.strip()) for score in score_list.split(',')])
memory_score = list(zip(*[Memory_result[col].astype(int) for col in Memory_filtered.columns if col.startswith("Question")]))

In [4]:
# Plot scatter plots depicting the correlation between performances on odd and even-numbered questions.
fig1, axs1 = plt.subplots(1, 4, figsize=(20,5))

ans_scatter, ans_OddvsEven = even_odd_correlation(ANS_score, axs1[0])
math_scatter, math_OddvsEven = even_odd_correlation(Math_score, axs1[1])
memory_scatter, memory_OddvsEven = even_odd_correlation(memory_score, axs1[2])
sr_scatter, sr_OddvsEven = even_odd_correlation(SR_score, axs1[3])

axs1[0].set_title("ANS Test")
axs1[0].set_xlabel("Average Odd Score")
axs1[0].set_ylabel("Average Even Score")
axs1[1].set_title("Math Ability Test")
axs1[1].set_xlabel("Average Odd Score")
axs1[1].set_ylabel("Average Even Score")
axs1[2].set_title("Memory Test")
axs1[2].set_xlabel("Average Odd Score")
axs1[2].set_ylabel("Average Even Score")
axs1[3].set_title("Spatial Reasoning Test")
axs1[3].set_xlabel("Average Odd Score")
axs1[3].set_ylabel("Average Even Score")

add_subpot_label(axs1)

caption = "Figure 1: Scatter plots depicting the correlation between performances on odd and even-numbered questions within four cognitive assessments (ANS Test, Math Ability Test, Memory Test, Spatial Reasoning Test)."
display_figure(fig1, "fig", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figure 1

In [5]:
# Display Pearson correlation result for Figure 1.
table1_data = {
    "Test Type":["ANS","Math Ability", "Memory", "Spatial Reasoning"],
    "R-value":[ans_OddvsEven[2], math_OddvsEven[2], memory_OddvsEven[2], sr_OddvsEven[2]], 
    "P-value":[ans_OddvsEven[3], math_OddvsEven[3], memory_OddvsEven[3], sr_OddvsEven[3]]
}

caption = "Table 1: Pearson R-values and P-values for the correlation between odd and even question scores across the four tests."

table1 = pd.DataFrame(table1_data).set_index("Test Type")
display_table(table1, "table1", caption)

Unnamed: 0_level_0,R-value,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS,0.469,0.003
Math Ability,0.195,0.19
Memory,0.403,0.004
Spatial Reasoning,0.226,0.185


Table Description for Table 1

### 3.2 Hypothesis A

In [6]:
# Get individuals taking all four tests, and remove outliers based on performance of each test.
ANS_id_set = set(ANS_result["id"].tolist())
Math_id_set = set(Math_result["id"].tolist())
Memory_id_set = set(Memory_result["id"].tolist())
SR_id_set = set(SR_result["id"].tolist())
intersect_id = list(ANS_id_set.intersection(Math_id_set,Memory_id_set,SR_id_set))

main_result["id"] = intersect_id
main_result = main_result.merge(ANS_result[["id", "ANS_score"]], on="id", how="left")
main_result = main_result.merge(Math_result[["id", "Math_score"]], on="id", how="left")
main_result = main_result.merge(Memory_result[["id", "Memory_score"]], on="id", how="left")
main_result = main_result.merge(SR_result[["id", "SR_score"]], on="id", how="left")

outliers = get_outliers(main_result, ["ANS_score", "Math_score","Memory_score", "SR_score"])
main_filtered = main_result[~main_result['id'].isin(outliers)]

In [7]:
# Plot histogram illustrating the distribution of participant scores across four cognitive tests.
fig2, axs2 = plt.subplots(1, 4, figsize=(30,7.5))

ans_hist=axs2[0].hist(main_filtered["ANS_score"], edgecolor='black', linewidth=1.5, bins=np.arange(42,64,2))
axs2[0].set_xticks(range(42,65))
math_hist=axs2[1].hist(main_filtered["Math_score"], edgecolor='black', linewidth=1.5, bins=np.arange(8,16,1))
axs2[1].set_xticks(range(8,16))
memory_hist=axs2[2].hist(main_filtered["Memory_score"], edgecolor='black', linewidth=1.5, bins=np.arange(0,21,2))
axs2[2].set_xticks(range(0,21))
sr_hist=axs2[3].hist(main_filtered["SR_score"], edgecolor='black', linewidth=1.5, bins=np.arange(0,10,1))
axs2[3].set_xticks(range(0,10))

axs2[0].set_title("ANS Test Distribution")
axs2[0].set_xlabel("Score")
axs2[0].set_ylabel("Number of Participants")
axs2[1].set_title("Math Ability Test Distribution")
axs2[1].set_xlabel("Score")
axs2[1].set_ylabel("Number of Participants")
axs2[2].set_title("Memory Test Distribution")
axs2[2].set_xlabel("Score")
axs2[2].set_ylabel("Number of Participants")
axs2[3].set_title("Spatial Reasoning Test Distribution")
axs2[3].set_xlabel("Score")
axs2[3].set_ylabel("Number of Participants")

add_subpot_label(axs2)

caption = "Figure 2: Histogram illustrating the distribution of participant scores across four cognitive tests."
display_figure(fig2, "fig", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figure 2

In [8]:
# Plot scatter plots illustrating the relationship between ANS Test results and performances on the three other cognitive assessments.
fig3, axs3 = plt.subplots(1, 3, figsize=(20,5))
ans_math_scatter, ANSvsMath = linear_regression(main_filtered["ANS_score"], main_filtered["Math_score"], axs3[0])
axs3[0].set_xticks(range(42,65))
ans_memory_scatter, ANSvsMemory = linear_regression(main_filtered["ANS_score"], main_filtered["Memory_score"], axs3[1])
axs3[1].set_xticks(range(42,65))
ans_sr_scatter, ANSvsSR = linear_regression(main_filtered["ANS_score"], main_filtered["SR_score"], axs3[2])
axs3[2].set_xticks(range(42,65))

axs3[0].set_title("ANS Test vs Math Ability Test")
axs3[0].set_xlabel("ANS Score")
axs3[0].set_ylabel("Math Score")
axs3[1].set_title("ANS Test vs Memory Test")
axs3[1].set_xlabel("ANS Score")
axs3[1].set_ylabel("Memory Score")
axs3[2].set_title("ANS Test vs Spatial Reasoning Test")
axs3[2].set_xlabel("ANS Score")
axs3[2].set_ylabel("Spatial Reasoning Score")

add_subpot_label(axs3)

caption = "Figure 3: Scatter plots illustrating the relationship between ANS Test results and performances on the three other cognitive assessments."
display_figure(fig2, "fig", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figrue 3

In [9]:
# Display Pearson correlation result for Figure 3.
table2_data = {
    "Test Type":["ANS vs Math Ability","ANS vs Memory", "ANS vs Spatial Reasoning"],
    "R-value":[ANSvsMath[2], ANSvsMemory[2], ANSvsSR[2]], 
    "P-value":[ANSvsMath[3], ANSvsMemory[3], ANSvsSR[3]]
}

caption = "Table 2: Pearson R-values and P-values comparing the correlation of ANS performance with Math Ability, Memory, and Spatial Reasoning scores."
table2 = pd.DataFrame(table2_data).set_index("Test Type")
display_table(table2, "table2", caption)

Unnamed: 0_level_0,R-value,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS vs Math Ability,0.186,0.353
ANS vs Memory,0.217,0.277
ANS vs Spatial Reasoning,0.261,0.189


Table Description for Table 2

In [10]:
# Display Spearman correlation result for Figure 3.
ans_math_corr, ans_math_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["Math_score"])
ans_memory_corr, ans_memory_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["Memory_score"])
ans_sr_corr, ans_sr_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["SR_score"])

table3_data = {
    "Test Type":["ANS vs Math Ability","ANS vs Memory", "ANS vs Spatial Reasoning"],
    "ρ-value": [ans_math_corr, ans_memory_corr, ans_sr_corr],
    "P-value": [ans_math_p_value, ans_memory_p_value, ans_sr_p_value]
}

caption = "Table 3: Spearman R-values and P-values comparing the correlation of ANS performance with Math Ability, Memory, and Spatial Reasoning scores."
table3 = pd.DataFrame(table3_data).set_index("Test Type")
display_table(table3, "table3", caption)

Unnamed: 0_level_0,ρ-value,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS vs Math Ability,0.136,0.499
ANS vs Memory,0.184,0.36
ANS vs Spatial Reasoning,0.215,0.281


Table Description for Table 3

### 3.3 Hypothesis B

In [11]:
# Get individuals are binary gender, and remove outliers based on performance across genders.
genders = ["MALE","FEMALE"]

ans_outliers = []
for gender in genders:
    ans_outliers += get_outliers(ANS_result[ANS_result["gender"] == gender], ["ANS_score"])
ANS_filtered = ANS_result[~ANS_result['id'].isin(ans_outliers) & ANS_result['gender'].isin(genders)]
ANS_by_gender = [ANS_filtered[ANS_filtered['gender'] == gender]['ANS_score'] for gender in genders]

math_outliers = []
for gender in genders:
    math_outliers += get_outliers(Math_result[Math_result["gender"] == gender], ["Math_score"])
Math_filtered = Math_result[~Math_result['id'].isin(math_outliers) & Math_result['gender'].isin(genders)]
Math_by_gender = [Math_filtered[Math_filtered['gender'] == gender]['Math_score'] for gender in genders]

memory_outliers = []
for gender in genders:
    memory_outliers += get_outliers(Memory_result[Memory_result["gender"] == gender], ["Memory_score"])
Memory_filtered = Memory_result[~Memory_result['id'].isin(memory_outliers) & Memory_result['gender'].isin(genders)]
Memory_by_gender = [Memory_filtered[Memory_filtered['gender'] == gender]['Memory_score'] for gender in genders]

sr_outliers = []
for gender in genders:
    sr_outliers += get_outliers(SR_result[SR_result["gender"] == gender], ["SR_score"])
SR_filtered = SR_result[~SR_result['id'].isin(sr_outliers) & SR_result['gender'].isin(genders)]
SR_by_gender = [SR_filtered[SR_filtered['gender'] == gender]['SR_score'] for gender in genders]

results_by_gender = [ANS_by_gender, Math_by_gender, Memory_by_gender, SR_by_gender]

In [12]:
# Plot boxplots illustrating scores for each test across genders.
fig4, axs4 = plt.subplots(1,4, figsize=(20,5))

axs4[0].boxplot(ANS_by_gender)
axs4[0].set_xticklabels(genders)
axs4[1].boxplot(Math_by_gender)
axs4[1].set_xticklabels(genders)
axs4[2].boxplot(Memory_by_gender)
axs4[2].set_xticklabels(genders)
axs4[3].boxplot(SR_by_gender)
axs4[3].set_xticklabels(genders)

axs4[0].set_title("ANS Test")
axs4[0].set_xlabel("Gender")
axs4[0].set_ylabel("Score")
axs4[1].set_title("Math Ability Test")
axs4[1].set_xlabel("Gender")
axs4[1].set_ylabel("Score")
axs4[2].set_title("Memory Test")
axs4[2].set_xlabel("Gender")
axs4[2].set_ylabel("Score")
axs4[3].set_title("Spatial Reasoning Test")
axs4[3].set_xlabel("Gender")
axs4[3].set_ylabel("Score")

add_subpot_label(axs4)

caption = "Figure 4: Boxplots illustrating scores for each test across genders."
display_figure(fig4, "fig4", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figure 4

In [13]:
# Plot dot plots comparing test scores across genders.
fig5, axs5 = plt.subplots(1,4, figsize=(20,5))

axs5[0].plot(ANS_filtered['gender'], ANS_filtered['ANS_score'],'.')
axs5[0].set_xlim(-1,2)
axs5[1].plot(Math_filtered['gender'], Math_filtered['Math_score'],'.')
axs5[1].set_xlim(-1,2)
axs5[2].plot(Memory_filtered['gender'], Memory_filtered['Memory_score'],'.')
axs5[2].set_xlim(-1,2)
axs5[3].plot(SR_filtered['gender'], SR_filtered['SR_score'],'.')
axs5[3].set_xlim(-1,2)

for i, result_by_gender in enumerate(results_by_gender):
    for j, gender in enumerate(genders):
        mean, std_err = get_error_bar(result_by_gender[j])
        axs5[i].errorbar(gender, mean, yerr=std_err, fmt='o', color='red', markersize=5) 

axs5[0].set_title(f"ANS Test(p-value:{ttest_ind(*ANS_by_gender)[1]:.3g})")
axs5[0].set_xlabel("Gender")
axs5[0].set_ylabel("Score")
axs5[1].set_title(f"Math Ability Test(p-value:{ttest_ind(*Math_by_gender)[1]:.3g})")
axs5[1].set_xlabel("Gender")
axs5[1].set_ylabel("Score")
axs5[2].set_title(f"Memory Test(p-value:{ttest_ind(*Memory_by_gender)[1]:.3g})")
axs5[2].set_xlabel("Gender")
axs5[2].set_ylabel("Score")
axs5[3].set_title(f"Spatial Reasoning Test(p-value:{ttest_ind(*SR_by_gender)[1]:.3g})")
axs5[3].set_xlabel("Gender")
axs5[3].set_ylabel("Score")

add_subpot_label(axs5)

caption = "Figure 5: Dot plots comparing test scores across genders."
display_figure(fig5, "fig5", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figure 5

In [14]:
# Display summary of data for Figure 4 & 5.
table4_data = {
    "Test Type": ["ANS Test", "Math Ability Test", "Memory Test", "Spatial Reasoning Test"],
    "Mean Male": [], 
    "SD Male": [],
    "Mean Female": [],
    "SD Female": [],
    "Mean Difference": [],
    "P-value": [],
    "T-statistic": []
}

for result_by_gender in results_by_gender:
    t_stat, p_value = ttest_ind(result_by_gender[0], result_by_gender[1])
    table4_data["Mean Male"].append(result_by_gender[0].mean())
    table4_data["SD Male"].append(result_by_gender[0].mean())
    table4_data["Mean Female"].append(result_by_gender[1].mean())
    table4_data["SD Female"].append(result_by_gender[1].mean())
    table4_data["Mean Difference"].append(result_by_gender[0].mean()-result_by_gender[1].mean())
    table4_data["P-value"].append(p_value)
    table4_data["T-statistic"].append(t_stat)
    
table4 = pd.DataFrame(table4_data).set_index("Test Type")
caption = "Table 4: Summary of mean scores, standard deviations, score differences, and p-values for each test, segmented by gender."
display_table(table4, "table4", caption)

Unnamed: 0_level_0,Mean Male,SD Male,Mean Female,SD Female,Mean Difference,P-value,T-statistic
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ANS Test,51.941,51.941,51.778,51.778,0.163,0.907,0.118
Math Ability Test,13.08,13.08,12.0,12.0,1.08,0.023,2.347
Memory Test,10.643,10.643,10.273,10.273,0.37,0.697,0.392
Spatial Reasoning Test,2.833,2.833,2.667,2.667,0.167,0.799,0.257


Table Description for Table 4

### 3.4 Hypothesis C

In [15]:
# Get tiredness levels with entries, and remove outliers based on performance of ANS test across tiredness levels.
tiredness_levels = sorted(ANS_result['tiredness'].unique())
outliers = []
for level in tiredness_levels:
    outliers += get_outliers(ANS_result[ANS_result["tiredness"] == level], ["ANS_score"])
ANS_filtered = ANS_result[~ANS_result['id'].isin(outliers)]

In [16]:
# Plot boxplot and dot plot comparing score distribution and central tendencies in ANS tests across tiredness levels。
fig6, axs6 = plt.subplots(1, 2,figsize=(15, 5))

ANS_by_tiredness = [ANS_filtered[ANS_filtered['tiredness'] == level]['ANS_score'] for level in tiredness_levels]

axs6[0].boxplot(ANS_by_tiredness)
axs6[0].set_xticklabels(tiredness_levels)

axs6[1].plot(ANS_filtered['tiredness'], ANS_filtered['ANS_score'],'.')
for i, level in enumerate(tiredness_levels):
    mean, std_err = get_error_bar(ANS_by_tiredness[i])
    axs6[1].errorbar(i+1, mean, yerr=std_err, fmt='o', color='red', markersize=3)    

axs6[0].set_title("ANS performance according to tiredness level")
axs6[0].set_xlabel("Tiredness Level")
axs6[0].set_ylabel("Score")
axs6[1].set_title("ANS performance according to tiredness level")
axs6[1].set_xlabel("Tiredness Level")
axs6[1].set_ylabel("Score")

add_subpot_label(axs6)

caption = "Figure 6: Analysis of score distribution and central tendencies in ANS tests by tiredness levels, featuring a boxplot for dispersion on the left and a dot plot showing means and error bars on the right."
display_figure(fig6, "fig6", caption, w=0.8)

"""
Add comments here
"""

'\nAdd comments here\n'

Figure Description for Figure 6

In [17]:
# Display ANOVA test results for Figure 6
ans_f_statistic, ans_p_value = f_oneway(*ANS_by_tiredness)

table5_data = {
    "Test Type":["ANOVA Test"],
    "F-statistic": [ans_f_statistic],
    "P-value": [ans_p_value]
}

table5 = pd.DataFrame(table5_data).set_index("Test Type")
caption = "Table 5: ANOVA test results showing F-statistic and P-value for differences in ANS performance across varying levels of tiredness"
display_table(table5, "table5", caption)

Unnamed: 0_level_0,F-statistic,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANOVA Test,7.654,0.0


Table Description for Table 5

## 4 Discussion

### 4.1 Half-split Reliability Test

To check the reliability of our tests, we executed a split-half reliability analysis to evaluate the consistency across test outcomes. The questions from the four tests were divided into two sets based on their numbering: odd and even. We utilized linear regression to explore the relationship between the results for odd and even-numbered questions. Out of the four linear regression models, only the ANS and memory tests demonstrated a noticeable linear correlation, with p-values below the 0.05 threshold, which allowed us to reject the hypothesis of no correlation (Figure 1 & Table 1). Conversely, the remaining two tests displayed low r-values around 0.2 and elevated p-values of 0.19, indicating a lack of result consistency. This issue might stem from the imbalanced difficulty level of questions. Specifically, the math test questions were too easy, leading to a high concentration of scores in the upper scoring range and undermining the reliability of these results. On the other hand, the spatial reasoning test proved too challenging, causing scores to cluster at the lower end of the scale, similarly affecting its reliability.

### 4.2 Hypothesis A: Is ANS related to maths, spatial reasoning or memory skills?

This hypothesis examines the correlation between individuals' ANS test performance and their results in three other tests. To accurately quantify this, we utilized a simple linear regression model with the ANS score as the covariate to interpret the outcomes of the other tests.

Prior to constructing the linear regression models, we performed two preparatory steps on our data. Due to the separate collection of data, participants who had not completed all four tests were excluded from this analysis section to ensure a direct comparison and control for background variables, which is essential for a reliable and valid statistical analysis. After removing participants with incomplete test data, outliers that fell outside the interquartile range for any test are dropped, and there were left with a sample size of 29.

Given that our sample size is below 30, we cannot assume a normal distribution for our data. Consequently, we needed to evaluate the normality of the results for each test. Histograms (Figure 2) displaying the score distributions indicated that, aside from the math test which showed a pronounced right-skewed distribution, the results of the other tests somewhat followed a normal distribution, albeit not perfectly. This raises some concerns about the validity of predictions made for the math test using linear regression.

Scatter plots for the three linear regression models (ANS vs Math, ANS vs Memory, and ANS vs Spatial Reasoning) were created, alongside a table presenting Pearson correlation values and p-values for testing the null hypothesis of no linear correlation between the tests (Figure 3). Both the scatter plots and the table suggest a lack of linear correlation - the points' distribution did not show linearity, and the p-values were above 0.05, indicating that the null hypothesis could not be rejected and the correlation was not statistically significant (Table 2). These findings might stem from the abnormal distribution of the test scores. To explore potential monotonic relationships between the test scores, we also assessed Spearman's correlation, which yielded similarly low correlation coefficients and high p-values, indicating no statistical significance (Table 3).

### 4.3 Hypothesis B: Can we detect any significant difference between the scores of males and females across our cognitive tests?

This report also investigates the relationship between the test scores (ANS, Math, Memory, and SRT) and the gender of the participants. The participants were divided into three gender categories: male, female, and others. Since only two participants identified themselves as ‘other’, their data were removed and the report will be focussing on comparing the score obtained between Male and Female participants.

Boxplots (Figure 4) were created to visually represent the distribution of scores obtained by the two genders across the four tests. Figure shows that there are several outliers in the scores obtained from the ANS and Math tests. These outliers need to be removed because they can cause an error in interpreting the data. Additionally, it can be observed that there are overlaps between the boxplots. This could suggest that the differences in the score between male and female participants are not significantly significant. However, a solid conclusion cannot be drawn without running the data through an appropriate statistical test such as t-test.

To conduct the t-test, several measurements need to be calculated. Some of the measurements that are being focused on this part of the report are:

- `mean_score_male` the mean score obtained by the Male participants for each tests.
- `sd_score_male` the standard deviation of the scores obtained by the Male Participants.
- `mean_score_female` the mean score obtained by the Male participants for each tests.
- `sd_score_female` the standard deviation of the scores obtained by the Female Participants.
- `score_diff` the differences of the score mean between the Male and Female Participants.

The values in the score_diff column indicates that there are some differences between the mean score obtained by the male and female participants. The p-value from t-tests can verify the significance of these observed differences.

Based on Figure 5, the results of the comparison can be observed. At a 5% significance level, there are no significant differences between the mean score obtained by the male and female participants in ANS, memory and Spatial Reasoning test, since the p-value of the t-test for these tests are higher than the critical value (0.907, 0.697, 0.799 > 0.5). However, at the same significance level, the differences between the mean score of male and female participants in the Math test is statistically significant. These data indicate that the participants' gender may have an impact solely on their performance in Math tests, while having no effect on their performance in other cognitive tests.

### 4.4 Hypothesis C:

This hypothesis aims to find out whether different tiredness levels, from 1 to 8, will influence participants’ performance on the ANS test. To avoid error from outliers, the upper limit and lower limit are first calculated via IQR and mean score of the total score, and the outliers are then excluded from the later analysis. 

The box plot, Figure 6, shows that the ANS test scores are generally concentrated in a range of 40 to 60 except when the tiredness level is 8. More specifically, participants with higher tiredness levels, 7 and 8, have a generally low mean score under 50, especially for the mean score of tiredness level 8 which is under 30, while participants with other tiredness levels have mean scores above 50. 

To successfully conduct the t-test/anova several assumptions are supposed to be achieved. More specifically, each participant should be independent so that they are not influenced by others. Also, the required normal distribution is fulfilled because the sample size for the ANS test is bigger than 30. Along with the fact that the mean score of each tiredness level has a similar variance versus others, all assumptions required for a t-test/anova are met. 

The null hypothesis for C states that there is no relationship between any different tiredness levels and the ANS performance valued by scores. With a significant level of 0.05, the most frequently used one, the calculated p-values, 0.00013, shows that there is a statistically significant effect on performance between different tiredness levels. We could therefore reject the null hypothesis and conclude that participants have different performance when they are under different tiredness conditions. 

## 5 Summary and Outlook

The study shows no significant correlation between ANS acuity and mathematical ability, spatial reasoning or memory test. Males were shown to perform significantly better than females in mathematics tests. Furthermore, fatigue had a substantial impact on ANS ability.

The findings from this study offers insights into ANS association with cognitive abilities, and how far it is influenced by difference in gender and fatigue. This helps us to understand how ANS may operate with different people under different circumstances Moreover, by focusing on a demographic of individuals aged 19-25, this study offers a perspective on the characteristics of fully developed ANS ability and cognitive maturity. This study also provides a foundation for more comprehensive research to be conducted. Among them is longitudinal studies on ANS development. Investigating the development of ANS acuity with age and education level may provide essential information about how numerical approximation skills develop and the factors that may lead to better ANS acuity. Besides that, studies should be done on adults with developmental dyscalculia, a math learning disorder, to investigate the disorder’s links with ANS impairment. This could lead to better intervention strategies to help adults overcome dyscalculia. 

The importance of ANS in everyday decision making and educational development makes this area of research vital. Adding to our understanding of ANS results in better educational strategies to approach STEM education in all levels with keeping the variability of ANS acuity of different individuals in mind.
