<div id="toggle_code">...</div>

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

<script src="https://cdn.jsdelivr.net/gh/philipwlewis/jupyterlab-toc-toggle@1.0/jlab-toc-toggle.js"></script>

<style>

.jlab-table td {

border: 1px solid black !important;

text-align: center !important;

background: white !important;

}

.jlab-table table {

background: white !important;

margin: 1em auto 1em auto !important;

text-align: center !important;

border-collapse: collapse !important;

border: 1px solid black !important;

}

.jlab-table th {

border: 1px solid black !important;

text-align: center !important;

background: aliceblue !important;

}

</style>

# Function Definition

In [16]:
%matplotlib agg
import io
import base64
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import HTML
from scipy.stats import linregress, spearmanr, f_oneway
from scipy.stats import stats

def to_upper_str(df, column):
    df[column] = [str(item).upper() for item in df[column].dropna().tolist()]
    return df

def drop_empty(df, column):
    df[column].replace('', np.nan, inplace=True)
    df = df.dropna(subset=[column])
    return df

def get_outliers(df, columns):
    all_outlier_list = []
    for column in columns:
        Q1 = df[column].quantile(0.25)
        Q3 = df[column].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        all_outlier_set = set(all_outlier_list)
        col_outlier_set = set(df[(df[column] < lower_bound) | (df[column] > upper_bound)]["id"].tolist())
        all_outlier_list = list(all_outlier_set.union(col_outlier_set))
        
    return all_outlier_list

def get_best_fit_line(x_value, y_value, ax):
    slope, intercept, rvalue, pvalue, stderr = linregress(x_value,y_value)
    x0 = min(x_value)
    x1 = max(x_value)
    y0 = x0*slope + intercept
    y1 = x1*slope + intercept
    best_fit_line=ax.plot([x0, x1], [y0, y1], "r-")
    
    return slope, intercept, rvalue, pvalue, stderr

def linear_regression(x_value, y_value, ax):
    scatter = ax.scatter(x=x_value, y=y_value)
    regression_coefficients = get_best_fit_line(x_value, y_value, ax)
    
    return scatter, regression_coefficients

def even_odd_correlation(score_list, ax):
    odd_scores = []
    even_scores = []
    for scores in score_list:
        odd_score = 0
        even_score = 0
        for i in range(len(scores)):
            if (i+1)%2 != 0:
                odd_score += scores[i]
            else:
                even_score += scores[i]
        odd_scores.append(odd_score/round(len(scores)/2))
        even_scores.append(even_score/int(len(scores)/2))
    
    scatter, regression_coefficients = linear_regression(odd_scores, even_scores, ax)
    
    return scatter, regression_coefficients

def display_figure(fig, fig_id, caption, w=0.6, fs=12):
    plt.rcParams['figure.dpi'] = 300
    plt.rcParams['savefig.dpi'] = 300
    pic_IObytes = io.BytesIO()
    
    # set fontsize for title and labels        
    for ax in fig.axes:
        text_items = [ax.title, ax.xaxis.label, ax.yaxis.label]
        if ax.get_legend() is not None:
            text_items = text_items + ax.get_legend().get_texts()
        for item in (text_items + ax.get_xticklabels() + ax.get_yticklabels() ):
            item.set_fontsize(fs)
            
    plt.savefig(pic_IObytes,  format='png', bbox_inches='tight')
    pic_IObytes.seek(0)
    pic_hash = base64.b64encode(pic_IObytes.read())
    img = f'<img margin="auto" width="{w*100}%" src="data:image/png;base64, {str(pic_hash)[2:-1]}" />'
    caption = f'<figcaption style="text-align: center; font-style: italic;">{caption}</figcaption>'
    display(HTML(f'<br><fig width="100%" id={fig_id}><center>{img}</center> {caption}</fig><br>'))
    plt.close(fig)

# Data Processing

In [None]:
ANS_result = to_upper_str(drop_empty(pd.read_csv("ANS_Response.csv"), "id"), "id")
Math_result = to_upper_str(drop_empty(pd.read_csv("Math_Ability_Response.csv"), "id"), "id")
Memory_result = to_upper_str(drop_empty(pd.read_csv("Memory_Response.csv"), "username"), "username")
SR_result = to_upper_str(drop_empty(pd.read_csv("Spatial_Reasoning_Response.csv"), "user_id"), "user_id")

ANS_result.rename(columns={'score': 'ANS_score'}, inplace=True)
Math_result.rename(columns={'score': 'Math_score'}, inplace=True)

Memory_result.rename(columns={'username': 'id'}, inplace=True)
Memory_result.rename(columns={'points': 'Memory_score'}, inplace=True)

SR_result.rename(columns={'user_id': 'id'}, inplace=True)
SR_result.rename(columns={'total_score': 'SR_score'}, inplace=True)

main_result = pd.DataFrame()

# Cognitive Test Report

## 1 Introduction

The approximate number system (ANS) is a part of our innate cognition to rapidly and intuitively sense numbers and their relations (1). This sense is active throughout our life and allows us to estimate huge quantities without counting. ANS application is most evident in the visual context such as estimating the number of dots in a single frame, ANS can operate on any approximation independent of modality (1). For example, estimating the number of voices heard from a recording.

Past studies have shown the mathematical aptitude is influenced by ANS (2,3) . This relationship has been documented from the earliest developmental stage where studies have shown that ANS accuracy measured as early as 6 months of age (or early nursery school age) provides an indicator of symbolic math performance (4) . This is so as ANS helps to aid in children’s formation of imprecise numerical estimation which are utilised in magnitude comparison and mathematical learning (5).  Previous studies have also found that visual-spatial working memory strongly influences ANS acuity in a study of children who have difficulty in performing arithmetic calculation otherwise known as developmental dyscalculia (6). Building upon this research, the present study aims to investigate the correlation of ANS vs mathematical ability, ANS vs memory and ANS vs spatial reasoning. Based on previous study, it is expected that mathematical ability has the highest correlation with ANS.

This study also investigates if there is significant difference between males and females in their ANS aptitude. While some studies report that males performed better in approximate arithmetic due to their greater spatial ability, another study reports that number sense ability between males and females do not differ significantly (7,8). Therefore, it is anticipated that while ANS aptitude between males and females might differ it will not be significant. As fatigue may impede cognitive tasks, this study also aims to investigate the correlation of fatigue with ANS aptitude (9). It is expected that more fatigue people tend to score worse in ANS tests.

The significance in this study lies in the understanding how ANS intertwines with mathematical ability, memory and spatial reasoning. Investigation of gender differences and the effect of fatigue will give insight on how gender-related cognitive differences and external factors affect ANS respectively. Findings from this study may be useful in developing strategies for mathematical education that accounts for the impact of ANS acuity in different individuals.

## 2 Method

This research aims to assess participants’ cognitive ability via four different tests, including Approximate Number Sense (ANS) test, Math ability test, Memory test, and Spatial reasoning test. Each test takes about approximately 3 minutes to complete.

### 2.1 Data collection

Our target sample population are mainly students at UCL. To make the sample more representative and meaningful, we not only collected data from BIOS0030 students but also asked other UCL students from different programmes and even different departments to participate in this research.

### 2.2 Consent collection

At the beginning of each of the four tests, a consent form will appear, and participants could choose whether to share their test information and results and allow them to be used in the data analysis. Several additional personal information including age, gender, and the frequency of taking sports are collected because they could contribute to finding factors that might affect people's performance on the cognitive tests.

### 2.3 Test details

#### 2.3.1 ANS test

ANS is the first test, mainly measuring participants' ability to estimate within a short time. A total of 64 figures with settled ratio numbers and random order will be presented to participants in 0.75s. For each trial, participants will have 3 seconds to consider which sides have more dots and make their final decision by clicking the corresponding buttons. After that, there will be a 1.5s pause between each trial. A seed is created in the code to ensure the reproducibility of this test. More specifically, while the 64 figures come from a random arrangement of 16 pictures with a designed ratio, each participant completes the test using the same order of 64 figures.

#### 2.3.2 Math ability test

The maths test measures participant’s mathematical aptitude by answering arithmetic expressions. Each part of the arithmetic expression was shown for 3 seconds one-by-one and then hidden. The fully formed arithmetic expressions were hidden from the participant as they attempted to input the answer. The arithmetic expression had three levels of difficulty. The first level involved simple addition and subtraction operations with lower two digit numbers. The second level involved addition and subtraction operations with higher two digit numbers. The third level involved  addition or subtraction with multiplication operations. Besides the score, the average time taken to answer each question was taken. 

#### 2.3.3 Memory test

The memory test measures the participant's ability to memorise several images within a time period. A total of 4 images were shown to the participant and in each image, there will be a grid that contains various symbols and numbers. The participants would then be required to memorise every single detail of the symbols such as the colour and the position of the symbols in 20 seconds.  The participant will then be presented with 5 questions that correlate to the image that they had been shown.Every question offered four options, and participants had ten seconds to select the right answer. The difficulty of the test would increase from image to image.

#### 2.3.4 Spatial reasoning test

The spatial Reasoning test evaluates a participant's capability to visualize and comprehend three-dimensional space. Participants are presented with a series of 9 questions that involve randomly generated three-dimensional arrangements of cubes. For each question, they are shown 4 two-dimensional images and given 25 seconds to identify the image that cannot be obtained by rotating the given three-dimensional figure. As the test progresses, the complexity of the cube arrangements increases due to an expanding size of the three-dimensional space.

## 3 Result

### 3.1 Split-half reliability test

In [3]:
ANS_filtered = ANS_result[~ANS_result['id'].isin(get_outliers(ANS_result, ["ANS_score"]))]
Math_filtered = Math_result[~Math_result['id'].isin(get_outliers(Math_result, ["Math_score"]))]
Memory_filtered = Memory_result[~Memory_result['id'].isin(get_outliers(Memory_result, ["Memory_score"]))]
SR_filtered = SR_result[~SR_result['id'].isin(get_outliers(SR_result, ["SR_score"]))]

In [4]:
fig1, axs1 = plt.subplots(1, 4, figsize=(30,5))
ANS_score = []
for score_list in ANS_filtered["correctness"]:
    ANS_score.append([int(score.strip()) for score in score_list.split(',')])
Math_score = []
for score_list in Math_filtered["score_list"]:
    Math_score.append([int(score.strip()) for score in score_list.split(',')])
SR_score = []
for score_list in SR_filtered["score_list"]:
    SR_score.append([int(score.strip()) for score in score_list.split(',')])

memory_score = list(zip(*[Memory_result[col].astype(int) for col in Memory_filtered.columns if col.startswith("Question")]))

ans_scatter, ans_OddvsEven = even_odd_correlation(ANS_score, axs1[0])
math_scatter, math_OddvsEven = even_odd_correlation(Math_score, axs1[1])
memory_scatter, memory_OddvsEven = even_odd_correlation(memory_score, axs1[2])
sr_scatter, sr_OddvsEven = even_odd_correlation(SR_score, axs1[3])

axs1[0].set_title("ANS Test")
axs1[0].set_xlabel("Average Odd Score")
axs1[0].set_ylabel("Average Even Score")
axs1[1].set_title("Math Ability Test")
axs1[1].set_xlabel("Average Odd Score")
axs1[1].set_ylabel("Average Even Score")
axs1[2].set_title("Memory Test")
axs1[2].set_xlabel("Average Odd Score")
axs1[2].set_ylabel("Average Even Score")
axs1[3].set_title("Spatial Reasoning Test")
axs1[3].set_xlabel("Average Odd Score")
axs1[3].set_ylabel("Average Even Score")

caption = "Figure 1: Scatter plots depicting the correlation between performances on odd and even-numbered items within four cognitive assessments (ANS Test, Math Ability Test, Memory Test, Spatial Reasoning Test)."
display_figure(fig1, "fig", caption, w=1)

Pearson_correlation = {
    "Test Type":["ANS","Math Ability", "Memory", "Spatial Reasoning"],
    "r-value":[ans_OddvsEven[2], math_OddvsEven[2], memory_OddvsEven[2], sr_OddvsEven[2]], 
    "P-value":[ans_OddvsEven[3], math_OddvsEven[3], memory_OddvsEven[3], sr_OddvsEven[3]]
}

Pearson_correlation_table = pd.DataFrame(Pearson_correlation).set_index("Test Type")
Pearson_correlation_table

Unnamed: 0_level_0,r-value,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS,0.469388,0.002949
Math Ability,0.194567,0.190016
Memory,0.403446,0.00367
Spatial Reasoning,0.225972,0.185106


### 3.2 Hypothesis A

In [5]:
ANS_id_set = set(ANS_result["id"].tolist())
Math_id_set = set(Math_result["id"].tolist())
Memory_id_set = set(Memory_result["id"].tolist())
SR_id_set = set(SR_result["id"].tolist())
intersect_id = list(ANS_id_set.intersection(Math_id_set,Memory_id_set,SR_id_set))

In [6]:
main_result["id"] = intersect_id
main_result = main_result.merge(ANS_result[["id", "ANS_score"]], on="id", how="left")
main_result = main_result.merge(Math_result[["id", "Math_score"]], on="id", how="left")
main_result = main_result.merge(Memory_result[["id", "Memory_score"]], on="id", how="left")
main_result = main_result.merge(SR_result[["id", "SR_score"]], on="id", how="left")
main_result.head()

Unnamed: 0,id,ANS_score,Math_score,Memory_score,SR_score
0,UCBM,53,14,14,3.0
1,CTCT,47,10,8,3.0
2,HBDJ,59,14,18,7.0
3,SOAS,53,15,8,1.0
4,XOXO,54,13,13,2.0


In [7]:
outliers = get_outliers(main_result, ["ANS_score", "Math_score","Memory_score", "SR_score"])
main_filtered = main_result[~main_result['id'].isin(outliers)]

In [8]:
fig2, axs2 = plt.subplots(1, 4, figsize=(30,5))

ans_hist=axs2[0].hist(main_filtered["ANS_score"], edgecolor='black', linewidth=1.5, bins=np.arange(42,64,2))
axs2[0].set_xticks(range(42,65))
math_hist=axs2[1].hist(main_filtered["Math_score"], edgecolor='black', linewidth=1.5, bins=np.arange(8,16,1))
axs2[1].set_xticks(range(8,16))
memory_hist=axs2[2].hist(main_filtered["Memory_score"], edgecolor='black', linewidth=1.5, bins=np.arange(0,21,2))
axs2[2].set_xticks(range(0,21))
sr_hist=axs2[3].hist(main_filtered["SR_score"], edgecolor='black', linewidth=1.5, bins=np.arange(0,10,1))
axs2[3].set_xticks(range(0,10))

axs2[0].set_title("ANS Test Distribution")
axs2[0].set_xlabel("Score")
axs2[0].set_ylabel("Number of Participants")
axs2[1].set_title("Math Ability Test Distribution")
axs2[1].set_xlabel("Score")
axs2[1].set_ylabel("Number of Participants")
axs2[2].set_title("Memory Test Distribution")
axs2[2].set_xlabel("Score")
axs2[2].set_ylabel("Number of Participants")
axs2[3].set_title("Spatial Reasoning Test Distribution")
axs2[3].set_xlabel("Score")
axs2[3].set_ylabel("Number of Participants")

caption = "Figure 2: Histogram illustrating the distribution of participant scores across four cognitive tests."
display_figure(fig2, "fig", caption, w=1)

In [9]:
fig3, axs3 = plt.subplots(1, 3, figsize=(20,5))
ans_math_scatter, ANSvsMath = linear_regression(main_filtered["ANS_score"], main_filtered["Math_score"], axs3[0])
axs3[0].set_xticks(range(42,65))
ans_memory_scatter, ANSvsMemory = linear_regression(main_filtered["ANS_score"], main_filtered["Memory_score"], axs3[1])
axs3[1].set_xticks(range(42,65))
ans_sr_scatter, ANSvsSR = linear_regression(main_filtered["ANS_score"], main_filtered["SR_score"], axs3[2])
axs3[2].set_xticks(range(42,65))

axs3[0].set_title("ANS Test vs Math Test")
axs3[0].set_xlabel("ANS Score")
axs3[0].set_ylabel("Math Score")
axs3[1].set_title("ANS Test vs Memory Test")
axs3[1].set_xlabel("ANS Score")
axs3[1].set_ylabel("Memory Score")
axs3[2].set_title("ANS Test vs Spatial Reasoning Test")
axs3[2].set_xlabel("ANS Score")
axs3[2].set_ylabel("Spatial Reasoning Score")

caption = "Figure 3: Scatter plots illustrating the relationship between ANS Test results and performances on the three other cognitive assessments."
display_figure(fig2, "fig", caption, w=1)

Pearson_correlation = {
    "Test Type":["ANS vs Math Ability","ANS vs Memory", "ANS vs Spatial Reasoning"],
    "r-value":[ANSvsMath[2], ANSvsMemory[2], ANSvsSR[2]], 
    "P-value":[ANSvsMath[3], ANSvsMemory[3], ANSvsSR[3]]
}

Pearson_correlation_table = pd.DataFrame(Pearson_correlation).set_index("Test Type")
Pearson_correlation_table

Unnamed: 0_level_0,r-value,P-value
Test Type,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS vs Math Ability,0.186113,0.352647
ANS vs Memory,0.216813,0.277359
ANS vs Spatial Reasoning,0.260983,0.188556


In [10]:
ans_math_corr, ans_math_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["Math_score"])
ans_memory_corr, ans_memory_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["Memory_score"])
ans_sr_corr, ans_sr_p_value = spearmanr(main_filtered["ANS_score"], main_filtered["SR_score"])

Spearman_correlation = {
    "Comparison":["ANS vs Math Ability","ANS vs Memory", "ANS vs Spatial Reasoning"],
    "ρ-value": [ans_math_corr, ans_memory_corr, ans_sr_corr],
    "P-value": [ans_math_p_value, ans_memory_p_value, ans_sr_p_value]
}

Spearman_correlation_table = pd.DataFrame(Spearman_correlation).set_index("Comparison")
Spearman_correlation_table

Unnamed: 0_level_0,ρ-value,P-value
Comparison,Unnamed: 1_level_1,Unnamed: 2_level_1
ANS vs Math Ability,0.135908,0.499087
ANS vs Memory,0.18352,0.359514
ANS vs Spatial Reasoning,0.215203,0.28103


### 3.3 Hypothesis B

In [20]:
ANS_df = pd.read_csv("ANS_Response.csv")
Math_df = pd.read_csv("Math_Ability_Response.csv")
Memory_df = pd.read_csv("Memory_Response.csv")
SRT_df = pd.read_csv("Spatial_Reasoning_Response.csv")
SRT_df = SRT_df.rename(columns={"total_score":"score"})
SRT_df = SRT_df.rename(columns={"user_id":"id"})
SRT_df = SRT_df.set_index("id")
SRT_df = SRT_df.loc[:,["gender","score"]]
Memory_df = Memory_df.rename(columns={"points":"score"})
Memory_df = Memory_df.rename(columns={"username":"id"})
Memory_df = Memory_df.set_index("id")
Memory_df = Memory_df.loc[:,["gender","score"]]
Math_df = Math_df.set_index("id")
Math_df = Math_df.loc[:,["gender","score"]]
ANS_df = ANS_df.set_index("id")
ANS_df = ANS_df.loc[:,["gender","score"]]
ANS_df = ANS_df.drop(["jy", "ydn"], axis = "rows")
ANS_df.to_csv("ANS_responses_processed.csv")
Math_df.to_csv("Math_responses_processed.csv")
Memory_df.to_csv("Memory_responses_processed.csv")
SRT_df.to_csv("SRT_responses_processed.csv")

fig, axs = plt.subplots(1,4, figsize=(12,4))
ax1 = ANS_df.boxplot(column="score", by="gender", ax=axs[0])
ax1.set_title("ANS")
ax1.set_ylabel("score", fontsize=8)
ax2 = Math_df.boxplot(column="score", by="gender", ax=axs[1])
ax2.set_title("Math")
ax2.set_ylabel("score", fontsize=8)
ax3 = Memory_df.boxplot(column="score", by="gender", ax=axs[2])
ax3.set_title("Memory")
ax3.set_ylabel("score", fontsize=8)
ax4 = SRT_df.boxplot(column="score", by="gender", ax=axs[3])
ax4.set_title("SRT")
ax4.set_ylabel("score", fontsize=8)
plt.tight_layout()


caption = "Figure 4:Boxplots of scores for each tests(ANS, Math, Memory, SRT) seperated by gender"
display_figure(fig, "fig", caption, 0.7, 10)

def remove_outliers(dataframe):
    Q1 = dataframe['score'].quantile(0.25)
    Q3 = dataframe['score'].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    df_cleaned = dataframe[(dataframe['score'] >= lower_bound) & (dataframe['score'] <= upper_bound)]
    return df_cleaned

ANS_df = ANS_df.groupby('gender').apply(remove_outliers).reset_index(drop=True)
Math_df = Math_df.groupby('gender').apply(remove_outliers).reset_index(drop=True)
ANS_df.to_csv("ANS_responses_processed.csv")
Math_df.to_csv("Math_responses_processed.csv")

def calculations_data(test ,processed_filename):
    test_df = pd.read_csv(processed_filename)
    results ={}
    results['mean_score_male'] = test_df.query('gender== "male" | gender=="Male"').score.mean()
    results['sd_score_male'] = test_df.query('gender=="male" | gender=="Male"').score.std()
    results['mean_score_female'] = test_df.query('gender== "female" | gender=="Female"').score.mean()
    results['sd_score_frmale'] = test_df.query('gender=="female" | gender=="Female"').score.std()
    results['score_diff'] = results['mean_score_male'] - results['mean_score_female']
    sig_test = stats.ttest_ind(test_df.query('gender== "male" | gender=="Male"').score, test_df.query('gender== "female" | gender=="Female"').score)
    results['p-value'] = sig_test.pvalue
    return results

subject_ids = ["ANS", "Math", "Memory", "SRT"]

data_set = []
for subject_id in subject_ids:
    processed_filename = subject_id+"_responses_processed.csv"
    test = subject_id
    result = calculations_data(test, processed_filename)
    result['id'] = subject_id
    data_set.append(result)

calc_df = pd.DataFrame(data_set)


caption = "Table 4: Table for the mean, standard deviation, difference in score, and p-value for each test by gender "
caption = f'<figcaption style="text-align: center; font-style: italic;">{caption}</figcaption>'
display(calc_df.set_index("id"))
display(HTML(caption))

fig = plt.figure(figsize=(8,8))

ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(ANS_df['gender'], ANS_df['score'], '.')
ax1.set_xlim(-1,2)
ANS_m_mean = ANS_df.query('gender== "male" | gender=="Male"').score.mean()
ANS_m_std = stats.sem(ANS_df.query('gender== "male" | gender=="Male"').score)
ANS_f_mean = ANS_df.query('gender== "female" | gender=="Female"').score.mean()
ANS_f_std = stats.sem(ANS_df.query('gender== "female" | gender=="Female"').score)
ax1.errorbar('Male', ANS_m_mean, yerr =ANS_m_std, marker='_', color ='black')
ax1.errorbar('Female', ANS_f_mean, yerr =ANS_f_std, marker='_', color ='black')
result = stats.ttest_ind(ANS_df.query('gender== "male" | gender=="Male"').score, ANS_df.query('gender== "female" | gender=="Female"').score)
ax1.set_title(f"ANS(p-value:{result.pvalue:.3g})")

ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(Math_df['gender'], Math_df['score'], '.')
ax2.set_xlim(-1,2)
Math_m_mean = Math_df.query('gender== "male" | gender=="Male"').score.mean()
Math_m_std = stats.sem(Math_df.query('gender== "male" | gender=="Male"').score)
Math_f_mean = Math_df.query('gender== "female" | gender=="Female"').score.mean()
Math_f_std = stats.sem(Math_df.query('gender== "female" | gender=="Female"').score)
ax2.errorbar('male', Math_m_mean, yerr =Math_m_std, marker='_', color ='red')
ax2.errorbar('female', Math_f_mean, yerr =Math_f_std, marker='_', color ='red')
result = stats.ttest_ind(Math_df.query('gender== "male" | gender=="Male"').score, Math_df.query('gender== "female" | gender=="Female"').score)
ax2.set_title(f"Math(p-value:{result.pvalue:.3g})")

ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(Memory_df['gender'], Memory_df['score'], '.')
ax3.set_xlim(-1,2)
Memory_m_mean = Memory_df.query('gender== "male" | gender=="Male"').score.mean()
Memory_m_std = stats.sem(Memory_df.query('gender== "male" | gender=="Male"').score)
Memory_f_mean = Memory_df.query('gender== "female" | gender=="Female"').score.mean()
Memory_f_std = stats.sem(Memory_df.query('gender== "female" | gender=="Female"').score)
ax3.errorbar('male', Memory_m_mean, yerr =Memory_m_std, marker='_', color ='red')
ax3.errorbar('female', Memory_f_mean, yerr =Memory_f_std, marker='_', color ='red')
result = stats.ttest_ind(Memory_df.query('gender== "male" | gender=="Male"').score, Memory_df.query('gender== "female" | gender=="Female"').score)
ax3.set_title(f"Memory(p-value:{result.pvalue:.3g})")

ax4 = fig.add_subplot(2, 2, 4)
ax4.plot(SRT_df['gender'].dropna(), SRT_df['score'].dropna().astype(int), '.')
ax4.set_xlim(-1,2)
SRT_m_mean = SRT_df.query('gender== "male" | gender=="Male"').score.mean()
SRT_m_std = stats.sem(SRT_df.query('gender== "male" | gender=="Male"').score)
SRT_f_mean = SRT_df.query('gender== "female" | gender=="Female"').score.mean()
SRT_f_std = stats.sem(SRT_df.query('gender== "female" | gender=="Female"').score)
ax4.errorbar('Male', SRT_m_mean, yerr =SRT_m_std, marker='_', color ='red')
ax4.errorbar('Female', SRT_f_mean, yerr =SRT_f_std, marker='_', color ='red')
result = stats.ttest_ind(SRT_df.query('gender== "male" | gender=="Male"').score, SRT_df.query('gender== "female" | gender=="Female"').score)
ax4.set_title(f"SRT(p-value:{result.pvalue:.3g})")

caption = "Figure 5: Comparison of test scores(ANS, Math, Memory and Spatial Recognition tests) by gender"
display_figure(fig, "fig", caption, 0.5, 10)

  sig_test = stats.ttest_ind(test_df.query('gender== "male" | gender=="Male"').score, test_df.query('gender== "female" | gender=="Female"').score)
  sig_test = stats.ttest_ind(test_df.query('gender== "male" | gender=="Male"').score, test_df.query('gender== "female" | gender=="Female"').score)
  sig_test = stats.ttest_ind(test_df.query('gender== "male" | gender=="Male"').score, test_df.query('gender== "female" | gender=="Female"').score)
  sig_test = stats.ttest_ind(test_df.query('gender== "male" | gender=="Male"').score, test_df.query('gender== "female" | gender=="Female"').score)


Unnamed: 0_level_0,mean_score_male,sd_score_male,mean_score_female,sd_score_frmale,score_diff,p-value
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ANS,51.789474,3.392458,51.777778,5.219371,0.011696,0.993563
Math,12.956522,1.296087,12.25,1.802776,0.706522,0.144055
Memory,10.642857,3.346798,10.272727,3.268636,0.37013,0.696678
SRT,2.833333,1.947849,2.666667,1.940285,0.166667,0.798581


  ANS_m_std = stats.sem(ANS_df.query('gender== "male" | gender=="Male"').score)
  ANS_f_std = stats.sem(ANS_df.query('gender== "female" | gender=="Female"').score)
  result = stats.ttest_ind(ANS_df.query('gender== "male" | gender=="Male"').score, ANS_df.query('gender== "female" | gender=="Female"').score)
  Math_m_std = stats.sem(Math_df.query('gender== "male" | gender=="Male"').score)
  Math_f_std = stats.sem(Math_df.query('gender== "female" | gender=="Female"').score)
  result = stats.ttest_ind(Math_df.query('gender== "male" | gender=="Male"').score, Math_df.query('gender== "female" | gender=="Female"').score)
  Memory_m_std = stats.sem(Memory_df.query('gender== "male" | gender=="Male"').score)
  Memory_f_std = stats.sem(Memory_df.query('gender== "female" | gender=="Female"').score)
  result = stats.ttest_ind(Memory_df.query('gender== "male" | gender=="Male"').score, Memory_df.query('gender== "female" | gender=="Female"').score)
  SRT_m_std = stats.sem(SRT_df.query('gender== "male" |

### 3.4 Hypothesis C

In [11]:
tiredness_levels = sorted(ANS_result['tiredness'].unique())
outliers = []
for level in tiredness_levels:
    outliers += get_outliers(ANS_result[ANS_result["tiredness"] == level], ["ANS_score"])
ANS_filtered = ANS_result[~ANS_result['id'].isin(outliers)]

In [12]:
fig, axs_c = plt.subplots(1, 2,figsize=(20, 5))

ANS_by_tiredness = [ANS_filtered[ANS_filtered['tiredness'] == level]['ANS_score'] for level in tiredness_levels]

axs_c[0].boxplot(ANS_by_tiredness)
axs_c[0].set_xticklabels(tiredness_levels)

axs_c[1].plot(ANS_filtered['tiredness'],ANS_filtered['ANS_score'],'.')
for i, level in enumerate(tiredness_levels):
    scores = ANS_by_tiredness[i]
    mean_score = np.mean(scores)
    std_err = np.std(scores) / np.sqrt(len(scores)) 
    axs_c[1].errorbar(i+1, mean_score, yerr=std_err, fmt='o', color='red', markersize=3)
    
    
axs_c[0].set_title("ANS performance according to tiredness level")
axs_c[0].set_xlabel("Tiredness Level")
axs_c[0].set_ylabel("Score")

axs_c[1].set_title("ANS performance according to tiredness level")
axs_c[1].set_xlabel("Tiredness Level")
axs_c[1].set_ylabel("Score")

caption = "Figure 4: Comparative Study of Score Distribution and Central Tendencies in ANS Tests Across Tiredness Levels. The left side displays a boxplot for score dispersion, and the right side presents a dot plot with indicated means and error bars."
display_figure(fig2, "fig", caption, w=1)

ans_f_statistic, ans_p_value = f_oneway(*ANS_by_tiredness)

ANOVA = {
    "Comparison":["ANOVA Test"],
    "F-statistic": [ans_f_statistic],
    "P-value": [ans_p_value]
}

ANOVA_table = pd.DataFrame(ANOVA).set_index("Comparison")
ANOVA_table

Unnamed: 0_level_0,F-statistic,P-value
Comparison,Unnamed: 1_level_1,Unnamed: 2_level_1
ANOVA Test,7.654079,1.3e-05


## 4 Discussion

### 4.1 Half-split Reliability Test

To check the reliability of our tests, we executed a split-half reliability analysis to evaluate the consistency across test outcomes. The questions from the four tests were divided into two sets based on their numbering: odd and even. We utilized linear regression to explore the relationship between the results for odd and even-numbered questions. Out of the four linear regression models, only the ANS and memory tests demonstrated a noticeable linear correlation, with p-values below the 0.05 threshold, which allowed us to reject the hypothesis of no correlation (Figure 1 & Table 1). Conversely, the remaining two tests displayed low r-values around 0.2 and elevated p-values of 0.19, indicating a lack of result consistency. This issue might stem from the imbalanced difficulty level of questions. Specifically, the math test questions were too easy, leading to a high concentration of scores in the upper scoring range and undermining the reliability of these results. On the other hand, the spatial reasoning test proved too challenging, causing scores to cluster at the lower end of the scale, similarly affecting its reliability.

### 4.2 Hypothesis A: Is ANS related to maths, spatial reasoning or memory skills?

This hypothesis examines the correlation between individuals' ANS test performance and their results in three other tests. To accurately quantify this, we utilized a simple linear regression model with the ANS score as the covariate to interpret the outcomes of the other tests.

Prior to constructing the linear regression models, we performed two preparatory steps on our data. Due to the separate collection of data, participants who had not completed all four tests were excluded from this analysis section to ensure a direct comparison and control for background variables, which is essential for a reliable and valid statistical analysis. After removing participants with incomplete test data, outliers that fell outside the interquartile range for any test are dropped, and there were left with a sample size of 29.

Given that our sample size is below 30, we cannot assume a normal distribution for our data. Consequently, we needed to evaluate the normality of the results for each test. Histograms (Figure 2) displaying the score distributions indicated that, aside from the math test which showed a pronounced right-skewed distribution, the results of the other tests somewhat followed a normal distribution, albeit not perfectly. This raises some concerns about the validity of predictions made for the math test using linear regression.

Scatter plots for the three linear regression models (ANS vs Math, ANS vs Memory, and ANS vs Spatial Reasoning) were created, alongside a table presenting Pearson correlation values and p-values for testing the null hypothesis of no linear correlation between the tests (Figure 3). Both the scatter plots and the table suggest a lack of linear correlation - the points' distribution did not show linearity, and the p-values were above 0.05, indicating that the null hypothesis could not be rejected and the correlation was not statistically significant (Table 2). These findings might stem from the abnormal distribution of the test scores. To explore potential monotonic relationships between the test scores, we also assessed Spearman's correlation, which yielded similarly low correlation coefficients and high p-values, indicating no statistical significance (Table 3).

### 4.3 Hypothesis B: Can we detect any significant difference between the scores of males and females across our cognitive tests?

This report also investigates the relationship between the test scores (ANS, Math, Memory, and SRT) and the gender of the participants. The participants were divided into three gender categories: male, female, and others. Since only two participants identified themselves as ‘other’, their data were removed and the report will be focussing on comparing the score obtained between Male and Female participants.

Boxplots (Figure 4) were created to visually represent the distribution of scores obtained by the two genders across the four tests. Figure shows that there are several outliers in the scores obtained from the ANS and Math tests. These outliers need to be removed because they can cause an error in interpreting the data. Additionally, it can be observed that there are overlaps between the boxplots. This could suggest that the differences in the score between male and female participants are not significantly significant. However, a solid conclusion cannot be drawn without running the data through an appropriate statistical test such as t-test.

To conduct the t-test, several measurements need to be calculated. Some of the measurements that are being focused on this part of the report are:

- `mean_score_male` the mean score obtained by the Male participants for each tests.
- `sd_score_male` the standard deviation of the scores obtained by the Male Participants.
- `mean_score_female` the mean score obtained by the Male participants for each tests.
- `sd_score_female` the standard deviation of the scores obtained by the Female Participants.
- `score_diff` the differences of the score mean between the Male and Female Participants.

The values in the score_diff column indicates that there are some differences between the mean score obtained by the male and female participants. The p-value from t-tests can verify the significance of these observed differences.

Based on Figure 5, the results of the comparison can be observed. At a 5% significance level, there are no significant differences between the mean score obtained by the male and female participants in ANS, memory and Spatial Reasoning test, since the p-value of the t-test for these tests are higher than the critical value (0.907, 0.697, 0.799 > 0.5). However, at the same significance level, the differences between the mean score of male and female participants in the Math test is statistically significant. These data indicate that the participants' gender may have an impact solely on their performance in Math tests, while having no effect on their performance in other cognitive tests.

### 4.4 Hypothesis C:

This hypothesis aims to find out whether different tiredness levels, from 1 to 8, will influence participants’ performance on the ANS test. To avoid error from outliers, the upper limit and lower limit are first calculated via IQR and mean score of the total score, and the outliers are then excluded from the later analysis. 

The box plot, Figure 6, shows that the ANS test scores are generally concentrated in a range of 40 to 60 except when the tiredness level is 8. More specifically, participants with higher tiredness levels, 7 and 8, have a generally low mean score under 50, especially for the mean score of tiredness level 8 which is under 30, while participants with other tiredness levels have mean scores above 50. 

To successfully conduct the t-test/anova several assumptions are supposed to be achieved. More specifically, each participant should be independent so that they are not influenced by others. Also, the required normal distribution is fulfilled because the sample size for the ANS test is bigger than 30. Along with the fact that the mean score of each tiredness level has a similar variance versus others, all assumptions required for a t-test/anova are met. 

The null hypothesis for C states that there is no relationship between any different tiredness levels and the ANS performance valued by scores. With a significant level of 0.05, the most frequently used one, the calculated p-values, 0.00013, shows that there is a statistically significant effect on performance between different tiredness levels. We could therefore reject the null hypothesis and conclude that participants have different performance when they are under different tiredness conditions. 

## 5 Summary and Outlook

The study shows no significant correlation between ANS acuity and mathematical ability, spatial reasoning or memory test. Males were shown to perform significantly better than females in mathematics tests. Furthermore, fatigue had a substantial impact on ANS ability.

The findings from this study offers insights into ANS association with cognitive abilities, and how far it is influenced by difference in gender and fatigue. This helps us to understand how ANS may operate with different people under different circumstances Moreover, by focusing on a demographic of individuals aged 19-25, this study offers a perspective on the characteristics of fully developed ANS ability and cognitive maturity. This study also provides a foundation for more comprehensive research to be conducted. Among them is longitudinal studies on ANS development. Investigating the development of ANS acuity with age and education level may provide essential information about how numerical approximation skills develop and the factors that may lead to better ANS acuity. Besides that, studies should be done on adults with developmental dyscalculia, a math learning disorder, to investigate the disorder’s links with ANS impairment. This could lead to better intervention strategies to help adults overcome dyscalculia. 

The importance of ANS in everyday decision making and educational development makes this area of research vital. Adding to our understanding of ANS results in better educational strategies to approach STEM education in all levels with keeping the variability of ANS acuity of different individuals in mind.
