# Formal Comparative Analysis

## Introduction

This comprehensive analysis examines the performance of two different robotic systems, each created by a different student: Amani (myself) and Lydia. The aim of this study is to simulate and control a mobile robot in an environment containing several golden boxes, with the task of gathering all the boxes together in one place. This report compares the performance of my code with that of Lydia's code, which is designed to accomplish the same task. The performance of two implementations assessed by running experiments that varied the number of tokens in the environment. Their execution time for each task and tracked their success/failure rates is measured.

The evaluation of performance is based on the hypotheses outlined in the subsequent section. The primary metric for determining the superior algorithm is the time required for the robot to gather all the golden boxes in a centralized location. Furthermore, the analysis considers the number of successes and failures of each algorithm in detecting and collecting the golden boxes, particularly as the number of boxes in the environment varies.

## Null and Alternative Hypotheses

In statistical hypothesis testing, we formulate two competing claims about a population parameter (like the mean or proportion).

**1. Null Hypothesis (H₀):**

* Represents the default assumption, stating there's **no effect** or **no significant difference** between groups.
* It serves as the baseline for comparison.
* We aim to **disprove** the null hypothesis if evidence suggests otherwise.

**2. Alternative Hypothesis (H₁):**

* The opposite of the null hypothesis, specifying the **expected effect** or **directional difference** between groups.
* It's the claim we want to **support** with our statistical analysis.
* There are two main types of alternative hypotheses:
    * **Two-tailed:** The effect can be in either direction (greater than, less than, or different from the null value).
    * **One-tailed:** The effect is expected to be in a specific direction (greater than or less than the null value).

**Choosing the Hypothesis:**

* The null hypothesis is typically set to "no effect" for clarity.
* The alternative hypothesis reflects the specific research question or prediction.


### Alternative Hypothesis Analysis:

The alternative hypothesis proposes that Amani's robot surpasses Lydia's robot in terms of efficiency. This hypothesis is founded on Amani's code's capacity to more effectively recognize previously visited tokens, potentially resulting in an overall superior performance.

## Experimental Procedure

To test these hypotheses, experiments were designed with the following characteristics:

### Task Description: 
    The task involves randomly placing tokens in the environment. Both implementations, Amani and Lydia's Code, were evaluated based on their efficiency and accuracy in completing this task.
### Variation of Conditions: 
    The experiments were conducted with different numbers of tokens (3, 4, 5, 6, and 7 golden boxes) to assess performance under varying task complexities.
### Environment Creation: 
    Identical environments were created for both algorithms to ensure a fair comparison. The positions of the boxes were fixed within each environment configuration to eliminate variability due to random generation.
### Repetitions: 
    Each algorithm was executed six times for each environment configuration, resulting in a total of 30 simulations per algorithm. This repetition ensured reliable and consistent results.
### Data Collection: 
    Data were collected on the elapsed time for each implementation to complete the task and the corresponding success or failure outcomes. This allowed for a detailed evaluation of performance metrics.
### Statistical Analysis: 
    Statistical methods, including the calculation of means, standard deviations, and t-tests, were employed to analyze the data and test the hypotheses. These analyses provided quantitative insights into the performance differences between the two implementations.

## Calculating Standard Error

To calculate the standard error, we'll follow these steps:

1. **Mean (Average):** Calculate the mean by adding all the values and dividing by the total number of values (n) denoted d̅ .

    **Formula:** Mean = Σ(xᵢ) / n

2. **Squared Deviations from the Mean:** For each value, subtract the mean and then square the result.

3. **Variance:** Add up the squared deviations from the mean for all values and divide by n-1 (since we're estimating the population standard error from a sample).

    **Formula:** Variance = Σ(xᵢ - μ)² / (n - 1)

4. **Standard Error:** Take the square root of the variance.

    **Formula:** Standard Error (SE) = √(Variance)


## Calculating the t-Statistic

The t-statistic, denoted by **T**, is a statistical test used to compare the means of two groups, either independent or paired. It assesses the **magnitude** and **significance** of the difference between the observed mean difference (`d̅`) and the hypothesized mean difference (typically zero under the null hypothesis).

**Formula:**   T = d̅ / SE(d̅)

- **d̅**: Observed mean difference between the groups.
- **SE(d̅)**: Standard error of the mean difference (formula may vary depending on the t-test type).


# Execution Time Comparison of Amani's and Lydia's Codes

The table shows the number of tokens (presumably used for some kind of authentication or authorization) and the execution times (in seconds) for two different codes, Amani's code and Lydia's code. The table also shows the difference in execution time between the two codes and the success status (Success) for each test.

## Observations from the Table

- **Overall Success:** Both codes seem to be successful in all test cases (indicated by "Success" in the last column).
- **Efficiency:** Amani's code appears to be generally faster than Lydia's code for all the test cases. The difference in execution time ranges from -20.611 seconds (meaning Amani's code is 20.611 seconds faster) to -53.568 seconds (meaning Amani's code is 53.568 seconds faster).
- **Trends:** It's difficult to identify any clear trends in the data without additional context. The execution times don't seem to strictly increase or decrease as the number of tokens increases.

## Execution Times Table

![image.png](attachment:772be6c7-1f0c-47bc-bb54-b6ff5dba85f9.png)

# T-Test Analysis of Execution Times

Based on the observations and calculations in Table 1, it can be seen that the value of *t* for the paired T-score is equal to -2.769. A paired T-score with a confidence of 99.5% is applied to the data with a degree of freedom of 29. According to the one-tail T-test table represented in Figure (t-table) below, our computed *t*-value should be compared with 2.756.

If the *t* value we computed is larger than 2.756, then the null hypothesis (H0) is rejected and the alternative hypothesis (Ha) is accepted. As can be seen, the *t* value we calculated is almost equal to 2.756, which obviously rejects the alternative hypothesis and proves that both Amani's and Lydia's codes perform almost the same in completing this task.

![t_table.png](attachment:5d20ee6c-5656-482d-8828-8f5036480619.png)

Based on the chart below, it can be seen from the bar chart that in all cases, the simulation time for Amani’s code is slitly better than Lydia’s simulation time.

![image.png](attachment:e902598b-1574-43a4-bff7-f15d5a8e4192.png)

## Testing Procedure

It's important to note that the tests were conducted using a test runner script. This script executes the code multiple times, in this case, six times for each number of tokens.

**Code Modifications:**

Changes were made to both Amani's and Lydia's code. Additionally, a separate executable Python file (`run_tests.py`) was created to run the code using the script.

**Running the Tests:**

To execute the tests using the script, use the following command:

```bash
python3 run_tests.py <number_of_tries>