# Hypothesis Testing - Hypothesis #1

With the python script we will test the following hypothesis, we will determine either if we will reject or fail to reject the Null Hypothesis

**Null Hypothesis (H₀):** There is no distinct, unique pattern in the time I spend on each move in the games I won when the moves of the whole game are divided into equal phases. The time distribution pattern in winning games does not differ significantly from the time distribution pattern observed in games I lost. Any observed difference between winning and losing games is due to random variation rather than consistent patterns.

**Alternative Hypothesis (H₁):** The time I spend on each move in the games I won follows a unique, distinct pattern when the moves of the whole game are divided into equal phases, and this pattern differs significantly from the time distribution pattern observed in games I lost.


### Usage of the Script
The following function must be included and defined in the main EDA script. After this, the function may be called in the main function to perform the hypothesis tests.

### Function: `perform_hypothesis_tests`
This function is responsible for perform hypothesis tests. Below is the implementation:

In [10]:
def perform_hypothesis_tests(results):
    """Test if phase-based time distributions differ between wins and losses."""
    import numpy as np
    from scipy.stats import ttest_ind

    categories = ['all', 'blitz', 'bullet', 'rapid', 'daily']
    num_phases = 5

    print("\nTIME DISTRIBUTION PATTERN HYPOTHESIS TEST")
    print("==========================================")

    for category in categories:
        win_data = results[category]['win']
        loss_data = results[category]['loss']

        if len(win_data) < 10 or len(loss_data) < 10:
            print(f"\n{category.capitalize()} games: Not enough move data.")
            continue

        def split_into_games(data):
            games = []
            current_game = []
            last_move = 0
            for move, time in data:
                if move < last_move:  # move numbers reset → new game
                    if len(current_game) > 5:
                        games.append(current_game)
                    current_game = []
                current_game.append((move, time))
                last_move = move
            if len(current_game) > 5:
                games.append(current_game)
            return games

        def compute_phase_vectors(games):
            phase_vectors = []
            for game in games:
                moves = sorted(game, key=lambda x: x[0])
                total_moves = moves[-1][0]
                phase_size = max(1, total_moves // num_phases)
                phases = [[] for _ in range(num_phases)]
                for move_num, t in moves:
                    idx = min((move_num - 1) // phase_size, num_phases - 1)
                    phases[idx].append(t)
                phase_avg = [np.mean(p) if p else 0 for p in phases]
                phase_vectors.append(phase_avg)
            return np.array(phase_vectors)

        win_games = split_into_games(win_data)
        loss_games = split_into_games(loss_data)

        if len(win_games) < 3 or len(loss_games) < 3:
            print(f"\n{category.capitalize()} games: Not enough full games with move data.")
            continue

        win_matrix = compute_phase_vectors(win_games)
        loss_matrix = compute_phase_vectors(loss_games)

        print(f"\n{category.capitalize()} games:")
        print(f"  {len(win_matrix)} win games, {len(loss_matrix)} loss games")
        print(f"  Divided into {num_phases} phases per game")

        significant_phases = 0
        for i in range(num_phases):
            win_times = win_matrix[:, i]
            loss_times = loss_matrix[:, i]

            t_stat, p_val = ttest_ind(win_times, loss_times, equal_var=False)
            sig = p_val < 0.05
            if sig:
                significant_phases += 1
            win_mean = np.mean(win_times)
            loss_mean = np.mean(loss_times)
            direction = "more" if win_mean > loss_mean else "less"
            print(f"  Phase {i+1}: Win avg = {win_mean:.2f}s, Loss avg = {loss_mean:.2f}s → p = {p_val:.4f} → {'✓ Significant' if sig else '✗ Not significant'} ({direction} time when winning)")

        print("\n  HYPOTHESIS TEST RESULT:")
        if significant_phases > 0:
            print(f"  ✓ REJECT H₀: Time pattern differs in {significant_phases}/{num_phases} phases")
        else:
            print("  ✗ FAIL TO REJECT H₀: No significant pattern difference")

    print("\nHypothesis testing complete.")


TIME DISTRIBUTION PATTERN HYPOTHESIS TEST

All games:
  1316 win games, 1148 loss games
  Divided into 5 phases per game
  Phase 1: Win avg = 19.16s, Loss avg = 1.93s → p = 0.0962 → ✗ Not significant (more time when winning)
  Phase 2: Win avg = 6.90s, Loss avg = 4.33s → p = 0.3336 → ✗ Not significant (more time when winning)
  Phase 3: Win avg = 12.28s, Loss avg = 5.56s → p = 0.1198 → ✗ Not significant (more time when winning)
  Phase 4: Win avg = 15.99s, Loss avg = 3.62s → p = 0.0308 → ✓ Significant (more time when winning)
  Phase 5: Win avg = 13.60s, Loss avg = 5.39s → p = 0.1524 → ✗ Not significant (more time when winning)

  HYPOTHESIS TEST RESULT:
  ✓ REJECT H₀: Time pattern differs in 1/5 phases

Blitz games:
  624 win games, 555 loss games
  Divided into 5 phases per game
  Phase 1: Win avg = 2.27s, Loss avg = 2.59s → p = 0.0017 → ✓ Significant (less time when winning)
  Phase 2: Win avg = 4.65s, Loss avg = 5.21s → p = 0.0012 → ✓ Significant (less time when winning)
  Phase 3