# Season-to-season changes

Exploring how many regular-season games have a different outcomes from one season to the next.

## Data organisation

As usual, we have one folder per competition, within which we have one folder per season. We will have two variables `comp` and `season` to specify which files we want to use.

Here, `season` will refer to the 'current' season, and we will always compare that season with the preceeding one.

We will analyse several instances, so we have two lists `comps` and `seasons`, and `comp` and `season` will iterate over these two lists.

In [27]:
!ls ../opta_data/Mens/

[34mAutumnNationsCup[m[m        [34mNPC[m[m                     [34mSuperRugbyAU[m[m
[34mChallengeCup[m[m            [34mPacificNationsCup[m[m       [34mSuperRugbyAotearoa[m[m
[34mChampionsCup[m[m            [34mPremiership[m[m             [34mSuperRugbyPacific[m[m
[34mCurrieCup[m[m               [34mProD2[m[m                   [34mSuperRugbyTranstasman[m[m
[34mInternational[m[m           [34mRWC[m[m                     [34mTRC[m[m
[34mJapanRugbyLeagueOneD1[m[m   [34mRainbowCup[m[m              [34mTop14[m[m
[34mJapanTopLeague[m[m          [34mRainbowCupSA[m[m            [34mURC[m[m
[34mLions[m[m                   [34mRugbyEuropeChampionship[m[m
[34mMLR[m[m                     [34mSixNations[m[m


In [40]:
comps = ["SuperRugbyPacific", "Top14", "URC"]
seasons = [23, 24, 25]

## Support functions

In [43]:
def process_one_game(filename):
    with open(filename,'r') as inFile:
        lines = inFile.readlines()

    # Read the header line
    header = lines[0].strip().split(',')
    
    # Determine the index for all the columns we want to use
    homeTeamName = header.index('homeTeamName')
    awayTeamName = header.index('awayTeamName')
    isHome = header.index('isHome')
    result = header.index('result')

    # Process one line at a time until we find a line about the home team
    for i in range(1,len(lines)):
        temp_array = lines[i].strip().split(',')
        is_home = temp_array[isHome]
        if is_home!="Y":
            continue

        home_team = temp_array[homeTeamName]
        away_team = temp_array[awayTeamName]
        game_result = temp_array[result]
        break

    return home_team, away_team, game_result
    

## Process all seasons

In [46]:
import glob

def check_duplicates(my_list):
    seen = set()
    for x in my_list:
        if x in seen:
            print(f"Game {x} has been seen before")
        seen.add(x)

round_limits = {"SuperRugbyPacific": 15,
                "Top14": 26,
                "URC": 18
               }

results = dict()

for comp in comps:
    for s in seasons:
        for offset in [0,1]:
            season = f"20{s-1-offset}-{s-offset}"
            print(f"Extracting the {season} season for {comp}")
    
            if comp not in results:
                results[comp] = dict()
            
            list_games = []
            for rd in range(0,round_limits[comp]):
                round_games = glob.glob(f"../opta_data/Mens/{comp}/{season}/Round_{rd+1}/*.csv",recursive=True)
                list_games += round_games
            
            print(f"{len(list_games)} games to process.")
            check_duplicates(list_games)
    
            for f in list_games:
                home, away, outcome = process_one_game(f)
                game = f"{home}_{away}"
                if game not in results[comp]:
                    results[comp][game] = dict()
                results[comp][game][s-offset] = outcome

Extracting the 2022-23 season for SuperRugbyPacific
84 games to process.
Extracting the 2021-22 season for SuperRugbyPacific
84 games to process.
Extracting the 2023-24 season for SuperRugbyPacific
84 games to process.
Extracting the 2022-23 season for SuperRugbyPacific
84 games to process.
Extracting the 2024-25 season for SuperRugbyPacific
25 games to process.
Extracting the 2023-24 season for SuperRugbyPacific
84 games to process.
Extracting the 2022-23 season for Top14
182 games to process.
Extracting the 2021-22 season for Top14
182 games to process.
Extracting the 2023-24 season for Top14
182 games to process.
Extracting the 2022-23 season for Top14
182 games to process.
Extracting the 2024-25 season for Top14
105 games to process.
Extracting the 2023-24 season for Top14
182 games to process.
Extracting the 2022-23 season for URC
144 games to process.
Extracting the 2021-22 season for URC
144 games to process.
Extracting the 2023-24 season for URC
144 games to process.
Extracting

## Analyse changes

In [54]:
for comp in comps:
    for s in seasons:
        season = f"20{s-1}-{s}"
        print(f"\nComparing the {season} season of {comp} with the previous season")

        same_game = 0
        flipped_game = 0
        same_game_different_outcome = 0
        flipped_game_different_outcome = 0

        for game in results[comp]:
            if s not in results[comp][game]:
                continue

            home = game.split("_")[0]
            away = game.split("_")[1]
            flip = f"{away}_{home}"
            
            if s-1 in results[comp][game]:
                same_game+=1
                if results[comp][game][s]!=results[comp][game][s-1]:
                    same_game_different_outcome+=1

            elif flip in results[comp]:
                if s-1 in results[comp][flip]:
                    flipped_game+=1
                    if results[comp][game][s]==results[comp][flip][s-1]: # here the game is flipped, so a home team means the *other* team won compared to current season
                        flipped_game_different_outcome+=1

            # else:
            #     print(f"Cannot find {game} or {flip} in {s-1}")

        print(f"{same_game} games are exact repeats; {same_game_different_outcome} of those have a different outcome: {round(same_game_different_outcome/same_game*100,2)}%")
        if flipped_game>0:
            print(f"{flipped_game} games are flipped; {flipped_game_different_outcome} of those have a different outcome: {round(flipped_game_different_outcome/flipped_game*100,2)}%")

        total = same_game+flipped_game
        total_diff = same_game_different_outcome+flipped_game_different_outcome
        print(f"Overall: {round(total_diff/total*100,2)}% are different ({total_diff} out of {total})")


Comparing the 2022-23 season of SuperRugbyPacific with the previous season
36 games are exact repeats; 12 of those have a different outcome: 33.33%
48 games are flipped; 16 of those have a different outcome: 33.33%
Overall: 33.33% are different (28 out of 84)

Comparing the 2023-24 season of SuperRugbyPacific with the previous season
52 games are exact repeats; 18 of those have a different outcome: 34.62%
32 games are flipped; 15 of those have a different outcome: 46.88%
Overall: 39.29% are different (33 out of 84)

Comparing the 2024-25 season of SuperRugbyPacific with the previous season
16 games are exact repeats; 10 of those have a different outcome: 62.5%
9 games are flipped; 7 of those have a different outcome: 77.78%
Overall: 68.0% are different (17 out of 25)

Comparing the 2022-23 season of Top14 with the previous season
156 games are exact repeats; 55 of those have a different outcome: 35.26%
Overall: 35.26% are different (55 out of 156)

Comparing the 2023-24 season of Top1