# Analysis
Below is data concerning the 2023-2024 seasons. There's a moderate ($r$ = -0.4 to -0.5) negative correlation between (stat-xstat) and how players perform the following season relative to their previous season. The p-values are also very low (< 0.00001), indicating that the correlation is statistically significant. It is important to note that $r^{2}$ for the graphs only range from 0.19 to 0.24, meaning that only 19-24% of variance in the change in OPS is explained by the difference in actual and expected statistics.


In [41]:
import pandas as pd
from scipy import stats

df = pd.read_csv('../out.csv')

In [42]:
rwoba, pwoba = stats.pearsonr(df['wobadiff (2023)'], df['ops_change 2023-2024'])
rslg, pslg = stats.pearsonr(df['xslgdiff (2023)'], df['ops_change 2023-2024'])
adjusted_df = df.dropna()
rhr, phr = stats.pearsonr(adjusted_df['xhrdiff (2023)'], adjusted_df['ops_change 2023-2024'])
rba, pba = stats.pearsonr(df['xbadiff (2023)'], df['ops_change 2023-2024'])

filtered_df = df[df['babipdiff (2023)'] != 0] #ignores 2023 rookies, since babip for that season = career babip
rbabip, pbabip = stats.pearsonr(filtered_df['babipdiff (2023)'], filtered_df['ops_change 2023-2024'])

data = {
    '%ΔOPS vs wOBA-xwOBA': [rwoba, rwoba**2, pwoba],
    '%ΔOPS vs SLG - xSLG': [rslg, rslg**2, pslg], 
    '%ΔOPS vs BA - xBA': [rba, rba**2, pba],
    '%ΔOPS vs HR - xHR': [rhr, rhr**2, phr],
    '%ΔOPS vs 2023 BABIP - Career BABIP': [rbabip, rbabip**2, pbabip]
}

values_df = pd.DataFrame.from_dict(data, orient='index', columns=['r', 'r^2', 'p'])
values_df.rename_axis("graph", inplace=True)
display(values_df)

Unnamed: 0_level_0,r,r^2,p
graph,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
%ΔOPS vs wOBA-xwOBA,-0.454729,0.206779,5.861345e-12
%ΔOPS vs SLG - xSLG,-0.422607,0.178596,2.251576e-10
%ΔOPS vs BA - xBA,-0.422587,0.17858,2.256196e-10
%ΔOPS vs HR - xHR,-0.228158,0.052056,0.001123589
%ΔOPS vs 2023 BABIP - Career BABIP,-0.302709,0.091632,8.156711e-05
