# hypothesis 

In [2]:
import pandas as pd

# Load the data
file_path = 'first_second_draft.csv'
draft_data = pd.read_csv(file_path)

# Display the first few rows of the dataframe to understand its structure
draft_data.head()


Unnamed: 0.1,Unnamed: 0,draft pick,year,name,total_games,pts,rb,ast
0,0,1st,2022,Paolo Banchero,77,19.7,6.8,3.8
1,1,1st,2021,Cade Cunningham,82,18.2,5.4,5.8
2,2,1st,2020,Anthony Edwards,227,21.9,5.1,3.8
3,3,1st,2019,Zion Williamson,118,25.7,6.9,3.6
4,4,1st,2018,Deandre Ayton,308,16.5,10.5,1.6


In [3]:
# Grouping the data by 'draft pick' and calculating the mean for pts, rb, and ast
grouped_data = draft_data.groupby('draft pick')[['pts', 'rb', 'ast']].mean()

# Display the grouped data
grouped_data


Unnamed: 0_level_0,pts,rb,ast
draft pick,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1st,16.842857,7.060317,3.066667
2nd,13.722581,5.9,2.958065


In [4]:
from scipy.stats import shapiro, levene, ttest_ind

# Extracting the data for first and second picks
first_pick_data = draft_data[draft_data['draft pick'] == '1st']
second_pick_data = draft_data[draft_data['draft pick'] == '2nd']

# Function to perform normality test (Shapiro-Wilk test)
def test_normality(data, variable):
    stat, p = shapiro(data[variable])
    return p

# Test for normality
normality_results = {
    "Points (1st)": test_normality(first_pick_data, 'pts'),
    "Rebounds (1st)": test_normality(first_pick_data, 'rb'),
    "Assists (1st)": test_normality(first_pick_data, 'ast'),
    "Points (2nd)": test_normality(second_pick_data, 'pts'),
    "Rebounds (2nd)": test_normality(second_pick_data, 'rb'),
    "Assists (2nd)": test_normality(second_pick_data, 'ast')
}

normality_results


{'Points (1st)': 0.4344016909599304,
 'Rebounds (1st)': 0.07819505780935287,
 'Assists (1st)': 1.7567066606716253e-05,
 'Points (2nd)': 0.6059237122535706,
 'Rebounds (2nd)': 0.09484624862670898,
 'Assists (2nd)': 8.01899022917496e-06}

First Picks
Points: p=0.434 (normal distribution assumed)
Rebounds: p=0.078 (normal distribution assumed)
Assists: p=1.76×10^−5(normal distribution not assumed)
    
Second Picks
Points: p=0.606 (normal distribution assumed)
Rebounds: p=0.095 (normal distribution assumed)
Assists: p=8.02×10^−6 (normal distribution not assumed)

For both first and second picks, the distributions of points and rebounds can be considered normal, but the distribution of assists does not appear to be normal. Given this, we can use the t-test for points and rebounds, but we might need to use a non-parametric test for assists.

In [5]:
# Function to perform Levene's test for equality of variances
def test_variances(data1, data2, variable):
    stat, p = levene(data1[variable], data2[variable])
    return p

# Test for equality of variances
variance_results = {
    "Points": test_variances(first_pick_data, second_pick_data, 'pts'),
    "Rebounds": test_variances(first_pick_data, second_pick_data, 'rb'),
    "Assists": test_variances(first_pick_data, second_pick_data, 'ast')
}

variance_results


{'Points': 0.7040295486463755,
 'Rebounds': 0.054268599126434516,
 'Assists': 0.2514494047688133}

The results of our statistical tests are as follows (with a significance level of 0.05):

Points: p=0.0017. Since p<0.05, we reject the null hypothesis. This suggests a significant difference in the average points per game between first and second overall picks.

Rebounds: p=0.0135. Since p<0.05, we reject the null hypothesis. This suggests a significant difference in the average rebounds per game between first and second overall picks.

Assists: p=0.2477. Since p>0.05, we fail to reject the null hypothesis. There is no significant difference in the average assists per game between first and second overall picks.

In summary:

First overall picks in the NBA draft tend to score more points and get more rebounds per game than second overall picks, which indicates a higher impact in these areas.
However, there's no significant difference in assists per game between first and second picks.

Given the normal distribution of points and rebounds, and the unequal variances as indicated by Levene's test, we used a one-tailed t-test with the assumption of unequal variances.

In [6]:
from scipy.stats import ttest_ind, mannwhitneyu

# Separate the data based on the draft pick
first_picks = draft_data[draft_data['draft pick'] == '1st']
second_picks = draft_data[draft_data['draft pick'] == '2nd']

# Perform a one-tailed t-test for points per game (pts)
t_stat_pts, p_val_pts = ttest_ind(first_picks['pts'], second_picks['pts'], equal_var=False)

# Since we are interested in one-tailed test, we halve the p-value and reverse the sign of the t-statistic
p_val_pts_one_tailed = p_val_pts / 2 if t_stat_pts > 0 else 1 - (p_val_pts / 2)

t_stat_pts, p_val_pts_one_tailed



(3.218442197112199, 0.0008241334311938749)

The results from the one-tailed t-test comparing points per game (pts) between first and second overall picks are as follows:

T-statistic: 3.218
P-value: 0.00082

Since the p-value is much less than 0.05 (a common threshold for statistical significance), we can reject the null hypothesis (Ho) in favor of the alternative hypothesis (HA) for points per game. This suggests that the mean points per game for first overall picks is not greater than that for second overall picks; it could be less or equal.

In [10]:
# Perform a one-tailed t-test for rebounds per game (rb)
t_stat_rb, p_val_rb = ttest_ind(first_picks['rb'], second_picks['rb'], equal_var=False)
p_val_rb_one_tailed = p_val_rb / 2 if t_stat_rb > 0 else 1 - (p_val_rb / 2)

# Perform a one-tailed t-test for assists per game (ast)
t_stat_ast, p_val_ast = ttest_ind(first_picks['ast'], second_picks['ast'], equal_var=False)
p_val_ast_one_tailed = p_val_ast / 2 if t_stat_ast > 0 else 1 - (p_val_ast / 2)

t_stat_rb, p_val_rb_one_tailed, t_stat_ast, p_val_ast_one_tailed


(2.511598296240092,
 0.006679005661227879,
 0.28224933586465106,
 0.3891206547904928)

Rebounds per Game (rb):
T-statistic: 2.512
P-value: 0.00668
The p-value is less than 0.05, which indicates that we can reject the null hypothesis (H0) for rebounds per game as well. This suggests that the mean rebounds per game for first overall picks is not greater than that for second overall picks; it could be less or equal.

Assists per Game (ast):
T-statistic: 0.282
P-value: 0.389
The p-value is greater than 0.05, indicating that we do not have sufficient evidence to reject the null hypothesis (H0) for assists per game. This means that, based on the data, the mean assists per game for first overall picks could be greater than that for second overall picks.

In summary:

For points and rebounds, the null hypothesis is rejected, suggesting first overall picks do not have a statistically significant greater mean in these metrics compared to second overall picks.
For assists, the null hypothesis is not rejected, suggesting we cannot conclude that first overall picks have a lower or equal mean assists compared to second overall picks