## NBA資料集
### 設計說明：
1. 請使用線性迴歸預測方式撰寫程式，讀取NBApoints.csv，此資料收集了NBA球員的資訊。
2. NBApoints.csv其中每一行都包含用逗號分隔的字串格式等共30個欄位，資料集的欄位簡易說明如下，其餘省略。

欄位名稱 |說明
--------|----
Rk	|排名
Player	|球員
Pos	|守備位置
Age	|年齡
Tm	|隊名
3. 請將Pos欄位及Tm欄位資料轉換為數值，以利進行後續處理。
4. 接著建立機器學習模型並預測。以Pos、Age、2P、Tm四個欄位進行訓練。
5. 運用sklearn.linear_model.score，計算出R-squared與P-value。
### 請依序回答下列問題：


1. 輸入測試資料[5,28,10,11]，請填入預測NBA球員的三分球得球數（四捨五入取至小數點後第四位）？
2. 請填入R-squared 之值的模型解釋力（四捨五入取至小數點後第四位）？
3. 檢定變數的顯著性，以Pos的 P-value （P值）是否小於 0.05（信心水準 95%）來判定。Pos的P值顯著填入Y，不顯著填入N？
4. 檢定變數的顯著性，以Age的P-value （P值）是否小於 0.05（信心水準 95%）來判定。Age的P值顯著填入Y，不顯著填入N？

In [2]:
# 導入所需套件
import pandas as pd
import numpy as np
from sklearn import preprocessing, linear_model
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import f_regression

In [3]:
# 匯入原始資料並檢視相關資訊
NBApoints_data= pd.read_csv("NBApoints.csv")
NBApoints_data.info()
NBApoints_data.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 476 entries, 0 to 475
Data columns (total 30 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Rk      476 non-null    int64  
 1   Player  476 non-null    object 
 2   Pos     476 non-null    object 
 3   Age     476 non-null    int64  
 4   Tm      476 non-null    object 
 5   G       476 non-null    int64  
 6   GS      476 non-null    int64  
 7   MP      476 non-null    float64
 8   FG      476 non-null    float64
 9   FGA     476 non-null    float64
 10  FG%     475 non-null    float64
 11  3P      476 non-null    float64
 12  3PA     476 non-null    float64
 13  3P%     433 non-null    float64
 14  2P      476 non-null    float64
 15  2PA     476 non-null    float64
 16  2P%     473 non-null    float64
 17  eFG%    475 non-null    float64
 18  FT      476 non-null    float64
 19  FTA     476 non-null    float64
 20  FT%     462 non-null    float64
 21  ORB     476 non-null    float64
 22  DR

Unnamed: 0,Rk,Age,G,GS,MP,FG,FGA,FG%,3P,3PA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PS/G▼
count,476.0,476.0,476.0,476.0,476.0,476.0,476.0,475.0,476.0,476.0,...,462.0,476.0,476.0,476.0,476.0,476.0,476.0,476.0,476.0,476.0
mean,238.5,26.594538,54.785714,25.840336,20.160084,3.113235,6.951891,0.444091,0.688025,1.963866,...,0.741426,0.870378,2.754832,3.620378,1.82605,0.653361,0.415126,1.146849,1.744538,8.34937
std,137.553626,4.38244,24.274576,29.413419,9.218341,2.053724,4.420795,0.085532,0.71357,1.862538,...,0.129695,0.775428,1.76921,2.407572,1.756933,0.438861,0.44818,0.782957,0.742706,5.652556
min,1.0,19.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,119.75,23.0,37.0,1.0,13.0,1.5,3.5,0.4045,0.1,0.3,...,0.6825,0.3,1.5,1.9,0.675,0.3,0.1,0.6,1.2,4.0
50%,238.5,26.0,62.0,10.0,19.95,2.7,5.95,0.439,0.5,1.5,...,0.756,0.6,2.35,3.1,1.3,0.6,0.3,0.9,1.8,7.0
75%,357.25,29.0,76.0,53.0,28.025,4.4,9.6,0.483,1.1,3.125,...,0.827,1.2,3.6,4.8,2.325,0.9,0.5,1.5,2.2,11.625
max,476.0,39.0,82.0,82.0,42.0,10.2,20.5,1.0,5.1,11.2,...,1.0,4.9,10.3,14.8,11.7,2.1,3.7,4.6,6.0,30.1


## NBA資料集中的欄位意義

資料名	|英文全名|含義 |資料名	|英文全名|含義
----|------|----|----|------|----
Rk |Rank	|排名|G |Games	|參與的比賽場數（都為82場）
GS |Games Start |先發比賽場數|MP |Minutes Played	|平均每場比賽進行的時間(分鐘)
FG|Field Goals	|投球命中次數|FGA|Field Goal Attempts	|投射次數
FG%|Field Goal Percentage	|投球命中率|3P|3-Point Field Goals	|三分球命中次數
3PA|3-Point Field Goal Attempts	|三分球投射次數|3P%|3-Point Field Goal Percentage	|三分球命中率
2P|2-Point Field Goals	|二分球命中次數|2PA|2-point Field Goal Attempts	|二分球投射次數
2P%|2-Point Field Goal Percentage	|二分球命中率|eFG%|effective field goal percentage |有效命中率
FT|Free Throws	|罰球命中次數|FTA|Free Throw Attempts	|罰球投射次數
FT%|Free Throw Percentage	|罰球命中率|ORB|Offensive Rebounds	|進攻籃板球
DRB|Defensive Rebounds	|防守籃板球|TRB|Total Rebounds	|籃板球總數
AST|Assists	|助攻|STL|Steals	|偷球(抄截)
BLK |Blocks	|封阻(火鍋)|TOV |Turnovers	|失誤
PF |Personal Fouls	|個人犯規|PTS |Points	|得分

In [4]:
# 將Pos欄位及Tm欄位資料轉換為數值
label_encoder_conver = preprocessing.LabelEncoder()
Pos_encoder_value = label_encoder_conver.fit_transform(NBApoints_data["Pos"])

label_encoder_conver = preprocessing.LabelEncoder()
Tm_encoder_value = label_encoder_conver.fit_transform(NBApoints_data["Tm"])
# 列印更新後資料
print(Pos_encoder_value)
print("\n")
print(Tm_encoder_value)

[3 5 4 0 4 3 1 5 3 4 3 5 4 1 4 3 5 3 5 4 0 3 3 4 3 5 4 3 1 5 3 1 0 5 0 1 4
 0 5 4 1 1 3 3 0 0 1 3 5 0 1 5 3 5 4 3 0 1 5 5 5 5 0 1 4 1 5 5 5 1 5 4 0 3
 4 1 3 3 1 5 5 5 0 4 0 5 3 4 5 4 4 3 3 0 4 0 0 4 1 3 1 5 5 1 5 4 1 3 0 1 3
 3 4 4 1 4 4 5 1 4 4 3 4 1 4 5 0 5 4 3 4 3 3 3 3 4 5 5 1 3 5 0 1 0 4 5 0 3
 4 0 0 3 4 3 1 5 5 5 4 5 5 5 3 4 0 5 4 3 1 1 5 0 0 3 0 3 5 5 3 0 5 3 5 5 1
 1 0 0 1 3 0 5 1 3 5 5 1 3 4 0 1 3 4 0 4 1 0 1 5 5 1 5 3 3 3 3 1 3 3 0 5 1
 1 3 5 4 5 1 4 0 4 4 5 0 3 0 1 4 0 0 0 0 4 1 3 3 4 1 1 1 3 0 0 5 5 1 1 0 5
 1 5 4 4 4 0 3 0 4 1 1 4 2 4 1 3 4 0 0 4 1 5 1 1 5 5 0 0 3 5 4 1 5 0 3 1 5
 3 5 5 3 1 5 5 0 0 0 4 0 1 1 0 3 4 1 1 1 3 0 0 4 1 5 1 3 4 5 3 1 5 1 1 3 4
 3 4 5 3 3 0 3 0 1 3 0 1 3 0 4 4 3 1 5 0 4 5 0 4 1 5 3 0 0 4 4 3 3 4 5 4 3
 0 3 3 4 5 1 1 4 1 1 5 5 0 0 0 0 4 4 3 0 3 1 1 0 5 4 3 1 5 4 1 4 1 3 1 1 0
 1 4 5 1 5 1 4 0 0 3 1 0 3 6 4 4 1 4 4 1 5 1 4 0 0 0 1 1 5 4 0 0 0 1 0 4 3
 4 1 5 3 1 0 1 3 1 5 3 3 1 1 1 4 3 5 5 3 1 0 5 0 5 0 1 5 4 4 4 3]


[ 9 10 20 25  5 24 18 27 20 11  

In [5]:
# 產生訓練資料集，建立模型與訓練
train_X = pd.DataFrame([Pos_encoder_value,NBApoints_data["Age"],
                        NBApoints_data["2P"],Tm_encoder_value]).T
                        
#train_X = pd.DataFrame([Pos_encoder_value,NBApoints_data["Age"]]).T

NBApoints_linear_model = LinearRegression()
NBApoints_linear_model.fit(train_X, NBApoints_data["3P"])

LinearRegression()

In [7]:
# 預測三分球得球數
NBApoints_linear_model_predict_result= NBApoints_linear_model.predict([[5,28,10,11]])
print('三分球得球數=',"% .4f" % NBApoints_linear_model_predict_result)

三分球得球數=  2.1119


In [8]:
# 計算R-squared與P-value
r_squared = NBApoints_linear_model.score(train_X, NBApoints_data["3P"])
print('R_squared值=',"% .4f" % r_squared)

print("f_regresstion\n")
print("P值=",f_regression(train_X, NBApoints_data["3P"])[1])

R_squared值=  0.3383
f_regresstion

P值= [2.28418914e-30 1.69037260e-01 8.25422767e-07 2.84584085e-01]




四個特徵值的P值:
Pos_encoder_value  2.28418914e-30   P值<0.05  Pos欄位顯著
NBApoints_data["Age"]  1.69037260e-01 P值>0.05  Age欄位不顯著
NBApoints_data["2P"]   8.25422767e-07  P值<0.05  2P欄位顯著
Tm_encoder_value  2.84584085e-01  P值>0.05  Tm欄位不顯著

In [11]:
# 產生訓練資料集，建立模型與訓練
train_X = pd.DataFrame([Pos_encoder_value,
                        NBApoints_data["2P"]]).T
                        
NBApoints_linear_model = LinearRegression()
NBApoints_linear_model.fit(train_X, NBApoints_data["3P"])
# 計算R-squared與P-value
r_squared = NBApoints_linear_model.score(train_X, NBApoints_data["3P"])
print('R_squared值=',"% .4f" % r_squared)

print("f_regresstion\n")
print("P值=",f_regression(train_X, NBApoints_data["3P"])[1])

R_squared值=  0.3248
f_regresstion

P值= [2.28418914e-30 8.25422767e-07]
