Menginstal dan mengimpor Pandas untuk manipulasi data berbasis tabel.

In [None]:
!pip install pandas



Membaca dan Menampilkan Data
File CSV dimuat ke dataframe data menggunakan pd.read_csv().
Menampilkan 5 baris pertama dataset dengan data.head() untuk memahami struktur data.


In [None]:
import pandas as pd

# Load the uploaded CSV file to inspect its content
file_path = '/content/drive/MyDrive/mp1/epldata_final.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset
data.head()

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Alexis Sanchez,Arsenal,28,LW,1,65.0,4329,12.0,17.10%,264,3.0,Chile,0,4,1,1,0
1,Mesut Ozil,Arsenal,28,AM,1,50.0,4395,9.5,5.60%,167,2.0,Germany,0,4,1,1,0
2,Petr Cech,Arsenal,35,GK,4,7.0,1529,5.5,5.90%,134,2.0,Czech Republic,0,6,1,1,0
3,Theo Walcott,Arsenal,28,RW,1,20.0,2393,7.5,1.50%,122,1.0,England,0,4,1,1,0
4,Laurent Koscielny,Arsenal,31,CB,3,22.0,912,6.0,0.70%,121,2.0,France,0,4,1,1,0


Pengecekan Nilai Kosong

Menghitung jumlah nilai kosong (NaN dan NA) per kolom menggunakan isnull() dan isna().
Hasilnya dirangkum dalam tabel baru (null_and_na_data) untuk analisis lebih lanjut.

In [None]:
# Check for null (NaN) values in the dataset
null_data = data.isnull().sum()

# Check for NA values in the dataset (equivalent to NaN in pandas)
na_data = data.isna().sum()

# Combine the results to provide a clear overview
null_and_na_data = pd.DataFrame({
    "Column": data.columns,
    "Null Values": null_data,
    "NA Values": na_data
}).reset_index(drop=True)

null_and_na_data


Unnamed: 0,Column,Null Values,NA Values
0,name,0,0
1,club,0,0
2,age,0,0
3,position,0,0
4,position_cat,0,0
5,market_value,0,0
6,page_views,0,0
7,fpl_value,0,0
8,fpl_sel,0,0
9,fpl_points,0,0


Seleksi Kolom Data

Memilih kolom fpl_points (target) dan age (fitur) untuk analisis lebih lanjut. Hasilnya berupa subset data sederhana.

In [None]:
# Select only the 'fpl_points' (TARGET) and 'age' (FEATURE) columns
# Use the 'data' DataFrame instead of 'data_cleaned'
selected_data = data[['fpl_points', 'age']]

# Display the first few rows of the selected data
selected_data.head()

Unnamed: 0,fpl_points,age
0,264,28
1,167,28
2,134,35
3,122,28
4,121,31


Transformasi Data String ke Numerik

Kolom bertipe string/objek diidentifikasi dan diubah menjadi angka menggunakan LabelEncoder.
Data hasil transformasi disimpan dalam dataframe baru (numeric_data) agar siap digunakan untuk model machine learning.

In [None]:
from sklearn.preprocessing import LabelEncoder

# Create a copy of the data to modify
numeric_data = data.copy()

# Identify columns with string or object data types
string_columns = numeric_data.select_dtypes(include=['object']).columns

# Apply LabelEncoder to each string column
label_encoders = {}
for column in string_columns:
    le = LabelEncoder()
    numeric_data[column] = le.fit_transform(numeric_data[column])
    label_encoders[column] = le  # Store the encoder for future reference

# Display the first few rows of the modified data
numeric_data.head(), string_columns


(   name  club  age  position  position_cat  market_value  page_views  \
 0    19     0   28         8             1          65.0        4329   
 1   302     0   28         0             1          50.0        4395   
 2   348     0   35         5             4           7.0        1529   
 3   418     0   28        11             1          20.0        2393   
 4   250     0   31         1             3          22.0         912   
 
    fpl_value  fpl_sel  fpl_points  region  nationality  new_foreign  age_cat  \
 0       12.0       43         264     3.0           12            0        4   
 1        9.5       92         167     2.0           26            0        4   
 2        5.5       93         134     2.0           18            0        6   
 3        7.5       15         122     1.0           22            0        4   
 4        6.0        7         121     2.0           25            0        4   
 
    club_id  big_club  new_signing  
 0        1         1            0 

Split Dataset (Pembagian Data)

Menggunakan train_test_split untuk membagi dataset menjadi data latih (80%) dan data uji (20%) dengan variabel fitur X (age) dan target y (fpl_points).
Hasil pembagian diverifikasi melalui bentuk data (jumlah baris dan kolom).

In [None]:
from sklearn.model_selection import train_test_split

# Assuming 'numeric_data' from the previous cell is the cleaned data
# If not, replace 'numeric_data' with the actual variable containing cleaned data
data_cleaned = numeric_data

# Define features (X) and target (y)
X = data_cleaned[['age']]  # Features
y = data_cleaned['fpl_points']  # Target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the resulting datasets for verification
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((368, 1), (93, 1), (368,), (93,))

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

# Tentukan jumlah kategori unik dalam kolom TARGET
num_categories = y_train.nunique()

# Tentukan nilai K
K = 3 if num_categories % 2 == 0 else 4

# Inisialisasi dan latih model KNN
knn = KNeighborsClassifier(n_neighbors=K)
knn.fit(X_train, y_train)

# Lakukan prediksi
y_pred = knn.predict(X_test)

# Evaluasi model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Accuracy: 0.1827956989247312

Classification Report:
               precision    recall  f1-score   support

           0       0.27      0.70      0.39        23
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         2
           8       0.00      0.00      0.00         1
           9       0.00      0.00      0.00         0
          12       0.00      0.00      0.00         3
          14       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         0
          24       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1
          32       0.00      0.00      0.00         2
          37       0.00      0.00      0.00         1
          38       0.00    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
# Lakukan prediksi menggunakan model KNN
y_pred = knn.predict(X_test)

# Tampilkan hasil prediksi
print("Predicted values: ", y_pred)

# Evaluasi performa model dengan akurasi
from sklearn.metrics import accuracy_score
print("Accuracy: ", accuracy_score(y_test, y_pred))


Predicted values:  [ 73   0   0   0   0  23  38   0  38  73   0   0   0  73  23   0   0  41
   0  41  73  41   0  38   0  23   0   0   0   0  38   0  41   0   0  38
   0   0  23  41   0   0   0   0  73   0   0  57   0 113   0  73   0   0
   0   0   0  73   0   0   0   0   0  73   0   0  38   0   0   0  57   0
   0  23   0   9   0  38   0   0   0  73   0  41   0   0   0  73   0  41
   0   0   0]
Accuracy:  0.1827956989247312


In [None]:
import numpy as np

# Ambil dua contoh data untuk dihitung jaraknya
point1 = X_test.iloc[0].values  # Ambil data pertama dari X_test
point2 = X_test.iloc[1].values  # Ambil data kedua dari X_test

# Hitung jarak Euclidean
euclidean_distance = np.linalg.norm(point1 - point2)

print("Euclidean Distance:", euclidean_distance)


Euclidean Distance: 3.0


In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix

# Lakukan prediksi menggunakan model KNN
y_pred = knn.predict(X_test)

# Hitung akurasi
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Hitung presisi, recall, dan F1-score
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")

# Tampilkan classification report dan confusion matrix
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))


Accuracy: 0.1827956989247312
Precision: 0.06702508960573476
Recall: 0.1827956989247312
F1 Score: 0.09730417270254037

Classification Report:
               precision    recall  f1-score   support

           0       0.27      0.70      0.39        23
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         2
           8       0.00      0.00      0.00         1
           9       0.00      0.00      0.00         0
          12       0.00      0.00      0.00         3
          14       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         0
          24       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1
          32       0.00      0.00      0.00    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
data

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Alexis Sanchez,Arsenal,28,LW,1,65.0,4329,12.0,17.10%,264,3.0,Chile,0,4,1,1,0
1,Mesut Ozil,Arsenal,28,AM,1,50.0,4395,9.5,5.60%,167,2.0,Germany,0,4,1,1,0
2,Petr Cech,Arsenal,35,GK,4,7.0,1529,5.5,5.90%,134,2.0,Czech Republic,0,6,1,1,0
3,Theo Walcott,Arsenal,28,RW,1,20.0,2393,7.5,1.50%,122,1.0,England,0,4,1,1,0
4,Laurent Koscielny,Arsenal,31,CB,3,22.0,912,6.0,0.70%,121,2.0,France,0,4,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
456,Edimilson Fernandes,West+Ham,21,CM,2,5.0,288,4.5,0.40%,38,2.0,Switzerland,0,1,20,0,1
457,Arthur Masuaku,West+Ham,23,LB,3,7.0,199,4.5,0.20%,34,4.0,Congo DR,0,2,20,0,1
458,Sam Byram,West+Ham,23,RB,3,4.5,198,4.5,0.30%,29,1.0,England,0,2,20,0,0
459,Ashley Fletcher,West+Ham,21,CF,1,1.0,412,4.5,5.90%,16,1.0,England,0,1,20,0,1
