# Assignment 02 Spatial and Spatio-Temporal Data Processing

In this assignment, you will learn how to conduct a simple multi-dimensional data analysis in Python. We use the NBA statistics collected from https://www.nbastuffer.com/2024-2025-nba-player-stats/ as our database.


In [2]:
import pandas as pd
df = pd.read_csv('nbastat_20242025.csv')

## Task 1: Sort the players by the EFF (NBA's efficiency rating) (5%)

NBA's efficiency rating: $(PTS + REB + AST + STL + BLK − ((FGA − FGM) + (FTA − FTM) + TO))$

In our excel file, $FTM=FTA*FT\%$, $FGA = 2PA+3PA$, and $FGM=2PA*2P\%+3PA*3P\%$

In [3]:
# Step 1: Calculate FTM, FGA, and FGM
df['FTM'] = df['FTA'] * df['FT%']
df['FGA'] = df['2PA'] + df['3PA']
df['FGM'] = (df['2PA'] * df['2P%']) + (df['3PA'] * df['3P%'])

# Step 2: Calculate EFF
df['EFF'] = (df['PpG'] + df['RpG'] + df['ApG'] + df['SpG'] + df['BpG'] - 
             ((df['FGA'] - df['FGM']) + (df['FTA'] - df['FTM']) + df['TOpG']))

# Step 3: Sort players by EFF
df_sorted = df.sort_values(by='EFF', ascending=False)

# Display the sorted dataframe
df_sorted[['NAME', 'EFF']]

Unnamed: 0,NAME,EFF
11,Anthony Davis,41.002
146,Daeqwon Plowden,12.000
149,Moses Brown,11.594
337,Mitchell Robinson,11.000
314,Ben Simmons,7.591
...,...,...
28,Trae Young,-622.136
40,Jalen Green,-624.342
14,Cade Cunningham,-626.171
5,Jayson Tatum,-658.110


## Task 2: Find the skyline players using PTS, REB, and AST (10%)
- To find a skyline with a naive method, you may simply iterate the data by a nested for loop;

````
Skyline:={}
for point p in Dataset:
    if p is not dominated by any point in Skyline:
        Add p to Skyline;
        Eliminate points in the Skyline that are dominated by p;
````

In [4]:
def is_dominated(p1, p2):
    return all(p1[dim] <= p2[dim] for dim in ['PpG', 'RpG', 'ApG']) and any(p1[dim] < p2[dim] for dim in ['PpG', 'RpG', 'ApG'])

skyline = []

for _, player in df.iterrows():
    dominated = False
    to_remove = []
    for s in skyline:
        if is_dominated(player, s):
            dominated = True
            break
        if is_dominated(s, player):
            to_remove.append(s)
    if not dominated:
        skyline.append(player)
        for s in to_remove:
            skyline.remove(s)

skyline_df = pd.DataFrame(skyline)
skyline_df[['NAME', 'PpG', 'RpG', 'ApG']]

Unnamed: 0,NAME,PpG,RpG,ApG
0,Shai Gilgeous-Alexander,32.3,5.2,6.1
1,Giannis Antetokounmpo,30.9,12.1,5.9
2,Nikola Jokic,29.1,12.7,10.5
11,Anthony Davis,26.0,16.0,7.0
28,Trae Young,23.8,3.1,11.4


## Task 3: Find the skyline players using PTS, REB, AST, STL, and BLK (5%)

In [5]:
def is_dominated_5d(p1, p2):
    return all(p1[dim] <= p2[dim] for dim in ['PpG', 'RpG', 'ApG', 'SpG', 'BpG']) and any(p1[dim] < p2[dim] for dim in ['PpG', 'RpG', 'ApG', 'SpG', 'BpG'])

skyline_5d = []

for _, player in df.iterrows():
    dominated = False
    to_remove = []
    for s in skyline_5d:
        if is_dominated_5d(player, s):
            dominated = True
            break
        if is_dominated_5d(s, player):
            to_remove.append(s)
    if not dominated:
        skyline_5d.append(player)
        for s in to_remove:
            skyline_5d.remove(s)

skyline_5d_df = pd.DataFrame(skyline_5d)
skyline_5d_df[['NAME', 'PpG', 'RpG', 'ApG', 'SpG', 'BpG']]

Unnamed: 0,NAME,PpG,RpG,ApG,SpG,BpG
0,Shai Gilgeous-Alexander,32.3,5.2,6.1,1.9,1.0
1,Giannis Antetokounmpo,30.9,12.1,5.9,0.7,1.3
2,Nikola Jokic,29.1,12.7,10.5,1.8,0.7
3,Luka Doncic,28.1,8.3,7.8,2.0,0.4
7,Kevin Durant,26.7,5.9,4.3,0.9,1.3
11,Anthony Davis,26.0,16.0,7.0,0.0,3.0
12,Anthony Davis,25.7,11.9,3.4,1.3,2.1
14,Cade Cunningham,25.2,6.2,9.4,1.0,0.8
19,Karl-Anthony Towns,24.6,13.3,3.2,1.0,0.8
20,Victor Wembanyama,24.3,11.0,3.7,1.1,3.8


## Task 4: Identifying the top 5 players based on domination count in 5 dimensions, PTS, REB, AST, STL, and BLK. (10%)
- The domination count is defined as the number of points that are dominated by a point;
- For instance, there are 3 players of 3 dimensional values (higher is better): [10,20,30], [5,7,8] and [4,10,11]. The domination count for [10,20,30] is 2 and the domination count for [4,10,11] is 0.

In [6]:
def domination_count(player, df):
    count = 0
    for _, other in df.iterrows():
        if all(player[dim] >= other[dim] for dim in ['PpG', 'RpG', 'ApG', 'SpG', 'BpG']) and any(player[dim] > other[dim] for dim in ['PpG', 'RpG', 'ApG', 'SpG', 'BpG']):
            count += 1
    return count

df['domination_count'] = df.apply(lambda player: domination_count(player, df), axis=1)
top_5_players = df.sort_values(by='domination_count', ascending=False).head(5)
top_5_players[['NAME', 'PpG', 'RpG', 'ApG', 'SpG', 'BpG', 'domination_count']]

Unnamed: 0,NAME,PpG,RpG,ApG,SpG,BpG,domination_count
2,Nikola Jokic,29.1,12.7,10.5,1.8,0.7,509
49,Scottie Barnes,20.0,7.8,6.2,1.4,1.1,486
58,Jalen Johnson,18.9,10.0,5.0,1.6,1.0,483
22,Zion Williamson,24.3,7.5,5.0,1.3,1.0,475
0,Shai Gilgeous-Alexander,32.3,5.2,6.1,1.9,1.0,471


## Task 5: Revisiting the implementation (10%)
- Without constructing the multi-dimensional index, do you have any ideas for improving the implementation in Task 2 and Task 3? Please explain your approach and the reasoning behind it.

In [7]:
import numpy as np

def pareto_front(df, dimensions):
    is_dominated = np.zeros(len(df), dtype=bool)
    for i, player in df.iterrows():
        if not is_dominated[i]:
            is_dominated |= (df[dimensions] <= player[dimensions]).all(axis=1) & (df[dimensions] < player[dimensions]).any(axis=1)
            is_dominated[i] = False
    return df[~is_dominated]

# Task 2: Find the skyline players using PTS, REB, and AST
skyline_3d = pareto_front(df, ['PpG', 'RpG', 'ApG'])
skyline_3d[['NAME', 'PpG', 'RpG', 'ApG']]

# Task 3: Find the skyline players using PTS, REB, AST, STL, and BLK
skyline_5d = pareto_front(df, ['PpG', 'RpG', 'ApG', 'SpG', 'BpG'])
skyline_5d[['NAME', 'PpG', 'RpG', 'ApG', 'SpG', 'BpG']]

Unnamed: 0,NAME,PpG,RpG,ApG,SpG,BpG
0,Shai Gilgeous-Alexander,32.3,5.2,6.1,1.9,1.0
1,Giannis Antetokounmpo,30.9,12.1,5.9,0.7,1.3
2,Nikola Jokic,29.1,12.7,10.5,1.8,0.7
3,Luka Doncic,28.1,8.3,7.8,2.0,0.4
7,Kevin Durant,26.7,5.9,4.3,0.9,1.3
11,Anthony Davis,26.0,16.0,7.0,0.0,3.0
12,Anthony Davis,25.7,11.9,3.4,1.3,2.1
14,Cade Cunningham,25.2,6.2,9.4,1.0,0.8
19,Karl-Anthony Towns,24.6,13.3,3.2,1.0,0.8
20,Victor Wembanyama,24.3,11.0,3.7,1.1,3.8
