<h1>Vidhi Agrawal- Scraping NBA.com</h1>
Scrape data from https://www.nba.com/players. The goal is to get 50 player performance data from the NBA official and find out who's the top five players based on the most Points per Game (PPG).

The end result is to write a function: <i>get_players()</i> that will return a list of tuples. Each tuple should correspond to a player and should contain the following data:
<li>1. player name <b>[str]</b>
<li>2. team name (full name, e.g: Miami Heat, Toronto Raptors, etc.) <b>[str]</b>
<li>3. player position (e.g. Guard/Center/Forward) <b>[str]</b> -- When a player has multiple preferred positions, prioritize the one mentioned first. For example, if a player lists 'Center-Forward,' consider their primary position as 'Center,' and if they list 'Forward-Guard,' you should return 'Forward'."
<li>4. Points Per Game (PPG) <b>[float]</b>
<li>5. Rebounds Per Game (RPG) <b>[float]</b>
<li>6.  Assists Per Game (APG) <b>[float]</b>
<li>7. a link to the player's page <b>[str]</b>

<h3>Process</h3>
<li>Retrieve the information for players' names and links.
<li>Iterate through the list and invoke the function get_player_info(player_url) for each player.
<li>Accumulate the name, team, position, points, rebounds, assists, and link for each player in the output_list.
<li>Get top five players by sorting the (Points per Game) PPG

<b>Notes:</b>
<li>Note that you need to prioritize the position mentioned first (e.g., change "Center-Forward" to "Center").
<li>If there is no information about players' team name or position, assign a <b>None</b> value.
<li>If there is no information about the player PPG/RPG/APG, assign a <b>0</b> (zero) value.
<li>You only need to retrieve information for the first 50 players from the initial page of the NBA website.

### 1. Collecting 50 player performance data

In [7]:
import requests
import re
from bs4 import BeautifulSoup
import operator
import itertools

In [8]:
def get_players():
    # Define the URL to the NBA players page
    url = "https://www.nba.com/players"
    response = requests.get(url)
    
    #Check the Status Code
    if not response.status_code == 200:
        print('Failed')
        return None
    try:
        results_page = BeautifulSoup(response.content,'lxml')
        
        #Get the list of all players
        players = results_page.find('div',{"class":re.compile('PlayerList_playerTable__Jno0k')}).find_all('a', class_='Anchor_anchor__cSc3P RosterRow_playerLink__qw1vG')
        player_list=[]
        
        #Get required information for every playey
        for player in players:
            
                #Get player link
                player_link = player.get('href')
                
                #Get player name
                player_name = player.find('div',{"class":'RosterRow_playerName__G28lg'}).get_text(' ')
                
                #Get other player information like team, position, ppg, rpg, apg by calling the get_player_info func
                player_att=get_player_info('https://www.nba.com/'+player_link)
                playr_team, playr_styl, playr_ppg, playr_rpg, playr_apg=player_att
                
                #Append all information for each playet
                player_list.append((player_name,playr_team, playr_styl, float(playr_ppg), float(playr_rpg), float(playr_apg) ,'https://www.nba.com/'+player_link))

        return player_list
    except:
        print('Error Occured')
        return None

In [9]:
def get_player_info(player_url):
    player_dict=dict()
    
    #Check Status Code, assign default to 'Content Unavailable'
    response = requests.get(player_url)
    if not response.status_code == 200:
        return([None, None,0,0,0])
    try:
        results_page = BeautifulSoup(response.content,'lxml')
        
        #Get player team and position
        player_info=results_page.find('div',{"class":'PlayerSummary_mainInnerBio__JQkoj'}).get_text('|')
        
        #Get ppg, lpg, apg
        player_stats= results_page.find_all('div',{"class":'PlayerSummary_playerStat__rmEOP'})
        pstats=[]
        
        #Check if stats exist or not and append the 3 values accordingly
        for ps in player_stats:
            ps_v=ps.find('p',{"class":'PlayerSummary_playerStatValue___EDg_'}).get_text()
            if ps_v != '--':
                pstats.append(ps_v)
            else:
                pstats.append(0)
        #Get only the team name and position and strip extra spaces
        player_attr= player_info.split('|')
        get_player_attr = operator.itemgetter(0, 2)
        player_req_attr=get_player_attr(player_attr)
        player_req_attr= [s.strip() for s in player_req_attr]
        player_req_attr.extend(pstats[0:3])
        
        #Replace 2 positions with the 1st one for ex: Center-Forward with Center
        if '-' in player_req_attr[1]:
            player_req_attr[1]=player_req_attr[1].split('-')[0]
        return(player_req_attr)
    except:
        print('Error Occured')
        return None

In [10]:
# Run this cell to get the data
data = get_players()
data

[('Precious Achiuwa',
  'Toronto Raptors',
  'Forward',
  9.2,
  6.0,
  0.9,
  'https://www.nba.com//player/1630173/precious-achiuwa/'),
 ('Steven Adams',
  'Memphis Grizzlies',
  'Center',
  8.6,
  11.5,
  2.3,
  'https://www.nba.com//player/203500/steven-adams/'),
 ('Bam Adebayo',
  'Miami Heat',
  'Center',
  20.4,
  9.2,
  3.2,
  'https://www.nba.com//player/1628389/bam-adebayo/'),
 ('Ochai Agbaji',
  'Utah Jazz',
  'Guard',
  7.9,
  2.1,
  1.1,
  'https://www.nba.com//player/1630534/ochai-agbaji/'),
 ('Santi Aldama',
  'Memphis Grizzlies',
  'Forward',
  9.0,
  4.8,
  1.3,
  'https://www.nba.com//player/1630583/santi-aldama/'),
 ('Nickeil Alexander-Walker',
  'Minnesota Timberwolves',
  'Guard',
  6.2,
  1.7,
  1.8,
  'https://www.nba.com//player/1629638/nickeil-alexander-walker/'),
 ('Grayson Allen',
  'Phoenix Suns',
  'Guard',
  10.4,
  3.3,
  2.3,
  'https://www.nba.com//player/1628960/grayson-allen/'),
 ('Jarrett Allen',
  'Cleveland Cavaliers',
  'Center',
  14.3,
  9.8,
  1

In [None]:
# Running the above cell should return (note: the results may vary over time since the website is always updated)
"""
[('Precious Achiuwa',
  'Toronto Raptors',
  'Forward',
  9.2,
  6.0,
  0.9,
  'https://www.nba.com/player/1630173/precious-achiuwa/'),
 ('Steven Adams',
  'Memphis Grizzlies',
  'Center',
  8.6,
  11.5,
  2.3,
  'https://www.nba.com/player/203500/steven-adams/'),
 ('Bam Adebayo',
  'Miami Heat',
  'Center',
  20.4,
  9.2,
  3.2,
  'https://www.nba.com/player/1628389/bam-adebayo/'),
 ('Ochai Agbaji',
  'Utah Jazz',
  'Guard',
  7.9,
  2.1,
  1.1,
  'https://www.nba.com/player/1630534/ochai-agbaji/'),
 ('Santi Aldama',
  'Memphis Grizzlies',
  'Forward',
  9.0,
  4.8,
  1.3,
  'https://www.nba.com/player/1630583/santi-aldama/'),
 ('Nickeil Alexander-Walker',
  'Minnesota Timberwolves',
  'Guard',
  6.2,
  1.7,
  1.8,
  'https://www.nba.com/player/1629638/nickeil-alexander-walker/'),
 ('Angelo Allegri',
  None,
  None,
  0,
  0,
  0,
  'https://www.nba.com/player/1641874/angelo-allegri/'),
 ...]
"""

In [12]:
# Check the output length
len(data) #Should return 50

50

### 2. Who's getting the most points per game (PPG)? 

In [13]:
# Top 5 players with highest PPG 
# Sample output: [('Giannis Antetokounmpo', 31.1),('LaMelo Ball', 23.3), ('Bradley Beal', 23.2), ('Desmond Bane', 21.5), ('Bam Adebayo', 20.4)])
# note: the results may vary over time 

#Create a copy to perform operations
data_cpy= data.copy()

#Sort on the 3rd element which is ppg-points per game
data_cpy.sort(key=lambda a: a[3], reverse=True)
top_5=[]

#Get the name and ppg for the top 5 players by ppg
for ele in data_cpy[0:5]:
    top_5.append((ele[0], ele[3]))
top_5

[('Giannis Antetokounmpo', 31.1),
 ('LaMelo Ball', 23.3),
 ('Bradley Beal', 23.2),
 ('Desmond Bane', 21.5),
 ('Bam Adebayo', 20.4)]

<h3>Hint: How to sort tuples by an arbitrary element? How to get selected element in tuples?</h3>

In [14]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]
#Sort by the first element of the tuple

x.sort(key=lambda a: a[0])
x

[('a', 23.2, 'b'), ('c', 17.4, 'f'), ('d', 29.2, 'z'), ('e', 1.74, 'bb')]

In [15]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]
#Sort by the element at position 1

x.sort(key=lambda a: a[1])
x

[('e', 1.74, 'bb'), ('c', 17.4, 'f'), ('a', 23.2, 'b'), ('d', 29.2, 'z')]

In [16]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]

[(sub[0],sub[1]) for sub in x]

[('a', 23.2), ('c', 17.4), ('d', 29.2), ('e', 1.74)]