<h1>Scraping NBA.com</h1>
In this assignment, you will scrape data from https://www.nba.com/players. The goal of the exercise is to get 50 player performance data from the NBA official and find out who's the top five players based on the most Points per Game (PPG).

The end result is to write a function: <i>get_players()</i> that will return a list of tuples. Each tuple should correspond to a player and should contain the following data:
<li>1. player name <b>[str]</b>
<li>2. team name (full name, e.g: Miami Heat, Toronto Raptors, etc.) <b>[str]</b>
<li>3. player position (e.g. Guard/Center/Forward) <b>[str]</b> -- When a player has multiple preferred positions, prioritize the one mentioned first. For example, if a player lists 'Center-Forward,' consider their primary position as 'Center,' and if they list 'Forward-Guard,' you should return 'Forward'."
<li>4. Points Per Game (PPG) <b>[float]</b>
<li>5. Rebounds Per Game (RPG) <b>[float]</b>
<li>6.  Assists Per Game (APG) <b>[float]</b>
<li>7. a link to the player's page <b>[str]</b>

<h3>Process</h3>
<li>Retrieve the information for players' names and links.
<li>Iterate through the list and invoke the function get_player_info(player_url) for each player.
<li>Accumulate the name, team, position, points, rebounds, assists, and link for each player in the output_list.
<li>Get top five players by sorting the (Points per Game) PPG

<b>Notes:</b>
<li>Note that you need to prioritize the position mentioned first (e.g., change "Center-Forward" to "Center").
<li>If there is no information about players' team name or position, assign a <b>None</b> value.
<li>If there is no information about the player PPG/RPG/APG, assign a <b>0</b> (zero) value.
<li>You only need to retrieve information for the first 50 players from the initial page of the NBA website.

### 1. Collecting 50 player performance data

In [3]:
import requests
from bs4 import BeautifulSoup
from w3lib.html import remove_tags
import re

In [4]:






def get_players(): 
    players = []
    url = "https://www.nba.com/players"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    player_rows = soup.find("table", class_="players-list").find("tbody").find_all("tr")
    for player_row in player_rows[:50]:
        player_name_2 = player_row.find("div", class_="RosterRow_playerName__G28lg").text.strip()
        player_name_1 = str(player_name_2)
        for i, char in enumerate(player_name_1):
            if char.isupper() and i > 0:
                first_name = player_name_1[:i]
                last_name = player_name_1[i:]
                player_name_1 = first_name + " " + last_name
                player_name = str(player_name_1)
        #print(player_name)
        player_link = player_row.find("a", class_="Anchor_anchor__cSc3P RosterRow_playerLink__qw1vG")["href"]
 
        #if player_link is not None:
            #player_info_1 = get_player_info("http://www.nba.com" + player_link)
 
       # else:             
            #player_info_1 = None  

            
        try:
            player_info_1 = get_player_info("http://www.nba.com" + player_link)
        except Exception as e:
            player_info_1 = [None, None, 0, 0, 0]   
          
        player_tuple = (player_name, player_info_1[0], player_info_1[1], player_info_1[2],player_info_1[3], player_info_1[4], "http://www.nba.com" + player_link)
        players.append(player_tuple)
        
        
    return players
 
player_list_1 = get_players()
player_list = list(player_list_1)
player_list


 
 

[('Precious Achiuwa',
  'Toronto Raptors',
  'Forward',
  9.2,
  6.0,
  0.9,
  'http://www.nba.com/player/1630173/precious-achiuwa/'),
 ('Steven Adams',
  'Memphis Grizzlies',
  'Center',
  8.6,
  11.5,
  2.3,
  'http://www.nba.com/player/203500/steven-adams/'),
 ('Bam Adebayo',
  'Miami Heat',
  'Center-Forward',
  20.4,
  9.2,
  3.2,
  'http://www.nba.com/player/1628389/bam-adebayo/'),
 ('Ochai Agbaji',
  'Utah Jazz',
  'Guard',
  7.9,
  2.1,
  1.1,
  'http://www.nba.com/player/1630534/ochai-agbaji/'),
 ('Santi Aldama',
  'Memphis Grizzlies',
  'Forward-Center',
  9.0,
  4.8,
  1.3,
  'http://www.nba.com/player/1630583/santi-aldama/'),
 ('Nickeil Alexander -Walker',
  'Minnesota Timberwolves',
  'Guard',
  6.2,
  1.7,
  1.8,
  'http://www.nba.com/player/1629638/nickeil-alexander-walker/'),
 ('Grayson Allen',
  'Phoenix Suns',
  'Guard',
  10.4,
  3.3,
  2.3,
  'http://www.nba.com/player/1628960/grayson-allen/'),
 ('Jarrett Allen',
  'Cleveland Cavaliers',
  'Center',
  14.3,
  9.8,
 

In [5]:
def get_player_info(player_url):
    team_name = None
    position = None
    ppg = 0
    rpg = 0
    apg = 0
    player_info = []
    
    response = requests.get(player_url)
    player_info = BeautifulSoup(response.content, 'lxml')
    
    # get team name & position
    info = player_info.find('p', {"class": re.compile('.*PlayerSummary_mainInnerInfo*.')}).get_text()
    info_string = info.split('|')
    team_name = info_string[0].strip()
    position = info_string[-1].strip()
    
    score_list = []
    scores =  player_info.find_all('p', {"class": re.compile('.*PlayerSummary_playerStatValue*.')})
    for score in scores:
        score_list.append(score.get_text())
    
    if score_list[0] != '--': ppg = float(score_list[0])
    if score_list[1] != '--': rpg = float(score_list[1])
    if score_list[2] != '--': apg = float(score_list[2])
 
 
    player_info = [team_name, position, ppg, rpg, apg]
    
    return player_info

In [6]:
# Run this cell to get the data
data = get_players()
data

[('Precious Achiuwa',
  'Toronto Raptors',
  'Forward',
  9.2,
  6.0,
  0.9,
  'http://www.nba.com/player/1630173/precious-achiuwa/'),
 ('Steven Adams',
  'Memphis Grizzlies',
  'Center',
  8.6,
  11.5,
  2.3,
  'http://www.nba.com/player/203500/steven-adams/'),
 ('Bam Adebayo',
  'Miami Heat',
  'Center-Forward',
  20.4,
  9.2,
  3.2,
  'http://www.nba.com/player/1628389/bam-adebayo/'),
 ('Ochai Agbaji',
  'Utah Jazz',
  'Guard',
  7.9,
  2.1,
  1.1,
  'http://www.nba.com/player/1630534/ochai-agbaji/'),
 ('Santi Aldama',
  'Memphis Grizzlies',
  'Forward-Center',
  9.0,
  4.8,
  1.3,
  'http://www.nba.com/player/1630583/santi-aldama/'),
 ('Nickeil Alexander -Walker',
  'Minnesota Timberwolves',
  'Guard',
  6.2,
  1.7,
  1.8,
  'http://www.nba.com/player/1629638/nickeil-alexander-walker/'),
 ('Grayson Allen',
  'Phoenix Suns',
  'Guard',
  10.4,
  3.3,
  2.3,
  'http://www.nba.com/player/1628960/grayson-allen/'),
 ('Jarrett Allen',
  'Cleveland Cavaliers',
  'Center',
  14.3,
  9.8,
 

In [None]:
# Running the above cell should return (note: the results may vary over time since the website is always updated)
"""
[('Precious Achiuwa',
  'Toronto Raptors',
  'Forward',
  9.2,
  6.0,
  0.9,
  'https://www.nba.com/player/1630173/precious-achiuwa/'),
 ('Steven Adams',
  'Memphis Grizzlies',
  'Center',
  8.6,
  11.5,
  2.3,
  'https://www.nba.com/player/203500/steven-adams/'),
 ('Bam Adebayo',
  'Miami Heat',
  'Center',
  20.4,
  9.2,
  3.2,
  'https://www.nba.com/player/1628389/bam-adebayo/'),
 ('Ochai Agbaji',
  'Utah Jazz',
  'Guard',
  7.9,
  2.1,
  1.1,
  'https://www.nba.com/player/1630534/ochai-agbaji/'),
 ('Santi Aldama',
  'Memphis Grizzlies',
  'Forward',
  9.0,
  4.8,
  1.3,
  'https://www.nba.com/player/1630583/santi-aldama/'),
 ('Nickeil Alexander-Walker',
  'Minnesota Timberwolves',
  'Guard',
  6.2,
  1.7,
  1.8,
  'https://www.nba.com/player/1629638/nickeil-alexander-walker/'),
 ('Angelo Allegri',
  None,
  None,
  0,
  0,
  0,
  'https://www.nba.com/player/1641874/angelo-allegri/'),
 ...]
"""

In [None]:
# Check the output length
len(data) #Should return 50

### 2. Who's getting the most points per game (PPG)? 

In [None]:
player_list.sort(key = lambda player_list: player_list[3], reverse=True)
result = [(sub[0],sub[3]) for sub in player_list][0:5]
result

<h3>Hint: How to sort tuples by an arbitrary element? How to get selected element in tuples?</h3>

In [1]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]
#Sort by the first element of the tuple

x.sort(key=lambda a: a[0])
x

[('a', 23.2, 'b'), ('c', 17.4, 'f'), ('d', 29.2, 'z'), ('e', 1.74, 'bb')]

In [2]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]
#Sort by the element at position 1

x.sort(key=lambda a: a[1])
x

[('e', 1.74, 'bb'), ('c', 17.4, 'f'), ('a', 23.2, 'b'), ('d', 29.2, 'z')]

In [3]:
x = [('a',23.2,'b'),('c',17.4,'f'),('d',29.2,'z'),('e',1.74,'bb')]

[(sub[0],sub[1]) for sub in x]

[('a', 23.2), ('c', 17.4), ('d', 29.2), ('e', 1.74)]