# Web Scraping Practice

## Step-by-Step Google Chrome
1. Import the following packages
    - **import scrapy**
    - **from scrapy import selector**
    - **import requests**
2. Visit website [HERE](https://en.wikipedia.org/wiki/2024_United_States_men%27s_Olympic_basketball_team#:~:text=The%20men's%20national%20basketball%20team,gold%20medal%20for%20the%20Americans.)
    - Right click page and select "Inspect" - HTML code will appear
    - Can return the HTML code of a specific object on a web page via hovering over the object right clicking and selecting "Inspect" - will return HTML element of the object on the page
    - To copy the HTML code, right click select hover over **Copy**
        - Choose **Copy XPath**
3. Making a GET requests to the page url
    - **url = 'https://.....'**
    - **html = requests.get(url).content** --> will return HTML of the URL
4. Create Selector object sel to interact with the HTML pulled from the GET request
    - **sel = Selector(text = html)**
5. Paste the xpath notation from the site in the selector object and edit as necessary
    - sel.xpath('....')

In [1]:
# importing packages
import scrapy
from scrapy import Selector
import requests as r
import pandas as pd
import string as str
from datetime import datetime
from datetime import timedelta

In [2]:
# instantiating web scarper objects via GET request
url = "https://en.wikipedia.org/wiki/2024_United_States_men%27s_Olympic_basketball_team#:~:text=The%20men's%20national%20basketball%20team,gold%20medal%20for%20the%20Americans."
html = r.get(url).content #--> returns HTML code

In [3]:
# selector object
sel = Selector(text = html)

In [4]:
# Web Scraping description of the United States record in the 2024 olympics
print(sel.xpath('//*[@id="mw-content-text"]/div[1]/p[10]/text()[1]').extract()[0]) # --> selecting data from web page

The United States was 5–0 in exhibition games but did not look unbeatable.


In [5]:
# using xpath to extract the positions of the USA Olympic Men's Basketball Roster
positions = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td/span/a/text()').extract()

In [6]:
# using xpath to extract the jersey numbers of the USA Olympic Men's Basketball Roster
jersey_num = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td[2]/text()').extract()

In [7]:
# using xpath to extract the player names of the USA Olympic Men's Basketball Roster
players = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td[3]/a/text()').extract()

In [8]:
# using xpath to extract the birth dates of the USA Olympic Men's Basketball Roster
birth_dates = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td[4]/span/span/text()').extract()

In [9]:
# using xpath to extract the heigiht(m) of the USA Olympic Men's Basketball Roster
height_m = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td[5]/text()').extract()

In [10]:
# using xpath to extract the NBA Clubs/Teams of the USA Olympic Men's Basketball Roster
nba_team = sel.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[3]/td[1]/table/tbody/tr/td[6]/a/text()').extract()

In [11]:
# creating a dataframe from the scraped data
df = pd.DataFrame({'Player Positions':positions,
             'Jersey Numbers':jersey_num,
             'Players':players,
             'Birth Date':birth_dates,
             'Height(M)':height_m,
             'Club':nba_team})

In [12]:
# data cleaning
df['Jersey Numbers'] = df['Jersey Numbers'].astype('int64')
df['Height(M)'] = df['Height(M)'].astype('string')
df['Birth Date'] = pd.to_datetime(df['Birth Date'])

now = datetime.now() #--> creating date time object for age calculation
year = timedelta(days = 365.25) #--> creating year object to find age in years
df['Age'] = round((now - df['Birth Date'])/year)

In [13]:
#displaying df
df

Unnamed: 0,Player Positions,Jersey Numbers,Players,Birth Date,Height(M),Club,Age
0,PG,4,Stephen Curry,1988-03-14,1.91 m (6 ft 3 in),Golden State Warriors,36.0
1,SG,5,Anthony Edwards,2001-08-05,1.93 m (6 ft 4 in),Minnesota Timberwolves,23.0
2,F,6,LeBron James,1984-12-30,2.03 m (6 ft 8 in),Los Angeles Lakers,40.0
3,F,7,Kevin Durant,1988-09-29,2.11 m (6 ft 11 in),Phoenix Suns,36.0
4,G,8,Derrick White,1994-07-02,1.96 m (6 ft 5 in),Boston Celtics,30.0
5,PG,9,Tyrese Haliburton,2000-02-29,1.96 m (6 ft 5 in),Indiana Pacers,24.0
6,F,10,Jayson Tatum,1998-03-03,2.03 m (6 ft 8 in),Boston Celtics,26.0
7,C,11,Joel Embiid,1994-03-16,2.13 m (7 ft 0 in),Philadelphia 76ers,30.0
8,G,12,Jrue Holiday,1990-06-12,1.93 m (6 ft 4 in),Boston Celtics,34.0
9,F/C,13,Bam Adebayo,1997-07-18,2.06 m (6 ft 9 in),Miami Heat,27.0
