# Homework Assignment Instructions:

**Title: Web Scraping for SF 49ers Player Biography**

**Objective:** The goal of this assignment is to utilize the web scraping techniques discussed in class to collect bio information for each player from the San Francisco 49ers who will be participating in this year's Super Bowl game against the Kansas City Chiefs.

**Instructions:**

1. Commence your web scraping process from the official San Francisco 49ers team website roster page at https://www.49ers.com/team/players-roster/. This page contains an organized list of players participating in the current season.

2. Apply the principles learned in our class example (Keeping Tabs on Congrass example) to systematically navigate from the roster page to each player's personal page. Extract the bio information from these individual pages.

Information to Collect: Ensure that your web scraping process covers the following details we highlight for each player:
https://drive.google.com/file/d/1UOI_P0ofT-UosaurobXY-fiax5OCNWvI/view?usp=sharing

**Example Output:**

**Can be a text printout:**

Name:
Jason Verrett
Bio:



Verrett (5-10, 188) originally signed with the 49ers as a free agent on March 14, 2019. Over the previous five seasons (2019-23), he has appeared in 15 games (14 starts) and registered 65 tackles, seven passes defensed and two interceptions. In 2023, Verrett signed to the Houston Texans practice squad on October 10 and was released by the team on November 15.


Verrett was originally drafted by the San Diego Chargers in the first round (25th overall) of the 2014 NFL Draft. Throughout his six-year career with the Chargers (2014-18), he appeared in 25 games (22 starts) and registered 80 tackles, 19 passes defensed and five interceptions (one returned for a touchdown). In 2015, Verrett was selected to the Pro Bowl after appearing in 14 games (13 starts) and finishing with 47 tackles, to go along with career-highs in passes defensed (11), interceptions (three) and the first interception returned for a touchdown in his career.


A 30-year-old native of Fairfield, CA, Verrett attended Texas Christian University for three years (2011-13) after spending the 2010 season at Santa Rosa (CA) Junior College. With the Horned Frogs, he appeared in 37 games (34 starts) and registered 160 tackles, 35 passes defensed and nine interceptions. As a senior in 2013, he was named Big 12 Co-Defensive Player of the Year and earned First-Team All-Big 12 and Second-Team All-America honors.

....

**Can be a list:**

[{'Name': 'Jason Verrett', 'Bio': ['\n\n\nVerrett (5-10, 188) originally signed with the 49ers as a free agent on March 14, 2019. Over the previous five seasons (2019-23), he has appeared in 15 games (14 starts) and registered 65 tackles, seven passes defensed and two interceptions. In 2023, Verrett signed to the Houston Texans practice squad on October 10 and was released by the team on November 15.\n\n\nVerrett was originally drafted by the San Diego Chargers in the first round (25th overall) of the 2014 NFL Draft. Throughout his six-year career with the Chargers (2014-18), he appeared in 25 games (22 starts) and registered 80 tackles, 19 passes defensed and five interceptions (one returned for a touchdown). In 2015, Verrett was selected to the Pro Bowl after appearing in 14 games (13 starts) and finishing with 47 tackles, to go along with career-highs in passes defensed (11), interceptions (three) and the first interception returned for a touchdown in his career.\n\n\nA 30-year-old native of Fairfield, CA, Verrett attended Texas Christian University for three years (2011-13) after spending the 2010 season at Santa Rosa (CA) Junior College. With the Horned Frogs, he appeared in 37 games (34 starts) and registered 160 tackles, 35 passes defensed and nine interceptions. As a senior in 2013, he was named Big 12 Co-Defensive Player of the Year and earned First-Team All-Big 12 and Second-Team All-America honors.\n\n\n']}, ...]

**Can be a file output**

Like an Excel File, Txt file, or CSV file.






In [1]:
# !pip install requests beautifulsoup4

In [2]:
import requests
from bs4 import BeautifulSoup
import csv

def scrape_player_info(url):
    
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        player_info = {}
        
        # Biography str hold all biography content 
        biography_str = ""
        
        # try to scrape expanded info
        try:
            exp_bio = soup.find_all('div', class_='d3-l-col__col-10-centered')[1].prettify()
            bio_soup = BeautifulSoup(exp_bio, 'html.parser')
            for bio_item in bio_soup.find_all('li'):
                biography_str = biography_str + bio_item.text.strip()
        
        # if expanded bio info is not accessible then the player has no bio
        except IndexError:
            None
        
        # if all bio info isn't in expanded portion, get it
        except:
            bio = soup.find('div', class_='d3-l-col__col-10-centered')
            for bio_item in bio.find_all('li'):
                biography_str = biography_str + bio_item.text.strip()
            
        # edge case: handles cases where bio info is in neither of the areas searched above
        if biography_str == "":
            for div in soup.find_all('div', class_='nfl-c-body-part nfl-c-body-part--text'):
                if div.find('p'):
                    biography_str = biography_str + div.find('p').text.strip()

        # store scraped player info into a dict and pass back to main
        player_info['Name'] = soup.find('h1', class_='d3-o-media-object__title').text.strip()
        player_info['Biography'] = biography_str.strip()
        
        return player_info
    
    else:
        print("Failed to fetch page")
        return None

def main():
    
    roster_url = 'https://www.49ers.com/team/players-roster/'
    response = requests.get(roster_url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        
        active_players = soup.tbody.find_all(class_='nfl-o-roster__player-name')
        player_links = []
        
        for html_string in active_players:
            soup = BeautifulSoup(str(html_string), 'html.parser')
            player_name_tag = soup.find('a')
            if player_name_tag:
                player_name = player_name_tag.text.strip()
                href = player_name_tag['href']
                url_name = href.rsplit('/', 2)[-2]
                player_links.append(url_name)
        
        with open('49ers_players.csv', 'w', newline='', encoding='utf-8') as csvfile:
            fieldnames = ['Name', 'Biography']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            
            for player_link in player_links:
                player_url = roster_url + player_link
                player_info = scrape_player_info(player_url)
                if player_info:
                    writer.writerow(player_info)

    else:
        print("Failed to fetch roster page")

In [3]:
main()

indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
indexerr
