**Penn State Football Recruit Map (2014-2024)**

The notebook below is a simplified example of how to scrape several years of data and plot it on a map solely using python. It collects data from a college football recruiting website for the years 2014 through 2023 plotting each hometown. The output can be stylized just as you would any other webmap however since that wasn't the focal point here it's kept generic only changeing the baselayer and circle color.

Make sure the required libraries are installed

In [67]:
!pip install --upgrade geopandas
!pip install --upgrade folium
!pip install contextily
!pip install matplotlib
!pip install mapclassify

Collecting mapclassify
  Downloading mapclassify-2.5.0-py3-none-any.whl (39 kB)
Collecting networkx
  Downloading networkx-3.0-py3-none-any.whl (2.0 MB)
     ---------------------------------------- 2.0/2.0 MB 7.2 MB/s eta 0:00:00
Collecting scipy>=1.0
  Downloading scipy-1.10.0-cp310-cp310-win_amd64.whl (42.5 MB)
     --------------------------------------- 42.5/42.5 MB 12.8 MB/s eta 0:00:00
Collecting scikit-learn
  Downloading scikit_learn-1.2.1-cp310-cp310-win_amd64.whl (8.3 MB)
     ---------------------------------------- 8.3/8.3 MB 19.7 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, scipy, networkx, scikit-learn, mapclassify
Successfully installed mapclassify-2.5.0 networkx-3.0 scikit-learn-1.2.1 scipy-1.10.0 threadpoolctl-3.1.0


Import the modules that will be utilized

In [3]:
import requests
import geopandas as gpd
from bs4 import BeautifulSoup

Loop through the years you'd like to collect the data for while scraping it along the way. Requests will grab all available data and Beautiful Soup will target specific sections that'll be parsed and added to a dictionary

In [53]:
INITIAL_CLASS = 2014
LAST_CLASS = 2024

uuid = 0
recruits = dict()

for yr in range(INITIAL_CLASS,LAST_CLASS):
    url = f'https://www.on3.com/college/penn-state-nittany-lions/football/{str(yr)}/commits/'
    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')

    ind_recruits = soup.find_all("div",{"class":"CommitListItem_playerWrapper__56t1h"})

    for r in range(len(ind_recruits)-1):
        # initialize dictionary for individal athletes
        keys = ['name','hometown','ranking','class_year']
        rinfo = {key: [] for key in keys}

        # generate unique id - could also use a generator function
        uuid += 1

        # don't include players who transferred out
        if "Transferred" in str(ind_recruits[r]):
            break

        # parse the soup of individual players
        name = str(ind_recruits[r])[str(ind_recruits[r]).find('alt="')+5:str(ind_recruits[r]).find(' Avatar')]
        hometown = str(ind_recruits[r])[str(ind_recruits[r]).find('(')+1:str(ind_recruits[r]).find(')')]

        # transfers don't have a ranking causing a value error as you can't cast none to float
        try:
            star = ind_recruits[r].find('span', {'class':'StarRating_overallRating__MTh52 StarRating_bolded__kr_6V StarRating_border__DffWl'})
            rank = str(star)[str(star).find('>')+1:str(star).rfind('<')]
            ranking = float(rank)
        except ValueError:
            ranking = None

        # assign values to dictionary keys - can be rolled up into the above section
        rinfo['name'] = name
        rinfo['hometown'] = hometown.replace('<!-- -->','')
        rinfo['ranking'] = ranking
        rinfo['class_year'] = yr

        # compile attributes into a dictionary with a unique id
        recruits[f'{uuid}'] = rinfo


Convert the dictionary of dictionaries to a dataframe. It'll come in with the indexes flipped so transpose it (.T) in order to get the uuid as the first column index

In [57]:
gdf = gpd.GeoDataFrame(recruits).T

In [58]:
gdf

Unnamed: 0,name,hometown,ranking,class_year
1,DeAndre Thompkins,"Swansboro, NC",92.72,2014
2,Chris Godwin,"Middletown, DE",92.52,2014
3,Saeed Blacknall,"Englishtown, NJ",92.43,2014
4,Mike Gesicki,"Manahawkin, NJ",91.44,2014
5,Michael O'Connor,"Bradenton, FL",91.4,2014
...,...,...,...,...
205,Alex Bacchetta,"Atlanta, GA",79.52,2022
206,Ken Talley,"Philadelphia, PA",89.64,2022
207,Chop Robinson,"Gaithersburg, MD",,2022
208,Mitchell Tinsley,"Lees Summit, MO",,2022


Geocode those results

In [59]:
geocodes = gpd.tools.geocode(gdf['hometown'],provider='nominatim',user_agent='cmaps_ex',timeout=1000)

Attach the spatial data to your GeoDataFrame

In [62]:
gdf = gdf.join(geocodes)

Check out the DataFrame again and notice that you now have spatial data associated with each record

In [65]:
gdf = gpd.GeoDataFrame(gdf)
gdf

Unnamed: 0,name,hometown,ranking,class_year,geometry,address
1,DeAndre Thompkins,"Swansboro, NC",92.72,2014,POINT (-77.12322 34.68967),"Swansboro, Onslow County, North Carolina, Unit..."
2,Chris Godwin,"Middletown, DE",92.52,2014,POINT (-71.29144 41.54566),"Middletown, Newport County, Rhode Island, 0284..."
3,Saeed Blacknall,"Englishtown, NJ",92.43,2014,POINT (-74.35820 40.29733),"Englishtown, Monmouth County, New Jersey, Unit..."
4,Mike Gesicki,"Manahawkin, NJ",91.44,2014,POINT (-74.24881 39.69070),"Manahawkin, Stafford Township, Ocean County, N..."
5,Michael O'Connor,"Bradenton, FL",91.4,2014,POINT (-82.57482 27.49893),"Bradenton, Manatee County, Florida, United States"
...,...,...,...,...,...,...
205,Alex Bacchetta,"Atlanta, GA",79.52,2022,POINT (-84.39026 33.74899),"Atlanta, Fulton County, Georgia, United States"
206,Ken Talley,"Philadelphia, PA",89.64,2022,POINT (-75.16353 39.95272),"Philadelphia, Philadelphia County, Pennsylvani..."
207,Chop Robinson,"Gaithersburg, MD",,2022,POINT (-77.19292 39.13992),"Gaithersburg, Montgomery County, Maryland, Uni..."
208,Mitchell Tinsley,"Lees Summit, MO",,2022,POINT (-94.21180 38.88562),"East Lone Jack Lees Summit Road, Lone Jack, Ja..."


Utilize Folium to see the results on an interactive map directly in the notebook

In [68]:
gdf.explore(
    min_lat=25,min_lon=-125,max_lat=50,max_lon=-66.5,
    max_bounds=True,
    zoom_start=4,
    tiles = "CartoDB DarkMatterNoLabels",
    style_kwds = dict(color="cyan", #stroke color
                      weight='1', #stoke width
                         ),
    )