Notes on future development w/ `cfbd` API #7

cooperjaXC · 2023-06-21T16:38:02Z

This is the only place I know where I can leave a note-to-self quickly.

From ChatGPT on rewriting this package with the cfbd Python API:

Here's an example code snippet that retrieves the most recent AP Top 25 college football standings using the cfbd Python package and stores the results in a pandas dataframe:

import pandas as pd
from cfbd import AP_TOP_25

# Retrieve the AP Top 25 college football standings
standings = AP_TOP_25()

# Create empty lists to store the data
teams = []
votes = []
conferences = []

# Iterate through the standings and extract the required information
for team in standings:
    teams.append(team.team.full_name)
    votes.append(team.votes)
    conferences.append(team.team.conference)

# Create a pandas dataframe using the collected data
df = pd.DataFrame({'Team': teams, 'Votes': votes, 'Conference': conferences})

# Print the dataframe
print(df)

Make sure you have the cfbd package installed (pip install cfbd) before running the code. The code utilizes the AP_TOP_25() function from cfbd to retrieve the AP Top 25 standings. It then iterates through the standings, extracts the team name, votes received, and conference for each team, and stores them in separate lists. Finally, it creates a pandas dataframe using the collected data and prints the resulting dataframe.

Please note that the cfbd package relies on external APIs and the availability of data. Ensure you have an active internet connection to retrieve the latest standings.

The text was updated successfully, but these errors were encountered:

cooperjaXC · 2023-07-26T02:25:04Z

This may be a good API: https://api.collegefootballdata.com/api/docs/?url=/api-docs.json#/. One would have to request a key here though: https://collegefootballdata.com/key.

cooperjaXC · 2023-08-10T15:24:01Z

I believe this API has a GitHub link: https://github.com/CFBD/cfb-api. They also have more scripts in their repo: https://github.com/CFBD. Python specifics of this API here: https://github.com/CFBD/cfbd-python. This may be a good starting point for coding up the v2 of AP_XC.

cooperjaXC · 2023-09-07T03:04:03Z

`cfbd` python API Update

As noted in eda4b69, the cfbd python API does have historical AP poll data that is accessible back to 1939 (s/o '39 Texas A&M Aggies). It is updated quickly: 2 days after the Week 2 AP poll was released, the data came through on this API. May return faster in the future as well.

Limitation

However, a bit issue with this is that it seems to only have the top 25 for every poll, not the "others receiving votes." The CFB XC methodology is not un-workable without these other teams, but the 25 number is an arbitrary cutoff; there are other teams that rank 26 to deep in the 40s, especially in early season polls, that help flesh out finishers for 5-team races.

ESPN API Alternative

There may be another alternative available. ESPN has "hidden API endpoints," according to this gist document by @akeaswaran. The rankings API does have an "others" item that may be the ticket to success. It is also updated quickly: 2 days after the Week 2 AP poll was released, the data came through on this API. May return faster in the future as well. The rankings API specifically is at http://site.api.espn.com/apis/site/v2/sports/football/college-football/rankings.

Limitation

This seems to only be for the current season. So while it seems current, it may be limited to non-historical seasons, which would be a real bummer.

Solution

See if the API URL can be tweaked to induce a historical season. The public URL the v1 of this code was built on uses these for BeautifulSoup to scrape. The formatting of the URL is accounted for in the follwoing code. ESPN formats its rankings URL like so: https://www.espn.com/college-football/rankings/_/week/1/year/2023/seasontype/2

ap-cfb-xc/PollGrabber.py

Lines 67 to 128 in bcbf8ff

    
           def apweeklyurlgenerator( 
        
               date_list 
        
           ):  # date_list = result of dateprocessing() that's in list format (week, year) 
        
               """ Generate a URL link for a specific week of AP Rankings. Preseason = week 1 """ 
        
               week = str(date_list[0]) 
        
               year = str(date_list[1]) 
        
               staticespn = ( 
        
                   r"http://www.espn.com/college-football/rankings/_/week/6/year/2018/seasontype/2" 
        
               ) 
        
               currentespnap = r"http://www.espn.com/college-football/rankings" 
        
               # defaultlink = currentespnap 
        
               oldurl1 = r"http://www.espn.com/college-football/rankings/_/week/" 
        
               # Should be the default URL: 
        
               aponlylinkespn2 = r"http://www.espn.com/college-football/rankings/_/poll/1/week/" 
        
               defaultlink = aponlylinkespn2 
        
               # finallist = ["final", "f", "complete", "total", "last"] 
        
               # 
        
               # currentlist = ["current", "present", "default"] 
        
               # # Format the year correctly 
        
               # year = str(year) 
        
               # if len(year) != 4: 
        
               #     if len(year) == 2 and (year[0] == "1" or year[0] == "0"): 
        
               #         # Assume the entry was an abreviation of a year. Add the 20__ before it. 
        
               #         year = "20" + str(year) 
        
               # 
        
               # # Week formatting 
        
               # # Preseason? 
        
               # week = str(week) 
        
               # if week.lower() in prelist: 
        
               #     week = "1" 
        
               # # If the week entered is higher than 16, assume user wants final rankings. 
        
               # try: 
        
               #     if int(week) > 16: 
        
               #         week = "final" 
        
               # except: 
        
               #     pass 
        
               # Generate the URL 
        
               # Is the week entered indicating the final week? 
        
               if week.lower() == "final":  # in finallist: 
        
                   oldfinalurlexample = "http://www.espn.com/college-football/rankings/_/week/1/year/2017/seasontype/3" 
        
                   week1 = "1/year/" 
        
                   seasontype = "/seasontype/3" 
        
                   url = defaultlink + week1 + year + seasontype 
        
               # Check for entries wanting the most up-to-date rankings 
        
               elif week.lower() == "current":  # in currentlist: 
        
                   # just use the default link 
        
                   url = defaultlink  # default link 
        
               # # Commented out b/c we want the user to get the results they want and not be confused by getting the current week 
        
               # #     when they wanted another week. This will error out to let them know that. 
        
               # elif week is None: 
        
               #     # just use the default link by passing 
        
               #     pass 
        
               else: 
        
                   url2 = r"/year/" 
        
                   url3 = r"/seasontype/2" 
        
                   url = defaultlink + str(week) + url2 + year + url3 
        
               print("Week", week, ",", year, "season") 
        
               return url 
        
               # Should be the default URL: r"http://www.espn.com/college-football/rankings/_/poll/1/"

akeaswaran · 2023-09-07T03:20:50Z

Looks like you tagged me in here along with my gist (possibly accidentally!), but I dug around very quickly on the ESPN API endpoint you pulled out from there for rankings and found this: http://sports.core.api.espn.com/v2/sports/football/leagues/college-football/seasons/2023/types/2/weeks/1/rankings/1?lang=en&region=us. You can change the seasons and weeks values appropriately to get historical data, and based on other projects I've done with the API, you can get data on bowl/playoff games with a types value of 3 and weeks value of 1.

A few notes:

Looks like a rankings value 1 in the URL is the AP poll -- you can manipulate this value for different polls if you want them.
You won't need to use BeautifulSoup for this either -- just a requests.get() to grab the JSON and the json package to parse it.
You'll have to make GET requests to the URLs provided in the $ref key of each record in the ranks array to get detailed team information.
It does seem like there is receiving-votes data in there for your needs under the others array.

Hope this helps!

…m 'cfbd' (back) to ESPN.

cooperjaXC · 2023-09-26T01:29:56Z

@akeaswaran, thank you very much for the head start on the ESPN API! It's most helpful.

I am slowly working on a new script using it in an ESPN feature branch that incorporates your notes.

cooperjaXC · 2023-09-26T02:37:04Z

The espn_cfb_api repo may have some good examples for how to efficiently use the ESPN API for python and retrieve conference information. The main script, espn_cfb_api.py, has useful information as to how to access conference information:

The big drawback is that it is only for the current year, so there is no direct application. This will need to serve as inspiration rather than a plug-and-play source.

cooperjaXC · 2023-10-05T13:45:09Z

Explore using pydantic for API calls. It can help you with request and response validation, data serialization, and making your API calls more predictable and type-safe.

When you're receiving data from a website using requests.get() and you want to parse and validate the response data, Pydantic can still be a valuable tool. Although .json() is convenient for parsing JSON data, Pydantic can provide additional benefits such as data validation and type checking.

Here's how you can use Pydantic to read and validate the response from a website:

Define a Pydantic Model: Create a Pydantic model that represents the structure of the expected response data.

from pydantic import BaseModel

class WebsiteResponse(BaseModel):
    status: int
    content: str

In this example, we're defining a WebsiteResponse model with two fields: status (for the HTTP status code) and content (for the response content).

Make the Request and Deserialize: Use requests.get() to fetch data from the website and then deserialize the response using your Pydantic model.

import requests
from your_module import WebsiteResponse  # Import your Pydantic model

response = requests.get('https://example.com/some_endpoint')

if response.status_code == 200:
    website_data = WebsiteResponse(status=response.status_code, content=response.text)
else:
    website_data = WebsiteResponse(status=response.status_code, content='')

# Now website_data is a validated Pydantic model

Here, we're creating a website_data instance of the WebsiteResponse model and populating it with data from the response. If the response status code is not 200, we provide an empty string for the content field. You can handle error cases according to your needs.

Access Validated Data: Once you have website_data, you can access its fields like any other Python object:

print(website_data.status)   # Access the status code
print(website_data.content)  # Access the content

Validation and Error Handling: Pydantic will automatically validate that the data in website_data matches the structure defined in the model. If the response data doesn't match, a ValidationError will be raised, which you can catch and handle.

…#8 and implement enhancements detailed in #7.

cooperjaXC · 2024-02-19T23:14:33Z

Commit 6e09e0c represents a working version that replaces BeautifulSoup HTML parsing with sourcing from the ESPN API.

cooperjaXC added the help wanted Extra attention is needed label Jun 21, 2023

cooperjaXC self-assigned this Jun 21, 2023

cooperjaXC added the enhancement New feature or request label Jul 19, 2023

cooperjaXC mentioned this issue Jul 19, 2023

Beautiful Soup ESPN Scraping Outdated #8

Open

cooperjaXC mentioned this issue Aug 10, 2023

No README #10

Open

cooperjaXC added a commit that referenced this issue Sep 8, 2023

Add script to process rankings based on the ESPN API. Pivot on #7 fro…

652c5bc

…m 'cfbd' (back) to ESPN.

cooperjaXC added a commit that referenced this issue Feb 19, 2024

Add ESPN API-based method to replace BeautifulSoup. Works to fix Issue …

6e09e0c

…#8 and implement enhancements detailed in #7.

cooperjaXC mentioned this issue Feb 19, 2024

v2.0 Release #18

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on future development w/ `cfbd` API #7

Notes on future development w/ `cfbd` API #7

cooperjaXC commented Jun 21, 2023 •

edited

Loading

cooperjaXC commented Jul 26, 2023

cooperjaXC commented Aug 10, 2023

cooperjaXC commented Sep 7, 2023

akeaswaran commented Sep 7, 2023 •

edited

Loading

cooperjaXC commented Sep 26, 2023

cooperjaXC commented Sep 26, 2023

cooperjaXC commented Oct 5, 2023

cooperjaXC commented Feb 19, 2024

Notes on future development w/ cfbd API #7

Notes on future development w/ cfbd API #7

Comments

cooperjaXC commented Jun 21, 2023 • edited Loading

cooperjaXC commented Jul 26, 2023

cooperjaXC commented Aug 10, 2023

cooperjaXC commented Sep 7, 2023

cfbd python API Update

Limitation

ESPN API Alternative

Limitation

Solution

akeaswaran commented Sep 7, 2023 • edited Loading

cooperjaXC commented Sep 26, 2023

cooperjaXC commented Sep 26, 2023

cooperjaXC commented Oct 5, 2023

cooperjaXC commented Feb 19, 2024

Notes on future development w/ `cfbd` API #7

Notes on future development w/ `cfbd` API #7

cooperjaXC commented Jun 21, 2023 •

edited

Loading

`cfbd` python API Update

akeaswaran commented Sep 7, 2023 •

edited

Loading