Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes on future development w/ cfbd API #7

Open
cooperjaXC opened this issue Jun 21, 2023 · 8 comments
Open

Notes on future development w/ cfbd API #7

cooperjaXC opened this issue Jun 21, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@cooperjaXC
Copy link
Owner

cooperjaXC commented Jun 21, 2023

This is the only place I know where I can leave a note-to-self quickly.

From ChatGPT on rewriting this package with the cfbd Python API:

Here's an example code snippet that retrieves the most recent AP Top 25 college football standings using the cfbd Python package and stores the results in a pandas dataframe:

import pandas as pd
from cfbd import AP_TOP_25

# Retrieve the AP Top 25 college football standings
standings = AP_TOP_25()

# Create empty lists to store the data
teams = []
votes = []
conferences = []

# Iterate through the standings and extract the required information
for team in standings:
    teams.append(team.team.full_name)
    votes.append(team.votes)
    conferences.append(team.team.conference)

# Create a pandas dataframe using the collected data
df = pd.DataFrame({'Team': teams, 'Votes': votes, 'Conference': conferences})

# Print the dataframe
print(df)

Make sure you have the cfbd package installed (pip install cfbd) before running the code. The code utilizes the AP_TOP_25() function from cfbd to retrieve the AP Top 25 standings. It then iterates through the standings, extracts the team name, votes received, and conference for each team, and stores them in separate lists. Finally, it creates a pandas dataframe using the collected data and prints the resulting dataframe.

Please note that the cfbd package relies on external APIs and the availability of data. Ensure you have an active internet connection to retrieve the latest standings.

@cooperjaXC cooperjaXC added the help wanted Extra attention is needed label Jun 21, 2023
@cooperjaXC cooperjaXC self-assigned this Jun 21, 2023
@cooperjaXC cooperjaXC added the enhancement New feature or request label Jul 19, 2023
@cooperjaXC
Copy link
Owner Author

This may be a good API: https://api.collegefootballdata.com/api/docs/?url=/api-docs.json#/. One would have to request a key here though: https://collegefootballdata.com/key.

@cooperjaXC
Copy link
Owner Author

I believe this API has a GitHub link: https://github.com/CFBD/cfb-api. They also have more scripts in their repo: https://github.com/CFBD. Python specifics of this API here: https://github.com/CFBD/cfbd-python. This may be a good starting point for coding up the v2 of AP_XC.

@cooperjaXC cooperjaXC mentioned this issue Aug 10, 2023
@cooperjaXC
Copy link
Owner Author

cfbd python API Update

As noted in eda4b69, the cfbd python API does have historical AP poll data that is accessible back to 1939 (s/o '39 Texas A&M Aggies). It is updated quickly: 2 days after the Week 2 AP poll was released, the data came through on this API. May return faster in the future as well.

Limitation

However, a bit issue with this is that it seems to only have the top 25 for every poll, not the "others receiving votes." The CFB XC methodology is not un-workable without these other teams, but the 25 number is an arbitrary cutoff; there are other teams that rank 26 to deep in the 40s, especially in early season polls, that help flesh out finishers for 5-team races.

ESPN API Alternative

There may be another alternative available. ESPN has "hidden API endpoints," according to this gist document by @akeaswaran. The rankings API does have an "others" item that may be the ticket to success. It is also updated quickly: 2 days after the Week 2 AP poll was released, the data came through on this API. May return faster in the future as well. The rankings API specifically is at http://site.api.espn.com/apis/site/v2/sports/football/college-football/rankings.

Limitation

This seems to only be for the current season. So while it seems current, it may be limited to non-historical seasons, which would be a real bummer.

Solution

See if the API URL can be tweaked to induce a historical season. The public URL the v1 of this code was built on uses these for BeautifulSoup to scrape. The formatting of the URL is accounted for in the follwoing code. ESPN formats its rankings URL like so: https://www.espn.com/college-football/rankings/_/week/1/year/2023/seasontype/2

ap-cfb-xc/PollGrabber.py

Lines 67 to 128 in bcbf8ff

def apweeklyurlgenerator(
date_list
): # date_list = result of dateprocessing() that's in list format (week, year)
""" Generate a URL link for a specific week of AP Rankings. Preseason = week 1 """
week = str(date_list[0])
year = str(date_list[1])
staticespn = (
r"http://www.espn.com/college-football/rankings/_/week/6/year/2018/seasontype/2"
)
currentespnap = r"http://www.espn.com/college-football/rankings"
# defaultlink = currentespnap
oldurl1 = r"http://www.espn.com/college-football/rankings/_/week/"
# Should be the default URL:
aponlylinkespn2 = r"http://www.espn.com/college-football/rankings/_/poll/1/week/"
defaultlink = aponlylinkespn2
# finallist = ["final", "f", "complete", "total", "last"]
#
# currentlist = ["current", "present", "default"]
# # Format the year correctly
# year = str(year)
# if len(year) != 4:
# if len(year) == 2 and (year[0] == "1" or year[0] == "0"):
# # Assume the entry was an abreviation of a year. Add the 20__ before it.
# year = "20" + str(year)
#
# # Week formatting
# # Preseason?
# week = str(week)
# if week.lower() in prelist:
# week = "1"
# # If the week entered is higher than 16, assume user wants final rankings.
# try:
# if int(week) > 16:
# week = "final"
# except:
# pass
# Generate the URL
# Is the week entered indicating the final week?
if week.lower() == "final": # in finallist:
oldfinalurlexample = "http://www.espn.com/college-football/rankings/_/week/1/year/2017/seasontype/3"
week1 = "1/year/"
seasontype = "/seasontype/3"
url = defaultlink + week1 + year + seasontype
# Check for entries wanting the most up-to-date rankings
elif week.lower() == "current": # in currentlist:
# just use the default link
url = defaultlink # default link
# # Commented out b/c we want the user to get the results they want and not be confused by getting the current week
# # when they wanted another week. This will error out to let them know that.
# elif week is None:
# # just use the default link by passing
# pass
else:
url2 = r"/year/"
url3 = r"/seasontype/2"
url = defaultlink + str(week) + url2 + year + url3
print("Week", week, ",", year, "season")
return url
# Should be the default URL: r"http://www.espn.com/college-football/rankings/_/poll/1/"

@akeaswaran
Copy link

akeaswaran commented Sep 7, 2023

Looks like you tagged me in here along with my gist (possibly accidentally!), but I dug around very quickly on the ESPN API endpoint you pulled out from there for rankings and found this: http://sports.core.api.espn.com/v2/sports/football/leagues/college-football/seasons/2023/types/2/weeks/1/rankings/1?lang=en&region=us. You can change the seasons and weeks values appropriately to get historical data, and based on other projects I've done with the API, you can get data on bowl/playoff games with a types value of 3 and weeks value of 1.

A few notes:

  • Looks like a rankings value 1 in the URL is the AP poll -- you can manipulate this value for different polls if you want them.
  • You won't need to use BeautifulSoup for this either -- just a requests.get() to grab the JSON and the json package to parse it.
  • You'll have to make GET requests to the URLs provided in the $ref key of each record in the ranks array to get detailed team information.
  • It does seem like there is receiving-votes data in there for your needs under the others array.

Hope this helps!

cooperjaXC added a commit that referenced this issue Sep 8, 2023
@cooperjaXC
Copy link
Owner Author

@akeaswaran, thank you very much for the head start on the ESPN API! It's most helpful.

I am slowly working on a new script using it in an ESPN feature branch that incorporates your notes.

@cooperjaXC
Copy link
Owner Author

The espn_cfb_api repo may have some good examples for how to efficiently use the ESPN API for python and retrieve conference information. The main script, espn_cfb_api.py, has useful information as to how to access conference information:

The big drawback is that it is only for the current year, so there is no direct application. This will need to serve as inspiration rather than a plug-and-play source.

@cooperjaXC
Copy link
Owner Author

Explore using pydantic for API calls. It can help you with request and response validation, data serialization, and making your API calls more predictable and type-safe.

When you're receiving data from a website using requests.get() and you want to parse and validate the response data, Pydantic can still be a valuable tool. Although .json() is convenient for parsing JSON data, Pydantic can provide additional benefits such as data validation and type checking.

Here's how you can use Pydantic to read and validate the response from a website:

  1. Define a Pydantic Model: Create a Pydantic model that represents the structure of the expected response data.
from pydantic import BaseModel

class WebsiteResponse(BaseModel):
    status: int
    content: str

In this example, we're defining a WebsiteResponse model with two fields: status (for the HTTP status code) and content (for the response content).

  1. Make the Request and Deserialize: Use requests.get() to fetch data from the website and then deserialize the response using your Pydantic model.
import requests
from your_module import WebsiteResponse  # Import your Pydantic model

response = requests.get('https://example.com/some_endpoint')

if response.status_code == 200:
    website_data = WebsiteResponse(status=response.status_code, content=response.text)
else:
    website_data = WebsiteResponse(status=response.status_code, content='')

# Now website_data is a validated Pydantic model

Here, we're creating a website_data instance of the WebsiteResponse model and populating it with data from the response. If the response status code is not 200, we provide an empty string for the content field. You can handle error cases according to your needs.

  1. Access Validated Data: Once you have website_data, you can access its fields like any other Python object:
print(website_data.status)   # Access the status code
print(website_data.content)  # Access the content
  1. Validation and Error Handling: Pydantic will automatically validate that the data in website_data matches the structure defined in the model. If the response data doesn't match, a ValidationError will be raised, which you can catch and handle.

cooperjaXC added a commit that referenced this issue Feb 19, 2024
@cooperjaXC
Copy link
Owner Author

Commit 6e09e0c represents a working version that replaces BeautifulSoup HTML parsing with sourcing from the ESPN API.

@cooperjaXC cooperjaXC mentioned this issue Feb 19, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants