<a href="https://colab.research.google.com/github/apb9717/week1-data-feature/blob/main/IsMyGoatCracked.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Features: Is My Goat Cracked?

In this project, we created a data feature prototype to help users get information about the colleges their favorite NBA players attended.

To achieve this, we utilized two APIs:
- BallDontLie.io's basketball data API
- Department of Education's College Scorecard API

# Steps

1. User sets up necessary prerequisites and API keys.
2. User enters the name of a current or former NBA player.
3. We call the BallDontLie API to retrieve basic information on the player.
4. We call the Dept. of Education's College Scorecard API to retrieve relevant information on the player's college, if applicable.

# Setting Up the API Keys

We first need to set up each API key.

- For BallDontLie, you can retrieve a free API key at https://app.balldontlie.io/. Add it to the **Secrets** section on Google Colab as *BALLDONTLIE_API*.  
- For Dept. of Education, you can retrieve a free API key at https://collegescorecard.ed.gov/data/api-documentation/. Add it to the **Secrets** section on Google Colab as *COLLEGE_API*.

The first code block imports the necessary libraries and checks if the user has correctly added their API keys.


In [90]:
# setup API key handling in Colab
from google.colab import userdata
# import package necessary for API calls
import requests

# set values of variables
BALL_API_KEY = userdata.get('BALLDONTLIE_API')
COLLEGE_API_KEY = userdata.get('COLLEGE_API')

# checks for keys being correctly added
if BALL_API_KEY and COLLEGE_API_KEY:
  print("You're good to go!")
if not BALL_API_KEY:
  print ("You're missing the BallDontLie API key!")
if not COLLEGE_API_KEY:
  print ("You're missing the College Scorecard API key!")

You're good to go!


# Set up functions for BallDontLie / Player Data

The next code block establishes the get_player_data and get_player_team functions for later usage.

In [91]:
def get_player_data(api_key, full_name):

  #Split name entered by user into first and last name
  name_parts = full_name.split()

  #Ensure that the user did not only enter first name
  if len(name_parts) < 2:
    print("Please enter both first and last names.")
    return None

  #Separate first and last name to use API features
  first_name_input = name_parts[0].lower()

  #Account for players who have two last names
  last_name_input = ' '.join(name_parts[1:]).lower()

  #URL is given by API to search for a player with a specific first or last name
  URL = f'https://api.balldontlie.io/v1/players?search={first_name_input}'
  headers = {
      'Authorization': api_key
  }

  #Get response from API
  response = requests.get(URL, headers=headers)

  if response.status_code == 200:
    data = response.json()
    players = data.get('data', [])
    player_data = None
    for player in players:
      #Makes the first and last name lowercase to avoid capitalization errors
      first_name = player['first_name'].lower()
      last_name = player['last_name'].lower()
      if first_name == first_name_input and last_name == last_name_input:
        player_data = player
        break
    return player_data

  else:
      print(f'Error: {response.status_code}')
      return None

def get_player_team(player_data):
  team = player_data['team']
  team_name = team['full_name']
  return team_name


# Set up function for College Scorecard

This code block establishes the get_college_data function for retrieval of college data from the Dept. of Education API.

In [92]:
# function takes college name (provided by BallDontLie "college") and year (most often, draft year provided by BallDontLie)
def get_college_data(college_name, year):

  # establish data limitation of College Scorecard for pre-1997 data
  if(year < 1997):
    print("College data only from 1997 onwards.")
    return None

  # establish data limitation of College Scorecard for post-2022 data
  if(year > 2022):
    print("College data only up to 2022.")
    return None

  # makes actual call to return relevant college data: acceptance rate, school name, SAT score, enrollment, and
  # whether the campus is classified as the "main campus"
  URL = f'https://api.data.gov/ed/collegescorecard/v1/schools?api_key={COLLEGE_API_KEY}&school.name={college_name}&fields=school.name,{year}.admissions.admission_rate.overall,{year}.admissions.sat_scores.average.overall,{year}.student.size,school.main_campus'
  response = requests.get(URL)
  data = response.json()

  # many BallDontLie college names are very ambiguous, so this flow handles attempting to find the most accurate
  # and likely institution from the dataset in cases with multiple options returned.
  if data['metadata']['total'] > 1:
    for school in data['results']:
      school_name = school['school.name']
      school_main_campus = school['school.main_campus']
      # checks for exact matches with "University" appended at end or "University of" added to beginning
      if school_main_campus == 1:
        if (f"{college_name} University") == school_name:
          return school
        elif (f"University of {college_name}") == school_name:
          return school
    for school in data['results']:
      school_name = school['school.name']
      school_main_campus = school['school.main_campus']
      # checks for substring matches, in cases of ex. "University of Maryland College Park"
      if school_main_campus == 1:
        if (f"{college_name} University") in school_name:
          return school
        elif (f"University of {college_name}") in school_name:
          return school
      else:
        # essentially gives up and just looks for substring matches. can lead to inaccurate results, would be fixed
        # in cases of making this program more public or in a full release.
        if college_name in school_name:
          return school
  elif data['metadata']['total'] == 1:
    # returns the only entry in cases with 1 possibility
    return data['results'][0]
  else:
    # prints "not found" if no entries appear.
    print("College not found.")
    return None

# Making things happen!

This code block utilizes the different functions and puts them together to create a useful output.

In [94]:
#Prompt the user to enter first and last name of player of interest
full_name = input("Enter the player's full name (first and last): ")

#call function to retrieve player data from API
player_data = get_player_data(BALL_API_KEY, full_name)

#Should not parse data if player is not found
if player_data!= None:
  #Retrieve player specific stats
  player_team = get_player_team(player_data)
  player_position = player_data['position']
  college = player_data['college']
  draft_year = player_data['draft_year']
  draft_round = player_data['draft_round']
  #Capitalize first and last name to display to user
  first_name = player_data['first_name'].capitalize()
  last_name = player_data['last_name'].capitalize()

# sets a new variable's value in order to make draft round print grammatically correctly.
# note that the NBA draft only has two rounds.
draft_round_print = ""
if draft_round == 1:
  draft_round_print = "1st"
elif draft_round == 2:
  draft_round_print = "2nd"

if player_data:
    #Print data to user
    print(f"{first_name} {last_name} is a {player_position} for the {player_team}.")
    # checks for undrafted players
    if draft_round == None:
      print(f"He went undrafted and attended {college}.")
    else:
      print(f"He was drafted in the {draft_round_print} round of the {draft_year} NBA draft.")
      print(f"At the time he was drafted, he attended {college}.")
else:
    print("Player not found.")

# sets default value to 2022 in cases of no draft year being known
college_year = 2022
if draft_year != None:
  college_year = draft_year

# calls get_college_data function
college_data = get_college_data(college, college_year)

if college_data!= None:
  school_name = college_data['school.name']
  # checks for cases of missing data from the Scorecard API and sets values to "N/A"
  # i'm sure there's a more efficient way of doing this i'm sorry my python is rusty
  if college_data[f'{college_year}.student.size'] == None:
    enrollment = "N/A"
  else:
    enrollment = college_data[f'{college_year}.student.size']
  if college_data[f'{college_year}.admissions.admission_rate.overall'] == None:
    acceptance_rate = "N/A"
  else:
    acceptance_rate = round(100*college_data[f'{college_year}.admissions.admission_rate.overall'],2)
  if college_data[f'{college_year}.admissions.sat_scores.average.overall'] == None:
    sat_score = "N/A"
  else:
    sat_score = college_data[f'{college_year}.admissions.sat_scores.average.overall']
  # prints college data
  print(f"In {college_year}, {school_name} had a {acceptance_rate}% acceptance rate, an average SAT score of {sat_score}, and an enrollment of {enrollment} students.")


Enter the player's full name (first and last): john wall
John Wall is a G for the LA Clippers.
He was drafted in the 1st round of the 2010 NBA draft.
At the time he was drafted, he attended Kentucky.
In 2010, University of Kentucky had a 73.52% acceptance rate, an average SAT score of 1139, and an enrollment of 19526 students.
