## Data Analytics - Assignment 1: API Data Acquisition & Webscraping
Welcome to your first data adventure! We're kicking off the course by diving straight into how to extract real-world data from powerful APIs.

**Your Mission: Let's see how to use the Google Places API and Google Routes API to gather data on hiking trails around Columbia University!**

You will use this data to perform a simple route optimization challenge.

### Prerequisite: Obtaining and Enabling Your Google Maps Platform API Key
Before running this assignment, you must obtain a valid API key and enable the required services.

1. Get a Key:

    * Navigate to the Google Cloud Console (Google Maps Platform).

    * Create a new project or select an existing one.

    * Go to Credentials and click + Create Credentials to generate a new API key.

2. Enable Services:

    * In the APIs & Services dashboard, ensure that the following two APIs are Enabled for your project:

        * Places API ([Places API Documentation](https://developers.google.com/maps/documentation/places/web-service/op-overview))

        * Routes API ([Routes API Documentation](https://developers.google.com/maps/documentation/routes/compute-route-over))
    * Please also familiarize yourself with the API documents above as you will need it for future steps!

3. Secure Input:
    * Run the cell below and paste your key when prompted. This uses `getpass` to prevent your key from being saved into the notebook's file history.

### Step 0: Imports and Setup
This section sets up your environment, securely loads your API key, and defines the constants for the analysis.

In [None]:
import requests
import json
from typing import List, Dict, Any
from datetime import datetime, timezone
from time import sleep
from random import random
import getpass

# SECURE API KEY INPUT
# Run this cell, and a prompt will appear for your key. The input will be masked.
API_KEY = getpass.getpass("Enter your Google Maps Platform API Key: ")

# Origin Point: Columbia University - Mudd Building, NYC
ORIGIN_COORDINATES = {"latitude": 40.8084, "longitude": -73.9632}
QUERY_STRING = "hiking trails near Columbia University with a rating of 4.5 or higher"

Enter your Google Maps Platform API Key: ··········


### Step 1: Discover Trails using Places API
Complete the below function to use the [Places API Text Search](https://developers.google.com/maps/documentation/places/web-service/text-search) (places:searchText endpoint) to discover a list of highly-rated hiking trails within a 50km radius of the defined ORIGIN_COORDINATES.

 You must use an **HTTP POST** request and include an `X-Goog-FieldMask` header set to only request the fields needed: `places.displayName`, `places.id`, and `places.rating`.

In [None]:
def get_hiking_options(api_key: str, query: str, location: Dict[str, float]) -> List[Dict[str, Any]]:
    """
    Uses the Places API (New) Text Search to find relevant hiking options.

    The function should query based on the QUERY_STRING and return a list of
    dictionaries, where each dictionary contains: 'name', 'place_id', and 'rating'.
    """

    url = "https://places.googleapis.com/v1/places:searchText"

    # PLEASE FILL IN YOUR CODE HERE:

    # SOLUTION:

    if not api_key:
        print("API key is missing. Please enter it in the secure input cell above.")
        return []

    # Request body for the POST request
    payload = {
        "textQuery": query,
        "locationBias": {
            "circle": {
                "center": location,
                "radius": 50000.0 # 50km radius
            }
        },
        "maxResultCount": 10,
        "minRating": 4.5
    }

    # Headers including the mandatory Field Mask
    headers = {
        "Content-Type": "application/json",
        "X-Goog-Api-Key": api_key,
        "X-Goog-FieldMask": "places.displayName,places.id,places.rating"
    }

    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()
        data = response.json()

        if not data.get('places'):
            print("No places found for the query.")
            return []

        hiking_options = []
        for place in data['places']:
            hiking_options.append({
                'name': place.get('displayName', {}).get('text'),
                'place_id': place.get('id'),
                'rating': place.get('rating')
            })
        return hiking_options

    except requests.exceptions.RequestException as e:
        print(f"API Request Error: {e}")
        return []

In [None]:
# SOLUTION - SIMULATED OUTPUT
DESTINATION_PLACES = [
    {'name': 'Empire State Trail Battery Park Trailhead', 'place_id': 'ChIJb0_V_lyJbwokRGe8sVkR2', 'rating': 5.0},
    {'name': 'Hiking Path under Old Mill Creek Bridge', 'place_id': 'ChIJ_aybiLZdwokRqWt6-3t1Xw', 'rating': 5.0},
    {'name': 'Pelham Bay Park', 'place_id': 'ChIJXQ7S4r3jwokRY-4u1N0nQ', 'rating': 4.6},
    {'name': 'Staten Island Greenbelt', 'place_id': 'ChIJVU4N1fTpwokR5sY77r-W0', 'rating': 4.7},
]
# Print your final list of hiking options (name, place_id, rating) and its length here.
print(f"Length of output: {len(DESTINATION_PLACES)}")
for place in DESTINATION_PLACES:
    print(place)

## Part 2: Pure Webscraping

# Scraping NBA.com

In this assignment, you will scrape data from [New York Knicks Team Page](https://www.nba.com/team/1610612752/knicks). The goal of the exercise is to get all player performance data from the New York Knicks team and find out who are the top three players based on the most Points per Game (PPG).

The end result is to write a function: *`get_players(max_retrieve=None)`* that will return a list of **dicts**. Each **dict** should correspond to a player and should contain the following key-value pairs:

- **name**: Player name **[str]**
- **position**: Player position (e.g., Guard/Center/Forward) **[str]**
  - When a player has multiple preferred positions, prioritize the one mentioned first. For example:
    - If a player lists 'Center-Forward,' consider their primary position as 'Center.'
    - If a player lists 'Forward-Guard,' return 'Forward.'
- **experience**: Years of experience of the player **[int]**
  - If the player is a rookie or has 'R' listed for experience, replace with 0.
  - If there is no information on player experience, assign None
- **PPG**: Points Per Game (PPG) **[float]**
  - If no information, assign a <b>0</b> (zero) value.
- **RPG**: Rebounds Per Game (RPG) **[float]**
  - If no information, assign a <b>0</b> (zero) value.
- **APG**: Assists Per Game (APG) **[float]**
  - If no information, assign a <b>0</b> (zero) value.
- **link**: A link to the player's page **[str]**

### Additional Requirement
- The `get_players` function should **only** be used to iterate through the list of player and get their *link* information, all other information (name, position, experience, PPG, RPG, APG) should be retrieved from the player's page using the `get_player_info` function.

- The function should accept an optional parameter `max_retrieve` **[int]**, which determines the maximum number of players to return, based on the order they appear on the website.
  - If `max_retrieve` = n, return only the top `n` players in the order they are listed on the website.
  - If `max_retrieve` is `None` (default), return all players.
  
- Throughout the assignment, you should use **each** of the following functions at least once:
  - find()
  - find_all()
  - find_all() or find() with a class that matches a regular expression pattern. For example : find_all('something', {'class': re.compile('somePatter')})

The goal is to scrape all player data and allow an option to limit the number of players returned based on their appearance order on the website.


### Process
- Retrieve the information for players' links.
- Iterate through the list and invoke the function `get_player_info(player_url)` for each player. **(Hint: You need to add `https://www.nba.com` in front of the player hyperlink.)**
- Accumulate the name, position, experience, acquire, points, rebounds, assists, and link for each player in the output_list.
- Get the top three players by sorting by (Points per Game) PPG.

<div class="alert alert-block alert-info">
<b>Attention:</b>
    
Please read and follow the instructions carefully to avoid point deduction.
    
You are encouraged to use class materials and online resources to help you with this assignment. However, copying code directly from Generative AI (ChatGPT, Llama, etc.) or coding websites (Stack Overflow, GitHub, etc.) is strictly forbidden. We TAs have used these tools to generate answers for this assignment, so we WILL know if you directly copy or plagiarize your code. If we suspect any dishonest conduct, we reserve the right to call you in during office hours for a code review. If you fail to explain your code, we reserve the right to give you a 0 for the assignment.

Feel free to email us or come to our office hours if you have any questions regarding this assignment.
</div>

### 1. Collecting player performance data

In [None]:
import requests
import re
from bs4 import BeautifulSoup

<div class="alert alert-block alert-info">
    <b> Attention: </b> You are not allowed to change the input parameters or the output format of this function. However, you may use helper functions if desired.
</div>

In [None]:
def get_players(max_retrieve = None):
    # Define the URL to the NBA players page
    url = "https://www.nba.com/team/1610612752/knicks"

    # Initialize an empty list to store player information
    output_list = list()

    ## YOUR CODE HERE


    return output_list


In [None]:
def get_player_info(player_url):
    ## YOUR CODE HERE


In [None]:
# Run this cell to get the data
data = get_players()
data

In [None]:
# Running the above cell should return (note: the results may vary over time since the website is always updated)
"""
[{'name': 'Pacome Dadiet',
  'position': 'Forward',
  'experience': 0,
  'PPG': 0.0,
  'RPG': 0.0,
  'APG': 0.0,
  'link': 'https://www.nba.com/player/1642359/pacome-dadiet/'},
 {'name': 'Tyler Kolek',
  'position': 'Guard',
  'experience': 0,
  'PPG': 0.0,
  'RPG': 0.0,
  'APG': 0.0,
  'link': 'https://www.nba.com/player/1642278/tyler-kolek/'},
 {'name': 'Ariel Hukporti',
  'position': 'Center',
  'experience': 0,
  'PPG': 0.0,
  'RPG': 0.0,
  'APG': 0.0,
  'link': 'https://www.nba.com/player/1630574/ariel-hukporti/'},
 {'name': 'Kevin McCullar Jr.',
  'position': 'Guard',
  'experience': 0,
  'PPG': 0.0,
  'RPG': 0.0,
  'APG': 0.0,
  'link': 'https://www.nba.com/player/1641755/kevin-mccullar-jr/'},
 {'name': 'Donte DiVincenzo',
  'position': 'Guard',
  'experience': 6,
  'PPG': 15.5,
  'RPG': 3.7,
  'APG': 2.7,
  'link': 'https://www.nba.com/player/1628978/donte-divincenzo/'},
 {'name': 'Jacob Toppin',
  'position': 'Forward',
  'experience': 1,
  'PPG': 1.4,
  'RPG': 0.8,
  'APG': 0.3,
  'link': 'https://www.nba.com/player/1631210/jacob-toppin/'},
 ...]
"""

In [None]:
# Set max_retrieve to 6 and check the len of your output
len(get_players(max_retrieve = 6))

### 2. Who's the top 3 player getting the most points per game (PPG)?
- Get the top 3 players with highest PPG in New York Knicks
    - Sample output: [('Jalen Brunson', 28.7),('Julius Randle', 24.0),('Mikal Bridges', 19.6)]
    - Note: this sample is intended to provide an idea of the structure of the output, but it should not be used as a reference for the correct answer, as information may change over time.

In [None]:
## YOUR CODE HERE

<h3>Hint: How to sort dicts by value for a specific key? How to get selected element in dicts?</h3>

In [None]:
x = [{'letter1':'a','number':23.2,'letter2':'b'},{'letter1':'c','number':17.4,'letter2':'f'},{'letter1':'d','number':29.2,'letter2':'z'},{'letter1':'e','number':1.74,'letter2':'bb'}]
#Sort by the letter1 key in the dict

x.sort(key=lambda a: a['letter1'])
x

In [None]:
x = [{'letter1':'a','number':23.2,'letter2':'b'},{'letter1':'c','number':17.4,'letter2':'f'},{'letter1':'d','number':29.2,'letter2':'z'},{'letter1':'e','number':1.74,'letter2':'bb'}]
# Sort by the number key in the dict

x.sort(key=lambda a: a['number'])
x

In [None]:
x = [{'letter1':'a','number':23.2,'letter2':'b'},{'letter1':'c','number':17.4,'letter2':'f'},{'letter1':'d','number':29.2,'letter2':'z'},{'letter1':'e','number':1.74,'letter2':'bb'}]

[(sub['letter1'],sub['number']) for sub in x]