# 1. Data collection

### Output/Deliverables:
- For each exercise, a notebook in the folder notebooks/ must be commited
- Create one reusable (shared across exercises) python function called paginate_api_calls that will handle the pagination of the API. - It will accept two arguments (url, parameters)
- Create one reusable python function called write_json_file that will write the responses to disk
- Exercise 1: One JSON file containing all teams in the folder /data/teams. Each row in this file will represent one API response. You can find an example in sample/data/teams.json
- Exercise 2: One JSON file containing all games in the folder /data/games. Each row in this file will represent one API response. You can find an example in sample/data/games.json
- Exercise 3: One JSON file containing all players in the folder /data/players. Each row in this file will represent one API response. You can find an example in sample/data/players.json
- Pagination must be dynamic. i.e. control pagination until the end.
- Git repository must not contain any JSON/CSV files

### Key Points:
- Pagination. Each API call contains metadata that you can use to control it. I recommend to use the attribute next_page 'meta': {'total_pages': 2, 'current_page': 1, 'next_page': 2, 'per_page': 30, 'total_count': 45}
- Reducing the number of API calls to the maximum. There is a parameter to increase the number of items retrieved in each API call. By modifying it, you can reduce the total number of API Calls
- API Search/Query: Read the docs carefully

In [120]:
# INSTALL LIBRARIES
# ==============================================================================
# !pip install request
# !pip install pandas
# !pip install numpy
# !pip install json
# !pip install jupyter

In [1]:
# IMPORT LIBRARIES

# request
# ==============================================================================
import requests

# Data processing
# ==============================================================================
import pandas as pd
import numpy as np
import json

#Allows us to display more than one output per cell
# ==============================================================================
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all" 

# Display all columns
# ==============================================================================
pd.options.display.max_columns = None

`` Create one reusable (shared across exercises) python function called paginate_api_calls that will handle the pagination of the API. It will accept two arguments (url, parameters) ``

In [2]:
# The above code is creating a function that is going to be used to paginate the API calls.

def paginate_api_calls(url, params): # You must give the url and the params as a dictionary

    params.setdefault("page",1) #

 # To call the API
    response = requests.get(url, params)
    status_code = response.status_code
    reason_status = response.reason
    
    if "meta" in response.json().keys(): # The API call can contain metadata as meta or not.

        meta = response.json()["meta"]
        next_page= meta["next_page"]
        results=[]

        # This is a while loop that is going to iterate through the pages of the API.
        while next_page != None :
            
            try:
                response = requests.get(url, params, timeout=1, verify=True)
                results.append(response.json())
                next_page = response.json()["meta"]["next_page"]
            
                params["page"] += 1      
        
            except requests.exceptions.HTTPError as errh:
                print("HTTP Error")
                print(errh.args[0])
            except requests.exceptions.ReadTimeout as errrt:
                print("Time out")
            except requests.exceptions.ConnectionError as conerr:
                print("Connection error")
            except requests.exceptions.RequestException as errex:
                print("Exception request")

        return results

    # This is a condition that is going to be used if the API call does not have metadata for pagination.
    else:
       
        print("There is not page parameter")
        results =response.json()

        return results

In [3]:
# To obtein information abour the status of the API call

def check_api_call(url,params):

 # To call the API
    response = requests.get(url, params)
    status_code = response.status_code
    reason_status = response.reason
    
    meta = response.json()["meta"]
    next_page= meta["next_page"]
    results=[]

        # This is a while loop that is going to iterate through the pages of the API.
    while next_page != None :

        response = requests.get(url, params)
        status_code = response.status_code
        reason_status = response.reason
        results.append(response.json())
        next_page = response.json()["meta"]["next_page"]
        
        params["page"] += 1

    return f'There are {meta["current_page"]} pages. The status of the request is {status_code} and the reason of the state {reason_status}.'

``Create one reusable python function called write_json_file that will write the responses to disk``

In [23]:
# The above code is taking the results from the API call and writing them to a json file.
def write_json_file(results, file_name): # You must give the result of the API call and the ubi and name of the file that will be created as a string

    for i in results:

        if len(results) <= 1:
            print("There is only one page")

        else:
            call = json.dumps(i)
            with open(file_name+".json", "a") as outfile:
                outfile.write(call + "\n" )

    return "The following file has been created :" + file_name+".json"

## Exercise 1.1: Get all teams. 
- https://www.balldontlie.io/home.html#get-all-teams
- Bear in mind pagination. It is possible to pass additional arguments to the API call.

In [5]:
# url and params to the API call
TEAMS_URL = 'https://www.balldontlie.io/api/v1/teams'
teams_params = {} # It must be a dictionary

In [6]:
# Function to the API call
teams = paginate_api_calls(TEAMS_URL, teams_params)

In [7]:
check_api_call(TEAMS_URL,teams_params)

'There are 3 pages. The status of the request is 200 and the reason of the state OK.'

In [8]:
teams

[{'data': [{'id': 1,
    'abbreviation': 'ATL',
    'city': 'Atlanta',
    'conference': 'East',
    'division': 'Southeast',
    'full_name': 'Atlanta Hawks',
    'name': 'Hawks'},
   {'id': 2,
    'abbreviation': 'BOS',
    'city': 'Boston',
    'conference': 'East',
    'division': 'Atlantic',
    'full_name': 'Boston Celtics',
    'name': 'Celtics'},
   {'id': 3,
    'abbreviation': 'BKN',
    'city': 'Brooklyn',
    'conference': 'East',
    'division': 'Atlantic',
    'full_name': 'Brooklyn Nets',
    'name': 'Nets'},
   {'id': 4,
    'abbreviation': 'CHA',
    'city': 'Charlotte',
    'conference': 'East',
    'division': 'Southeast',
    'full_name': 'Charlotte Hornets',
    'name': 'Hornets'},
   {'id': 5,
    'abbreviation': 'CHI',
    'city': 'Chicago',
    'conference': 'East',
    'division': 'Central',
    'full_name': 'Chicago Bulls',
    'name': 'Bulls'},
   {'id': 6,
    'abbreviation': 'CLE',
    'city': 'Cleveland',
    'conference': 'East',
    'division': 'Central'

In [9]:
all_teams = write_json_file(teams,"../data/teams/teams")

In [10]:
all_teams

'The following file has been created :../data/teams/teams.json'

## Exercise 1.2
- It is required to obtain all games of the 1991-1992 season. No need to rename/remove fields from the response.

In [11]:
# url and params to the API call
GAMES_URL = 'https://www.balldontlie.io/api/v1/games'
games_params= {'seasons[]': 1991, "per_page":100} # It must be a dictionary

In [12]:
# Function to the API call
games = paginate_api_calls(GAMES_URL, games_params)

In [13]:
check_api_call(GAMES_URL,games_params)

'There are 13 pages. The status of the request is 200 and the reason of the state OK.'

In [14]:
games

[{'data': [{'id': 1164,
    'date': '1991-11-08T00:00:00.000Z',
    'home_team': {'id': 27,
     'abbreviation': 'SAS',
     'city': 'San Antonio',
     'conference': 'West',
     'division': 'Southwest',
     'full_name': 'San Antonio Spurs',
     'name': 'Spurs'},
    'home_team_score': 107,
    'period': 4,
    'postseason': False,
    'season': 1991,
    'status': 'Final',
    'time': ' ',
    'visitor_team': {'id': 6,
     'abbreviation': 'CLE',
     'city': 'Cleveland',
     'conference': 'East',
     'division': 'Central',
     'full_name': 'Cleveland Cavaliers',
     'name': 'Cavaliers'},
    'visitor_team_score': 101},
   {'id': 1165,
    'date': '1991-11-09T00:00:00.000Z',
    'home_team': {'id': 21,
     'abbreviation': 'OKC',
     'city': 'Oklahoma City',
     'conference': 'West',
     'division': 'Northwest',
     'full_name': 'Oklahoma City Thunder',
     'name': 'Thunder'},
    'home_team_score': 118,
    'period': 4,
    'postseason': False,
    'season': 1991,
    'st

In [15]:
all_games_1991 = write_json_file(games,"../data/games/games")

In [16]:
all_games_1991

'The following file has been created :../data/games/games.json'

## Exercise 1.3
- It is required to obtain all players. No need to rename/remove fields from the response.

In [17]:
# url and params to the API call
PLAYERS_URL = 'https://www.balldontlie.io/api/v1/players'
players_params= {"per_page":100} # It must be a dictionary

In [18]:
# Function to the API call
players = paginate_api_calls(PLAYERS_URL, players_params)

In [19]:
check_api_call(PLAYERS_URL,players_params)

'There are 53 pages. The status of the request is 200 and the reason of the state OK.'

In [20]:
players

[{'data': [{'id': 14,
    'first_name': 'Ike',
    'height_feet': None,
    'height_inches': None,
    'last_name': 'Anigbogu',
    'position': 'C',
    'team': {'id': 12,
     'abbreviation': 'IND',
     'city': 'Indiana',
     'conference': 'East',
     'division': 'Central',
     'full_name': 'Indiana Pacers',
     'name': 'Pacers'},
    'weight_pounds': None},
   {'id': 25,
    'first_name': 'Ron',
    'height_feet': None,
    'height_inches': None,
    'last_name': 'Baker',
    'position': 'G',
    'team': {'id': 20,
     'abbreviation': 'NYK',
     'city': 'New York',
     'conference': 'East',
     'division': 'Atlantic',
     'full_name': 'New York Knicks',
     'name': 'Knicks'},
    'weight_pounds': None},
   {'id': 47,
    'first_name': 'Jabari',
    'height_feet': None,
    'height_inches': None,
    'last_name': 'Bird',
    'position': 'G',
    'team': {'id': 2,
     'abbreviation': 'BOS',
     'city': 'Boston',
     'conference': 'East',
     'division': 'Atlantic',
     

In [21]:
all_players = write_json_file(players,"../data/players/players")

In [22]:
print(all_players)

The following file has been created :../data/players/players.json
