# Basketball Shot Event Identification v1
## Make/Miss or OTHER

##### In this notebook I will attempt to train a model to identify whether or not a 'Game Event' is a shot given video clips.

#### 1. Gather Clips
#### 2. Preprocess Clips into Trainable Data
#### 3. Build Dataset and Split into Test/Train
#### 4. Train the Model
#### 5. Verify Results

In [15]:
import cv2
import requests
import numpy as np
import os
from nba_api.stats.endpoints import leaguegamefinder

##### Lets take a subset of games. 
To do this we will reuse the code from small assignment 2.

In [18]:
def get_season_games(season_year='2019-20'):
    gamefinder = leaguegamefinder.LeagueGameFinder(season_nullable=season_year)
    games_df = gamefinder.get_data_frames()[0]
    return games_df['GAME_ID'].unique()

game_ids = get_season_games('2019-20')
game_ids

array(['0041900406', '0041900405', '0041900404', ..., '0011900002',
       '0011900004', '0011900001'], dtype=object)

In [21]:
subsetGMs=game_ids[525:530]

#### Retrieve Videos
During small assignment 5, I tested a solution for accessing nba clips.

In [29]:
headers = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'x-nba-stats-origin': 'stats',
    'x-nba-stats-token': 'true',
    'Connection': 'keep-alive',
    'Referer': 'https://stats.nba.com/',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache'
}

In [71]:
successful_video_events=[]
# video_count=0
for gid in subsetGMs:
    # if video_count:
    #     print (video_count, 'events during', gid)
    for i in range(200):
        if (i+1)%50 == 0:
            print(str((i+1)/2)+'% of', gid ,'complete')
        video_count=0
        event_id = i
        game_id = gid
        url = 'https://stats.nba.com/stats/videoeventsasset?GameEventID={}&GameID={}'.format(
            event_id, game_id)
        
        try:
            r = requests.get(url, headers=headers)
            json = r.json()
            
            # Accessing the keys safely
            video_urls = json['resultSets']['Meta'].get('videoUrls', [])
            playlist = json['resultSets'].get('playlist', [])
            
            if video_urls and playlist:  # Ensure there are items before accessing
                video_event = {'video': video_urls[0]['lurl'], 'desc': playlist[0]['dsc']}
                successful_video_events.append(video_event)
                # video_count+=1
            #     print(video_event)
            # else:
            #     print(f"No video URLs or playlist found for event ID {event_id}")
    
        except KeyError as e:
            print(f"KeyError encountered for event ID {event_id}: {e}")
        except Exception as e:
            print(f"An error occurred for event ID {event_id}: {e}")

25.0% of 0021900792 complete
50.0% of 0021900792 complete
75.0% of 0021900792 complete
100.0% of 0021900792 complete
25.0% of 2021900433 complete
50.0% of 2021900433 complete
75.0% of 2021900433 complete
100.0% of 2021900433 complete
25.0% of 0021900786 complete
50.0% of 0021900786 complete
75.0% of 0021900786 complete
100.0% of 0021900786 complete
25.0% of 0021900788 complete
50.0% of 0021900788 complete
75.0% of 0021900788 complete
100.0% of 0021900788 complete
25.0% of 0021900789 complete
50.0% of 0021900789 complete
75.0% of 0021900789 complete
100.0% of 0021900789 complete


In [73]:
print('before pruning',len(successful_video_events))

716

In [75]:
for i in successful_video_events:
    if i['video'] == None:
        successful_video_events.remove(i)

In [77]:
len(successful_video_events)

570

#### We are going to split the video events into shots and non-shots.

In [88]:
sve=successful_video_events #make a copy
shot_urls=[]
for i in successful_video_events:
    if 'Shot' in i['desc']:
        shot_urls.append(i['video'])
        sve.remove(i)
nonShot_urls=sve
print(len(shot_urls), len(nonShot_urls), len(shot_urls)+len(nonShot_urls))

142 428 570


I have just realized that clips in the non-shot pile might also feature shot attempts, therefore I am changing my approach to identifying makes and misses on shot attempts.

For the sake of time and space (storage), we will train get the data from the first 100 events 

In [102]:
True == None

False

In [104]:
True== 1

True