# $\mathbf{TAKE}$ $\mathbf{THE}$ $\mathbf{SHOT}$ $\mathbf{OR}$ $\mathbf{NOT?}$
## *Shot Caller for Ballers: 3-point shooting analysis and success modeling*  
### Prepared for The NBA Advanced Statistics Forum

$\rightarrow$ Report by Jen Eyring, Ryan Miller and Lincoln Muriithi | Analytics & Data Science Team | 8/19/2022

----

<a id="execsummary"></a>

# 🏀 Executive Summary

### Question: [Exoplanets](https://en.wikipedia.org/wiki/Exoplanet) are planets outside of our solar system and we're entering a new era of exoplanet discovery with the [James Webb Space Telescope (JWST)](https://www.jwst.nasa.gov/) coming online.  Expanding on the Kepler telescope and a number of other telescopes/missions geared towards exoplanet discovery, the JWST will allow humanity to glimpse further into the unverse than ever before; but, it can also get amazing detail of things much closer.  In fact, under certain conditions, it can 'see' an extrasolar planet's atmosphere, possibly detecting telltale signs of life.  As such, with so many stars in our galaxy alone (~100 Billion) and assets limited to one telescope, can we identify stellar candidates that have 'Earthlike' planets to focus our attention on?  INCLUDE HERE THE 'WHY'!  EXPECIALLY IF YOU CAN INCLUDE A SMALL LITTLE GRAPH SHOWING THE DIFFERENCE BETWEEN WINNERS AND LOSERS IN 3 POINT SHOOTING
##### Note: For this study, I defined 'Earthlike' as planets that have a gravity level that could support earth-like life, as this will present the most familiar signs of organic activity for identification .  Details of how this was determined can be found in this notebook. 
### Actions: Using [exoplanet telescope data consolidated by CalTech](https://exoplanetarchive.ipac.caltech.edu/), I attempted to filter and identify key characteristics of stars with earthlike planets - with the goal of identifying planets which would most likely harbor detectable life.  As such, I did a detailed analysis of the best data I could put together on every planet so far discovered to see if there were any clear drivers of having earthlike planets.  Modeling was attempted to see if an ML Classification Model could be used to classify new and existing stellar candidates.
### Conclusions:  Even with extensive analysis, there was not much success in identifying clear characteristics of stars which harbor Earthlike planets.  Modeling also failed to improve on baseline.  Overall this is not surprising as this is a very new field stretching cutting edge instruments to their furtherst capabilities.
### Recommendations: Continue to cluster stellar characteristics in an attempt to find one statistically signifigant as an indicator of Earthlike planets.  Also, an imputing algorithm can be used to capture a number of planets that had nulls that could not be filled simply by combing and finding the mean of all available data for that planet.  Lastly, focus early JWST studies on building a dataset which allows for modeling the likelihood of a star having an earthlike planet, as this will make it a better tool in the long run by continually improving its targeting.

#### Do a hypthesis test on the last 10 years

## Report Contents:
- ### [Executive Summary](#execsummary)
- ### [Libraries](#libraries)
- ### [Data Acquisition and Summary Analysis](#dasa)
- ### [EDA and Feature Engineering](#EDA)
- ### [Modeling](#modeling)
- ### [Conclusions](#conclusions)

***

 <a id="libraries"></a>

# 🏀 Libraries Required for Analysis

Common DS Libraries

In [None]:
# For data analysis:
import numpy as np
import pandas as pd

# Graphs/Visualizations:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# For Statistics and Hypothesis Testing:
import scipy.stats as stats

# For Modeling:
from itertools import combinations
from itertools import product
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split

# Necessary Modules
import os
import time

NBA API Endpoint Grabbers

In [None]:
from nba_api.stats.static import teams
from nba_api.stats.static import players
from nba_api.stats.endpoints import gamerotation
from nba_api.stats.endpoints import shotchartdetail
from nba_api.stats.endpoints import teamplayerdashboard
from nba_api.stats.endpoints import winprobabilitypbp
from nba_api.stats.endpoints import leaguegamefinder

Helper Functions

In [None]:
#from acquire import
#from wrangle import
#from explore import
#from model import

Others

In [None]:
#Ignore Warnings:
import warnings
warnings.filterwarnings('ignore')

# reloads import files each time a cell is ran (makes your life easier)
%load_ext autoreload
%autoreload 2

# Shows all columns of a dataframe
pd.set_option('display.max_columns', None)

<a id="dasa"></a>

# 🏀 Data Acquisition and Summary Analysis

#### Source description (NBA stats via API)

#### Raw Data Description - and which endpoint it was pulled from and why

#### Describe Acquisition Steps
- Mention workbook with step by step

#### Describe dropped and calculated columns

- 3 point clustering: ok with all data since it was a simple kmeans
    - maybe a side-by-side of clusters and a shot example
- dropping win_prob
- outlier analysis (distance) - how many rows lost
    - Distance
- b

acquire functions and info and head

#### Note on univariate distributions - Key Takeaways

<a id="EDA"></a>

# 🏀 Exploratory Data Analysis and Feature Engineering

#### Pre-Processing - Wrangle (acquire)

#### Building features around three point scoring, cumulative stats, etc.

#### Ran the following
- heatmap correlation between all numericals, unscaled
- plot all numericals against target (barplots, t-test and ANOVA)
- plot all categorical against target (chi-squared and 
- do some interesting multivariable analysis

<a id="modeling"></a>

# 🏀 Modeling

- Process
- Dropping features
- ensemble
- results - for general leaguewide model, x% better than baseline on unseen data
    - doesnt do much better, talk about accuracy vs precision vs. recall.  thus we tried it on individual players, starting with our elites
- Discovered even with the metric we built which we hoped would help predict the next shot, indivdiaul player idiosyncrazcites made it hard to model the league as a group.  Thus we decided to analyze this on individual players.
- Show dataframe with individual players
- Interestngly enough some players lend thmselves better to modeling than others.  Good next step for investigation.

<a id="conclusions"></a>

# 🏀 Conclusions

Summary of biggest key takeaways

Recap of model Performance

Recommendations and Next Steps
- Dip deeper into NBA API to find additional features that were filter based.  Requires more computing power due to the complexity of the API calls needed
- Also run the ensemble modeling on every player to get a invidual view
- Add in additional years of data for veterans