# Moreyball - Exploring the Impact of Analytics on the NBA 

**Project Overview:**<br>
This project aims to explore how data analytics has influenced strategy in the modern NBA, focusing on three key areas:

1. Shot selection<br>
3. Offensive efficiency<br>
3. Player evaluation and usage<br>

Inspired by Billy Beane's "Moneyball" story in baseball, this project focuses on its NBA counterpart, particularly the analytics movement led by former Houston Rockets General Manager Daryl Morey. Rather than analyzing the Rockets specifically, this project looks at league-wide trends that have emerged following Morey's influence since the mid 2000's.

**Objectives:**<br>
* Investigate how NBA strategy has shifted due to the rise of analytics
* Create data visualizations to support exploratory (descriptive) analysis
* Consider the ongoing debate around whether analytics has improved or diminished the entertainment and fundamentals of the sport

**Tools/Resources:**<br>
* Python
* Pandas
* Kaggle NBA Datasets
* Basketball-Reference (POTENTINALLY)
* NBA API (GitHub - swar/nba_api) (POTENTIALLY)

**Disclaimer:**<br>
As someone newer to coding and data analysis, this project will also serve as a learning experience. I hope to accomplish all of my goals over the coming weeks, but I may have to pivot the projects direction as I go along. Consequently, I welcome any feedback, suggestions, or resources that might help shape this project.

------------------------------------------------------------------------------------------



**Initialization:**<br>
The code below initializes the environment by importing the necessary libraries (pandas and os). It also confirsm that my selected datasets have been loaded properly through listing the available files in the Kaggle dataset directory to .

In [None]:
import pandas as pd
import os 

print(os.listdir('/kaggle/input/basketball'))
print(os.listdir('/kaggle/input/nba-aba-baa-stats'))

**Dataset Testing:**<br>
The next few blocks are exploratory as I experiment with commands to determine of the two kaggle datsets could be of most use for my objectives.

**Dataset #1 Testing**: 'basketball'

In [None]:
#Inspecting dataframe contents of a promsing csv from the basketball dataset
df = pd.read_csv('/kaggle/input/basketball/csv/player.csv')  
df.head()

In [None]:
#Exploring potential uses of this csv, finding individual players
the_goat = df[df['full_name'].str.contains('Michael Jordan', case=False, na=False)]
number_two = df[df['full_name'].str.contains('Lebron James', case=False, na=False)]

#Combining variables to maintain dataframe look
both_players = pd.concat([the_goat, number_two])

both_players

In [None]:
#Inspecting other promising csv
df = pd.read_csv('/kaggle/input/basketball/csv/common_player_info.csv')
df.head()

**Dataset #2 Testing**: 'nba-aba-baa-stats'

In [None]:
df = pd.read_csv('/kaggle/input/nba-aba-baa-stats/Player Shooting.csv')
df.head()
#This csv looks very promising for creating visualions of 3 point shooting evolution

In [None]:
df = pd.read_csv('/kaggle/input/nba-aba-baa-stats/Team Totals.csv')
df.head()
#Very intruiging CSV

**Initial Takeaways:**<br>
After exploring several CSV files from both datasets, I believe **Dataset 1** will be more useful for analyzing changes in overall team metrics over the years. In contrast, **Dataset 2** offers more versatility, as it includes both team-level data and detailed individual player stats, making it valuable for a broader range of analyses.

**In the coming weeks, I will continue analyzing the datasets by beginning to filter and organize the data to streamline the visualization process...**