![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

## Step by Step: Example Machine-Learning Ideas:

Machine learning is a field of study that focuses on teaching computers how to learn and make decisions without being explicitly programmed to do so. It's a way for computers to learn from data and improve their performance over time.

Let's say we have a machine learning model that wants to *predict* the performance of basketball players based on various factors such as *height*, *weight*, *shooting accuracy*, and *years* of *experience*.

To train the model, we would gather data on a large number of basketball players, including their attributes (height, weight, shooting accuracy, experience) and their corresponding performance metrics (points scored, rebounds, assists). The model would then analyze this data and look for patterns and relationships between the attributes and the performance metrics.

For instance, the model might discover that taller players tend to score more points or that players with higher shooting accuracy have more assists. It would learn to identify these correlations by analyzing the data and using mathematical algorithms.

To begin, let's attempt to create a simpler ML model, one which predicts an NBA player's *position* based on their *height*. First, we need to obtain a large, appropriate dataset. The current NBA season has many players which can be analyzed and is up-to-date/modern which makes it an appropriate fit to our model. 

In [None]:
from nba_api.stats.endpoints import commonallplayers, commonplayerinfo
import pandas as pd

# Fetch all players for the current NBA season
players = commonallplayers.CommonAllPlayers(is_only_current_season=1)
player_data = players.get_data_frames()[0]

# Create an empty list to store player information
dataset = []

# Iterate through each player and fetch their height and position
for player_id in player_data['PERSON_ID']:
    player_info = commonplayerinfo.CommonPlayerInfo(player_id=player_id)
    player_info_data = player_info.get_data_frames()[0]
    height = player_info_data['HEIGHT'][0]
    position = player_info_data['POSITION'][0]

    # Append the player's height and position to the dataset
    dataset.append({'Height': height, 'Position': position})

df = pd.DataFrame(dataset)

Let's take a look at our dataframe and see if there is anything in particular that needs to be addressed. 

In [None]:
# Take a look at the dataframe
df

The data looks great overall! However, let's make some changes to help our machine-learning model predict better. In the `Position` column we can remake positions that are labelled as two positions (ex. Forward-Center) into just the first position listed. This removes confusion between players that are hard to categorize and increases the performance of our ML model. 

Another issue is that in our `Height` column, the height of the players are in a *feet-inch* format. For our model, we'll likely need to total height to be a whole number in order for our model to quantify a certain height to certain positions. Let's solve this issue by converting the format (feet-inch) into only inches. 

In [None]:
# Convert height values that are currently in the format (feet-inch) to only inches
df['Height'] = df['Height'].apply(lambda x: int(x.split('-')[0]) * 12 + int(x.split('-')[1]))

# Extract the first word before the hyphen in the 'Position' column
df['Position'] = df['Position'].str.split('-').str[0]

Now that we have a *cleaned* dataframe, let's begin the machine-learning portion alongside creating some *functions* which will help in creating our prediction model. 

When developing a machine learning model, it is important to split the data into two sets: the *training set* and the *testing set*. The training set is used to teach the model how to make predictions based on the available data. Once the model is trained, the testing set is used to evaluate how well the model can predict on new, unseen data. By having a separate testing set, we can check if the model works well on new data that it hasn't seen before. This helps us understand how accurately the model can make predictions in real-world situations. Additionally, splitting the data allows us to adjust the model's settings to make it perform better on the testing set. It also helps us identify if the model is overfitting, which means it's too focused on the training data and doesn't generalize well to new data. Overall, splitting the data into training and testing sets helps us understand how good our model is and make improvements if needed.

Let's use sklearn's *RandomForestClassifier* model using the `Height` feature from our dataframe to predict the `Position` variable. 

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create a RandomForestClassifier model
model = RandomForestClassifier()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['Height']], df['Position'], test_size=0.2, random_state=42)
# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)
# Calculate the accuracy of the model  
accuracy = accuracy_score(y_test, predictions)
print(accuracy)

# Function to predict position based on height input
def predict_position(height):
    predicted_position = model.predict([[height]])
    return predicted_position[0]

# Function to fetch player position using nba_api
def fetch_player_position_height(player_name):
    player = players.find_players_by_full_name(player_name)
    if player:
        player_id = player[0]['id']
        player_info = commonplayerinfo.CommonPlayerInfo(player_id=player_id)
        player_info_df = player_info.get_data_frames()[0]
        position = player_info_df['POSITION'][0]
        height = player_info_df['HEIGHT'][0]
        return position, height
    else:
        return None

# Function to convert height to only inches
def convert_height_to_inches(height_str):
    feet, inches = map(int, height_str.split('-'))
    total_inches = feet * 12 + inches
    return total_inches

Perfect! We have a trained model that seems to be working (to an extent). Let's create an simple way for users to input the names of NBA players and try to predict their corresponding positions. 

In [None]:
playing = True

while playing:
    # Get player name input from the user
    player_name = input("Enter NBA player's name: (Note: If you can't find a player, try only inputting their first or last name)")

    # Fetch player position using nba_api
    actual_position, actual_height = fetch_player_position_height(player_name)

    if actual_position:

        # Predict position based on the input height
        predicted_position = predict_position(convert_height_to_inches(actual_height))

        print("Predicted Position:", predicted_position)
        print("Actual Position:", actual_position)

        choice = input("Do you want to continue? (Y/n): ")
        if choice.lower() == 'n':
            playing=False
    else:
        print("Player not found. Please enter a valid player name.")

## Helpful ML Sites

To explore more ML ideas related to basketball check out:

https://betterprogramming.pub/using-pythons-nba-api-to-create-a-simple-regression-model-ac9a3b36bc8

https://www.tandfonline.com/doi/full/10.1080/24751839.2021.1977066

https://towardsdatascience.com/guide-to-building-a-college-basketball-machine-learning-model-in-python-1c70b83acb51

https://towardsdatascience.com/nba-data-science-93e0314bb45e

https://watchstadium.com/which-nba-statistics-actually-translate-to-wins-07-13-2019/

https://towardsdatascience.com/building-my-first-machine-learning-model-nba-prediction-algorithm-dee5c5bc4cc1

Now you can complete your [data science project](10-data-science-project.ipynb).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)