# Demo Notebook for Using AI to Analyze Reboot Motion Metrics

__[CoLab Notebook Link](https://githubtocolab.com/RebootMotion/reboot-toolkit/blob/main/examples/RebootMotionAI.ipynb)__

Run the cells in order, making sure to enter AWS credentials in the cell when prompted

In [None]:
!pip install git+https://github.com/RebootMotion/reboot-toolkit.git@v2.4.0#egg=reboot_toolkit > /dev/null
!pip install git+https://github.com/RebootMotion/mlb-statsapi.git@v1.1.0#egg=mlb_statsapi > /dev/null
!echo "Done Installing"

In [None]:
!pip install pandasai

In [None]:
import os
import reboot_toolkit as rtk

from getpass import getpass
from reboot_toolkit import S3Metadata, MocapType, MovementType, Handedness, FileType, PlayerMetadata, setup_aws, decorate_primary_segment_df_with_stats_api
from IPython.display import display

In [None]:
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

In [None]:
#@title User Input - No code changes required below this section, just enter information in forms

# Update the below info to match your desired analysis information
# Common changes you might want to make:

# To analyze both Hawk-Eye HFR data from the Stats API,
# and also Hawk-Eye Action files (e.g. from the DSP),
#  set mocap_types=[MocapType.HAWKEYE_HFR, MocapType.HAWKEYE]

# To analyze baseball-hitting,
# set movement_type=MovementType.BASEBALL_HITTING

# To analyze right-handed players,
# set handedness=Handedness.RIGHT

# To analyze data from the momentum and energy files,
# set file_type=FileType.MOMENTUM_ENERGY

# See https://docs.rebootmotion.com/ for all available file types and the data in each
mocap_types = [MocapType.HAWKEYE_HFR]
movement_type = MovementType.BASEBALL_PITCHING
handedness = Handedness.RIGHT
file_type = FileType.METRICS_BASEBALL_PITCHING_V2

# Update the label to whatever you'd like to be displayed in the visuals
primary_segment_label = 'Primary Segment'
comparison_segment_label = 'Comparison Segment'

# Set these here, or use the drop down menu below (you don't need to do both)
prim_seg_org_player_ids = None  # ['594798']  # This is an example of the input format
comp_seg_org_player_ids = None

In [None]:
#@title AWS Credentials

# Upload your Organization's .env file to the local file system, per https://pypi.org/project/python-dotenv/
#
# Also, update the org_id in the field below to your own org_id
# (note this isn't strictly necessary, the .env file will override what's written here)

boto3_session = setup_aws(org_id="org-mlbbiomech", aws_default_region="us-west-1")

In [None]:
#@title Instantiate the AI instance

# In order to use this Notebook, you must create your own Open AI API Key.
# See the Open AI API docs here: https://platform.openai.com/overview

# Open AI
# The Open AI privacy policy can be found here: https://openai.com/api-data-privacy

# IN USING THIS NOTEBOOK, THE USER AGREES TO ASSUME ALL RISKS ASSOCIATED WITH THE USE OF GOOGLE COLAB AND OPENAI.
# THERE ARE RISKS ASSOCIATED WITH USING AI, AS THE OUTPUT CAN BE UNPREDICTABLE,
# PLEASE USE IT RESPONSIBLY.

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass('Input OPENAI_API_KEY here:')

else:
    print('OPENAI_API_KEY already present in environment')

llm = OpenAI(api_token=os.environ['OPENAI_API_KEY'])

# HOW THE ENFORCE PRIVACY PARAMETER WORKS:
# FALSE - Randomize and shuffle the dataframe head (the first 5 rows) and send just the head to Open AI
# TRUE -  Do not send the head to Open AI, only the column names
pandas_ai = PandasAI(llm, enforce_privacy=True)

In [None]:
#@title Set S3 File Info

# Common changes you might want to make:

# To analyze both Hawk-Eye HFR data from the Stats API,
# and also Hawk-Eye Action files (e.g. from the DSP),
#  set mocap_types=[MocapType.HAWKEYE_HFR, MocapType.HAWKEYE]

# To analyze baseball-hitting,
# set movement_type=MovementType.BASEBALL_HITTING

# To analyze right-handed players,
# set handedness=Handedness.RIGHT

# See https://docs.rebootmotion.com/ for all available file types and the data in each

# Update the below info to match your desired analysis information

s3_metadata = S3Metadata(
    org_id=os.environ['ORG_ID'],
    mocap_types=mocap_types,
    movement_type=movement_type,
    handedness=handedness,
    file_type=file_type,
)

s3_df = rtk.download_s3_summary_df(s3_metadata)

In [None]:
#@title Display the Interface for Selecting the Primary Data Segment to Analyze

primary_segment_widget = rtk.create_interactive_widget(s3_df)
display(primary_segment_widget)

In [None]:
#@title Set Primary Analysis Segment Info

primary_segment_data = primary_segment_widget.children[1].result

primary_analysis_segment = PlayerMetadata(
    org_player_ids=primary_segment_data["org_player_ids"] if not prim_seg_org_player_ids else prim_seg_org_player_ids,
    session_dates=primary_segment_data["session_dates"],
    session_nums=primary_segment_data["session_nums"],
    session_date_start=primary_segment_data["session_date_start"],
    session_date_end=primary_segment_data["session_date_end"],
    year=primary_segment_data["year"],
    org_movement_id=None, # set the play GUID for the skeleton animation; None defaults to the first play
    s3_metadata=s3_metadata,
)

primary_segment_summary_df = rtk.filter_s3_summary_df(primary_analysis_segment, s3_df)

# Common Issue:
# If no data files are returned here,
# check that the segment selection widget and the S3 File Info above are set correctly,
# also if the cells were updated after running once, check that they were run again with any new selections.

# List all Available S3 data for the Primary Analysis Segment
available_s3_keys = rtk.list_available_s3_keys(os.environ['ORG_ID'], primary_segment_summary_df)
# Load the Primary Analysis Segment into an Analysis Dictionary with Mean and Standard Dev DataFrames
primary_segment_data_df = rtk.load_games_to_df_from_s3_paths(primary_segment_summary_df['s3_path_delivery'].tolist())

reboot_motion_metrics = list(primary_segment_data_df)

In [None]:
#@title Optional: Uncomment to add information from the MLB Stats API and filter based on stats from Stats API
# Filter the segment data down to a subset of pitches using metrics of your choice
# In the last line, choose any of the metrics and filter the dataframe by that value
# To apply multiple filters, use & for and; use | for or

# Common Issue:
# Missing element: data integrity issues in parsing. Majority of data is still likely fine.
# Common metrics to filter by are: start_speed, end_speed, spin_rate, spin_direction, zone, pitch_type

primary_segment_data_df = decorate_primary_segment_df_with_stats_api(primary_segment_data_df)
# primary_segment_data_df = primary_segment_data_df[(primary_segment_data_df["start_speed"] > 90) & (primary_segment_data_df["start_speed"] < 100)]

stats_api_metrics = [m for m in list(primary_segment_data_df) if m not in reboot_motion_metrics]
print('Available Stats API Metrics:')
print(stats_api_metrics)

In [None]:
#@title Optional: Uncomment to have the AI automatically clean the data

# pandas_ai.clean_data(primary_segment_data_df)

In [None]:
#@title Use AI to explore the Primary Segment

# Use this prompt to tell the AI what you would like it to do
llm_prompt = f"""
Plot start speed vs pitch hand proj max for only pitch type = Four-Seam Fastball
"""

pandas_ai.run(primary_segment_data_df, llm_prompt)

In [None]:
#@title Display the Interface for Selecting the Comparison Data Segment to Analyze

comparison_segment_widget = rtk.create_interactive_widget(s3_df)
display(comparison_segment_widget)

In [None]:
#@title Set Comparison Analysis Segment Inputs

comparison_s3_metadata = s3_metadata
comparison_segment_data = comparison_segment_widget.children[1].result

comparison_analysis_segment = PlayerMetadata(
    org_player_ids=comparison_segment_data["org_player_ids"] if not comp_seg_org_player_ids else comp_seg_org_player_ids,
    session_dates=comparison_segment_data["session_dates"],
    session_nums=comparison_segment_data["session_nums"],
    session_date_start=comparison_segment_data["session_date_start"],
    session_date_end=comparison_segment_data["session_date_end"],
    year=comparison_segment_data["year"],
    org_movement_id=None, # set the play GUID for the skeleton animation; None defaults to the first play
    s3_metadata=comparison_s3_metadata,
)

comparison_segment_summary_df = rtk.filter_s3_summary_df(comparison_analysis_segment, s3_df)

# Add Movement Num and S3 Key to Comparison DataFrame to Enable Sorting
comp_available_s3_keys = rtk.list_available_s3_keys(os.environ['ORG_ID'], comparison_segment_summary_df)
comparison_segment_data_df = rtk.load_games_to_df_from_s3_paths(comparison_segment_summary_df['s3_path_delivery'].tolist())
comparison_segment_data_df = rtk.merge_data_df_with_s3_keys(comparison_segment_data_df , comp_available_s3_keys).sort_values(by=['session_date', 'movement_num'])

In [None]:
#@title Optional: Uncomment to decorate biomechanics metrics with information from the MLB Stats API and filter based on stats from Stats API
# Filter the segment data down to a subset of pitches using metrics of your choice
# In the last line, choose any of the metrics and filter the dataframe by that value
# To apply multiple filters, use & for and; use | for or

# Common Issue:
# Missing element: data integrity issues in parsing. Majority of data is still likely fine.
# Common metrics to filter by are: start_speed, end_speed, spin_rate, spin_direction, zone, pitch_type

comparison_segment_data_df = decorate_primary_segment_df_with_stats_api(comparison_segment_data_df)
# comparison_segment_data_df = comparison_segment_data_df[(comparison_segment_data_df["start_speed"] > 90) & (comparison_segment_data_df["start_speed"] < 100)]

In [None]:
#@title Optional: Uncomment to have the AI automatically clean the data

# pandas_ai.clean_data(comparison_segment_summary_df)

In [None]:
#@title Use AI to operate on the segment dataframes

llm_prompt_comp_operate = f"""
Concatenate dataframes along the index
"""

concatenated_df = pandas_ai.run([primary_segment_data_df, comparison_segment_data_df], llm_prompt_comp_operate)

In [None]:
#@title Use Google Colab to explore the concatenated dataframes

# Once you run this cell, the dataframe will be displayed below
# Underneath the dataframe, there will be several buttons:
# - One button will open the dataframe in an interactive mode
# - The other button will suggest interesting visuals for the dataframe
#     - if you click on a suggested visual, Colab will show the code to create the visual
concatenated_df

In [None]:
#@title Use AI to explore the segment dataframes together

llm_prompt_comp_explore = f"""
Concatenate the dataframes by the index, then group by session num and pitch type,
then plot start speed vs pitch hand proj max, with a grid and a legend outside the chart area
"""

pandas_ai.run([primary_segment_data_df, comparison_segment_data_df], llm_prompt_comp_explore)