# Exploring `statsapi`

We found a Python wrapper for the MLB Gameday API that might help us get the pitch data we need. Here are some links.  

Repo:  
https://github.com/toddrob99/MLB-StatsAPI

Doc:  
https://toddrob99.github.io/MLB-StatsAPI/#statsapi.schedule

In [1]:
!pwd

/Users/werlindo/Dropbox/flatiron/mod_3_proj/dsc-3-final-project


### Libraries

In [1]:
import statsapi
from statsapi import ENDPOINTS

---

Let's look at the `ENDPOINTS` to see if we can figure out we can get.

In [2]:
ENDPOINTS

{'attendance': {'url': 'https://statsapi.mlb.com/api/{ver}/attendance',
  'path_params': {'ver': {'type': 'str',
    'default': 'v1',
    'leading_slash': False,
    'trailing_slash': False,
    'required': True}},
  'query_params': ['teamId',
   'leagueId',
   'season',
   'date',
   'leagueListId',
   'gameType',
   'fields'],
  'required_params': [['teamId'], ['leagueId'], ['leagueListid']]},
 'awards': {'url': 'https://statsapi.mlb.com/api/{ver}/awards{awardId}{recipients}',
  'path_params': {'ver': {'type': 'str',
    'default': 'v1',
    'leading_slash': False,
    'trailing_slash': False,
    'required': True},
   'awardId': {'type': 'str',
    'default': None,
    'leading_slash': True,
    'trailing_slash': False,
    'required': False},
   'recipients': {'type': 'bool',
    'default': True,
    'True': '/recipients',
    'False': '',
    'leading_slash': False,
    'trailing_slash': False,
    'required': False}},
  'query_params': ['sportId', 'leagueId', 'season', 'hydrate',

( I cleared the output above because it was very large)  
On visually inspecting, we can see some promising features in there, e.g. `spin_rate` etc.  

But need to figure out how to navigate down to what we need. Maybe playing with some examples from the documentation will help spark some ideas.

Here's an example from the documentation:

In [185]:
print( statsapi.player_stats(next(x['id'] for x in statsapi.get('sports_players',{'season':2008,'gameType':'W'})['people'] if x['fullName']=='Chase Utley'), 'hitting', 'career') )

Chase "Silver Fox" Utley, 2B (2003-2018)

Career Hitting
gamesPlayed: 1937
groundOuts: 1792
runs: 1103
doubles: 411
triples: 58
homeRuns: 259
strikeOuts: 1193
baseOnBalls: 724
intentionalWalks: 62
hits: 1885
hitByPitch: 204
avg: .275
atBats: 6857
obp: .358
slg: .465
ops: .823
caughtStealing: 22
stolenBases: 154
groundIntoDoublePlay: 93
numberOfPitches: 31043
plateAppearances: 7863
totalBases: 3189
rbi: 1025
leftOnBase: 2780
sacBunts: 6
sacFlies: 72
babip: .297
groundOutsToAirouts: 0.84




...And another example:

In [None]:
statsapi.get('game_timestamps',{'gamePk':565997})

(Cleared the output above because it is too large to have rendered)  
Hmmm. This is promising. We saw `game_playByPlay` in the `ENDPOINTS`, and think we just need a `gamePk` 

In [3]:
test_game = statsapi.get('game_playByPlay',{'gamePk':565997})

In [208]:
type(test_game)

dict

In [11]:
test_game.keys()

dict_keys(['copyright', 'allPlays', 'currentPlay', 'scoringPlays', 'playsByInning'])

In [13]:
all_plays = test_game['allPlays']

In [82]:
all_plays[0]

{'result': {'type': 'atBat',
  'event': 'Strikeout',
  'eventType': 'strikeout',
  'description': 'Andrew McCutchen strikes out swinging.',
  'rbi': 0,
  'awayScore': 0,
  'homeScore': 0},
 'about': {'atBatIndex': 0,
  'halfInning': 'top',
  'inning': 1,
  'startTime': '2019-04-24T22:58:20.000Z',
  'endTime': '2019-04-24T23:13:04.000Z',
  'isComplete': True,
  'isScoringPlay': False,
  'hasReview': False,
  'hasOut': True,
  'captivatingIndex': 14},
 'count': {'balls': 3, 'strikes': 3, 'outs': 1},
 'matchup': {'batter': {'id': 457705,
   'fullName': 'Andrew McCutchen',
   'link': '/api/v1/people/457705'},
  'batSide': {'code': 'R', 'description': 'Right'},
  'pitcher': {'id': 450306,
   'fullName': 'Jason Vargas',
   'link': '/api/v1/people/450306'},
  'pitchHand': {'code': 'L', 'description': 'Left'},
  'batterHotColdZones': [],
  'pitcherHotColdZones': [],
  'splits': {'batter': 'vs_LHP', 'pitcher': 'vs_RHB', 'menOnBase': 'Empty'}},
 'pitchIndex': [0, 1, 2, 3, 4, 5],
 'actionIndex': 

In [88]:
type(all_plays[0])

dict

In [87]:
all_plays[0].keys()

dict_keys(['result', 'about', 'count', 'matchup', 'pitchIndex', 'actionIndex', 'runnerIndex', 'runners', 'playEvents', 'atBatIndex', 'playEndTime'])

In [84]:
type(all_plays[0].get('playEvents'))

list

In [27]:
pitch_num = all_plays[0].get('playEvents')[1].get('pitchNumber')
pitch_num

2

In [43]:
type(all_plays)

list

In [45]:
df = pd.DataFrame(all_plays)

In [None]:
df.columns()

In [46]:
df.head()

Unnamed: 0,about,actionIndex,atBatIndex,count,matchup,pitchIndex,playEndTime,playEvents,result,runnerIndex,runners
0,"{'atBatIndex': 0, 'halfInning': 'top', 'inning...",[],0,"{'balls': 3, 'strikes': 3, 'outs': 1}","{'batter': {'id': 457705, 'fullName': 'Andrew ...","[0, 1, 2, 3, 4, 5]",2019-04-24T23:13:04.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Strikeout', 'event...",[0],"[{'movement': {'start': None, 'end': None, 'ou..."
1,"{'atBatIndex': 1, 'halfInning': 'top', 'inning...",[],1,"{'balls': 2, 'strikes': 2, 'outs': 1}","{'batter': {'id': 592663, 'fullName': 'J.T. Re...","[0, 1, 2, 3, 4, 5, 6, 7]",2019-04-24T23:15:53.000Z,"[{'details': {'call': {'code': 'S', 'descripti...","{'type': 'atBat', 'event': 'Double', 'eventTyp...",[0],"[{'movement': {'start': None, 'end': '2B', 'ou..."
2,"{'atBatIndex': 2, 'halfInning': 'top', 'inning...",[],2,"{'balls': 2, 'strikes': 0, 'outs': 1}","{'batter': {'id': 547180, 'fullName': 'Bryce H...","[0, 1, 2]",2019-04-24T23:17:43.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Double', 'eventTyp...","[0, 1, 2]","[{'movement': {'start': '2B', 'end': '3B', 'ou..."
3,"{'atBatIndex': 3, 'halfInning': 'top', 'inning...",[],3,"{'balls': 4, 'strikes': 2, 'outs': 1}","{'batter': {'id': 656555, 'fullName': 'Rhys Ho...","[0, 1, 2, 3, 4, 5, 6]",2019-04-24T23:21:06.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Walk', 'eventType'...",[0],"[{'movement': {'start': None, 'end': '1B', 'ou..."
4,"{'atBatIndex': 4, 'halfInning': 'top', 'inning...",[],4,"{'balls': 0, 'strikes': 0, 'outs': 2}","{'batter': {'id': 596748, 'fullName': 'Maikel ...",[0],2019-04-24T23:21:54.000Z,"[{'details': {'call': {'code': 'X', 'descripti...","{'type': 'atBat', 'event': 'Flyout', 'eventTyp...",[0],"[{'movement': {'start': None, 'end': None, 'ou..."


In [40]:
df = pd.DataFrame.from_dict(test_game)


ValueError: arrays must all be same length

In [184]:
len(test_game)

5

- each dictionary in `all_plays` is a single 'play'

In [125]:
print(json.dumps(all_plays[-1], indent=4))

{
    "result": {
        "type": "atBat",
        "event": "Strikeout",
        "eventType": "strikeout",
        "description": "J.D. Davis strikes out swinging.",
        "rbi": 0,
        "awayScore": 6,
        "homeScore": 0
    },
    "about": {
        "atBatIndex": 79,
        "halfInning": "bottom",
        "inning": 9,
        "startTime": "2019-04-25T02:30:32.000Z",
        "endTime": "2019-04-25T02:32:40.000Z",
        "isComplete": true,
        "isScoringPlay": false,
        "hasReview": false,
        "hasOut": true,
        "captivatingIndex": 14
    },
    "count": {
        "balls": 3,
        "strikes": 3,
        "outs": 3
    },
    "matchup": {
        "batter": {
            "id": 605204,
            "fullName": "J.D. Davis",
            "link": "/api/v1/people/605204"
        },
        "batSide": {
            "code": "R",
            "description": "Right"
        },
        "pitcher": {
            "id": 504379,
            "fullName": "Juan Nicasio",
     

In [112]:
test_game.keys()

dict_keys(['copyright', 'allPlays', 'currentPlay', 'scoringPlays', 'playsByInning'])

### So just on inspection, I think the meat of what we need is in `allPlays`

I'm curious, though, as to what is in `playsByInning`

In [113]:
test_game.get('playsByInning')

[{'startIndex': 0,
  'endIndex': 11,
  'top': [0, 1, 2, 3, 4, 5],
  'bottom': [6, 7, 8, 9, 10, 11],
  'hits': {'away': [{'team': {'id': 143,
      'name': 'Philadelphia Phillies',
      'link': '/api/v1/teams/143',
      'springLeague': {'id': 115,
       'name': 'Grapefruit League',
       'link': '/api/v1/league/115',
       'abbreviation': 'GL'},
      'allStarStatus': 'N'},
     'inning': 1,
     'pitcher': {'id': 450306,
      'fullName': 'Jason Vargas',
      'link': '/api/v1/people/450306'},
     'batter': {'id': 592663,
      'fullName': 'J.T. Realmuto',
      'link': '/api/v1/people/592663'},
     'coordinates': {'x': 167.5117925095603, 'y': 55.70531840499331},
     'type': 'H',
     'description': 'Double'},
    {'team': {'id': 143,
      'name': 'Philadelphia Phillies',
      'link': '/api/v1/teams/143',
      'springLeague': {'id': 115,
       'name': 'Grapefruit League',
       'link': '/api/v1/league/115',
       'abbreviation': 'GL'},
      'allStarStatus': 'N'},
     'i

Appears a little too aggregated. So we should probably just focus on `allPlays`

---

!pip install flatten_json

Let's try to flatten the game by using `flatten_json`

In [31]:
import flatten_json
from flatten_json import flatten

In [104]:
is_it_flat = flatten(test_game)

In [105]:
type(is_it_flat)

dict

In [106]:
is_it_flat

{'copyright': 'Copyright 2019 MLB Advanced Media, L.P.  Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt',
 'allPlays_0_result_type': 'atBat',
 'allPlays_0_result_event': 'Strikeout',
 'allPlays_0_result_eventType': 'strikeout',
 'allPlays_0_result_description': 'Andrew McCutchen strikes out swinging.',
 'allPlays_0_result_rbi': 0,
 'allPlays_0_result_awayScore': 0,
 'allPlays_0_result_homeScore': 0,
 'allPlays_0_about_atBatIndex': 0,
 'allPlays_0_about_halfInning': 'top',
 'allPlays_0_about_inning': 1,
 'allPlays_0_about_startTime': '2019-04-24T22:58:20.000Z',
 'allPlays_0_about_endTime': '2019-04-24T23:13:04.000Z',
 'allPlays_0_about_isComplete': True,
 'allPlays_0_about_isScoringPlay': False,
 'allPlays_0_about_hasReview': False,
 'allPlays_0_about_hasOut': True,
 'allPlays_0_about_captivatingIndex': 14,
 'allPlays_0_count_balls': 3,
 'allPlays_0_count_strikes': 3,
 'allPlays_0_count_outs': 1,
 'allPlays_0_ma

In [39]:
len(is_it_flat)

24307

So, it is a bit too flat. Especially the decisions it makes to flatten 'across' rather than down. for example, we want rows of pitches, not columns of new pitches.  

How about if we use straight `json`.

Also, found `pd.io.json.json_normalize`

In [7]:
import json
import pandas as pd
from pandas.io.json import json_normalize

Let's go back to `allPlays` and try to flatten it.

In [8]:
all_plays = test_game.get('allPlays')

In [9]:
type(all_plays)

list

In [10]:
test_df = json_normalize(all_plays)

In [11]:
type(test_df)

pandas.core.frame.DataFrame

In [12]:
test_df.columns

Index(['about.atBatIndex', 'about.captivatingIndex', 'about.endTime',
       'about.halfInning', 'about.hasOut', 'about.hasReview', 'about.inning',
       'about.isComplete', 'about.isScoringPlay', 'about.startTime',
       'actionIndex', 'atBatIndex', 'count.balls', 'count.outs',
       'count.strikes', 'matchup.batSide.code', 'matchup.batSide.description',
       'matchup.batter.fullName', 'matchup.batter.id', 'matchup.batter.link',
       'matchup.batterHotColdZoneStats.stats', 'matchup.batterHotColdZones',
       'matchup.pitchHand.code', 'matchup.pitchHand.description',
       'matchup.pitcher.fullName', 'matchup.pitcher.id',
       'matchup.pitcher.link', 'matchup.pitcherHotColdZoneStats.stats',
       'matchup.pitcherHotColdZones', 'matchup.splits.batter',
       'matchup.splits.menOnBase', 'matchup.splits.pitcher', 'pitchIndex',
       'playEndTime', 'playEvents', 'result.awayScore', 'result.description',
       'result.event', 'result.eventType', 'result.homeScore', 'result.rb

In [13]:
# each is list of pitch indices, presumable
test_df['pitchIndex'].tail()

75                   [0]
76          [1, 2, 3, 4]
77    [1, 2, 3, 4, 5, 6]
78             [0, 1, 2]
79    [1, 2, 3, 4, 5, 6]
Name: pitchIndex, dtype: object

In [43]:
test_df.head()

Unnamed: 0,about.atBatIndex,about.captivatingIndex,about.endTime,about.halfInning,about.hasOut,about.hasReview,about.inning,about.isComplete,about.isScoringPlay,about.startTime,...,playEvents,result.awayScore,result.description,result.event,result.eventType,result.homeScore,result.rbi,result.type,runnerIndex,runners
0,0,14,2019-04-24T23:13:04.000Z,top,True,False,1,True,False,2019-04-24T22:58:20.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",0,Andrew McCutchen strikes out swinging.,Strikeout,strikeout,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."
1,1,34,2019-04-24T23:15:53.000Z,top,False,False,1,True,False,2019-04-24T23:13:05.000Z,...,"[{'details': {'call': {'code': 'S', 'descripti...",0,J.T. Realmuto doubles (4) on a sharp line driv...,Double,double,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '2B', 'ou..."
2,2,34,2019-04-24T23:17:43.000Z,top,False,False,1,True,True,2019-04-24T23:15:55.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",1,Bryce Harper doubles (7) on a sharp line drive...,Double,double,0,1,atBat,"[0, 1, 2]","[{'movement': {'start': '2B', 'end': '3B', 'ou..."
3,3,0,2019-04-24T23:21:06.000Z,top,False,False,1,True,False,2019-04-24T23:17:45.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",1,Rhys Hoskins walks.,Walk,walk,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '1B', 'ou..."
4,4,0,2019-04-24T23:21:54.000Z,top,True,False,1,True,False,2019-04-24T23:21:08.000Z,...,"[{'details': {'call': {'code': 'X', 'descripti...",1,Maikel Franco flies out to right fielder Micha...,Flyout,field_out,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."


In [45]:
test_df.iloc[:,:11].head()

Unnamed: 0,about.atBatIndex,about.captivatingIndex,about.endTime,about.halfInning,about.hasOut,about.hasReview,about.inning,about.isComplete,about.isScoringPlay,about.startTime,actionIndex
0,0,14,2019-04-24T23:13:04.000Z,top,True,False,1,True,False,2019-04-24T22:58:20.000Z,[]
1,1,34,2019-04-24T23:15:53.000Z,top,False,False,1,True,False,2019-04-24T23:13:05.000Z,[]
2,2,34,2019-04-24T23:17:43.000Z,top,False,False,1,True,True,2019-04-24T23:15:55.000Z,[]
3,3,0,2019-04-24T23:21:06.000Z,top,False,False,1,True,False,2019-04-24T23:17:45.000Z,[]
4,4,0,2019-04-24T23:21:54.000Z,top,True,False,1,True,False,2019-04-24T23:21:08.000Z,[]


In [48]:
test_df.iloc[:,11:22].head()

Unnamed: 0,atBatIndex,count.balls,count.outs,count.strikes,matchup.batSide.code,matchup.batSide.description,matchup.batter.fullName,matchup.batter.id,matchup.batter.link,matchup.batterHotColdZoneStats.stats,matchup.batterHotColdZones
0,0,3,1,3,R,Right,Andrew McCutchen,457705,/api/v1/people/457705,,[]
1,1,2,1,2,R,Right,J.T. Realmuto,592663,/api/v1/people/592663,,[]
2,2,2,1,0,L,Left,Bryce Harper,547180,/api/v1/people/547180,,[]
3,3,4,1,2,R,Right,Rhys Hoskins,656555,/api/v1/people/656555,,[]
4,4,0,2,0,R,Right,Maikel Franco,596748,/api/v1/people/596748,,[]


In [49]:
test_df.iloc[:,22:33].head()

Unnamed: 0,matchup.pitchHand.code,matchup.pitchHand.description,matchup.pitcher.fullName,matchup.pitcher.id,matchup.pitcher.link,matchup.pitcherHotColdZoneStats.stats,matchup.pitcherHotColdZones,matchup.splits.batter,matchup.splits.menOnBase,matchup.splits.pitcher,pitchIndex
0,L,Left,Jason Vargas,450306,/api/v1/people/450306,,[],vs_LHP,Empty,vs_RHB,"[0, 1, 2, 3, 4, 5]"
1,L,Left,Jason Vargas,450306,/api/v1/people/450306,,[],vs_LHP,RISP,vs_RHB,"[0, 1, 2, 3, 4, 5, 6, 7]"
2,L,Left,Jason Vargas,450306,/api/v1/people/450306,,[],vs_LHP,RISP,vs_LHB,"[0, 1, 2]"
3,L,Left,Jason Vargas,450306,/api/v1/people/450306,,[],vs_LHP,RISP,vs_RHB,"[0, 1, 2, 3, 4, 5, 6]"
4,L,Left,Jason Vargas,450306,/api/v1/people/450306,,[],vs_LHP,RISP,vs_RHB,[0]


In [62]:
test_df.iloc[:,33:44].head()

Unnamed: 0,playEndTime,playEvents,result.awayScore,result.description,result.event,result.eventType,result.homeScore,result.rbi,result.type,runnerIndex,runners
0,2019-04-24T23:13:04.000Z,"[{'details': {'call': {'code': 'B', 'descripti...",0,Andrew McCutchen strikes out swinging.,Strikeout,strikeout,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."
1,2019-04-24T23:15:53.000Z,"[{'details': {'call': {'code': 'S', 'descripti...",0,J.T. Realmuto doubles (4) on a sharp line driv...,Double,double,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '2B', 'ou..."
2,2019-04-24T23:17:43.000Z,"[{'details': {'call': {'code': 'B', 'descripti...",1,Bryce Harper doubles (7) on a sharp line drive...,Double,double,0,1,atBat,"[0, 1, 2]","[{'movement': {'start': '2B', 'end': '3B', 'ou..."
3,2019-04-24T23:21:06.000Z,"[{'details': {'call': {'code': 'B', 'descripti...",1,Rhys Hoskins walks.,Walk,walk,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '1B', 'ou..."
4,2019-04-24T23:21:54.000Z,"[{'details': {'call': {'code': 'X', 'descripti...",1,Maikel Franco flies out to right fielder Micha...,Flyout,field_out,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."


Having inspected these columns, and nested dicts, think have captured everything obviously useful for our business question(s).

In [67]:
test_df.shape

(80, 44)

## Prototype for data pull

In [68]:
list_for_new_df = []

# Data from allPlays
ap_sel_cols = ['about.atBatIndex', 'matchup.batSide.code', 'matchup.pitchHand.code', 'count.balls'
              ,'count.strikes', 'count.outs']

# Data from playEvents
plev_sel_cols = ['details.type.code', 'details.type.description', 
            'details.call.code', 'details.call.description', 
            'details.isBall', 'isPitch', 'details.isStrike'
            ,'pitchData.breaks.breakAngle'
            ,'pitchData.breaks.breakLength', 'pitchData.breaks.breakY'
            ,'pitchData.breaks.spinDirection', 'pitchData.breaks.spinRate'
            ,'pitchData.coordinates.aX'
            , 'pitchData.coordinates.aY','pitchData.coordinates.aZ', 'pitchData.coordinates.pX'
            , 'pitchData.coordinates.pZ', 'pitchData.coordinates.pfxX', 'pitchData.coordinates.pfxZ'
            , 'pitchData.coordinates.vX0', 'pitchData.coordinates.vY0', 'pitchData.coordinates.vZ0'
            , 'pitchData.coordinates.x', 'pitchData.coordinates.x0', 'pitchData.coordinates.y'
            , 'pitchData.coordinates.y0','pitchData.coordinates.z0', 'pitchData.endSpeed'
            , 'pitchData.startSpeed', 'pitchNumber'
           ]

# Now go through each row. If there is nested list, json_normalize it
#for index, row in test_df.head(2).iterrows(): #Just using first 2 rows for testing
for index, row in test_df.iterrows(): #Just using first 2 rows for testing
    
    # saw playEvents is a nested list, so json_normalize it
    play_events_df = json_normalize(row['playEvents'])
    
    # look at runners
    runners_df = json_normalize(row['runners'])
        
    # Loop through THIS NESTED dataframe and NOW build the row for the new df    
    for plev_ind, plev_row in play_events_df.iterrows():
  
        # Instantiate new dict, which will be a single row in target df
        curr_dict = {}
    
#         # atBatIndex
#         curr_dict['at_bat_index'] = row['about.atBatIndex']

#         # batside
#         curr_dict['bat_side'] = row['matchup.batSide.code']

#         # pitchHand
#         curr_dict['pitch_hand'] = row['matchup.pitchHand.code']
        for col_ap in ap_sel_cols:
            curr_dict[col_ap] = row[col_ap]
    
        for col_plev in plev_sel_cols:
            curr_dict[col_plev] = plev_row[col_plev]
        
#         curr_dict['pitch_label_cd'] = pev_row['details.type.code']
#         curr_dict['pitch_label_desc'] = pev_row['details.type.description']
        
        # collect row dictionary into list
        list_for_new_df.append(curr_dict)



Let's look at the last iteration of `play_events_df`

In [69]:
play_events_df.columns

Index(['battingOrder', 'count.balls', 'count.outs', 'count.strikes',
       'details.awayScore', 'details.ballColor', 'details.call.code',
       'details.call.description', 'details.code', 'details.description',
       'details.event', 'details.eventType', 'details.hasReview',
       'details.homeScore', 'details.isBall', 'details.isInPlay',
       'details.isScoringPlay', 'details.isStrike', 'details.trailColor',
       'details.type.code', 'details.type.description', 'endTime', 'index',
       'isPitch', 'pfxId', 'pitchData.breaks.breakAngle',
       'pitchData.breaks.breakLength', 'pitchData.breaks.breakY',
       'pitchData.breaks.spinDirection', 'pitchData.breaks.spinRate',
       'pitchData.coordinates.aX', 'pitchData.coordinates.aY',
       'pitchData.coordinates.aZ', 'pitchData.coordinates.pX',
       'pitchData.coordinates.pZ', 'pitchData.coordinates.pfxX',
       'pitchData.coordinates.pfxZ', 'pitchData.coordinates.vX0',
       'pitchData.coordinates.vY0', 'pitchData.coordin

Let's look at the last iteration of `runners_df`

In [71]:
runners_df.head().T

Unnamed: 0,0
credits,"[{'player': {'id': 592663, 'link': '/api/v1/pe..."
details.earned,False
details.event,Strikeout
details.eventType,strikeout
details.isScoringEvent,False
details.movementReason,
details.playIndex,6
details.rbi,False
details.responsiblePitcher,
details.runner.fullName,J.D. Davis


Nothing here looks immediately useful.

In [72]:
len(list_for_new_df)

365

Check all the selected fields make sense.

In [73]:
list_for_new_df[:1]

[{'about.atBatIndex': 0,
  'matchup.batSide.code': 'R',
  'matchup.pitchHand.code': 'L',
  'count.balls': 3,
  'count.strikes': 3,
  'count.outs': 1,
  'details.type.code': 'FF',
  'details.type.description': 'Four-Seam Fastball',
  'details.call.code': 'B',
  'details.call.description': 'Ball - Called',
  'details.isBall': True,
  'isPitch': True,
  'details.isStrike': False,
  'pitchData.breaks.breakAngle': 36.0,
  'pitchData.breaks.breakLength': 4.8,
  'pitchData.breaks.breakY': 24.0,
  'pitchData.breaks.spinDirection': 146,
  'pitchData.breaks.spinRate': 2303,
  'pitchData.coordinates.aX': 12.55,
  'pitchData.coordinates.aY': 26.39,
  'pitchData.coordinates.aZ': -13.99,
  'pitchData.coordinates.pX': 1.48,
  'pitchData.coordinates.pZ': 1.58,
  'pitchData.coordinates.pfxX': 8.38,
  'pitchData.coordinates.pfxZ': 12.13,
  'pitchData.coordinates.vX0': -5.79,
  'pitchData.coordinates.vY0': -122.28,
  'pitchData.coordinates.vZ0': -6.67,
  'pitchData.coordinates.x': 60.67,
  'pitchData.coo

Seems legit.

In [75]:
# Proof of concept on target dataframe
new_df = pd.DataFrame(list_for_new_df)

new_df.head(10)

Unnamed: 0,about.atBatIndex,count.balls,count.outs,count.strikes,details.call.code,details.call.description,details.isBall,details.isStrike,details.type.code,details.type.description,...,pitchData.coordinates.vY0,pitchData.coordinates.vZ0,pitchData.coordinates.x,pitchData.coordinates.x0,pitchData.coordinates.y,pitchData.coordinates.y0,pitchData.coordinates.z0,pitchData.endSpeed,pitchData.startSpeed,pitchNumber
0,0,3,1,3,B,Ball - Called,True,False,FF,Four-Seam Fastball,...,-122.28,-6.67,60.67,2.8,196.16,50.0,5.57,76.4,84.3,1.0
1,0,3,1,3,B,Ball - Called,True,False,FT,Two-Seam Fastball,...,-122.19,-3.37,55.45,2.74,170.3,50.0,5.69,76.3,84.2,2.0
2,0,3,1,3,S,Strike - Swinging,False,True,FT,Two-Seam Fastball,...,-123.08,-5.83,125.65,2.6,193.72,50.0,5.5,77.0,84.9,3.0
3,0,3,1,3,B,Ball - Called,True,False,CH,Changeup,...,-116.93,-5.93,96.61,2.48,206.66,50.0,5.71,73.4,80.8,4.0
4,0,3,1,3,S,Strike - Swinging,False,True,FT,Two-Seam Fastball,...,-123.69,-3.51,122.61,2.55,167.13,50.0,5.65,78.0,85.2,5.0
5,0,3,1,3,S,Strike - Swinging,False,True,FF,Four-Seam Fastball,...,-126.23,-4.25,62.49,2.8,173.25,50.0,5.54,79.1,86.9,6.0
6,1,2,1,2,S,Strike - Swinging,False,True,FF,Four-Seam Fastball,...,-123.85,-6.53,106.5,2.58,195.08,50.0,5.56,77.9,85.4,1.0
7,1,2,1,2,B,Ball - Called,True,False,FF,Four-Seam Fastball,...,-124.61,-2.5,147.61,2.32,147.0,50.0,5.79,78.0,86.0,2.0
8,1,2,1,2,S,Strike - Swinging,False,True,FT,Two-Seam Fastball,...,-124.14,-2.75,68.1,2.62,159.87,50.0,5.66,77.3,85.5,3.0
9,1,2,1,2,S,Strike - Swinging,False,True,FF,Four-Seam Fastball,...,-126.17,-1.25,140.82,2.52,137.07,50.0,5.67,78.8,87.0,4.0


Looks good!

In [77]:
new_df.shape

(365, 36)

In [78]:
new_df.tail()

Unnamed: 0,about.atBatIndex,count.balls,count.outs,count.strikes,details.call.code,details.call.description,details.isBall,details.isStrike,details.type.code,details.type.description,...,pitchData.coordinates.vY0,pitchData.coordinates.vZ0,pitchData.coordinates.x,pitchData.coordinates.x0,pitchData.coordinates.y,pitchData.coordinates.y0,pitchData.coordinates.z0,pitchData.endSpeed,pitchData.startSpeed,pitchNumber
360,79,3,3,3,B,Ball - Called,True,False,SL,Slider,...,-125.24,-6.89,44.66,-1.8,209.73,50.0,6.01,79.7,86.3,2.0
361,79,3,3,3,B,Ball - Called,True,False,SL,Slider,...,-126.37,-8.23,125.22,-2.16,219.16,50.0,5.96,80.2,87.0,3.0
362,79,3,3,3,S,Strike - Swinging,False,True,FF,Four-Seam Fastball,...,-133.58,-6.46,91.21,-1.75,174.95,50.0,6.0,84.1,92.0,4.0
363,79,3,3,3,B,Ball - Called,True,False,SL,Slider,...,-125.9,-7.55,91.17,-1.81,218.02,50.0,5.89,80.3,86.7,5.0
364,79,3,3,3,S,Strike - Swinging,False,True,FF,Four-Seam Fastball,...,-134.17,-4.97,119.54,-1.91,155.35,50.0,5.88,84.1,92.3,6.0


In [79]:
new_df.columns

Index(['about.atBatIndex', 'count.balls', 'count.outs', 'count.strikes',
       'details.call.code', 'details.call.description', 'details.isBall',
       'details.isStrike', 'details.type.code', 'details.type.description',
       'isPitch', 'matchup.batSide.code', 'matchup.pitchHand.code',
       'pitchData.breaks.breakAngle', 'pitchData.breaks.breakLength',
       'pitchData.breaks.breakY', 'pitchData.breaks.spinDirection',
       'pitchData.breaks.spinRate', 'pitchData.coordinates.aX',
       'pitchData.coordinates.aY', 'pitchData.coordinates.aZ',
       'pitchData.coordinates.pX', 'pitchData.coordinates.pZ',
       'pitchData.coordinates.pfxX', 'pitchData.coordinates.pfxZ',
       'pitchData.coordinates.vX0', 'pitchData.coordinates.vY0',
       'pitchData.coordinates.vZ0', 'pitchData.coordinates.x',
       'pitchData.coordinates.x0', 'pitchData.coordinates.y',
       'pitchData.coordinates.y0', 'pitchData.coordinates.z0',
       'pitchData.endSpeed', 'pitchData.startSpeed', 'pitchNu

### ** OK, think we have enough here. Going to pull cleaned up version into new workbook to act as a pseudo-template for work moving forward.**

-----

-----

# Old or Abandoned Ideas

In [49]:
yyy.head()

Unnamed: 0,battingOrder,count,details,endTime,hitData,index,isPitch,pfxId,pitchData,pitchNumber,playId,player,position,startTime,type
0,,"{'balls': 1, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:04.000Z,,0,True,190424_231125,"{'startSpeed': 84.3, 'endSpeed': 76.4, 'strike...",1.0,e7c6b8db-d32a-4592-a7ba-7747fc091a6a,,,2019-04-24T23:11:20.000Z,pitch
1,,"{'balls': 2, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:17.000Z,,1,True,190424_231209,"{'startSpeed': 84.2, 'endSpeed': 76.3, 'strike...",2.0,4903682e-ec48-4360-9c75-3ec89bf7f15d,,,2019-04-24T23:12:04.000Z,pitch
2,,"{'balls': 2, 'strikes': 1}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:30.000Z,,2,True,190424_231222,"{'startSpeed': 84.9, 'endSpeed': 77.0, 'strike...",3.0,8c8d56a6-0bc7-4ee1-b3a1-bcea95d4ce64,,,2019-04-24T23:12:17.000Z,pitch
3,,"{'balls': 3, 'strikes': 1}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:43.000Z,,3,True,190424_231236,"{'startSpeed': 80.8, 'endSpeed': 73.4, 'strike...",4.0,37f3f56e-f1d4-4131-b82c-6c88c4d009dc,,,2019-04-24T23:12:30.000Z,pitch
4,,"{'balls': 3, 'strikes': 2}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:59.000Z,,4,True,190424_231248,"{'startSpeed': 85.2, 'endSpeed': 78.0, 'strike...",5.0,3dc32a07-e2bf-4b6f-8712-5b16ee33b1d5,,,2019-04-24T23:12:43.000Z,pitch


In [50]:
yyy.columns

Index(['battingOrder', 'count', 'details', 'endTime', 'hitData', 'index',
       'isPitch', 'pfxId', 'pitchData', 'pitchNumber', 'playId', 'player',
       'position', 'startTime', 'type'],
      dtype='object')

In [58]:
def only_dict(d):
    '''
    Convert json string representation of dictionary to a python dict
    '''
    return json.loads(d)

In [59]:
def list_of_dicts(ld):
    '''
    Create a mapping of the tuples formed after 
    converting json strings of list to a python list   
    '''
    return dict([(list(d.values())[1], list(d.values())[0]) for d in json.loads(ld)])


In [None]:
A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

In [None]:
A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

In [62]:
aaa = json_normalize(yyy['pitchData'].tolist()).add_prefix('col_pitch_data.')


AttributeError: 'float' object has no attribute 'values'

In [60]:
yyy.head(2)

Unnamed: 0,battingOrder,count,details,endTime,hitData,index,isPitch,pfxId,pitchData,pitchNumber,playId,player,position,startTime,type
0,,"{'balls': 1, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:04.000Z,,0,True,190424_231125,"{'startSpeed': 84.3, 'endSpeed': 76.4, 'strike...",1.0,e7c6b8db-d32a-4592-a7ba-7747fc091a6a,,,2019-04-24T23:11:20.000Z,pitch
1,,"{'balls': 2, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:17.000Z,,1,True,190424_231209,"{'startSpeed': 84.2, 'endSpeed': 76.3, 'strike...",2.0,4903682e-ec48-4360-9c75-3ec89bf7f15d,,,2019-04-24T23:12:04.000Z,pitch


In [75]:
pitch_data_example = yyy.loc[yyy.index[0],'pitchData']

In [76]:
type(pitch_data_example)

dict

In [79]:
json_normalize(pitch_data_example)

Unnamed: 0,breaks.breakAngle,breaks.breakLength,breaks.breakY,breaks.spinDirection,breaks.spinRate,coordinates.aX,coordinates.aY,coordinates.aZ,coordinates.pX,coordinates.pZ,...,coordinates.x,coordinates.x0,coordinates.y,coordinates.y0,coordinates.z0,endSpeed,startSpeed,strikeZoneBottom,strikeZoneTop,zone
0,36.0,4.8,24.0,146,2303,12.55,26.39,-13.99,1.48,1.58,...,60.67,2.8,196.16,50.0,5.57,76.4,84.3,1.66,3.42,14


---

---

---

## Look at `mlbgame` module

In [7]:
import mlbgame

In [15]:
from __future__ import print_function
import mlbgame

#month = mlbgame.games(2015, 6, home='Mets')
month = mlbgame.games(2015, 6, 1)

In [16]:
month

[[<mlbgame.game.GameScoreboard at 0x11be37748>,
  <mlbgame.game.GameScoreboard at 0x11be37cf8>,
  <mlbgame.game.GameScoreboard at 0x11be37240>,
  <mlbgame.game.GameScoreboard at 0x11be372b0>,
  <mlbgame.game.GameScoreboard at 0x11be377b8>,
  <mlbgame.game.GameScoreboard at 0x11be375c0>,
  <mlbgame.game.GameScoreboard at 0x11be37da0>,
  <mlbgame.game.GameScoreboard at 0x11be37ba8>,
  <mlbgame.game.GameScoreboard at 0x11be37e10>,
  <mlbgame.game.GameScoreboard at 0x11be0f128>,
  <mlbgame.game.GameScoreboard at 0x11be0fe80>]]

In [21]:
type(month)

list

In [8]:
games = mlbgame.combine_games(month)
for game in games:
    print(game)

KeyboardInterrupt: 

In [14]:
from __future__ import print_function
import mlbgame

day = mlbgame.day(2015, 4, 12, home='Royals', away='Royals')
game = day[0]
output = 'Winning pitcher: %s (%s) - Losing Pitcher: %s (%s)'
print(output % (game.w_pitcher, game.w_team, game.l_pitcher, game.l_team))

Winning pitcher: Yordano Ventura (Royals) - Losing Pitcher: C.J. Wilson (Angels)


In [9]:
import pandas as pd

This ended up not going much abandoned. Using `statsapi` still for now.  
(wm 2019.05.26)