# Exploring `statsapi`

We found a Python wrapper for the MLB Gameday API that might help us get the pitch data we need. Here are some links.  

Repo:  
https://github.com/toddrob99/MLB-StatsAPI

Doc:  
https://toddrob99.github.io/MLB-StatsAPI/#statsapi.schedule

In [5]:
!pwd

/Users/werlindo/Dropbox/flatiron/mod_3_proj/dsc-3-final-project


### Libraries

In [6]:
import statsapi
from statsapi import ENDPOINTS

---

Let's look at the `ENDPOINTS` to see if we can figure out we can get.

In [None]:
ENDPOINTS

( I cleared the output above because it was very large)  
On visually inspecting, we can see some promising features in there, e.g. `spin_rate` etc.  

But need to figure out how to navigate down to what we need. Maybe playing with some examples from the documentation will help spark some ideas.

Here's an example from the documentation:

In [185]:
print( statsapi.player_stats(next(x['id'] for x in statsapi.get('sports_players',{'season':2008,'gameType':'W'})['people'] if x['fullName']=='Chase Utley'), 'hitting', 'career') )

Chase "Silver Fox" Utley, 2B (2003-2018)

Career Hitting
gamesPlayed: 1937
groundOuts: 1792
runs: 1103
doubles: 411
triples: 58
homeRuns: 259
strikeOuts: 1193
baseOnBalls: 724
intentionalWalks: 62
hits: 1885
hitByPitch: 204
avg: .275
atBats: 6857
obp: .358
slg: .465
ops: .823
caughtStealing: 22
stolenBases: 154
groundIntoDoublePlay: 93
numberOfPitches: 31043
plateAppearances: 7863
totalBases: 3189
rbi: 1025
leftOnBase: 2780
sacBunts: 6
sacFlies: 72
babip: .297
groundOutsToAirouts: 0.84




...And another example:

In [None]:
statsapi.get('game_timestamps',{'gamePk':565997})

(Cleared the output above because it is too large to have rendered)  
Hmmm. This is promising. We saw `game_playByPlay` in the `ENDPOINTS`, and think we just need a `gamePk` 

In [9]:
test_game = statsapi.get('game_playByPlay',{'gamePk':565997})

In [208]:
type(test_game)

dict

In [11]:
test_game.keys()

dict_keys(['copyright', 'allPlays', 'currentPlay', 'scoringPlays', 'playsByInning'])

In [13]:
all_plays = test_game['allPlays']

In [None]:
all_plays[0].get

In [17]:
type(all_plays[0])

dict

In [18]:
all_plays[0].keys()

dict_keys(['result', 'about', 'count', 'matchup', 'pitchIndex', 'actionIndex', 'runnerIndex', 'runners', 'playEvents', 'playEndTime', 'atBatIndex'])

In [19]:
type(all_plays[0].get('playEvents'))

list

In [27]:
pitch_num = all_plays[0].get('playEvents')[1].get('pitchNumber')
pitch_num

2

In [43]:
type(all_plays)

list

In [45]:
df = pd.DataFrame(all_plays)

In [None]:
df.columns()

In [46]:
df.head()

Unnamed: 0,about,actionIndex,atBatIndex,count,matchup,pitchIndex,playEndTime,playEvents,result,runnerIndex,runners
0,"{'atBatIndex': 0, 'halfInning': 'top', 'inning...",[],0,"{'balls': 3, 'strikes': 3, 'outs': 1}","{'batter': {'id': 457705, 'fullName': 'Andrew ...","[0, 1, 2, 3, 4, 5]",2019-04-24T23:13:04.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Strikeout', 'event...",[0],"[{'movement': {'start': None, 'end': None, 'ou..."
1,"{'atBatIndex': 1, 'halfInning': 'top', 'inning...",[],1,"{'balls': 2, 'strikes': 2, 'outs': 1}","{'batter': {'id': 592663, 'fullName': 'J.T. Re...","[0, 1, 2, 3, 4, 5, 6, 7]",2019-04-24T23:15:53.000Z,"[{'details': {'call': {'code': 'S', 'descripti...","{'type': 'atBat', 'event': 'Double', 'eventTyp...",[0],"[{'movement': {'start': None, 'end': '2B', 'ou..."
2,"{'atBatIndex': 2, 'halfInning': 'top', 'inning...",[],2,"{'balls': 2, 'strikes': 0, 'outs': 1}","{'batter': {'id': 547180, 'fullName': 'Bryce H...","[0, 1, 2]",2019-04-24T23:17:43.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Double', 'eventTyp...","[0, 1, 2]","[{'movement': {'start': '2B', 'end': '3B', 'ou..."
3,"{'atBatIndex': 3, 'halfInning': 'top', 'inning...",[],3,"{'balls': 4, 'strikes': 2, 'outs': 1}","{'batter': {'id': 656555, 'fullName': 'Rhys Ho...","[0, 1, 2, 3, 4, 5, 6]",2019-04-24T23:21:06.000Z,"[{'details': {'call': {'code': 'B', 'descripti...","{'type': 'atBat', 'event': 'Walk', 'eventType'...",[0],"[{'movement': {'start': None, 'end': '1B', 'ou..."
4,"{'atBatIndex': 4, 'halfInning': 'top', 'inning...",[],4,"{'balls': 0, 'strikes': 0, 'outs': 2}","{'batter': {'id': 596748, 'fullName': 'Maikel ...",[0],2019-04-24T23:21:54.000Z,"[{'details': {'call': {'code': 'X', 'descripti...","{'type': 'atBat', 'event': 'Flyout', 'eventTyp...",[0],"[{'movement': {'start': None, 'end': None, 'ou..."


In [40]:
df = pd.DataFrame.from_dict(test_game)


ValueError: arrays must all be same length

In [184]:
len(test_game)

5

- each dictionary in `all_plays` is a single 'play'

In [125]:
print(json.dumps(all_plays[-1], indent=4))

{
    "result": {
        "type": "atBat",
        "event": "Strikeout",
        "eventType": "strikeout",
        "description": "J.D. Davis strikes out swinging.",
        "rbi": 0,
        "awayScore": 6,
        "homeScore": 0
    },
    "about": {
        "atBatIndex": 79,
        "halfInning": "bottom",
        "inning": 9,
        "startTime": "2019-04-25T02:30:32.000Z",
        "endTime": "2019-04-25T02:32:40.000Z",
        "isComplete": true,
        "isScoringPlay": false,
        "hasReview": false,
        "hasOut": true,
        "captivatingIndex": 14
    },
    "count": {
        "balls": 3,
        "strikes": 3,
        "outs": 3
    },
    "matchup": {
        "batter": {
            "id": 605204,
            "fullName": "J.D. Davis",
            "link": "/api/v1/people/605204"
        },
        "batSide": {
            "code": "R",
            "description": "Right"
        },
        "pitcher": {
            "id": 504379,
            "fullName": "Juan Nicasio",
     

In [112]:
test_game.keys()

dict_keys(['copyright', 'allPlays', 'currentPlay', 'scoringPlays', 'playsByInning'])

### So just on inspection, I think the meat of what we need is in `allPlays`

I'm curious, though, as to what is in `playsByInning`

In [113]:
test_game.get('playsByInning')

[{'startIndex': 0,
  'endIndex': 11,
  'top': [0, 1, 2, 3, 4, 5],
  'bottom': [6, 7, 8, 9, 10, 11],
  'hits': {'away': [{'team': {'id': 143,
      'name': 'Philadelphia Phillies',
      'link': '/api/v1/teams/143',
      'springLeague': {'id': 115,
       'name': 'Grapefruit League',
       'link': '/api/v1/league/115',
       'abbreviation': 'GL'},
      'allStarStatus': 'N'},
     'inning': 1,
     'pitcher': {'id': 450306,
      'fullName': 'Jason Vargas',
      'link': '/api/v1/people/450306'},
     'batter': {'id': 592663,
      'fullName': 'J.T. Realmuto',
      'link': '/api/v1/people/592663'},
     'coordinates': {'x': 167.5117925095603, 'y': 55.70531840499331},
     'type': 'H',
     'description': 'Double'},
    {'team': {'id': 143,
      'name': 'Philadelphia Phillies',
      'link': '/api/v1/teams/143',
      'springLeague': {'id': 115,
       'name': 'Grapefruit League',
       'link': '/api/v1/league/115',
       'abbreviation': 'GL'},
      'allStarStatus': 'N'},
     'i

Appears a little too aggregated. So we should probably just focus on `allPlays`

---

!pip install flatten_json

Let's try to flatten the game by using `flatten_json`

In [31]:
import flatten_json
from flatten_json import flatten

In [104]:
is_it_flat = flatten(test_game)

In [105]:
type(is_it_flat)

dict

In [106]:
is_it_flat

{'copyright': 'Copyright 2019 MLB Advanced Media, L.P.  Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt',
 'allPlays_0_result_type': 'atBat',
 'allPlays_0_result_event': 'Strikeout',
 'allPlays_0_result_eventType': 'strikeout',
 'allPlays_0_result_description': 'Andrew McCutchen strikes out swinging.',
 'allPlays_0_result_rbi': 0,
 'allPlays_0_result_awayScore': 0,
 'allPlays_0_result_homeScore': 0,
 'allPlays_0_about_atBatIndex': 0,
 'allPlays_0_about_halfInning': 'top',
 'allPlays_0_about_inning': 1,
 'allPlays_0_about_startTime': '2019-04-24T22:58:20.000Z',
 'allPlays_0_about_endTime': '2019-04-24T23:13:04.000Z',
 'allPlays_0_about_isComplete': True,
 'allPlays_0_about_isScoringPlay': False,
 'allPlays_0_about_hasReview': False,
 'allPlays_0_about_hasOut': True,
 'allPlays_0_about_captivatingIndex': 14,
 'allPlays_0_count_balls': 3,
 'allPlays_0_count_strikes': 3,
 'allPlays_0_count_outs': 1,
 'allPlays_0_ma

In [39]:
len(is_it_flat)

24307

So, it is a bit too flat. Especially the decisions it makes to flatten 'across' rather than down. for example, we want rows of pitches, not columns of new pitches.  

How about if we use straight `json`.

Also, found `pd.io.json.json_normalize`

In [107]:
import json
import pandas as pd
from pandas.io.json import json_normalize

Let's go back to `allPlays` and try to flatten it.

In [130]:
all_plays = test_game.get('allPlays')

In [207]:
type(all_plays)

list

In [168]:
test = json_normalize(all_plays)

In [169]:
type(test)

pandas.core.frame.DataFrame

In [149]:
test.columns

Index(['about.atBatIndex', 'about.captivatingIndex', 'about.endTime',
       'about.halfInning', 'about.hasOut', 'about.hasReview', 'about.inning',
       'about.isComplete', 'about.isScoringPlay', 'about.startTime',
       'actionIndex', 'atBatIndex', 'count.balls', 'count.outs',
       'count.strikes', 'matchup.batSide.code', 'matchup.batSide.description',
       'matchup.batter.fullName', 'matchup.batter.id', 'matchup.batter.link',
       'matchup.batterHotColdZoneStats.stats', 'matchup.batterHotColdZones',
       'matchup.pitchHand.code', 'matchup.pitchHand.description',
       'matchup.pitcher.fullName', 'matchup.pitcher.id',
       'matchup.pitcher.link', 'matchup.pitcherHotColdZoneStats.stats',
       'matchup.pitcherHotColdZones', 'matchup.splits.batter',
       'matchup.splits.menOnBase', 'matchup.splits.pitcher', 'pitchIndex',
       'playEndTime', 'playEvents', 'result.awayScore', 'result.description',
       'result.event', 'result.eventType', 'result.homeScore', 'result.rb

In [162]:
# each is list of pitch indices, presumable
test['pitchIndex'].tail()

75                   [0]
76          [1, 2, 3, 4]
77    [1, 2, 3, 4, 5, 6]
78             [0, 1, 2]
79    [1, 2, 3, 4, 5, 6]
Name: pitchIndex, dtype: object

In [134]:
test.head()

Unnamed: 0,about.atBatIndex,about.captivatingIndex,about.endTime,about.halfInning,about.hasOut,about.hasReview,about.inning,about.isComplete,about.isScoringPlay,about.startTime,...,playEvents,result.awayScore,result.description,result.event,result.eventType,result.homeScore,result.rbi,result.type,runnerIndex,runners
0,0,14,2019-04-24T23:13:04.000Z,top,True,False,1,True,False,2019-04-24T22:58:20.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",0,Andrew McCutchen strikes out swinging.,Strikeout,strikeout,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."
1,1,34,2019-04-24T23:15:53.000Z,top,False,False,1,True,False,2019-04-24T23:13:05.000Z,...,"[{'details': {'call': {'code': 'S', 'descripti...",0,J.T. Realmuto doubles (4) on a sharp line driv...,Double,double,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '2B', 'ou..."
2,2,34,2019-04-24T23:17:43.000Z,top,False,False,1,True,True,2019-04-24T23:15:55.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",1,Bryce Harper doubles (7) on a sharp line drive...,Double,double,0,1,atBat,"[0, 1, 2]","[{'movement': {'start': '2B', 'end': '3B', 'ou..."
3,3,0,2019-04-24T23:21:06.000Z,top,False,False,1,True,False,2019-04-24T23:17:45.000Z,...,"[{'details': {'call': {'code': 'B', 'descripti...",1,Rhys Hoskins walks.,Walk,walk,0,0,atBat,[0],"[{'movement': {'start': None, 'end': '1B', 'ou..."
4,4,0,2019-04-24T23:21:54.000Z,top,True,False,1,True,False,2019-04-24T23:21:08.000Z,...,"[{'details': {'call': {'code': 'X', 'descripti...",1,Maikel Franco flies out to right fielder Micha...,Flyout,field_out,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou..."


In [136]:
test.shape

(80, 44)

In [202]:
list_new_df = []

for index, row in test.head(2).iterrows():
    #print(row['about.atBatIndex'])
    
      
    play_events_df = json_normalize(row['playEvents'])
    #print(play_events_df.columns)
    
    for indy, rrow in play_events_df.iterrows():
        dicty = {}
        print(indy)
        # atBatIndex
        dicty['at_bat_index'] = row['about.atBatIndex']

        # batside
        dicty['batside'] = row['matchup.batSide.code']

        # pitchHand
        dicty['pitchhand'] = row['matchup.pitchHand.code']
        
        dicty['pitch_num'] = rrow['pitchNumber']
        dicty['speed_start'] = rrow['pitchData.startSpeed']
        dicty['speed_end'] = rrow['pitchData.endSpeed']
        
        # collect row dictionary into list
        list_new_df.append(dicty)

0
1
2
3
4
5
0
1
2
3
4
5
6
7


Let's look at the last iteration of `play_events_df`

In [212]:
play_events_df.T

Unnamed: 0,0,1,2,3,4,5,6,7
count.balls,0,1,1,1,1,1,2,2
count.strikes,1,1,2,2,2,2,2,2
details.ballColor,"rgba(170, 21, 11, 1.0)","rgba(39, 161, 39, 1.0)","rgba(170, 21, 11, 1.0)","rgba(170, 21, 11, 1.0)","rgba(170, 21, 11, 1.0)","rgba(170, 21, 11, 1.0)","rgba(39, 161, 39, 1.0)","rgba(26, 86, 190, 1.0)"
details.call.code,S,B,S,S,S,S,B,X
details.call.description,Strike - Swinging,Ball - Called,Strike - Swinging,Strike - Swinging,Strike - Swinging,Strike - Swinging,Ball - Called,Hit Into Play - Out(s)
details.code,C,B,F,F,F,F,B,D
details.description,Called Strike,Ball,Foul,Foul,Foul,Foul,Ball,"In play, no out"
details.hasReview,False,False,False,False,False,False,False,False
details.isBall,False,True,False,False,False,False,True,False
details.isInPlay,False,False,False,False,False,False,False,True


In [204]:
len(list_new_df)

14

In [205]:
list_new_df[:1]

[{'at_bat_index': 0,
  'batside': 'R',
  'pitchhand': 'L',
  'pitch_num': 1,
  'speed_start': 84.3,
  'speed_end': 76.4}]

In [206]:
new_df = pd.DataFrame(list_new_df)

new_df.T.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
at_bat_index,0,0,0,0,0,0,1,1,1,1,1,1,1,1
batside,R,R,R,R,R,R,R,R,R,R,R,R,R,R
pitch_num,1,2,3,4,5,6,1,2,3,4,5,6,7,8
pitchhand,L,L,L,L,L,L,L,L,L,L,L,L,L,L
speed_end,76.4,76.3,77,73.4,78,79.1,77.9,78,77.3,78.8,73.4,79,73.9,79.1
speed_start,84.3,84.2,84.9,80.8,85.2,86.9,85.4,86,85.5,87,80.2,86.5,80.6,86.5


-----

In [108]:
zzz = json_normalize(test_game)

In [126]:
zzz.columns

Index(['allPlays', 'copyright', 'currentPlay.about.atBatIndex',
       'currentPlay.about.captivatingIndex', 'currentPlay.about.endTime',
       'currentPlay.about.halfInning', 'currentPlay.about.hasOut',
       'currentPlay.about.hasReview', 'currentPlay.about.inning',
       'currentPlay.about.isComplete', 'currentPlay.about.isScoringPlay',
       'currentPlay.about.startTime', 'currentPlay.actionIndex',
       'currentPlay.atBatIndex', 'currentPlay.count.balls',
       'currentPlay.count.outs', 'currentPlay.count.strikes',
       'currentPlay.matchup.batSide.code',
       'currentPlay.matchup.batSide.description',
       'currentPlay.matchup.batter.fullName', 'currentPlay.matchup.batter.id',
       'currentPlay.matchup.batter.link',
       'currentPlay.matchup.batterHotColdZoneStats.stats',
       'currentPlay.matchup.batterHotColdZones',
       'currentPlay.matchup.pitchHand.code',
       'currentPlay.matchup.pitchHand.description',
       'currentPlay.matchup.pitcher.fullName',
  

In [135]:
zzz.head()

Unnamed: 0,allPlays,copyright,currentPlay.about.atBatIndex,currentPlay.about.captivatingIndex,currentPlay.about.endTime,currentPlay.about.halfInning,currentPlay.about.hasOut,currentPlay.about.hasReview,currentPlay.about.inning,currentPlay.about.isComplete,...,currentPlay.result.description,currentPlay.result.event,currentPlay.result.eventType,currentPlay.result.homeScore,currentPlay.result.rbi,currentPlay.result.type,currentPlay.runnerIndex,currentPlay.runners,playsByInning,scoringPlays
0,"[{'result': {'type': 'atBat', 'event': 'Strike...","Copyright 2019 MLB Advanced Media, L.P. Use o...",79,14,2019-04-25T02:32:40.000Z,bottom,True,False,9,True,...,J.D. Davis strikes out swinging.,Strikeout,strikeout,0,0,atBat,[0],"[{'movement': {'start': None, 'end': None, 'ou...","[{'startIndex': 0, 'endIndex': 11, 'top': [0, ...","[2, 60, 63, 64, 72]"


In [87]:
all_plays[0]

{'result': {'type': 'atBat',
  'event': 'Strikeout',
  'eventType': 'strikeout',
  'description': 'Andrew McCutchen strikes out swinging.',
  'rbi': 0,
  'awayScore': 0,
  'homeScore': 0},
 'about': {'atBatIndex': 0,
  'halfInning': 'top',
  'inning': 1,
  'startTime': '2019-04-24T22:58:20.000Z',
  'endTime': '2019-04-24T23:13:04.000Z',
  'isComplete': True,
  'isScoringPlay': False,
  'hasReview': False,
  'hasOut': True,
  'captivatingIndex': 14},
 'count': {'balls': 3, 'strikes': 3, 'outs': 1},
 'matchup': {'batter': {'id': 457705,
   'fullName': 'Andrew McCutchen',
   'link': '/api/v1/people/457705'},
  'batSide': {'code': 'R', 'description': 'Right'},
  'pitcher': {'id': 450306,
   'fullName': 'Jason Vargas',
   'link': '/api/v1/people/450306'},
  'pitchHand': {'code': 'L', 'description': 'Left'},
  'batterHotColdZones': [],
  'pitcherHotColdZones': [],
  'splits': {'batter': 'vs_LHP', 'pitcher': 'vs_RHB', 'menOnBase': 'Empty'}},
 'pitchIndex': [0, 1, 2, 3, 4, 5],
 'actionIndex': 

In [94]:
all_plays[0].keys()

dict_keys(['result', 'about', 'count', 'matchup', 'pitchIndex', 'actionIndex', 'runnerIndex', 'runners', 'playEvents', 'playEndTime', 'atBatIndex'])

In [101]:
yyy = pd.io.json.json_normalize(all_plays
                               ,record_path=['playEvents']
                               ,meta=['result'])
yyy.head()

Unnamed: 0,battingOrder,count,details,endTime,hitData,index,isPitch,pfxId,pitchData,pitchNumber,playId,player,position,startTime,type,result
0,,"{'balls': 1, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:04.000Z,,0,True,190424_231125,"{'startSpeed': 84.3, 'endSpeed': 76.4, 'strike...",1.0,e7c6b8db-d32a-4592-a7ba-7747fc091a6a,,,2019-04-24T23:11:20.000Z,pitch,"{'type': 'atBat', 'event': 'Strikeout', 'event..."
1,,"{'balls': 2, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:17.000Z,,1,True,190424_231209,"{'startSpeed': 84.2, 'endSpeed': 76.3, 'strike...",2.0,4903682e-ec48-4360-9c75-3ec89bf7f15d,,,2019-04-24T23:12:04.000Z,pitch,"{'type': 'atBat', 'event': 'Strikeout', 'event..."
2,,"{'balls': 2, 'strikes': 1}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:30.000Z,,2,True,190424_231222,"{'startSpeed': 84.9, 'endSpeed': 77.0, 'strike...",3.0,8c8d56a6-0bc7-4ee1-b3a1-bcea95d4ce64,,,2019-04-24T23:12:17.000Z,pitch,"{'type': 'atBat', 'event': 'Strikeout', 'event..."
3,,"{'balls': 3, 'strikes': 1}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:43.000Z,,3,True,190424_231236,"{'startSpeed': 80.8, 'endSpeed': 73.4, 'strike...",4.0,37f3f56e-f1d4-4131-b82c-6c88c4d009dc,,,2019-04-24T23:12:30.000Z,pitch,"{'type': 'atBat', 'event': 'Strikeout', 'event..."
4,,"{'balls': 3, 'strikes': 2}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:59.000Z,,4,True,190424_231248,"{'startSpeed': 85.2, 'endSpeed': 78.0, 'strike...",5.0,3dc32a07-e2bf-4b6f-8712-5b16ee33b1d5,,,2019-04-24T23:12:43.000Z,pitch,"{'type': 'atBat', 'event': 'Strikeout', 'event..."


In [103]:
yyy['pitchData'].apply(pd.Series)

  index = _union_indexes(indexes, sort=sort)
  result = result.union(other)


Unnamed: 0,breaks,coordinates,endSpeed,startSpeed,strikeZoneBottom,strikeZoneTop,typeConfidence,zone,0
0,"{'breakAngle': 36.0, 'breakLength': 4.8, 'brea...","{'aY': 26.39, 'aZ': -13.99, 'pfxX': 8.38, 'pfx...",76.4,84.3,1.66,3.42,,14.0,
1,"{'breakAngle': 37.2, 'breakLength': 7.2, 'brea...","{'aY': 26.1, 'aZ': -20.27, 'pfxX': 11.27, 'pfx...",76.3,84.2,1.66,3.42,,12.0,
2,"{'breakAngle': 34.8, 'breakLength': 6.0, 'brea...","{'aY': 26.31, 'aZ': -16.71, 'pfxX': 9.54, 'pfx...",77.0,84.9,1.65,3.45,,8.0,
3,"{'breakAngle': 34.8, 'breakLength': 8.4, 'brea...","{'aY': 23.68, 'aZ': -20.53, 'pfxX': 12.07, 'pf...",73.4,80.8,1.67,3.42,,14.0,
4,"{'breakAngle': 28.8, 'breakLength': 6.0, 'brea...","{'aY': 24.18, 'aZ': -18.55, 'pfxX': 7.92, 'pfx...",78.0,85.2,1.70,3.42,,5.0,
5,"{'breakAngle': 31.2, 'breakLength': 4.8, 'brea...","{'aY': 26.89, 'aZ': -17.35, 'pfxX': 7.54, 'pfx...",79.1,86.9,1.65,3.52,,14.0,
6,"{'breakAngle': 34.8, 'breakLength': 4.8, 'brea...","{'aY': 25.34, 'aZ': -15.07, 'pfxX': 8.47, 'pfx...",77.9,85.4,1.70,3.64,,14.0,
7,"{'breakAngle': 25.2, 'breakLength': 4.8, 'brea...","{'aY': 26.16, 'aZ': -16.65, 'pfxX': 6.33, 'pfx...",78.0,86.0,1.70,3.62,,11.0,
8,"{'breakAngle': 37.2, 'breakLength': 6.0, 'brea...","{'aY': 27.38, 'aZ': -19.17, 'pfxX': 10.49, 'pf...",77.3,85.5,1.65,3.52,,12.0,
9,"{'breakAngle': 18.0, 'breakLength': 4.8, 'brea...","{'aY': 27.41, 'aZ': -17.33, 'pfxX': 4.71, 'pfx...",78.8,87.0,1.65,3.52,,11.0,


In [None]:
yyyz = yyy.

In [54]:
ff = pd.io.json.json_normalize(yyy['pitchData'].apply())

TypeError: 'float' object is not iterable

In [49]:
yyy.head()

Unnamed: 0,battingOrder,count,details,endTime,hitData,index,isPitch,pfxId,pitchData,pitchNumber,playId,player,position,startTime,type
0,,"{'balls': 1, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:04.000Z,,0,True,190424_231125,"{'startSpeed': 84.3, 'endSpeed': 76.4, 'strike...",1.0,e7c6b8db-d32a-4592-a7ba-7747fc091a6a,,,2019-04-24T23:11:20.000Z,pitch
1,,"{'balls': 2, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:17.000Z,,1,True,190424_231209,"{'startSpeed': 84.2, 'endSpeed': 76.3, 'strike...",2.0,4903682e-ec48-4360-9c75-3ec89bf7f15d,,,2019-04-24T23:12:04.000Z,pitch
2,,"{'balls': 2, 'strikes': 1}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:30.000Z,,2,True,190424_231222,"{'startSpeed': 84.9, 'endSpeed': 77.0, 'strike...",3.0,8c8d56a6-0bc7-4ee1-b3a1-bcea95d4ce64,,,2019-04-24T23:12:17.000Z,pitch
3,,"{'balls': 3, 'strikes': 1}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:43.000Z,,3,True,190424_231236,"{'startSpeed': 80.8, 'endSpeed': 73.4, 'strike...",4.0,37f3f56e-f1d4-4131-b82c-6c88c4d009dc,,,2019-04-24T23:12:30.000Z,pitch
4,,"{'balls': 3, 'strikes': 2}","{'call': {'code': 'S', 'description': 'Strike ...",2019-04-24T23:12:59.000Z,,4,True,190424_231248,"{'startSpeed': 85.2, 'endSpeed': 78.0, 'strike...",5.0,3dc32a07-e2bf-4b6f-8712-5b16ee33b1d5,,,2019-04-24T23:12:43.000Z,pitch


In [50]:
yyy.columns

Index(['battingOrder', 'count', 'details', 'endTime', 'hitData', 'index',
       'isPitch', 'pfxId', 'pitchData', 'pitchNumber', 'playId', 'player',
       'position', 'startTime', 'type'],
      dtype='object')

In [58]:
def only_dict(d):
    '''
    Convert json string representation of dictionary to a python dict
    '''
    return json.loads(d)

In [59]:
def list_of_dicts(ld):
    '''
    Create a mapping of the tuples formed after 
    converting json strings of list to a python list   
    '''
    return dict([(list(d.values())[1], list(d.values())[0]) for d in json.loads(ld)])


In [None]:
A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

In [None]:
A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.') 

In [62]:
aaa = json_normalize(yyy['pitchData'].tolist()).add_prefix('col_pitch_data.')


AttributeError: 'float' object has no attribute 'values'

In [60]:
yyy.head(2)

Unnamed: 0,battingOrder,count,details,endTime,hitData,index,isPitch,pfxId,pitchData,pitchNumber,playId,player,position,startTime,type
0,,"{'balls': 1, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:04.000Z,,0,True,190424_231125,"{'startSpeed': 84.3, 'endSpeed': 76.4, 'strike...",1.0,e7c6b8db-d32a-4592-a7ba-7747fc091a6a,,,2019-04-24T23:11:20.000Z,pitch
1,,"{'balls': 2, 'strikes': 0}","{'call': {'code': 'B', 'description': 'Ball - ...",2019-04-24T23:12:17.000Z,,1,True,190424_231209,"{'startSpeed': 84.2, 'endSpeed': 76.3, 'strike...",2.0,4903682e-ec48-4360-9c75-3ec89bf7f15d,,,2019-04-24T23:12:04.000Z,pitch


In [75]:
pitch_data_example = yyy.loc[yyy.index[0],'pitchData']

In [76]:
type(pitch_data_example)

dict

In [79]:
json_normalize(pitch_data_example)

Unnamed: 0,breaks.breakAngle,breaks.breakLength,breaks.breakY,breaks.spinDirection,breaks.spinRate,coordinates.aX,coordinates.aY,coordinates.aZ,coordinates.pX,coordinates.pZ,...,coordinates.x,coordinates.x0,coordinates.y,coordinates.y0,coordinates.z0,endSpeed,startSpeed,strikeZoneBottom,strikeZoneTop,zone
0,36.0,4.8,24.0,146,2303,12.55,26.39,-13.99,1.48,1.58,...,60.67,2.8,196.16,50.0,5.57,76.4,84.3,1.66,3.42,14


---

---

---

## Look at `mlbgame` module

In [7]:
import mlbgame

In [15]:
from __future__ import print_function
import mlbgame

#month = mlbgame.games(2015, 6, home='Mets')
month = mlbgame.games(2015, 6, 1)

In [16]:
month

[[<mlbgame.game.GameScoreboard at 0x11be37748>,
  <mlbgame.game.GameScoreboard at 0x11be37cf8>,
  <mlbgame.game.GameScoreboard at 0x11be37240>,
  <mlbgame.game.GameScoreboard at 0x11be372b0>,
  <mlbgame.game.GameScoreboard at 0x11be377b8>,
  <mlbgame.game.GameScoreboard at 0x11be375c0>,
  <mlbgame.game.GameScoreboard at 0x11be37da0>,
  <mlbgame.game.GameScoreboard at 0x11be37ba8>,
  <mlbgame.game.GameScoreboard at 0x11be37e10>,
  <mlbgame.game.GameScoreboard at 0x11be0f128>,
  <mlbgame.game.GameScoreboard at 0x11be0fe80>]]

In [21]:
type(month)

list

In [8]:
games = mlbgame.combine_games(month)
for game in games:
    print(game)

KeyboardInterrupt: 

In [14]:
from __future__ import print_function
import mlbgame

day = mlbgame.day(2015, 4, 12, home='Royals', away='Royals')
game = day[0]
output = 'Winning pitcher: %s (%s) - Losing Pitcher: %s (%s)'
print(output % (game.w_pitcher, game.w_team, game.l_pitcher, game.l_team))

Winning pitcher: Yordano Ventura (Royals) - Losing Pitcher: C.J. Wilson (Angels)


In [9]:
import pandas as pd

In [None]:
pd.rea