# GM info
I had originally planned to put collecting the GM/Coaching data in with the other player accomplishment data, but the code became quite long and I decided it would be better to break it apart into more pieces and posts. So for this post I'll just be getting GM info and merging that onto all my player data and saving the coaching data for the next post.

## Thank god for basketball-reference.com
In grabbing the GM/coach info to attribute draft choices, basketball-reference appears again to be the place to go. [NBA.com][1] seemingly only has this info for the Lakers, though the way it's organized it would have been the easiest to exploit. As it is, grabbing the data from basketball-reference shouldn't be too difficult*, but it will require a lot more checking of the info to make sure it reflects the GM/Coach at the time of the draft and preseason rather than any mid-season changes.
*This ended up being the most difficult piece yet, mostly out of my own poor approach to getting the date issue resolved. All in all, I'd guess this took me about 20 to 25 hours to figure out, way to long to be effective.
[1]: http://www.nba.com/lakers/history/owners_gm_coach

In [1]:
# Standard imports 
import pandas as pd
from pandas import Series,DataFrame,read_html
import numpy as np
import datetime as dt

from bs4 import BeautifulSoup
import html5lib

In [2]:
# Team exec info
# First I just need a list of all team abbrievations that I can use to rip through the url's. I can create the list from the previously created list of draft choices
draft_dframe = pd.read_csv('NBA_Data/1976_to_2014_Draft.csv')

# From this I can take the 'Team' column that has the necessary abbreviations
team_list = list(set(draft_dframe['Team']))

# The set commands gets rid of duplicates and the list() puts the result into a list
team_list 

['WSB',
 'MIL',
 'GSW',
 'MIN',
 'MIA',
 'ATL',
 'BOS',
 'DET',
 'NYK',
 'DEN',
 'DAL',
 'OKC',
 'POR',
 'NJN',
 'TOR',
 'SEA',
 'CLE',
 'SAS',
 'BUF',
 'CHA',
 'UTA',
 'VAN',
 'CHH',
 'CHI',
 'HOU',
 'NOH',
 'WAS',
 'LAL',
 'PHI',
 'PHO',
 'NOJ',
 'NOK',
 'MEM',
 'KCK',
 'LAC',
 'SAC',
 'ORL',
 'BRK',
 'SDC',
 'IND']

In [3]:
len(team_list)

40

## My first mistake
I started having errors here, but didn't realize it till a little later. As can be seen from above, I have a team list of 40, which makes sense given that teams have moved and thus their abbreviations changed. But I didn't realize that basketball-reference had actually updated the lists of a team's GMs such that the LA Clipppers webpage included GMs of their predecessors, the San Diego Clippers and the Buffalo Braves, thus I really just needed the most recent abbreviation. 
I proceeded as shown in the next three cells as though everything was fine. It wasn't until I tried to implement the loop that everything went wrong. This was a valuable learning experience in spending more time looking at the data that I'm going to try to gather first, rather than spending time thinking through a loop that may never work.

In [4]:
# With the list of team abbreviations, I can now rip through a URL template
url_template = "http://www.basketball-reference.com/teams/{team}/executives.html"

In [5]:
# Making an empty dataframe to append the info to
gm_df = DataFrame()

In [None]:
# Grabbing the info and putting into the empty dataframe
for team in team_list:
    url = url_template.format(team = team)
    
    dframe_team = pd.io.html.read_html(url)
    dframe_team = dframe_team[0]
    
    # Add in a column for the team abbreviation
    dframe_team.insert(0, 'Team', team)
    
    # Append to gm DataFrame
    gm_df = gm_df.append(dframe_team, ignore_index=True)

## Unfortunately not as easy as I had hoped
Since basketball-reference uses the most recent team abbreviation for it's url, a team like the LA Clippers that started as the Buffalo Braves (BUF), moved to San Diego to become the San Diego Clippers (SDC) and eventually moved to LA as the LA Clippers (LAC), all of their team exec history is stored in the url  "http://www.basketball-reference.com/teams/LAC/executives.html".
The biggest concern then is if a team became defunct between 1976 and 2015. Such a team would have drafted a player(s) in the dataframe, and I would need to determine what the correct abbreviation for the team would have been. Luckily , looking for defunct NBA teams, [Wikipedia][1] shows that no team became defunct after 1954, so all abbreviations from the most recent NBA season should suffice to pull all the correct data.
I should then be able to just use a list of the abbreviations from the 2015.
[1]: https://en.wikipedia.org/wiki/List_of_defunct_National_Basketball_Association_teams

In [6]:
# From the draft dataframe, I'll take the most recent team abbreviations, thankfully no teams have moved since 2008
dframe_list = draft_dframe[draft_dframe.Draft_Yr >= 2012]
team_list = list(set(dframe_list['Team']))

# The set() command gets rid of duplicates and the list() puts the result into a list

# Removing a Charlotte Hornets abbreviation as they changed their name back to the Hornets in 2014 and the Brooklyn Nets 'BRK' abbreviation as b-r still uses 'NJN' for the team
if 'CHH' in team_list: team_list.remove('CHH')  
if 'BRK' in team_list: team_list.remove('BRK')   
# Now just check to make sure there are all 30 teams represented
len(team_list)

30

## Aside
I actually had to run this piece of code three times, starting with the draft year as 2014. This only returned a list of length 25, so I knew I was missing teams. Changing it to 2012, brought it to the requisite 30 (with two duplicates to drop).
## NB
I had to adjust the data I was pulling to include the 2015 draft class, but I didn't want to go through the hassle of adjusting this code to fit with a new list of team abbreviations so I'm relying on my prior csv that only went through the 2014 draft for the team abbreviation

## Changing tactics for a second
As I proceeded forward with the correct url template and attempting to institute the loop to create the dataframe, the loop became quite cumbersome. I decided it would be beneficial instead to do one url, as I did one draft year url in Part 1, to show my work before proceeding to put everything into a complex loop.

In [7]:
url =  "http://www.basketball-reference.com/teams/ATL/executives.html"

In [8]:
# Reading the URL into a DataFrame
dframe_team = pd.io.html.read_html(url)
dframe_team = dframe_team[0]
dframe_team.head()

Unnamed: 0,Rk,Executive,Start,End,Notes
0,1,Ben Kerner,1949,1960-11-26,
1,2,Marty Blake,1960-11-26,1970-04-29,
2,3,Bob Cousins,1970-04-29,1972-04-24,
3,4,Richie Guerin,1972-04-24,1973-08-04,
4,5,Pat Williams,1973-08-04,1974-08-06,


In [9]:
# Dropping the unneeded columns
dframe_team.drop(dframe_team.columns[[0,4]],inplace=True,axis=1)

# Renaming the columns
column_names = ['Executive', 'Start', 'End']
dframe_team.columns = column_names

# Adding a column for the team abbreviation
dframe_team.insert(0, 'Team', 'ATL')

dframe_team.head()

Unnamed: 0,Team,Executive,Start,End
0,ATL,Ben Kerner,1949,1960-11-26
1,ATL,Marty Blake,1960-11-26,1970-04-29
2,ATL,Bob Cousins,1970-04-29,1972-04-24
3,ATL,Richie Guerin,1972-04-24,1973-08-04
4,ATL,Pat Williams,1973-08-04,1974-08-06


### Filling out the years for which the GM was in charge during the draft
This is the most complex part. Basketball-reference includes an exact start and end date for each GM, so I need to make sure I have the GM in charge during the draft. Since the draft is always between June 8 to June 30 I need to use the dates in order to grab the GM in charge at that time.

In [10]:
# Replacing 'present' in the 'End' column with today's date
date = dt.datetime.today().strftime("%m-%d-%Y")
dframe_team.replace(['present'], [date], inplace=True)

# First converting the 'Start' and 'End' columns to a date time object
dframe_team['Start'] = pd.to_datetime(dframe_team['Start'], errors='coerce')
dframe_team['End'] = pd.to_datetime(dframe_team['End'], errors='coerce')

# Dropping GM's who left prior to 1976
dframe_team = dframe_team[dframe_team['End'].dt.year >= 1976]

dframe_team

Unnamed: 0,Team,Executive,Start,End
6,ATL,Bud Seretean,1975-06-23,1977-01-03
7,ATL,Mike Storen,1977-01-03,1977-09-27
8,ATL,Michael Gearon,1977-09-27,1979-07-12
9,ATL,Lewis Schaffel,1979-07-12,1979-11-19
10,ATL,Stan Kasten,1979-11-19,1990-02-14
11,ATL,Pete Babcock,1990-02-14,2003-04-02
12,ATL,Billy Knight,2003-04-02,2008-05-28
13,ATL,Rick Sund,2008-05-28,2012-06-25
14,ATL,Danny Ferry,2012-06-25,2014-09-12
15,ATL,Mike Budenholzer,2014-09-12,2016-09-30


In [11]:
# Then I create a column for the year that the GM started which will be used later when determining what drafts the GM was in charge of
dframe_team['start_year'] = dframe_team['Start'].dt.year
dframe_team.head()

Unnamed: 0,Team,Executive,Start,End,start_year
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977
9,ATL,Lewis Schaffel,1979-07-12,1979-11-19,1979
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979


## Getting more complicated
Writing up the proper code to get all the GM's assigned to the right year took a long time and involved quite a bit of trial and error. In this case, I wrote a bunch of code that worked for url, but then it hit snags in another because of date issues. So I adjusted the code to deal with those issues, but then it hit snags at the next url. Overall, I was going about finding solutions in a piecemeal fashion, rather than taking the time to look at the data to begin with and to craft a one size fits all solution. A classic example of how spending more time at the front-end to understand and anticipate issues would have saved me hours.

## Now what is going on below
As can be seen in the above DataFrame, I have a data set with the exact dates of a GM's tenure, as well as a start year column that I will use later to determine what year the GM started making draft choices. The problem I need to address is that some GM's might have never have actually drafted anyone. To deal with this, I need to create a starting cut date such that if a GM started after that date in a given year then he would not be credited with drafting the player drafted that year and a end cut off date such that if a GM didn't make it to the date in the given year he is not credited with drafting a player from that year. For example, in the above DataFrame, Ron Hrovat started on July 23, 1979, this was after the 1979 draft so he should not be credited with drafting Cleveland's pick in 1979. Hrovat was then fired/released on June 13, 1980, this was after the 1980 draft on June 10, so Hrovat should be credited with Cleveland's pick from 1980. 
In attempting to address this issue, I initially took the piecemeal approach and simply assumed that if a GM was the GM after July 1 of a given year then the GM didn't make the draft choice for that year and that if the GM was let go prior to June 8 of a given year then the GM didn't make the draft choice for that year as well. (I'm being a little generous to myself as I tried a number of other approaches and date combinations, including a method where I grouped years together if the draft was before June 12 and those after June 12 (this approach nearly works if only looking at drafts after 1976) in my piecemeal efforts that worked for some teams, but not all). Finally, after butting up against error after error in implementing the loop that went over all the teams, I got disgusted enough to do the following. The method given below is what I originally planned to do, but avoided doing since creating a dictionary of draft dates seemed (1) boring and (2) I thought I could be cool and find a way around this manual input. In the end, creating the dictionary would have been much faster, makes the logic easier to follow and allows me to stretch this analysis back to the start of the NBA if I wish. 
Creating these cut off columns was also a new experience for me. I am used to building loops that iterrate over cells in a column and row to create new columns or rows (this is usually what I do in Stata and I love the logic of loops). Though this can be done in Python, as I found out when trying to build a loop in Part 3, it can be much more complicated in Python since without a really solid foundation in how various operations work such as what a method, attribute, object, etc are I make a lot of time consuming mistakes. Without a stronger background in CS, I'm not quite sure of when a loop will work and when one won't, but the implementation below that applies a simple algorithm to make the new column and uses the very helpful .map runs faster and is cleaner to read. 

In [12]:
draftDates = {1947: '1947-07-01', 1948: '1948-05-10', 1949: '1949-03-21', 1950: '1950-04-25', 1951: '1951-04-25',
              1952: '1952-04-26', 1953: '1953-04-24', 1954: '1954-04-24', 1955: '1955-04-13', 1956: '1956-04-30',
              1957: '1957-04-17', 1958: '1958-04-22', 1959: '1959-03-31', 1960: '1960-04-11', 1961: '1961-03-27',
              1962: '1962-03-26', 1963: '1963-04-30', 1964: '1964-05-04', 1965: '1965-05-06', 1966: '1966-05-11',
              1967: '1967-05-03', 1968: '1968-04-03', 1969: '1969-04-07', 1970: '1970-03-23', 1971: '1971-03-29',
              1972: '1972-04-10', 1973: '1973-04-24', 1974: '1974-05-28', 1975: '1975-05-29', 1976: '1976-06-08',
              1977: '1977-06-10', 1978: '1978-06-09', 1979: '1979-06-25', 1980: '1980-06-10', 1981: '1981-06-09',
              1982: '1982-06-29', 1983: '1983-06-28', 1984: '1984-06-19', 1985: '1985-06-18', 1986: '1986-06-17',
              1987: '1987-06-22', 1988: '1988-06-28', 1989: '1989-06-27', 1990: '1990-06-27', 1991: '1991-06-25',
              1992: '1992-06-24', 1993: '1993-06-30', 1994: '1994-06-29', 1995: '1995-06-28', 1996: '1996-06-26',
              1997: '1997-06-25', 1998: '1998-06-24', 1999: '1999-06-30', 2000: '2000-06-28', 2001: '2001-06-27', 
              2002: '2002-06-26', 2003: '2003-06-26', 2004: '2004-06-24', 2005: '2005-06-28', 2006: '2006-06-28', 
              2007: '2007-06-28', 2008: '2008-06-26', 2009: '2009-06-25', 2010: '2010-06-24', 2011: '2011-06-23', 
              2012: '2012-06-28', 2013: '2013-06-27', 2014: '2014-06-26', 2015: '2015-06-25', 2016: '2016-06-23'}

## My greatest contribution yet to data
Hand creating a dictionary of all draft dates may be my greatest contribution to the data world yet (and possibly ever). Why there is not a convenient list somewhere I don't know.

I'll now use .map. What happens in the below code is that .map looks at the year in the column 'start_year' and then takes the draft date for that year from the draftDates dictionary and populates the column 'start_cut'. map() is a pretty awesome python feature that is a replacement for 'for' loops and runs much faster over dataframes.

In [13]:
dframe_team['start_cut'] = dframe_team.start_year.map(draftDates)
dframe_team

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975,1975-05-29
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977,1977-06-10
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977,1977-06-10
9,ATL,Lewis Schaffel,1979-07-12,1979-11-19,1979,1979-06-25
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979,1979-06-25
11,ATL,Pete Babcock,1990-02-14,2003-04-02,1990,1990-06-27
12,ATL,Billy Knight,2003-04-02,2008-05-28,2003,2003-06-26
13,ATL,Rick Sund,2008-05-28,2012-06-25,2008,2008-06-26
14,ATL,Danny Ferry,2012-06-25,2014-09-12,2012,2012-06-28
15,ATL,Mike Budenholzer,2014-09-12,2016-09-30,2014,2014-06-26


Similar to the above code, I need to make a cut-off date such based on the draft date of the year following when a GM started. This is done so that I can then check whether a GM started after the draft in one year and left before the draft the following year. To do this I create another column 'cut_year' that is just one year after the 'start_year', then, just as .map worked above, I populate a new column 'end_cut' with the draft date of the year following a GM's start.

In [14]:
dframe_team['cut_year'] = dframe_team['Start'].dt.year + 1
dframe_team['end_cut'] = dframe_team.cut_year.map(draftDates)
dframe_team

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut,cut_year,end_cut
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975,1975-05-29,1976,1976-06-08
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977,1977-06-10,1978,1978-06-09
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977,1977-06-10,1978,1978-06-09
9,ATL,Lewis Schaffel,1979-07-12,1979-11-19,1979,1979-06-25,1980,1980-06-10
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979,1979-06-25,1980,1980-06-10
11,ATL,Pete Babcock,1990-02-14,2003-04-02,1990,1990-06-27,1991,1991-06-25
12,ATL,Billy Knight,2003-04-02,2008-05-28,2003,2003-06-26,2004,2004-06-24
13,ATL,Rick Sund,2008-05-28,2012-06-25,2008,2008-06-26,2009,2009-06-25
14,ATL,Danny Ferry,2012-06-25,2014-09-12,2012,2012-06-28,2013,2013-06-27
15,ATL,Mike Budenholzer,2014-09-12,2016-09-30,2014,2014-06-26,2015,2015-06-25


In [15]:
# Converting the 'end_cut' column to a datetime object that can be compared to 'End' and 'Start'
dframe_team['end_cut'] = pd.to_datetime(dframe_team['end_cut'], infer_datetime_format=True, errors='coerce')

# Converting the 'start_cut' column to a datetime object that can be compared to 'Start'
dframe_team['start_cut'] = pd.to_datetime(dframe_team['start_cut'], errors='coerce')
   
# Creating a length of tenure column just to see how long a GM was in office
dframe_team['Tenure'] = dframe_team['End'] - dframe_team['Start']

dframe_team

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut,cut_year,end_cut,Tenure
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975,1975-05-29,1976,1976-06-08,560 days
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977,1977-06-10,1978,1978-06-09,267 days
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977,1977-06-10,1978,1978-06-09,653 days
9,ATL,Lewis Schaffel,1979-07-12,1979-11-19,1979,1979-06-25,1980,1980-06-10,130 days
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979,1979-06-25,1980,1980-06-10,3740 days
11,ATL,Pete Babcock,1990-02-14,2003-04-02,1990,1990-06-27,1991,1991-06-25,4795 days
12,ATL,Billy Knight,2003-04-02,2008-05-28,2003,2003-06-26,2004,2004-06-24,1883 days
13,ATL,Rick Sund,2008-05-28,2012-06-25,2008,2008-06-26,2009,2009-06-25,1489 days
14,ATL,Danny Ferry,2012-06-25,2014-09-12,2012,2012-06-28,2013,2013-06-27,809 days
15,ATL,Mike Budenholzer,2014-09-12,2016-09-30,2014,2014-06-26,2015,2015-06-25,749 days


## The 'Schaffel' Problem
I chose to do the example url with Atlanta because they have GM's that illustrate the hardest problem to deal with. Looking at the dataset above, Lewis Schaffel was hired after July 1, 1979, and replaced on November 19, 1979. I needed accurate start and end cut off dates that captured that Schaffel was replaced before the 1980 draft. As such, I created accurate cut-off dates and the wrote two simple conditionals that dropped GM's that started after the date in the 'start_cut' column and were replaced before the date in the 'end_cut' column. Additionally, I dropped GM's whose 'Start' and 'End' dates were before the 'start_cut' date because these GM's represent those hired and fired in the same calendar year prior to the draft. 
## Using Datetime objects
Python has datetime objects and a datetime library that make the conditional statements I use below very easy. It's pretty awesome that you can compare dates, add and subtract them to get the number of days, and perform other easy operations on dates in Python. Stata does not have such an easy object as far as I know.

In [16]:
dframe_team.dtypes

Team                   object
Executive              object
Start          datetime64[ns]
End            datetime64[ns]
start_year              int64
start_cut      datetime64[ns]
cut_year                int64
end_cut        datetime64[ns]
Tenure        timedelta64[ns]
dtype: object

In [17]:
# Drops GM's that were hired after the draft in a given year and released prior to the draft the following year
dframe_team = dframe_team[~((dframe_team['Start'] >= dframe_team['start_cut']) & (dframe_team['End'] < dframe_team['end_cut']))]
# Dropping those GM's hired and fired in the same year before the draft
dframe_team = dframe_team[~((dframe_team['Start'] <= dframe_team['start_cut']) & (dframe_team['End'] <= dframe_team['start_cut']))]
# The tilda, '~', used above is a negation, so the conditional grabs the GM's I don't want and then I make the DataFrame from the negation of those GM's
dframe_team

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut,cut_year,end_cut,Tenure
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975,1975-05-29,1976,1976-06-08,560 days
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977,1977-06-10,1978,1978-06-09,267 days
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977,1977-06-10,1978,1978-06-09,653 days
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979,1979-06-25,1980,1980-06-10,3740 days
11,ATL,Pete Babcock,1990-02-14,2003-04-02,1990,1990-06-27,1991,1991-06-25,4795 days
12,ATL,Billy Knight,2003-04-02,2008-05-28,2003,2003-06-26,2004,2004-06-24,1883 days
13,ATL,Rick Sund,2008-05-28,2012-06-25,2008,2008-06-26,2009,2009-06-25,1489 days
14,ATL,Danny Ferry,2012-06-25,2014-09-12,2012,2012-06-28,2013,2013-06-27,809 days
15,ATL,Mike Budenholzer,2014-09-12,2016-09-30,2014,2014-06-26,2015,2015-06-25,749 days


In [18]:
pd.options.mode.chained_assignment = None # This just stops Python from popping up an annoying warning
dframe_team.drop(dframe_team.columns[[6,7]], inplace=True, axis=1) # dropping 'cut_year' and 'end_cut' columns, the 'tenure' column could be interesting

In [19]:
# Now I'm going to create the draft year column for the first year that each GM drafted. 
dframe_team['Draft_year'] = (dframe_team['start_year']+1)[(dframe_team['Start'] >= dframe_team['start_cut'])] # This gives the GM's first year of the draft as the year after he started if he started after July 1 of the previous year
# Now filling the rest of 'Draft_year' with just the year column
dframe_team['Draft_year'] = dframe_team['Draft_year'].fillna(dframe_team['start_year'])
dframe_team.head()

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut,Tenure,Draft_year
6,ATL,Bud Seretean,1975-06-23,1977-01-03,1975,1975-05-29,560 days,1976.0
7,ATL,Mike Storen,1977-01-03,1977-09-27,1977,1977-06-10,267 days,1977.0
8,ATL,Michael Gearon,1977-09-27,1979-07-12,1977,1977-06-10,653 days,1978.0
10,ATL,Stan Kasten,1979-11-19,1990-02-14,1979,1979-06-25,3740 days,1980.0
11,ATL,Pete Babcock,1990-02-14,2003-04-02,1990,1990-06-27,4795 days,1990.0


In [20]:
# Dropping unneeded columns
dframe_team.drop(dframe_team.columns[[2,3,4,5]],inplace=True,axis=1)

In [21]:
# Now I need to forward fill in the draft years, so I need to create new rows for missing years and then forward 
# fill
index = dframe_team['Draft_year']
dframe_team.set_index(index, inplace = True) # sets the column 'Draft_year' as the index'

dframe_team.head()

Unnamed: 0_level_0,Team,Executive,Tenure,Draft_year
Draft_year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1976.0,ATL,Bud Seretean,560 days,1976.0
1977.0,ATL,Mike Storen,267 days,1977.0
1978.0,ATL,Michael Gearon,653 days,1978.0
1980.0,ATL,Stan Kasten,3740 days,1980.0
1990.0,ATL,Pete Babcock,4795 days,1990.0


In [22]:
A = int(dframe_team.iloc[0]['Draft_year']) # grabs the first year of 'Draft_year'
dframe_team.set_index('Draft_year').reindex(range(A,2016)) # Sets a new index, starting with the first year of the team
# which has just been set to 'A' and goes till the last year of the data, 2015
# Now I bump 'Draft_year' back out as a column and reset the index so that there are now rows for all the years
dframe_team = dframe_team.set_index('Draft_year').reindex(range(A,2016)).reset_index()

dframe_team.head(10)

Unnamed: 0,Draft_year,Team,Executive,Tenure
0,1976,ATL,Bud Seretean,560 days
1,1977,ATL,Mike Storen,267 days
2,1978,ATL,Michael Gearon,653 days
3,1979,,,NaT
4,1980,ATL,Stan Kasten,3740 days
5,1981,,,NaT
6,1982,,,NaT
7,1983,,,NaT
8,1984,,,NaT
9,1985,,,NaT


In [23]:
# Now that there are rows for all the years and the next step is to forward fill the team and GM and tenure
col = ['Team', 'Executive', 'Tenure']
dframe_team[col] = dframe_team[col].ffill()
    
# Dropping Draft_year rows prior to 1976 and after 2015
dframe_team = dframe_team[((dframe_team['Draft_year'] >= 1976) & (dframe_team['Draft_year'] <= 2016))]
# Dropping rows 

dframe_team.head(10)

Unnamed: 0,Draft_year,Team,Executive,Tenure
0,1976,ATL,Bud Seretean,560 days
1,1977,ATL,Mike Storen,267 days
2,1978,ATL,Michael Gearon,653 days
3,1979,ATL,Michael Gearon,653 days
4,1980,ATL,Stan Kasten,3740 days
5,1981,ATL,Stan Kasten,3740 days
6,1982,ATL,Stan Kasten,3740 days
7,1983,ATL,Stan Kasten,3740 days
8,1984,ATL,Stan Kasten,3740 days
9,1985,ATL,Stan Kasten,3740 days


## Now putting this all into a loop over the url's
I'm taking the clippers and the magic out of my team_list because they are a unique cases whereby multiple people are listed as their GM for the same year.

In [24]:
if 'LAC' in team_list: team_list.remove('LAC')
if 'ORL' in team_list: team_list.remove('ORL')  
len(team_list)

28

In [25]:
# Now I should be able to run the loop to append the data to the gm DataFrame
for team in team_list:
    url = url_template.format(team = team)
    
    dframe_team = pd.io.html.read_html(url)
    dframe_team = dframe_team[0]
    
    # Dropping the unneeded columns
    dframe_team.drop(dframe_team.columns[[0,4]],inplace=True,axis=1)
    
    # Renaming the columns
    column_names = ['Executive', 'Start', 'End']
    dframe_team.columns = column_names
    
    # Add in a column for the team abbreviation
    dframe_team.insert(0, 'Team', team)
    
    # Replacing 'present' with today's date
    date = dt.datetime.today().strftime("%m-%d-%Y")
    dframe_team.replace(['present'], [date], inplace=True)
    
    # First converting the 'Start' and 'End' column to a date time object
    dframe_team['Start'] = pd.to_datetime(dframe_team['Start'], errors='coerce')
    dframe_team['End'] = pd.to_datetime(dframe_team['End'], errors='coerce')
    
    # Dropping GM's prior to 1976
    dframe_team = dframe_team[dframe_team['End'].dt.year >= 1976]
    
    # Creating a column for the year that the GM started
    dframe_team['start_year'] = dframe_team['Start'].dt.year
    
    # Using the dictionary to create cut-off date columns
    dframe_team['start_cut'] = dframe_team.start_year.map(draftDates)
    dframe_team['cut_year'] = dframe_team['Start'].dt.year + 1
    dframe_team['end_cut'] = dframe_team.cut_year.map(draftDates)
    
    # Converting the 'start_cut' column to a datetime object that can be compared to 'Start'
    dframe_team['start_cut'] = pd.to_datetime(dframe_team['start_cut'], errors='coerce')
    
    # Converting the 'end_cut' column to a datetime object that can be compared with 'End' and 'Start'
    dframe_team['end_cut'] = pd.to_datetime(dframe_team['end_cut'], infer_datetime_format=True, errors='coerce')
        
    # Creating a length of tenure column to also use in eliminating GM's that lasted less than a year and never drafted
    dframe_team['Tenure'] = dframe_team['End'] - dframe_team['Start']
    
    # Drops GM's that were hired after the draft in a given year and released prior to the draft the following year
    dframe_team = dframe_team[~((dframe_team['Start'] >= dframe_team['start_cut']) & (dframe_team['End'] < dframe_team['end_cut']))]
    # Dropping those GM's hired and fired in the same year before the draft
    dframe_team = dframe_team[~((dframe_team['Start'] <= dframe_team['start_cut']) & (dframe_team['End'] <= dframe_team['start_cut']))]
 
    # Dropping those most recent columns of 'cut_year' and 'end_cut'
    dframe_team.drop(dframe_team.columns[[6,7]],inplace=True,axis=1)
    
    # Now I'm going to create the draft year column for the first year that each GM drafted. 
    dframe_team['Draft_year'] = (dframe_team['start_year']+1)[(dframe_team['Start'] >= dframe_team['start_cut'])] # This gives the GM's first year of the draft as the year after he started if he started after July 1 of the previous year
    # Now filling the rest of 'Draft_year' with just the year column
    dframe_team['Draft_year'] = dframe_team['Draft_year'].fillna(dframe_team['start_year'])
    # Dropping those useless columns
    dframe_team.drop(dframe_team.columns[[2,3,4,5]],inplace=True,axis=1)
      
    # Now I need to forward fill in the draft years
    index = dframe_team['Draft_year']
    dframe_team.set_index(index, inplace = True)
    
    A = int(dframe_team.iloc[0]['Draft_year'])
    
    dframe_team.index
    
    dframe_team.set_index('Draft_year').reindex(range(A,2016))
    
    dframe_team = dframe_team.set_index('Draft_year').reindex(range(A,2016)).reset_index()
    
    # Now there are rows for all the years and the next step is to forward fill the team and GM
    col = ['Team', 'Executive', 'Tenure']
    dframe_team[col] = dframe_team[col].ffill()
  
    # Dropping Draft_year rows prior to 1976 and after 2015
    dframe_team = dframe_team[((dframe_team['Draft_year'] >= 1976) & (dframe_team['Draft_year'] <= 2016))]
    
    # Append to gm DataFrame
    gm_df = gm_df.append(dframe_team, ignore_index=True)

In [26]:
gm_df.head()

Unnamed: 0,Draft_year,Team,Executive,Tenure
0,1976,MIL,Wayne Embry,1866 days
1,1977,MIL,Don Nelson,3694 days
2,1978,MIL,Don Nelson,3694 days
3,1979,MIL,Don Nelson,3694 days
4,1980,MIL,Don Nelson,3694 days


In [27]:
gm_df.tail()

Unnamed: 0,Draft_year,Team,Executive,Tenure
1013,2011,IND,Larry Bird,3274 days
1014,2012,IND,Donnie Walsh,365 days
1015,2013,IND,Donnie Walsh,365 days
1016,2014,IND,Larry Bird,1191 days
1017,2015,IND,Larry Bird,1191 days


## Now to deal with Orlando and the LA Clippers
The issue I'm confronting is the same for both teams, essentially I need to reshape the rows a little. I'm going to just manually update these dataframes since the amount of needed alteration is pretty minimal (5 changes total).

In [28]:
url = 'http://www.basketball-reference.com/teams/ORL/executives.html'

In [29]:
dframe_team = pd.io.html.read_html(url)
dframe_team = dframe_team[0]
    
# Dropping the unneeded columns
dframe_team.drop(dframe_team.columns[[0,4]],inplace=True,axis=1)
    
# Renaming the columns
column_names = ['Executive', 'Start', 'End']
dframe_team.columns = column_names
    
# Add in a column for the team abbreviation
dframe_team.insert(0, 'Team', 'ORL')
    
dframe_team

Unnamed: 0,Team,Executive,Start,End
0,ORL,Pat Williams,1987,1996-04-29
1,ORL,John Gabriel,1996-04-29,2004-03-12
2,ORL,John Weisbrod,2004-03-12,2005-05-23
3,ORL,Dave Twardzik,2005-03-23,2006-05-03
4,ORL,Otis Smith,2005-03-23,2012-05-21
5,ORL,Rob Hennigan,2012-06-20,present


As can be seen, the problem with the above is that Dave Twardzik and Otis Smith actually shared GM duties from March 23, 2005 to May 3, 2006. This causes an error in the loop above and I need to reshape the rows so that Otis Smith is included with Dave Twardzik during Twardzik's tenure and Smith's tenure as singular GM starts May 3, 2006.

In [30]:
# fastest way I know to do this
dframe_team.set_value(3, 'Executive', 'Dave Twardzik, Otis Smith')
dframe_team.set_value(4, 'Start', '2006-05-03')
dframe_team

Unnamed: 0,Team,Executive,Start,End
0,ORL,Pat Williams,1987,1996-04-29
1,ORL,John Gabriel,1996-04-29,2004-03-12
2,ORL,John Weisbrod,2004-03-12,2005-05-23
3,ORL,"Dave Twardzik, Otis Smith",2005-03-23,2006-05-03
4,ORL,Otis Smith,2006-05-03,2012-05-21
5,ORL,Rob Hennigan,2012-06-20,present


In [31]:
# Replacing present with today's date
date = dt.datetime.today().strftime("%m-%d-%Y")
dframe_team.replace(['present'], [date], inplace=True)

# First converting the 'Start' and 'End' column to a date time object
dframe_team['Start'] = pd.to_datetime(dframe_team['Start'], errors='coerce')
dframe_team['End'] = pd.to_datetime(dframe_team['End'], errors='coerce')
    
# Dropping GM's prior to 1976
dframe_team = dframe_team[dframe_team['End'].dt.year >= 1976]

# Creating a column for the year that the GM started
dframe_team['start_year'] = dframe_team['Start'].dt.year
    
# Using the dictionary to create cut-off date columns
dframe_team['start_cut'] = dframe_team.start_year.map(draftDates)
dframe_team['cut_year'] = dframe_team['Start'].dt.year + 1
dframe_team['end_cut'] = dframe_team.cut_year.map(draftDates)
    
# Converting the 'start_cut' column to a datetime object that can be compared to 'Start'
dframe_team['start_cut'] = pd.to_datetime(dframe_team['start_cut'], errors='coerce')
    
# Converting the 'end_cut' column to a datetime object that can be compared with 'End' and 'Start'
dframe_team['end_cut'] = pd.to_datetime(dframe_team['end_cut'], infer_datetime_format=True, errors='coerce')
        
# Creating a length of tenure column to also use in eliminating GM's that lasted less than a year and never drafted
dframe_team['Tenure'] = dframe_team['End'] - dframe_team['Start']
    
# Drops GM's that were hired after the draft in a given year and released prior to the draft the following year
dframe_team = dframe_team[~((dframe_team['Start'] >= dframe_team['start_cut']) & (dframe_team['End'] < dframe_team['end_cut']))]
# Dropping those GM's hired and fired in the same year before the draft
dframe_team = dframe_team[~((dframe_team['Start'] <= dframe_team['start_cut']) & (dframe_team['End'] <= dframe_team['start_cut']))]
 
# Dropping those most recent columns of 'cut_year' and 'end_cut'
dframe_team.drop(dframe_team.columns[[6,7]],inplace=True,axis=1)
    
# Now I'm going to create the draft year column for the first year that each GM drafted. 
dframe_team['Draft_year'] = (dframe_team['start_year']+1)[(dframe_team['Start'] >= dframe_team['start_cut'])] # This gives the GM's first year of the draft as the year after he started if he started after July 1 of the previous year
# Now filling the rest of 'Draft_year' with just the year column
dframe_team['Draft_year'] = dframe_team['Draft_year'].fillna(dframe_team['start_year'])
# Dropping those useless columns
dframe_team.drop(dframe_team.columns[[2,3,4,5]],inplace=True,axis=1)
      
# Now I need to forward fill in the draft years
index = dframe_team['Draft_year']
dframe_team.set_index(index, inplace = True)
    
A = int(dframe_team.iloc[0]['Draft_year'])
    
dframe_team.index
    
dframe_team.set_index('Draft_year').reindex(range(A,2016))
    
dframe_team = dframe_team.set_index('Draft_year').reindex(range(A,2016)).reset_index()
    
# Now there are rows for all the years and the next step is to forward fill the team and GM
col = ['Team', 'Executive', 'Tenure']
dframe_team[col] = dframe_team[col].ffill()
  
# Dropping Draft_year rows prior to 1976 and after 2015
dframe_team = dframe_team[((dframe_team['Draft_year'] >= 1976) & (dframe_team['Draft_year'] <= 2016))]
    
# Append to gm DataFrame
gm_df = gm_df.append(dframe_team, ignore_index=True)

In [32]:
url = 'http://www.basketball-reference.com/teams/LAC/executives.html'
dframe_team = pd.io.html.read_html(url)
dframe_team = dframe_team[0]
    
# Dropping the unneeded columns
dframe_team.drop(dframe_team.columns[[0,4]],inplace=True,axis=1)
    
# Renaming the columns
column_names = ['Executive', 'Start', 'End']
dframe_team.columns = column_names
    
# Add in a column for the team abbreviation
dframe_team.insert(0, 'Team', 'LAC')

In [33]:
dframe_team.drop([12], inplace=True)
dframe_team.set_value(11, 'Executive', 'Vinny Del Negro, Andy Roeser, Gary Sacks')
dframe_team.set_value(13, 'Start', '2012-09-04')

Unnamed: 0,Team,Executive,Start,End
0,LAC,Eddie Donovan,1970,1975-03-20
1,LAC,Bob MacKinnon,1975-05-21,1977-03-25
2,LAC,Norm Sonju,1977-03-25,1978-07-19
3,LAC,Irv Kaze,1978-07-19,1980-06-06
4,LAC,Lou Lenart,1980-06-06,1981-01-28
5,LAC,Ted Podleski,1981-01-28,1982-10-18
6,LAC,Paul Phipps,1982-10-18,1984-03-23
7,LAC,Carl Scheer,1984-07-18,1986-04-15
8,LAC,Elgin Baylor,1986-04-15,2008-10-07
9,LAC,Mike Dunleavy,2008-10-07,2010-03-09


In [34]:
date = dt.datetime.today().strftime("%m-%d-%Y")
dframe_team.replace(['present'], [date], inplace=True)

In [35]:
# First converting the 'Start' and 'End' column to a date time object
dframe_team['Start'] = pd.to_datetime(dframe_team['Start'], errors='coerce')
dframe_team['End'] = pd.to_datetime(dframe_team['End'], errors='coerce')
    
# Dropping GM's prior to 1976
dframe_team = dframe_team[dframe_team['End'].dt.year >= 1976]
    
# Creating a column for the year that the GM started
dframe_team['start_year'] = dframe_team['Start'].dt.year
    
# Using the dictionary to create cut-off date columns
dframe_team['start_cut'] = dframe_team.start_year.map(draftDates)
dframe_team['cut_year'] = dframe_team['Start'].dt.year + 1
dframe_team['end_cut'] = dframe_team.cut_year.map(draftDates)
    
# Converting the 'start_cut' column to a datetime object that can be compared to 'Start'
dframe_team['start_cut'] = pd.to_datetime(dframe_team['start_cut'], errors='coerce')
    
# Converting the 'end_cut' column to a datetime object that can be compared with 'End' and 'Start'
dframe_team['end_cut'] = pd.to_datetime(dframe_team['end_cut'], infer_datetime_format=True, errors='coerce')
        
# Creating a length of tenure column to also use in eliminating GM's that lasted less than a year and never drafted
dframe_team['Tenure'] = dframe_team['End'] - dframe_team['Start']

In [36]:
# Drops GM's that were hired after the draft in a given year and released prior to the draft the following year
dframe_team = dframe_team[~((dframe_team['Start'] >= dframe_team['start_cut']) & (dframe_team['End'] < dframe_team['end_cut']))]
# Dropping those GM's hired and fired in the same year before the draft
dframe_team = dframe_team[~((dframe_team['Start'] <= dframe_team['start_cut']) & (dframe_team['End'] <= dframe_team['start_cut']))]
 
dframe_team

Unnamed: 0,Team,Executive,Start,End,start_year,start_cut,cut_year,end_cut,Tenure
1,LAC,Bob MacKinnon,1975-05-21,1977-03-25,1975,1975-05-29,1976,1976-06-08,674 days
2,LAC,Norm Sonju,1977-03-25,1978-07-19,1977,1977-06-10,1978,1978-06-09,481 days
3,LAC,Irv Kaze,1978-07-19,1980-06-06,1978,1978-06-09,1979,1979-06-25,688 days
4,LAC,Lou Lenart,1980-06-06,1981-01-28,1980,1980-06-10,1981,1981-06-09,236 days
5,LAC,Ted Podleski,1981-01-28,1982-10-18,1981,1981-06-09,1982,1982-06-29,628 days
6,LAC,Paul Phipps,1982-10-18,1984-03-23,1982,1982-06-29,1983,1983-06-28,522 days
7,LAC,Carl Scheer,1984-07-18,1986-04-15,1984,1984-06-19,1985,1985-06-18,636 days
8,LAC,Elgin Baylor,1986-04-15,2008-10-07,1986,1986-06-17,1987,1987-06-22,8211 days
9,LAC,Mike Dunleavy,2008-10-07,2010-03-09,2008,2008-06-26,2009,2009-06-25,518 days
10,LAC,Neil Olshey,2010-03-09,2012-06-04,2010,2010-06-24,2011,2011-06-23,818 days


In [37]:
# Dropping those most recent columns of 'cut_year' and 'end_cut'
dframe_team.drop(dframe_team.columns[[6,7]],inplace=True,axis=1)
    
# Now I'm going to create the draft year column for the first year that each GM drafted. 
dframe_team['Draft_year'] = (dframe_team['start_year']+1)[(dframe_team['Start'] >= dframe_team['start_cut'])] # This gives the GM's first year of the draft as the year after he started if he started after July 1 of the previous year
# Now filling the rest of 'Draft_year' with just the year column
dframe_team['Draft_year'] = dframe_team['Draft_year'].fillna(dframe_team['start_year'])
# Dropping those useless columns
dframe_team.drop(dframe_team.columns[[2,3,4,5]],inplace=True,axis=1)
      
# Now I need to forward fill in the draft years
index = dframe_team['Draft_year']
dframe_team.set_index(index, inplace = True)
    
A = int(dframe_team.iloc[0]['Draft_year'])
    
dframe_team.index
    
dframe_team.set_index('Draft_year').reindex(range(A,2016))
    
dframe_team = dframe_team.set_index('Draft_year').reindex(range(A,2016)).reset_index()
    
# Now there are rows for all the years and the next step is to forward fill the team and GM
col = ['Team', 'Executive', 'Tenure']
dframe_team[col] = dframe_team[col].ffill()
  
# Dropping Draft_year rows prior to 1976 and after 2015
dframe_team = dframe_team[((dframe_team['Draft_year'] >= 1976) & (dframe_team['Draft_year'] <= 2016))]
    
# Append to gm DataFrame
gm_df = gm_df.append(dframe_team, ignore_index=True)

In [38]:
gm_df.tail(15)

Unnamed: 0,Draft_year,Team,Executive,Tenure
1072,2001,LAC,Elgin Baylor,8211 days
1073,2002,LAC,Elgin Baylor,8211 days
1074,2003,LAC,Elgin Baylor,8211 days
1075,2004,LAC,Elgin Baylor,8211 days
1076,2005,LAC,Elgin Baylor,8211 days
1077,2006,LAC,Elgin Baylor,8211 days
1078,2007,LAC,Elgin Baylor,8211 days
1079,2008,LAC,Elgin Baylor,8211 days
1080,2009,LAC,Mike Dunleavy,518 days
1081,2010,LAC,Neil Olshey,818 days


In [39]:
gm_df = gm_df.rename(columns = {'Draft_year':'Draft_Yr'})
gm_df.head()

Unnamed: 0,Draft_Yr,Team,Executive,Tenure
0,1976,MIL,Wayne Embry,1866 days
1,1977,MIL,Don Nelson,3694 days
2,1978,MIL,Don Nelson,3694 days
3,1979,MIL,Don Nelson,3694 days
4,1980,MIL,Don Nelson,3694 days


In [40]:
# Loading up the draft dataframe
draft_dframe = pd.read_csv('NBA_Data/1976_to_2015_Draftees.csv')

# Need to replace the old team abbreviations before merging with the GM dataframe
team_abbrv = {'MIA':'MIA', 'WSB':'WAS', 'MIL':'MIL', 'GSW':'GSW', 'MIN':'MIN', 'ATL':'ATL', 'BOS':'BOS', 'DET':'DET',
             'NYK':'NYK', 'DEN':'DEN', 'DAL':'DAL', 'OKC':'OKC', 'POR':'POR', 'NJN':'NJN', 'TOR':'TOR', 'SEA':'OKC',
             'CLE':'CLE', 'SAS':'SAS', 'BUF':'LAC', 'CHA':'CHA', 'CHH':'NOH', 'UTA':'UTA', 'VAN':'MEM', 'CHI':'CHI',
             'HOU':'HOU', 'NOH':'NOH', 'WAS':'WAS', 'LAL':'LAL', 'PHI':'PHI', 'PHO':'PHO', 'NOJ':'UTA', 'NOK':'NOH',
             'LAC':'LAC', 'KCK':'SAC', 'SAC':'SAC', 'ORL':'ORL', 'BRK':'NJN', 'SDC':'LAC', 'IND':'IND', 'MEM':'MEM',
             'NOP':'NOH', 'CHO':'CHA'}

draft_dframe['Team'] = draft_dframe.Team.map(team_abbrv)
draft_dframe.head()

Unnamed: 0.1,Unnamed: 0,Player,All_NBA,All-Star,Draft_Yr,Pk,Team,College,Yrs,Games,...,TP_Percentage,FT_Percentage,Minutes per Game,Points per Game,TRB per game,Assits per Game,Win Share,WS_per_game,BPM,VORP
0,0,Tim Duncan,15.0,15.0,1997.0,1.0,SAS,Wake Forest University,19.0,1392.0,...,0.179,0.696,34.0,19.0,10.8,3.0,206.4,0.209,5.5,89.3
1,1,Kobe Bryant,15.0,18.0,1996.0,13.0,NOH,0,20.0,1346.0,...,0.329,0.837,36.1,25.0,5.2,4.7,172.7,0.17,3.9,72.1
2,2,Shaquille O'Neal,14.0,15.0,1992.0,1.0,ORL,Louisiana State University,19.0,1207.0,...,0.045,0.527,34.7,23.7,10.9,2.5,181.7,0.208,5.0,74.0
3,4,Karl Malone,14.0,14.0,1985.0,13.0,UTA,Louisiana Tech University,19.0,1476.0,...,0.274,0.742,37.2,25.0,10.1,3.6,234.6,0.205,5.4,102.5
4,6,Hakeem Olajuwon,12.0,12.0,1984.0,1.0,HOU,University of Houston,18.0,1238.0,...,0.202,0.712,35.7,21.8,11.1,2.5,162.8,0.177,4.9,77.1


In [41]:
df_draft = pd.merge(draft_dframe, gm_df, on = ['Team', 'Draft_Yr'], how = 'outer')

# Now dropping NaN filled rows that are the result of a GM not having a draft pick that year
df_draft = df_draft[df_draft.Player.notnull()]

# Dropping the first column that is a repeat index column that came with the merge
df_draft.drop(df_draft.columns[[0]],inplace=True,axis=1)

df_draft.head()

Unnamed: 0,Player,All_NBA,All-Star,Draft_Yr,Pk,Team,College,Yrs,Games,Minutes Played,...,Minutes per Game,Points per Game,TRB per game,Assits per Game,Win Share,WS_per_game,BPM,VORP,Executive,Tenure
0,Tim Duncan,15.0,15.0,1997.0,1.0,SAS,Wake Forest University,19.0,1392.0,47368.0,...,34.0,19.0,10.8,3.0,206.4,0.209,5.5,89.3,Gregg Popovich,2953 days
1,Kobe Bryant,15.0,18.0,1996.0,13.0,NOH,0,20.0,1346.0,48637.0,...,36.1,25.0,5.2,4.7,172.7,0.17,3.9,72.1,Bob Bass,3292 days
2,Tony Delk,0.0,0.0,1996.0,16.0,NOH,University of Kentucky,10.0,545.0,11702.0,...,21.5,9.1,2.5,1.9,19.5,0.08,-1.3,2.1,Bob Bass,3292 days
3,Malik Rose,0.0,0.0,1996.0,44.0,NOH,Drexel University,13.0,813.0,13404.0,...,16.5,6.2,4.1,0.8,26.5,0.095,-1.4,1.9,Bob Bass,3292 days
4,Shaquille O'Neal,14.0,15.0,1992.0,1.0,ORL,Louisiana State University,19.0,1207.0,41918.0,...,34.7,23.7,10.9,2.5,181.7,0.208,5.0,74.0,Pat Williams,3406 days


In [42]:
# Checking for missing data
df_draft.isnull().sum().sum()

0

In [43]:
cd NBA_Data

/Users/rorypulvino/Dropbox (Personal)/Python/blog/content/NBA_Data


In [44]:
df_draft.to_csv('1976_to_2015_Draftees.csv')

## Where to Next
Now that I have all the datasets merged, I need to:
1. Scrape, clean, and merge coaching data
2. Deal with pre-season draftee trades. Though there is a [site][1] that has draft day deals, it doesn't contain info on trades of draftees prior to the start of the season, but after the draft. I've seen at least some of this information on wikipedia, so I might have to grab it piece by piece from there. Luckily, there are not many of these trades, but some are quite significant (see Charlotte 1996). 
I plan to have the team that traded for the draftee reflected as the team that drafted the player since it seems to make sense that it was that team/GM/coach that saw the value (or not value) in the player. Furthermore, the way basketball-reference lays out the draft table is by the team who drafted the player, even if that player was drafted because of a trade with another team who requested that player be drafted and traded to them (see OKC and Domantas Sabonis 2016 where Orlando drafted Sabonis at the request of OKC with an eye towards an agreed upon trade).
[1]: http://www.thedraftreview.com/