# Visualizing NBA Draftee Location Data

The NBA is known as a truly global organization with players hailing from all over the world. The [Wikipedia page](https://en.wikipedia.org/wiki/NBA_draft) for the NBA draft gives great detail on the draft in general, as well as the draft becoming much more international in the late 1990s.

This notebook leverages the `Draft` class from the `py_ball` package to explore the `drafthistory` endpoint of the NBA Stats API with the goal of visualizing the locations of NBA draftees' previous organizations.

In [1]:
from geopandas import GeoDataFrame
import numpy as np
import pandas as pd

from geopy.geocoders import Bing
from geopy.extra.rate_limiter import RateLimiter

import plotly.express as px
import plotly as py
import plotly.graph_objects as go

from py_ball import draft

HEADERS = {'Connection': 'keep-alive',
           'Host': 'stats.nba.com',
           'Origin': 'http://stats.nba.com',
           'Upgrade-Insecure-Requests': '1',
           'Referer': 'stats.nba.com',
           'x-nba-stats-origin': 'stats',
           'x-nba-stats-token': 'true',
           'Accept-Language': 'en-US,en;q=0.9',
           "X-NewRelic-ID": "VQECWF5UChAHUlNTBwgBVw==",
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)' +\
                         ' AppleWebKit/537.36 (KHTML, like Gecko)' + \
                         ' Chrome/81.0.4044.129 Safari/537.36'}

pd.options.mode.chained_assignment = None  # Disabling pandas SetWithCopyWarnings

In [2]:
league_id = '00'

draft_data = draft.Draft(headers=HEADERS,
                         endpoint='drafthistory',
                         league_id=league_id)
draft_data.data.keys()

dict_keys(['DraftHistory'])

In [3]:
draft_df = pd.DataFrame(draft_data.data['DraftHistory'])
draft_df.head()

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE
0,1629627,Zion Williamson,2019,1,1,1,Draft,1610612740,New Orleans,Pelicans,NOP,Duke,College/University
1,1629630,Ja Morant,2019,1,2,2,Draft,1610612763,Memphis,Grizzlies,MEM,Murray State,College/University
2,1629628,RJ Barrett,2019,1,3,3,Draft,1610612752,New York,Knicks,NYK,Duke,College/University
3,1629631,De'Andre Hunter,2019,1,4,4,Draft,1610612747,Los Angeles,Lakers,LAL,Virginia,College/University
4,1629636,Darius Garland,2019,1,5,5,Draft,1610612739,Cleveland,Cavaliers,CLE,Vanderbilt,College/University


In [4]:
draft_df.tail()

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE
8018,79278,Bob Alemeida,1947,0,0,0,Draft,1610612738,Boston,Celtics,BOS,,
8019,79279,George Felt,1947,0,0,0,Draft,1610612738,Boston,Celtics,BOS,Northwestern,College/University
8020,79280,John Kelly,1947,0,0,0,Draft,1610612738,Boston,Celtics,BOS,Notre Dame,College/University
8021,79281,George Petrovick,1947,0,0,0,Draft,1610612738,Boston,Celtics,BOS,,
8022,79282,Ralph Bishop,1947,0,0,0,Draft,1610610025,Chicago,Stags,CHS,Washington,College/University


The above shows a snapshot of the data available from the `drafthistory` endpoint. Features include:

- Player name and ID, along with name and type of organization from which a player was drafted
- Season, round, and pick information
- Drafting team name, city, and ID

Data seems to date back to 1947, which is the first year of the NBA draft, although round and pick information seems to be unavailable from this draft. This information can be found [here](https://en.wikipedia.org/wiki/1947_BAA_draft).

Let's explore some of the features available.

In [5]:
draft_df.groupby('DRAFT_TYPE')['PERSON_ID'].count()

DRAFT_TYPE
Draft          8002
Territorial      21
Name: PERSON_ID, dtype: int64

Looks like the vast majority of selections come via the draft, but there are 21 "Territorial" selections. I'm not sure what this is, so let's investigate further.

In [6]:
draft_df[draft_df['DRAFT_TYPE']=='Territorial']

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE
6277,76233,Bill Bradley,1965,0,0,0,Territorial,1610612752,New York,Knicks,NYK,Princeton,College/University
6278,76832,Gail Goodrich,1965,0,0,0,Territorial,1610612747,Los Angeles,Lakers,LAL,California-Los Angeles,College/University
6279,76302,Bill Buntin,1965,0,0,0,Territorial,1610612765,Detroit,Pistons,DET,Michigan,College/University
6379,78579,George Wilson,1964,0,0,0,Territorial,1610612758,Cincinnati,Royals,CIN,Cincinnati,College/University
6380,76983,Walt Hazzard,1964,0,0,0,Territorial,1610612747,Los Angeles,Lakers,LAL,California-Los Angeles,College/University
6464,78309,Tom Thacker,1963,0,0,0,Territorial,1610612758,Cincinnati,Royals,CIN,Cincinnati,College/University
6565,76545,Dave DeBusschere,1962,0,0,0,Territorial,1610612765,Detroit,Pistons,DET,Detroit Mercy,College/University
6566,77418,Jerry Lucas,1962,0,0,0,Territorial,1610612758,Cincinnati,Royals,CIN,Ohio State,College/University
6857,76375,Wilt Chamberlain,1959,0,0,0,Territorial,1610612744,Philadelphia,Warriors,PHW,Kansas,College/University
6858,76712,Bob Ferry,1959,0,0,0,Territorial,1610612737,St. Louis,Hawks,STL,Saint Louis,College/University


At first glance, these selections seem to be from schools in close proximity to the selecting teams. After a bit of research, this seems to be the case! You can read more about the territorial pick [here](https://en.wikipedia.org/wiki/NBA_territorial_pick), but these picks were instituted to gain fan support by allowing hometown favorites in college to be selected by the corresponding professional team.

You may notice that Wilt Chamberlain was selected by the Philadelphia Warriors despite attending the University of Kansas. However, the Warriors successfully argued that, because Chamberlain grew up in Philadelphia, they had his territorial rights, as described [here](https://www.nba.com/history/legends/profiles/wilt-chamberlain).

In [7]:
draft_df.groupby('ORGANIZATION_TYPE')['PERSON_ID'].count()

ORGANIZATION_TYPE
                        20
College/University    7660
High School             48
Other Team/Club        295
Name: PERSON_ID, dtype: int64

Again, the vast majority of NBA players were selected from college. High school players were deemed ineligible for the 2006 draft onwards, a rule still in effect. There are two other organization types that warrant an additional look.

In [8]:
draft_df[draft_df['ORGANIZATION_TYPE']=='']

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE
95,1629011,Mitchell Robinson,2018,2,6,36,Draft,1610612752,New York,Knicks,NYK,,
189,1627748,Thon Maker,2016,1,10,10,Draft,1610612749,Milwaukee,Bucks,MIL,,
5872,80797,Dick Harris,1968,16,4,193,Draft,1610612758,Cincinnati,Royals,CIN,,
5890,80802,Jay Reffords,1968,19,4,211,Draft,1610612758,Cincinnati,Royals,CIN,,
6268,80540,Dave Hicks,1965,15,1,101,Draft,1610612764,Baltimore,Bullets,BLT,,
7028,79997,Ed Romanoff,1957,13,1,82,Draft,1610612737,St. Louis,Hawks,STL,,
7030,79949,Jerry Gibson,1957,0,0,0,Draft,1610612744,Philadelphia,Warriors,PHW,,
7200,79867,Joe Fitt,1955,0,0,0,Draft,1610612737,St. Louis,Hawks,STL,,
7316,79746,John Glinski,1954,12,2,98,Draft,1610612744,Philadelphia,Warriors,PHW,,
7388,79658,Bob Marske,1953,7,0,0,Draft,1610612744,Philadelphia,Warriors,PHW,,


Players without an organization type seem to simply have no affiliation prior to being selected. [Mitchell Robinson](https://en.wikipedia.org/wiki/Mitchell_Robinson) was removed from the Western Kentucky team for violating team rules and decided to prepare for the draft on his own, while [Thon Maker](https://en.wikipedia.org/wiki/Thon_Maker) had a unique situation as well.

In [9]:
draft_df[draft_df['ORGANIZATION_TYPE']=='Other Team/Club']

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE
14,1629635,Sekou Doumbouya,2019,1,15,15,Draft,1610612765,Detroit,Pistons,DET,Limoges CSP (France),Other Team/Club
17,1629048,Goga Bitadze,2019,1,18,18,Draft,1610612754,Indiana,Pacers,IND,KK Mega Leks (Serbia),Other Team/Club
18,1629677,Luka Samanic,2019,1,19,19,Draft,1610612759,San Antonio,Spurs,SAS,KK Olimpija (Slovenia),Other Team/Club
34,1629712,Marcos Louzada Silva,2019,2,5,35,Draft,1610612737,Atlanta,Hawks,ATL,Sesi Franca (Brazil),Other Team/Club
36,1629686,Deividas Sirvydis,2019,2,7,37,Draft,1610612742,Dallas,Mavericks,DAL,BC Lietuvos rytas (Lithuania),Other Team/Club
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3211,82624,Yasutaka Okayama,1981,8,10,171,Draft,1610612744,Golden State,Warriors,GOS,Japan Basketball Association (Japan),Other Team/Club
4385,81949,Aleksander Belov,1975,10,1,161,Draft,1610612762,New Orleans,Jazz,NOJ,BC Spartak Saint Petersburg (Russia),Other Team/Club
5219,81351,Carlos Quintanar,1971,18,2,234,Draft,1610612745,Houston,Rockets,HOU,Asociacion Deportiva Mexicana de Basquetbol (M...,Other Team/Club
5389,81142,Manuel Raga,1970,10,14,167,Draft,1610612737,Atlanta,Hawks,ATL,Pallacanestro Varese (Italy),Other Team/Club


Those drafted by other teams or clubs seem to be international players! For the purposes of this notebook, let's trim the dataset to include these players, along with those drafted out of college.

In [10]:
draft_df = draft_df[(draft_df['ORGANIZATION_TYPE'].isin(['College/University', 'Other Team/Club']))]

## Draft Selections by School

As a first look, examining the count of players selected by school should give an indcation of the top schools producing NBA-quality players.

In [11]:
college_df = pd.DataFrame(draft_df[draft_df['ORGANIZATION_TYPE']=='College/University'].groupby('ORGANIZATION')['PERSON_ID'].count()).reset_index()
college_df.columns = ['School', 'Players']
college_df.sort_values('Players', ascending=False).head(20)

Unnamed: 0,School,Players
287,Kentucky,132
94,California-Los Angeles,122
417,North Carolina,114
172,Duke,97
283,Kansas,85
267,Indiana,77
325,Louisville,73
358,Michigan,72
589,Syracuse,71
23,Arizona,71


While the volume of picks from a given school is interesting, we can add additional information to our visualization. The next cell adds both the highest overall pick number from a school, as well as the name of one of those top players.

In [12]:
# Adding top overall pick by school
top_pick_df = pd.DataFrame(draft_df[(draft_df['ORGANIZATION_TYPE']=='College/University') &
                                    (draft_df['OVERALL_PICK']>0)].groupby('ORGANIZATION')['OVERALL_PICK'].min()).reset_index()

top_pick_df.columns = ['School', 'Top Pick']
college_df = college_df.merge(top_pick_df,
                              left_on='School',
                              right_on='School')

# Adding player name to the dataset
top_player = []
for school, overall in zip(college_df['School'], college_df['Top Pick']):
    sub_df = draft_df[(draft_df['ORGANIZATION']==school) &
                      (draft_df['OVERALL_PICK']==overall)]
    top_player.append(sub_df['PLAYER_NAME'].iloc[0])

college_df['Top Player'] = top_player

In [13]:
college_df['Top Player'] = [x + ' (' + str(y) + ' Overall)' for x, y in zip(college_df['Top Player'],
                                                                           college_df['Top Pick'])]

In [15]:
fig = px.bar(college_df.sort_values('Players', ascending=True),
             x='Players', y='School', text='Top Player', orientation='h')
fig.update_layout(yaxis=dict(showticklabels=False))
fig.show()

Take the above visualization for a spin! To get any information, zooming is really a must, given the width of the original bars. Can you find your favorite school? If you zoom in far enough, each bar is labeled with the top player and pick number. What else is interesting about this visualization?

## Draft Selections by Organization Location

While the name and type of organizations are provided in the dataset, we don't have the specific location data for these organizations to visualize them on a map. Fortunately, with motivation from [this post](https://towardsdatascience.com/mapping-avocado-prices-in-python-with-geopandas-geopy-and-matplotlib-c7e0ef08bc26), [geopy](https://geopy.readthedocs.io/en/stable/) can be used to locate the coordinates of locations. Using the [Bing geocoding service](https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders), a string describing a location can be input and the geocoding service returns the best estimate of the coordinates for this given location.

The following initializes the Bing geocoding service and constructs string inputs for organizations to query for their coordinates. I've removed my API key, but you can register for your own [here](https://docs.microsoft.com/en-us/bingmaps/getting-started/bing-maps-dev-center-help/getting-a-bing-maps-key).

In [17]:
geolocator = Bing(api_key='', timeout=30)
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=2)

For colleges and universities, the `ORGANIZATION_TYPE` string is reversed as it returned more accurate results from the geocoding service.

In [16]:
query = [x + ' ' + 'University/College' if y == 'College/University'
         else x + ' ' + y for x, y in zip(draft_df['ORGANIZATION'],
                                     draft_df['ORGANIZATION_TYPE'])]
query = list(set(query))

The followins cell executes the geocoding service queries, looping through one at a time. As such, this cell takes some time, around 5 minutes.

In [18]:
regions_dict = {i : geolocator.geocode(i) for i in query}

In [19]:
regions_dict['Duke University/College']

Location(Durham, NC, United States, (36.000091552734375, -78.93064880371094, 0.0))

In [20]:
regions_dict['Duke University/College'][0]

'Durham, NC, United States'

In [21]:
regions_dict['Duke University/College'][1]

(36.000091552734375, -78.93064880371094)

The above shows what the output for an organization looks like, with the first argument being a text description of the location (City, State, Country in this example) and the second argument being the latitude and longitude of the location. This argument is what will enable visualizing the locations on a map.

While everything seems great so far, there is an issue:

In [22]:
regions_dict['Vanderbilt University/College']

Location(Oxford, England, United Kingdom, (51.75259017944336, -1.2532099485397339, 0.0))

The issue here is that the geocoding service is not perfect, often returning incorrect locations and coordinates for an input, such as Vanderbilt University above. After visual inspection of the dataset and some quick visualization, I was able to identify some of the inaccurate results. Because the number of schools is high and my degree of familiarity with each organization varies, there could be additional inaccurate locations. However, below are manual overrides with the correct location and coordinates for affected organizations.

Feel free to open a PR or give me a shout if you find additional mistakes!

In [23]:
regions_dict['Vanderbilt University/College'] = ("Nashville, TN, United States", (36.1447, -86.8027)) 
regions_dict['Manhattan University/College'] = ("Riverdale, NY, United States", (40.8898, -73.9027))
regions_dict['Potsdam University/College'] = ("Potsdam, NY, United States", (44.6643, -74.9761))
regions_dict['Guilford University/College'] = ("Greensboro, NC, United States", (36.0918, -79.8887))
regions_dict['Barber-Scotia University/College'] = ("Concord, NC, United States", (35.4067, -80.5858))
regions_dict['Richmond University/College'] = ("Richmond, VA, United States", (37.5752, -77.5407))
regions_dict['Rocky Mountain University/College'] = ("Billings, MT, United States", (45.7966, -108.5582))
regions_dict['Holy Cross University/College'] = ("Worcester, MA, United States", (42.2392, -71.8080))
regions_dict["St. John's (NY) University/College"] = ("Queens, NY, United States", (40.7221, -73.7942))
regions_dict["Pacific University/College"] = ("Forest Grove, OR, United States", (45.5209, -123.1095))
regions_dict["Wofford University/College"] = ("Spartanburg, SC, United States", (34.9581, -81.9345))
regions_dict["Carthage University/College"] = ("Kenosha, WI, United States", (42.6238, -87.8198))
regions_dict["Milligan University/College"] = ("Milligan, TN, United States", (36.3011, -82.2957))
regions_dict['Navy University/College'] = ("Annapolis, MD, United States", (38.9821, -76.4839))
regions_dict["Morningside University/College"] = ("Sioux City, IA, United States", (42.4732, -96.3584))
regions_dict['Hamilton University/College'] = ("Clinton, NY, United States", (43.0528, -75.4060))
regions_dict['Siena University/College'] = ("Loudonville, NY, United States", (42.7166, -73.7523))
regions_dict['Hiram Scott University/College'] = ("Scottsbluff, NE, United States", (41.8666, -103.6672))
regions_dict['Army University/College'] = ("West Point, NY, United States", (41.391837, -73.962503))
regions_dict['Bishop University/College'] = ("Dallas, TX, United States", (32.7767, -96.7970))
regions_dict["Saint Michael's University/College"] = ("Colchester, VT, United States", (44.4933491, -73.1638707))
regions_dict['Hope University/College'] = ("Holland, MI, United States", (42.7872, -86.1016))
regions_dict['Ripon University/College'] = ("Ripon, WI, United States", (43.8432, -88.8410))
regions_dict['Southampton University/College'] = ("Shinnecock Hills, NY, United States", (40.8882, -72.4450))
regions_dict['Assumption University/College'] = ("Worcester, MA, United States", (42.2905, -71.8295))
regions_dict['La Verne University/College'] = ("La Verne, CA, United States", (34.1008, -117.7678))
regions_dict["Saint Paul's University/College"] = ("Lawrenceville, VA, United States", (36.7596, -77.8475))
regions_dict["Wagner University/College"] = ("Staten Island, NY, United States", (40.6150, -74.0944))
regions_dict["Cal State-Fullerton University/College"] = ("Fullerton, CA, United States", (33.8704, -117.9242))
regions_dict["Le Moyne University/College"] = ("Syracuse, NY, United States", (43.0497, -76.0855))
regions_dict["Pace University/College"] = ("New York, NY, United States", (40.7111, -74.0048))
regions_dict['Air Force University/College'] = ("Colorado Springs, CO, United States", (38.9983, -104.8613))
regions_dict['Paine University/College'] = ("Augusta, GA, United States", (33.4696, -81.9933))
regions_dict['Thomas More University/College'] = ("Crestview Hills, KY, United States", (39.0217, -84.5680))
regions_dict['Nasson University/College'] = ("Sanford, ME, United States", (43.4656, -70.7978))
regions_dict['Alliance University/College'] = ("Cambridge Springs, PA, United States", (41.8037, -80.0564))
regions_dict["St. Peter's University/College"] = ("Jersey City, NJ, United States", (40.7272, -74.0715))
regions_dict["Carson-Newman University/College"] = ("Jefferson City, TN, United States", (36.1222, -83.4910))
regions_dict["Saint Joseph's (IN) University/College"] = ("Rensselaer, IN, United States", (40.9219, -87.1586))
regions_dict['Upsala University/College'] = ("East Orange, NJ, United States", (40.7762117, -74.2093122))
regions_dict['Taylor University/College'] = ("Upland, IN, United States", (40.4559, -85.4989))
regions_dict['Hellenic University/College'] = ("Brookline, MA, United States", (42.3172, -71.1289))
regions_dict['Saint Mary (KS) University/College'] = ("Leavenworth, KS, United States", (39.2783, -94.9063))
regions_dict["St. Mary's (CA) University/College"] = ("Moraga, CA, United States", (37.8413, -122.1101))
regions_dict["Movistar Estudiantes Other Team/Club"] = ("Madrid, Spain", (40.4168, -3.7038))
regions_dict["Dartmouth University/College"] = ("Hanover, NH, United States", (43.7044, -72.2887))
regions_dict["Calgary (CAN) University/College"] = ("Calgary, Alberta, Canada", (51.0776, -114.1407))

Now, let's add location information back into the original `draft_df` DataFrame.

In [24]:
act = [x + ' ' + 'University/College' if y == 'College/University'
         else x + ' ' + y for x, y in zip(draft_df['ORGANIZATION'],
                                     draft_df['ORGANIZATION_TYPE'])]
draft_df['location'] = [regions_dict[x] for x in act]

In [26]:
draft_df['coordinates'] = [x[1] for x in draft_df['location']]
draft_df['location_name'] = [x[0] for x in draft_df['location']]

### US Organizations

Because the vast majority of organizations are based in the United States, visualizing these separately from international organizations will prevent the US organizations from dominating in a combined visualization. Due to the exponential nature of the above figure, the markers in the US visualzation will be colored by the natural log of the number of players.

In [27]:
map_df = pd.DataFrame(draft_df[draft_df['DRAFT_TYPE']=='Draft'].groupby(['ORGANIZATION',
                                                                         'ORGANIZATION_TYPE',
                                                                         'coordinates',
                                                                         'location_name'])['PERSON_ID'].count()).reset_index()

In [28]:
map_df['marker_color'] = [np.log(x) for x in map_df['PERSON_ID']]
map_df['lon'] = [x[1] for x in map_df['coordinates']]
map_df['lat'] = [x[0] for x in map_df['coordinates']]

In [29]:
axis_style=dict(showline=False, 
                mirror=False, 
                showgrid=False, 
                zeroline=False,
                ticks='',
                showticklabels=False)
layout=dict(title='NBA Draft by Organization Location',
            width=1000, height=1000, 
            autosize=False,
            xaxis=axis_style,
            yaxis=axis_style,
            paper_bgcolor='rgba(0,0,0,0)',
            plot_bgcolor='rgba(0,0,0,0)',
            hovermode='closest')
my_text=['<b>'+ str(w) + '</b><br><br>Location: '+ str(x) +
  '<br>Number of Players: '+ str(y)
  for w, x, y in zip(list(map_df['ORGANIZATION']), list(map_df['location_name']),
                        list(map_df['PERSON_ID'])) ] 

fig = go.Figure(data=go.Scattergeo(
        lon = map_df['lon'],
        lat = map_df['lat'],
        text = my_text,
        mode = 'markers',
        marker = dict(
            size = 8,
            opacity = 0.5,
            reversescale = False,
            autocolorscale = False,
            symbol = 'circle',
            colorscale = ['blue', 'white', 'orange'],
            cmin = 0,
            color = map_df['marker_color'],
            cmax = map_df['marker_color'].max(),
            colorbar_title="log(Number of Players)"
        )
        ))

fig.update_layout(
        title = 'NBA Draftees by Organization Location',
        geo_scope='usa',
    )

fig.show()

There are a lot of takeaways that can be gathered from this figure:

- The east coast and midwest seem to dominate the number of organizations, but there are several schools in the west that have many draft selections (BYU, Arizona, Arizona State, Stanford, Washington, etc.)
- Most schools have a low number of draft selections, given the number of blue dots
- Given a location and an orange dot, it's fairly easy to guess the institution for those familiar with college basketball
- Some dots overlap, obscuring the color. This could harm the takeaway a user gathers without zooming in. Even with zooming, the overlap issue is not totally resolved. Perhaps perturbing the coordinates slightly for overlapping dots could improve this, but it could introduce another issue for schools on the east coast, which are quite concentrated
- There are a lot of schools of which I've never heard!

# International Organizations

The number of players drafted internationally is not as large as those drafted domestically, so the number of players is encoded as marker size for the international visualization.

In [30]:
map_df['alt_location'] = [x if 'United States' not in str(x) else 'United States' for x in map_df['location_name']]
map_df['alt_coordinates'] = [x if 'United States' not in y else (39.8283, -98.5795) for x, y in zip(map_df['coordinates'],
                                                                                                map_df['alt_location'])]
map_df['alt_organization'] = [x if y != 'United States' else 'United States Organizations' for x, y in zip(map_df['ORGANIZATION'],
                                                                                                map_df['alt_location'])]

In [31]:
world_df = pd.DataFrame(map_df.groupby(['alt_organization',
                                        'ORGANIZATION_TYPE',
                                        'alt_coordinates',
                                        'alt_location'])['PERSON_ID'].sum()).reset_index()

In [32]:
world_df['marker_size'] = [x if y != 'United States' else 100 for x, y in zip(world_df['PERSON_ID'],
                                                                              world_df['alt_location'])]
world_df['marker_size'] = [6 + x for x, y in zip(world_df['PERSON_ID'],
                                                                              world_df['alt_location'])]
world_df['lon'] = [x[1] for x in world_df['alt_coordinates']]
world_df['lat'] = [x[0] for x in world_df['alt_coordinates']]
world_df = world_df[world_df['alt_location'] != 'United States']

In [33]:
axis_style=dict(showline=False, 
                mirror=False, 
                showgrid=False, 
                zeroline=False,
                ticks='',
                showticklabels=False)
layout=dict(title='NBA Draft by Organization Location',
            width=1000, height=1000, 
            autosize=False,
            xaxis=axis_style,
            yaxis=axis_style,
            paper_bgcolor='rgba(0,0,0,0)',
            plot_bgcolor='rgba(0,0,0,0)',
            hovermode='closest')
my_text=['<b>'+ str(w) + '</b><br><br>Location: '+ str(x) +
  '<br>Number of Players: '+ str(y)
  for w, x, y in zip(list(world_df['alt_organization']), list(world_df['alt_location']),
                        list(world_df['PERSON_ID'])) ] 

fig = go.Figure(data=go.Scattergeo(
        lon = world_df['lon'],
        lat = world_df['lat'],
        text = my_text,
        mode = 'markers',
        marker_size = world_df['marker_size'],
        marker_color='blue'
        ))

fig.update_layout(
        title = 'NBA Draftees by Organization Location',
    )

fig.show()

Most of the international organizations are centralized in Europe, but the organizations are indeed spread across the world.

## Exploration

- How can these visualizations be enhanced to provide the user with more information? Try to add something to each visualization here.
- How else can these data be visualized? Generate a new visualization that takes a look at the dataset from a new perspective.