## Merging the NBA datasets

Reading in the three data frames

In [58]:
import pandas as pd
champs=pd.read_pickle("https://github.com/Policy-by-the-Numbers/spacejam/raw/main/nbachamps.pkl")
streak=pd.read_pickle("https://github.com/Policy-by-the-Numbers/spacejam/raw/main/nbawinstreaks.pkl")
mvp=pd.read_pickle("https://github.com/Policy-by-the-Numbers/spacejam/raw/main/nba_mvps.pkl")

In [51]:
champs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Team    33 non-null     object 
 1   Win     27 non-null     float64
 2   Loss    27 non-null     float64
 3   Apps    27 non-null     float64
 4   Pct     27 non-null     float64
dtypes: float64(4), object(1)
memory usage: 1.4+ KB


In [59]:
streak.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Games   33 non-null     int64 
 1   Team    33 non-null     object
dtypes: int64(1), object(1)
memory usage: 656.0+ bytes


Realizing that the unit of analysis for win streak is not by team. Gonna move on with just the MVP data to see what happens.

In [53]:
champs.shape,streak.shape,mvp.shape

((33, 5), (33, 2), (30, 2))

Looking first at championship data vs. MVP data

In [60]:
onlyChamps=set(champs.Team)-set(mvp.Teams)
onlyChamps

{'Atlanta Hawks[v]',
 'Baltimore Bullets (original) (folded in 1954)[viii]',
 'Brooklyn Nets[x]',
 'Chicago Stags (folded in 1950)',
 'Detroit Pistons[iv]',
 'Golden State Warriors[ii]',
 'Los Angeles Clippers',
 'Los Angeles Lakers[i]',
 'Oklahoma City Thunder[vii]',
 'Philadelphia 76ers[iii]',
 'Sacramento Kings[ix]',
 'Washington Capitols (folded in 1951)',
 'Washington Wizards[vi]'}

In [55]:
onlyMVP=set(mvp.Teams)-set(champs.Team)
onlyMVP

{'Baltimore Bullets (now Washington Wizards)',
 'Brooklyn Nets',
 'Buffalo Braves (now Los Angeles Clippers)',
 'Cincinnati Royals (now Sacramento Kings)',
 'Detroit Pistons',
 'Los Angeles Lakers',
 'Oklahoma City Thunder',
 'Philadelphia 76ers',
 'Philadelphia/Golden State Warriors',
 'St. Louis Hawks (now Atlanta Hawks)'}

Here we'll try and find what countries in onlyChamps may match those in onlyMVP using fuzzy merge

In [29]:
!pip install thefuzz
from thefuzz import process as fz

# look for a country in onlyMVP and return the most similar
[(fz.extractOne(champs, onlyMVP),champs) for champs in sorted(onlyChamps)]



[(('St. Louis Hawks (now Atlanta Hawks)', 86), 'Atlanta Hawks[v]'),
 (('Baltimore Bullets (now Washington Wizards)', 57),
  'Baltimore Bullets (original) (folded in 1954)[viii]'),
 (('Brooklyn Nets', 95), 'Brooklyn Nets[x]'),
 (('Detroit Pistons', 44), 'Chicago Stags (folded in 1950)'),
 (('Detroit Pistons', 95), 'Detroit Pistons[iv]'),
 (('Philadelphia/Golden State Warriors', 88), 'Golden State Warriors[ii]'),
 (('Buffalo Braves (now Los Angeles Clippers)', 90), 'Los Angeles Clippers'),
 (('Los Angeles Lakers', 95), 'Los Angeles Lakers[i]'),
 (('Oklahoma City Thunder', 95), 'Oklahoma City Thunder[vii]'),
 (('Philadelphia 76ers', 95), 'Philadelphia 76ers[iii]'),
 (('Cincinnati Royals (now Sacramento Kings)', 86), 'Sacramento Kings[ix]'),
 (('Baltimore Bullets (now Washington Wizards)', 44),
  'Washington Capitols (folded in 1951)'),
 (('Baltimore Bullets (now Washington Wizards)', 86),
  'Washington Wizards[vi]')]

In [34]:
[(fz.extractOne(champs, onlyMVP),champs)
 for champs in sorted(onlyChamps)
if fz.extractOne(champs, onlyMVP)[1]>80]

[(('St. Louis Hawks (now Atlanta Hawks)', 86), 'Atlanta Hawks[v]'),
 (('Brooklyn Nets', 95), 'Brooklyn Nets[x]'),
 (('Detroit Pistons', 95), 'Detroit Pistons[iv]'),
 (('Philadelphia/Golden State Warriors', 88), 'Golden State Warriors[ii]'),
 (('Buffalo Braves (now Los Angeles Clippers)', 90), 'Los Angeles Clippers'),
 (('Los Angeles Lakers', 95), 'Los Angeles Lakers[i]'),
 (('Oklahoma City Thunder', 95), 'Oklahoma City Thunder[vii]'),
 (('Philadelphia 76ers', 95), 'Philadelphia 76ers[iii]'),
 (('Cincinnati Royals (now Sacramento Kings)', 86), 'Sacramento Kings[ix]'),
 (('Baltimore Bullets (now Washington Wizards)', 86),
  'Washington Wizards[vi]')]

I like these matches, so I'll create a dictionary:

In [38]:
changesMVP1={(fz.extractOne(champs, onlyMVP),champs)
            for champs in sorted(onlyChamps)
            if fz.extractOne(champs, onlyMVP)[1]>80}
              
# dictionary of matches
changesMVP1

{(('Baltimore Bullets (now Washington Wizards)', 86),
  'Washington Wizards[vi]'),
 (('Brooklyn Nets', 95), 'Brooklyn Nets[x]'),
 (('Buffalo Braves (now Los Angeles Clippers)', 90), 'Los Angeles Clippers'),
 (('Cincinnati Royals (now Sacramento Kings)', 86), 'Sacramento Kings[ix]'),
 (('Detroit Pistons', 95), 'Detroit Pistons[iv]'),
 (('Los Angeles Lakers', 95), 'Los Angeles Lakers[i]'),
 (('Oklahoma City Thunder', 95), 'Oklahoma City Thunder[vii]'),
 (('Philadelphia 76ers', 95), 'Philadelphia 76ers[iii]'),
 (('Philadelphia/Golden State Warriors', 88), 'Golden State Warriors[ii]'),
 (('St. Louis Hawks (now Atlanta Hawks)', 86), 'Atlanta Hawks[v]')}

In [44]:
mvp.Teams.replace(to_replace=changesMVP1,inplace=True)
mvp.Teams

0                                 Boston Celtics
1                             Los Angeles Lakers
2                             Philadelphia 76ers
3                                  Chicago Bulls
4                                Milwaukee Bucks
5                                Houston Rockets
6                              San Antonio Spurs
7                                   Phoenix Suns
8             Philadelphia/Golden State Warriors
9            St. Louis Hawks (now Atlanta Hawks)
10                                     Utah Jazz
11                           Cleveland Cavaliers
12                                    Miami Heat
13                         Oklahoma City Thunder
14                                Denver Nuggets
15      Cincinnati Royals (now Sacramento Kings)
16    Baltimore Bullets (now Washington Wizards)
17                               New York Knicks
18     Buffalo Braves (now Los Angeles Clippers)
19                        Portland Trail Blazers
20                  

In [42]:
# second try
onlyChamps=set(champs.Team)-set(mvp.Teams)
onlyMVP=set(mvp.Teams)-set(champs.Team)
[(fz.extractOne(champs, onlyMVP),champs) for champs in sorted(onlyChamps)]

[(('St. Louis Hawks (now Atlanta Hawks)', 86), 'Atlanta Hawks[v]'),
 (('Baltimore Bullets (now Washington Wizards)', 57),
  'Baltimore Bullets (original) (folded in 1954)[viii]'),
 (('Brooklyn Nets', 95), 'Brooklyn Nets[x]'),
 (('Detroit Pistons', 44), 'Chicago Stags (folded in 1950)'),
 (('Detroit Pistons', 95), 'Detroit Pistons[iv]'),
 (('Philadelphia/Golden State Warriors', 88), 'Golden State Warriors[ii]'),
 (('Buffalo Braves (now Los Angeles Clippers)', 90), 'Los Angeles Clippers'),
 (('Los Angeles Lakers', 95), 'Los Angeles Lakers[i]'),
 (('Oklahoma City Thunder', 95), 'Oklahoma City Thunder[vii]'),
 (('Philadelphia 76ers', 95), 'Philadelphia 76ers[iii]'),
 (('Cincinnati Royals (now Sacramento Kings)', 86), 'Sacramento Kings[ix]'),
 (('Baltimore Bullets (now Washington Wizards)', 44),
  'Washington Capitols (folded in 1951)'),
 (('Baltimore Bullets (now Washington Wizards)', 86),
  'Washington Wizards[vi]')]

In [57]:
champs.merge(mvp).shape

MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False