# Project
## NBA Player's statistics

### Abstract 
The topic is about NBA players’ personal information (Name, Age, Height, Weight…etc.) and statistics of regular season (PTS, RPG...etc.) which are scraped from the Internet (NBA official website, ESPN.com, Kaggle), and also including the social media (Instagram, Twitter, YouTube, Reddit ) data at this time. The basic conceptual model with the relationships of each entity is represented on ER diagrams. In addition, the SQL database for NBA data will be built for the user to search, providing some user cases to identify such database can be queried. Our database not only can be used to analyze the performance of each player and predict their following performance, but also can help to analyze their life through different social media.

### Import the packages

In [2]:
import pandas as pd
import requests
import numpy as np
from requests import get
from bs4 import BeautifulSoup as bs
from instaloader import Instaloader, Profile
from datetime import datetime, timedelta
from itertools import dropwhile, takewhile
from itertools import islice
import sqlite3

### Scraping Data

In [2]:
for m in range(2015,2018):
    ## Web Scraper
    url = "http://www.espn.com/nba/statistics/player/_/stat/rebounds/sort/avgRebounds/year/"+str(m)+"/seasontype/2/count/"+"1"
    response = get(url)
    html_soup = bs(response.text,'html.parser')## python's in built library HTML parser
    id_check = html_soup.find(id ="my-players-table")

    # find and store the max. page number
    players_container2 = id_check.find_all(class_ ="page-numbers")
    container2=players_container2[0].text
    x = container2.split(" ", 2) # split page "1 of 7" in to "1", "of", "7"
    pages=int(x[2]) # get max page no."7"

    #nba1=pd.DataFrame() # store the data value in dataframe
    globals()['nba1%s' % m] = pd.DataFrame()

    for y in range(pages): # from page 1 to 7
        url = "http://www.espn.com/nba/statistics/player/_/stat/rebounds/sort/avgRebounds/year/"+str(m)+"/seasontype/2/count/" + str(y*40+1)
        response = get(url)
        html_soup = bs(response.text, 'html.parser')  ## python's in built library HTML parser
        id_check = html_soup.find(id="my-players-table")
        players_container = id_check.find_all("tr")

        headers1_cols = [] # store the headers for column
        h_count=0 # for counting how many header repeat in each page
        d_count=1 # for counting the rank of players

        # extract data from individual players container
        for container in players_container:
            content = container.find_all("td")
            if content[0].text=="RK":
                headers1_cols=[content[0].text, content[1].text, content[2].text,
                              content[3].text, content[4].text, content[5].text,
                              content[6].text, content[7].text, content[8].text,
                              content[9].text, content[10].text, content[11].text]
                h_count=h_count+1
            else:
                content_s=content[1].text.split(",",1)  #split player's name and position
                tt = pd.DataFrame(np.column_stack([(y*40-h_count+d_count), content_s[0],content_s[1], content[2].text,
                                                    content[3].text, content[4].text, content[5].text,
                                                    content[6].text, content[7].text, content[8].text,
                                                    content[9].text, content[10].text, content[11].text]))
                globals()['nba1%s' % m]=globals()['nba1%s' % m].append(tt)
            d_count=d_count+1

    headers1_cols.insert(2,"POSITION") # add column "POSITION" in existing column list
    globals()['nba1%s' % m].columns = headers1_cols # change the columns name to headers
    globals()['nba1%s' % m].index = range(0,len(globals()['nba1%s' % m])) # reorder the index
    

    ## WebAPI
    # fake a browser visit
    user_agent = 'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)'
    headers = {'User-Agent':user_agent}
    url='https://stats.nba.com/stats/leagueLeaders?LeagueID=00&PerMode=PerGame&Scope=S&Season='+ str(m-1)+'-'+str(m-2000)+'&SeasonType=Regular+Season&StatCategory=PTS'
    r=requests.get(url,headers=headers).json() #grab the statistics data

    num=int(len(r['resultSet']['rowSet'])) # numbers of total data

    headers2_cols=[] # store the headers for column
    #nba2=pd.DataFrame() # store the data value in dataframe
    globals()['nba2%s' % m] = pd.DataFrame()

    # store the headers for column
    for x in r['resultSet']['headers']:
        headers2_cols.append(x)

    # extract data from json
    for z in range(num):
        player = pd.DataFrame([r['resultSet']['rowSet'][z]])
        globals()['nba2%s' % m]=globals()['nba2%s' % m].append((player))

    globals()['nba2%s' % m].columns=headers2_cols # change the columns name to headers
    globals()['nba2%s' % m].index = range(0,num) # reorder the index


## Import csv file
nba3 = pd.read_csv("players_stats.csv")



### Audit Data

In [3]:
for m in range(2015,2018):
    
    #list to store scraped value data in:
    combine1=pd.DataFrame()
    combine2=pd.DataFrame()
    player_stat_all=pd.DataFrame()
    #player_stat=pd.DataFrame()
    globals()['player_stat%s' % m] = pd.DataFrame()

    # combine 3 dataframes by "Player's Name"
    combine1=globals()['nba1%s' % m].merge(globals()['nba2%s' % m],left_on = 'PLAYER',right_on = 'PLAYER',how = 'inner')
    combine2=nba3.merge(combine1,left_on = 'Name',right_on = 'PLAYER',how = 'inner')
    combine2
    
    # pick the columns that needs to present and store them to new dataframe
    player_stat_all = combine2[['Name','Age','Birth_Place','Birthdate','Height','Weight','TEAM_y','POSITION','PTS_y','RPG','AST_y','STL_y','BLK_y','TOV_y']]

    # drop the rows that the values are missing
    globals()['player_stat%s' % m] = player_stat_all.dropna()
    # rename the columns' name
    globals()['player_stat%s' % m].columns = ['Name', 'Age','Birth_Place','Birthdate','Height','Weight','Team','Position','PTS','RPG','AST','STL','BLK','TOV']

    # change column "Age" type to int
    globals()['player_stat%s' % m] = globals()['player_stat%s' % m].astype({'Age':'int'})

    # reorder the index
    globals()['player_stat%s' % m].index = range(0,len(globals()['player_stat%s' % m]))

### Instagram 

In [9]:
def post(name):

    L = Instaloader()
    posts =Profile.from_username(L.context, name).get_posts()
    temp=pd.DataFrame()
    SINCE = datetime.now()
    UNTIL = datetime.now()- timedelta(days = 1)
    x=0
    for post in takewhile(lambda p: p.date > UNTIL, dropwhile(lambda p: p.date > SINCE, posts)):
        tt = [post.date,post.caption,post.likes,post.comments]
        temp=temp.append(tt)
        x=x+1
    else:
        #tt=[np.NaN,np.NaN,np.NaN,np.NaN]
        tt=['No post in 24 hours','No post in 24 hours','No post in 24 hours','No post in 24 hours']
        temp=temp.append(tt)
    return temp[0:4].transpose(),x

# read players' ig username
ig_n = pd.read_csv("player_ig2.csv")
L = Instaloader()

ig_df=pd.DataFrame()
iig_df=pd.DataFrame()

for x in range(len(ig_n)):
    try:
        profile = Profile.from_username(L.context, ig_n.iat[x,2])

        tt = pd.DataFrame(np.column_stack([ig_n.iat[x,0],ig_n.iat[x,1],
                                           profile.full_name,profile.username, profile.userid,
                                           profile.biography,profile.external_url,profile.mediacount,
                                           profile.followers,profile.followees]))
        ig_df=ig_df.append(tt)
        zz=pd.DataFrame(np.column_stack([post(ig_n.iat[x,2])[0],post(ig_n.iat[x,2])[1]]))
        iig_df=iig_df.append(zz)    
    except:
        print(x)
ig_result = pd.concat([ig_df, iig_df], axis=1, ignore_index=True)
ig_result.columns = ['Player_id','Name','ig_fullname','ig_username','ig_id','ig_bio','ig_url','ig_posts','ig_followers','ig_following',
                 'ig_latestpost_time','ig_latestpost_caption','ig_latestpost_likes',
                 'ig_latestpost_comments', 'ig_postwithin24hours']
ig_result.index = range(0,len(ig_result))
ig_result.head(3)



15


Unnamed: 0,Player_id,Name,ig_fullname,ig_username,ig_id,ig_bio,ig_url,ig_posts,ig_followers,ig_following,ig_latestpost_time,ig_latestpost_caption,ig_latestpost_likes,ig_latestpost_comments,ig_postwithin24hours
0,2,Al Horford,Al Horford,alhorford,10733526,,https://www.youtube.com/watch?v=YiZUWssMzyg&fe...,325,510621,265,2019-04-21 23:13:57,Well done. On to the next challenge. Go Celtic...,21707,214,1
1,4,Alan Anderson,Alan Anderson,dubblea74,1923491235,Proud Father✊🏾\n👨🏾‍🎓Michigan State Alumni\n Hu...,https://thecombinelasvegas.com/,80,5393,43,No post in 24 hours,No post in 24 hours,No post in 24 hours,No post in 24 hours,0
2,5,Alex Len,Alex Len,alexlen_21,300139600,"Ukraine ✈️Maryland ✈️Phoenix ✈️Atlanta ""A mo...",,158,40823,1038,2019-04-22 02:56:56,Somewhere in Barcelona,1688,16,1


### Teams_Profile

In [10]:
team_profile = pd.read_csv("team_profile.csv")
team_profile.head(3)

Unnamed: 0,Team_abbreviations,Team_fullname,Arena,Location,Capacity,Opened
0,MIA,Miami Heat,American Airlines Arena,"Miami, Florida",19600,1999
1,DAL,Dallas Mavericks,American Airlines Center,"Dallas, Texas",19200,2001
2,ORL,Orlando Magic,Amway Center,"Orlando, Florida",18846,2010


### Normalization-- Table "PROFILE"

In [11]:
# Table 'player_profile'
# add player_id for primary key to represent the players name

player_profile=nba3[['Name', 'Age','Birth_Place','Birthdate','Height','Weight','Pos']]
player_profile.insert(0, 'Player_id', range(1,len(player_profile)+1))
player_profile.columns =['Player_id','Name', 'Age','Birth_place','Birthdate','Height','Weight','Position'] 
player_profile.head(3)

Unnamed: 0,Player_id,Name,Age,Birth_place,Birthdate,Height,Weight,Position
0,1,AJ Price,29.0,us,"October 7, 1986",185.0,81.45,PG
1,2,Aaron Brooks,30.0,us,"January 14, 1985",180.0,72.45,PG
2,3,Aaron Gordon,20.0,us,"September 16, 1995",202.5,99.0,PF


In [24]:
player_profile.to_csv('player_profile.csv',index=False)

### Normalization-- Table "TEAMS"

In [12]:
# Table 'Teams'
# add Team_id for primary key to represernt the team name

teams=team_profile[:]
teams.insert(0, 'Team_id', range(1,len(teams)+1))
teams.columns =['Team_id','Abbreviation', 'Fullname','Arena','Location','Capacity','Opened']
teams.head(3)

Unnamed: 0,Team_id,Abbreviation,Fullname,Arena,Location,Capacity,Opened
0,1,MIA,Miami Heat,American Airlines Arena,"Miami, Florida",19600,1999
1,2,DAL,Dallas Mavericks,American Airlines Center,"Dallas, Texas",19200,2001
2,3,ORL,Orlando Magic,Amway Center,"Orlando, Florida",18846,2010


In [23]:
teams.to_csv('teams.csv',index=False)

### Normalization-- Table "STAT_20xx_20xx"

In [77]:
# combine the player_profile and player_stat 
resulta = pd.merge(player_stat2015, player_profile, how='inner', on=['Name', 'Name'])
resultb = pd.merge(player_stat2016, player_profile, how='inner', on=['Name', 'Name'])
resultc = pd.merge(player_stat2017, player_profile, how='inner', on=['Name', 'Name'])

In [78]:
# ectract the data that ready to connect to Table"Profile" and also organize it into normal from
stat_2014_2015=resulta[['Player_id', 'Team','PTS','RPG','AST','STL','BLK','TOV']]
stat_2014_2015.columns =['Player_id','Team_id','PTS','RPG','AST','STL','BLK','TOV']
stat_2015_2016=resultb[['Player_id', 'Team','PTS','RPG','AST','STL','BLK','TOV']]
stat_2015_2016.columns =['Player_id','Team_id','PTS','RPG','AST','STL','BLK','TOV']
stat_2016_2017=resultc[['Player_id', 'Team','PTS','RPG','AST','STL','BLK','TOV']]
stat_2016_2017.columns =['Player_id','Team_id','PTS','RPG','AST','STL','BLK','TOV']


In [79]:
# Table 'stat_20xx_20xx'
# replace the team id instead of the team name for normalization

for m in range(2015,2018):
    globals()['stat_%s_%s' % (m-1,m)] = globals()['stat_%s_%s' % (m-1,m)][['Player_id','Team_id','PTS','RPG','AST','STL','BLK','TOV']]

    globals()['stat_%s_%s' % (m-1,m)].columns = ['Player_id','Team_id','PTS','RPG','AST','STL','BLK','TOV']

    for x in range(len(globals()['stat_%s_%s' % (m-1,m)])):
        if globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='MIA': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=1
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='DAL': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=2
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='ORL': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=3
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='SAS': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=4
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='IND': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=5
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='BKN': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=6
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='WAS': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=7
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='OKC': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=8
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='MEM': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=9
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='MIL': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=10
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='SAC': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=11
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='DET': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=12
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='NYK': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=13
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='POR': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=14
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='GSW': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=15
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='DEN': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=16
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='CLE': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=17
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='TOR': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=18
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='NOP': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=19
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='CHA': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=20
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='LAC': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=21
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='LAL': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=22
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='ATL': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=23
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='PHX': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=24
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='MIN': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=25
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='BOS': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=26
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='HOU': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=27
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='CHI': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=28
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='UTA': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=29
        elif globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=='PHI': globals()['stat_%s_%s' % (m-1,m)].iloc[x,1]=30



In [84]:
#output the stat files for 3 reason
stat_2014_2015.to_csv('stat_2014_2015.csv',index=False)
stat_2015_2016.to_csv('stat_2015_2016.csv',index=False)
stat_2016_2017.to_csv('stat_2016_2017.csv',index=False)

### Normalization-- Table "IG_PROFILE" and "IG_POST"

In [8]:
## collect the data from instagram and organize it for nomalization

# function for find the posts for the  playere who posted in recent one day
def post(name):
    L = Instaloader()
    posts =Profile.from_username(L.context, name).get_posts()
    temp=pd.DataFrame()
    SINCE = datetime.now()
    UNTIL = datetime.now()- timedelta(days = 1)

    for post in takewhile(lambda p: p.date > UNTIL, dropwhile(lambda p: p.date > SINCE, posts)):
        k=set(post.get_comments())
        tt = [posts.userid,post.date,post.caption,post.likes,post.comments,list(k)[1][2]]
        temp=temp.append(tt)
    return temp

# import the instagram account
ig_n = pd.read_csv("player_ig2.csv")

L = Instaloader()
ig_profile=pd.DataFrame()
ig_post=pd.DataFrame()

# make a table for posts
for x in range(len(ig_n)):
    try:
        profile = Profile.from_username(L.context, ig_n.iat[x,2])
        tt = pd.DataFrame(np.column_stack([ig_n.iat[x,0],
                                           profile.full_name,profile.username, profile.userid,
                                           profile.biography,profile.external_url,profile.mediacount,
                                           profile.followers,profile.followees]))
        ig_profile=ig_profile.append(tt)

        a=post(ig_n.iat[x,2])
        for y in range(0,len(a),5):
            zz=pd.DataFrame(np.column_stack([a.iat[y,0],a.iat[y+1,0],a.iat[y+2,0],a.iat[y+3,0],a.iat[y+4,0],a.iat[y+5,0]]))
            ig_post=ig_post.append(zz) 
    except:
        print(x)

# rename and insert the index(primary) for this table
ig_post.insert(0, 'ig_post', range(1,len(ig_post)+1))
ig_post.columns =['Postid','Userid','Time','Caption','Likes','Comments','Comment']
ig_post.index = range(0,len(ig_post))
ig_post['Time']= ig_post['Time'].astype('str')

ig_profile.columns = ['Player_id','Fullname','Username','Userid','Bio','Url','Posts','Followers','Following']
ig_profile.index = range(0,len(ig_profile))

2
4
15


In [17]:
# export the file
ig_post.head(3)
ig_post.to_csv('ig_post.csv',index=False)

In [18]:
# export the file
ig_profile
ig_profile.to_csv('ig_profile.csv',index=False)

### Normalization-- Table "IG_MOST"

In [None]:
# function for finding the most popular posts and tages for last 30 days
#

def most(name):
    L = Instaloader()
    SINCE = datetime.now()
    UNTIL = datetime.now()- timedelta(days = 30)
    posts =Profile.from_username(L.context, name).get_posts()
    
    temp=[]
    #for post in posts:
    for post in takewhile(lambda p: p.date > UNTIL, dropwhile(lambda p: p.date > SINCE, posts)):
        temp.extend(post.caption_hashtags)
    if not temp:
        p=[None,None,None,None,None]
    else:
        import collections
        counter=collections.Counter(temp)
        tags=pd.DataFrame()
        for x in range(len(counter.most_common(5))):
            test=pd.DataFrame([counter.most_common(5)[:][x]])
            tags=tags.append(test)
       
        p= list(tags.loc[:,0])
        if len(p) !=5:
            for x in range(len(p),5):
                p.insert(x, None)
    
    posts =Profile.from_username(L.context, name).get_posts()
    q=[]
    qwe=[]
    for post in takewhile(lambda p: p.date > UNTIL, dropwhile(lambda p: p.date > SINCE, posts)):
        qwe.append(post)
    posts_sorted_by_likes = sorted(qwe, key = lambda p: p.likes + p.comments,reverse=True)
    if not posts_sorted_by_likes:
        q=['no post within 30 days']
    else:
        for post in islice(posts_sorted_by_likes, 0, 1):
            q=[post.caption]
    return p,q

# import the instagram account
ig_n = pd.read_csv("player_ig2.csv")

count =0
L = Instaloader()
ig_most=pd.DataFrame()

# make a table for most popular hashtages and posts for last 30 days
for x in range(len(ig_n)):
    try:
        profile = Profile.from_username(L.context, ig_n.iat[x,2])
        a=most(ig_n.iat[x,2])
        tt = pd.DataFrame(np.column_stack([profile.userid,a[0][0],a[0][1],a[0][2],a[0][3],a[0][4],a[1]]))
        ig_most=ig_most.append(tt)
        count=count+1
    except:
        # if error occured show which one failed
        print(ig_n.iat[x,2])

# rename the columns
ig_most.columns = ['Userid','tag1','tag2','tag3','tag4','tag5','Most_popular_post']
ig_most.index = range(0,len(ig_most))

In [7]:
ig_most

Unnamed: 0,Userid,Most_popular_hashtag,Most_popular_post
0,10733526,risetogether,My favorite time of year: playoff season! Runn...
1,1923491235,marchmadness,It’s almost that time....I’m not looking at du...
2,300139600,beststeakhousehandsdown,End of Chapter 6
3,375417881,goat,Too many ppl don’t understand but Notre Dame i...
4,8174195,str8up,Praying for my bro and his family Ishallah you...
5,37867524,no hashtag within 30 days,no post within 30 days
6,24949656,detroitbasketball,PLAYOFFS! 🦍🦍🦍\n#DetroitBasketball 📸: @iamtailz
7,6246343,no hashtag within 30 days,Chapter 15...
8,305609563,no hashtag within 30 days,no post within 30 days
9,3518326383,no hashtag within 30 days,no post within 30 days


In [4]:
ig_most
ig_most.to_csv('ig_most.csv',index=False)

### Citations
1. https://github.com/nikbearbrown/INFO_6210/blob/master/Week_2/NBB_IMDB_Web_Scraper.ipynb
2. https://github.com/danielfrg/espn-nba-scrapy/blob/master/src/scrap/get_players.py
3. http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
4. http://savvastjortjoglou.com/nba-shot-sharts.html
5. https://instaloader.github.io/index.html
6. https://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html
7. http://www.dcs.bbk.ac.uk/~ptw/teaching/DBM/er.pdf
8. https://www.dataquest.io/blog/python-pandas-databases/
9. https://github.com/nikbearbrown/INFO_6210/tree/master/Lahmans_Baseball_Database

Data source links:  
1. ESPN: http://www.espn.com/nba/statistics/player/_/stat/rebounds/sort/avgRebounds/year/2015/count/
2. NBA: https://stats.nba.com/leaders/?Season=2014-15&SeasonType=Regular%20Season
3. Kaggle: https://www.kaggle.com/drgilermo/nba-players-stats-20142015/version/1
4. Instagram: https://www.instagram.com/

### Contribution
This assignment is 95% done by my own, and 5% of the information and code that help me to do this assignment are from the Internet as the citations shown.

### License
Copyright 2019

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.