In [28]:
library(RCurl)
library(XML)
library(ggmap)
library(dplyr)
library(lubridate)

# Overview

NBA have 30 teams playing 82 games each during a regular season

Can we optmize the NBA and provide a better with lower distance travel?

# Data

In [3]:
calendar<-data.frame("date"=as.character(),
                     "time"=as.character(),
                     "visitor"=as.character(),
                     "visitor_pts"=as.numeric(),
                     "home"=as.character(),
                     "home_pts"=as.numeric())

### Scrapping data from Internet

Collecting the calendar from https://www.basketball-reference.com

In [5]:
months<-tolower(month.name)
years<-2016

for(j in 1:length(years)){
  
  url<-paste0("https://www.basketball-reference.com/leagues/NBA_",years[j],"_games-",months[1],".html")
  html <- xml2::read_html(url)
  node <- rvest::html_node(html, "table")
  table <- rvest::html_table(node, header = TRUE)
  table<-table[,1:6]
  
  names(table)<-c("date","time","visitor","visitor_pts","home","home_pts")
  
  for(i in 2:length(months)){
    url<-url<-paste0("https://www.basketball-reference.com/leagues/NBA_",years[j],"_games-",months[i],".html")
    if(url.exists(url)){
      html <- xml2::read_html(url)
      node <- rvest::html_node(html, "table")
      aux <- rvest::html_table(node, header = TRUE)
      aux<-aux[,1:6]
      names(aux)<-c("date","time","visitor","visitor_pts","home","home_pts")
      table<-rbind(table,aux)
    }
    else{
      next
    }
  }
  table$season<-years[j]
  calendar<-rbind(calendar,table)
}

In [6]:
head(calendar)

date,time,visitor,visitor_pts,home,home_pts,season
"Fri, Jan 1, 2016",8:00 pm,New York Knicks,81,Chicago Bulls,108,2016
"Fri, Jan 1, 2016",10:30 pm,Philadelphia 76ers,84,Los Angeles Lakers,93,2016
"Fri, Jan 1, 2016",7:30 pm,Dallas Mavericks,82,Miami Heat,106,2016
"Fri, Jan 1, 2016",7:30 pm,Charlotte Hornets,94,Toronto Raptors,104,2016
"Fri, Jan 1, 2016",7:00 pm,Orlando Magic,91,Washington Wizards,103,2016
"Sat, Jan 2, 2016",3:00 pm,Brooklyn Nets,100,Boston Celtics,97,2016


### Fixing date format

In [12]:
calendar$date2<-unlist(lapply(strsplit(gsub(",","",calendar$date)," "),function(x) paste(x[2:4],collapse = "-")))
calendar$date2<-as.Date(calendar$date2,"%b-%d-%Y")
calendar<-calendar%>%
  arrange(date2)

In [25]:
calendar%>%
    filter(complete.cases(.))%>%
    group_by()%>%
    summarise(min(date2),max(date2))

min(date2),max(date2)
2015-10-27,2016-06-19


For this calendar there are playoffs games and we are not interested on playoffs once we are evaluating the distance traveled during the regular season.

The 2015-16 season ranged from 10-27-2015 to 04-13-2016 (https://en.wikipedia.org/wiki/2015%E2%80%9316_NBA_season)

### Filter Regular Season Games

In [34]:
calendar<-calendar%>%
            filter(date2<='2016-04-13')

### Define Location of Games

In [35]:
calendar$home_location<-unlist(lapply(strsplit(calendar$home," "),function(x) paste(x[1:(length(x)-1)],collapse=" ")))
calendar$visitor_location<-unlist(lapply(strsplit(calendar$visitor," "),function(x) paste(x[1:(length(x)-1)],collapse=" ")))

In [38]:
unique(calendar$home_location)

Based on the name of the home team we can identify the game location. For example, when the home team is 'Chicago Bulls' we know the game was hosted in Chicago.

In a simple example, for a match between 'Chicago Bulls' and 'Memphis Grizzles' where the home team is 'Chicago Bulls' we assume that there was travel from Memphis to Chicago.