# Lecture 5 - 4/16

## My Jupyter Hub account got deleted somehow so notes on the first half of this lecture got lost. I will try my best to recreate everything.
 Also I set up a git hub repository for notes in jupyter notebook formats. You can find it through this link: [https://github.com/Fay-Wu/134Note](https://github.com/Fay-Wu/134Note) You can clone it to your own notebook and play with the codes.


# Pandas Data Frame

Pandas package implements functionalities like data frames in R. There are many similarities but also differences. We will go over some differences in the context of working with the basketball data.

Obtaining data from NBA can be done using the function developed previously.

In [None]:
import pandas as pd

def get_nba_data(endpt, params, return_url=False):

    ## endpt: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation
    ## params: dictionary of parameters: i.e., {'LeagueID':'00'}
    
    from pandas import DataFrame
    from urllib.parse import urlencode
    import json
    
    useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""
    dataurl = "\"" + "http://stats.nba.com/stats/" + endpt + "?" + urlencode(params) + "\""
    
    # for debugging: just return the url
    if return_url:
        return(dataurl)
    
    jsonstr = !wget -q -O - --user-agent={useragent} {dataurl}
    
    data = json.loads(jsonstr[0])
    
    h = data['resultSets'][0]['headers']
    d = data['resultSets'][0]['rowSet']
    
    return(DataFrame(d, columns=h))

Using the function, we can get data about teams and players

In [None]:
## get all teams
params = {'LeagueID':'00'}
teams = get_nba_data('commonTeamYears', params)

## get all players
params = {'LeagueID':'00', 'Season': '2016-17', 'IsOnlyCurrentSeason': '0'}
players = get_nba_data('commonallplayers', params)

## Programming style

Programming language is really like a language, and you will get better with practice. It is good to think about good programming style and better way to do the same thing. By better, I mean more readable, concise, efficient (computationally), etc.

For example, there are guides and articles such as these:
- http://docs.python-guide.org/en/latest/writing/style/#short-ways-to-manipulate-lists
- https://google.github.io/styleguide/pyguide.html?showone=List_Comprehensions#List_Comprehensions
- https://google.github.io/styleguide/pyguide.html?showone=Naming#Naming
- https://www.python.org/dev/peps/pep-0008/

In [None]:
import this

Above easter egg is from Zen of Python: https://www.python.org/dev/peps/pep-0020/. 

- https://www.quora.com/What-do-different-aphorisms-in-The-Zen-of-Python-mean 
- 20th aphorism?:https://www.reddit.com/r/Python/comments/3cjhlo/this_disobeys_the_zen_of_python/

# Pandas

Pandas has an extensive set of functions. Refer to [Chapter 3 in PDSH](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) and the [official website](https://pandas.pydata.org). Latest stable release documentation is here: [http://pandas.pydata.org/pandas-docs/stable/api.html](http://pandas.pydata.org/pandas-docs/stable/api.html).

## Pandas Series 

The section on `Series` is here: http://pandas.pydata.org/pandas-docs/stable/api.html#series. These are available by placing a dot after the object.

### Data frames are made of Series
Pandas data frames are different objects:

In [None]:
print("data frame object :", type(teams))
print("data row object   :", type(teams.iloc[0]))
print("data column object:", type(teams.ABBREVIATION))

Note that rows as well as columns of pandas data frame are `Series` objects. (In R, rows would be a smaller data frame.)

There are categories of functions that are applicable to certain object types:

- Pandas general functions: http://pandas.pydata.org/pandas-docs/stable/api.html#general-functions   
    e.g., [`pandas.melt()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html#pandas-melt) take `DataFrame` as input. 
- Series methods: http://pandas.pydata.org/pandas-docs/stable/api.html#series
- DataFrame methods: http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe

### Pandas (often) shows you views

Recall that python objects are often _views_ of the same instance in memory space. Following says these are the same objects in memory:

In [None]:
temp = teams
print(id(temp) == id(teams))

So, if you change one, you see the change in the other:

In [None]:
s1 = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
s2 = s1
print("id of s1:", id(s1))
print("id of s2:", id(s2))
print("s1 is s2:", s1 is s2)

In [None]:
s1[0] = 10000

print("s1 changed:", s1[0])
print("s2 also   :", s2[0])
#print("s1 is s2:", s1[0] is s2[0])

Needs to be copied in order to make an independent variable.

In [None]:
abbr = teams.ABBREVIATION.copy()
abbr is teams.ABBREVIATION

### Indexing

There are many different ways to index `Series` and `DataFrames` in pandas: https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing.

- `.loc` is primarily for using labels and booleans: e.g., column and row indices, comparison operators, etc
- `.iloc` is primarily for using integer positions: i.e., like you would matrices

In [None]:
abbr

In [None]:
dict(abbr.head().items())

### Series as lists

In [None]:
list(abbr.head().items())

In [None]:
abbr.keys()

There are many more useful functions and properties. Refer to [Chapter 3 in PDSH](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html). Latest stable release documentation is here: [http://pandas.pydata.org/pandas-docs/stable/api.html](http://pandas.pydata.org/pandas-docs/stable/api.html).

The section on `Series` is here: http://pandas.pydata.org/pandas-docs/stable/api.html#series. These are available by placing a dot after the object.

In [None]:
abbr.unique()

A convenient method function is [`str`](http://pandas.pydata.org/pandas-docs/stable/api.html#string-handling). This allows functions to be applied to each value as strings separately. For example, we can search for patterns. For example, we can search for teams that end with letter `S`: 

In [None]:
abbr.str.contains('S$')

__Exercise__: how would you use this to pick out team names that end with S? Can you use the resulting boolean `Series`?

In [None]:
## abbr.loc[abbr.str.contains('S$')] ## what is the problem?

__Exercise__: what is `dir()` function?

In [None]:
## dir(abbr)

## Data Frames


### Getting columns

Following ways to call columns are equivalent. The *dot notation* is easier to read.

In [None]:
temp = teams.copy()

print(temp['MIN_YEAR'].head())
print(temp.MIN_YEAR.head())

### Setting columns

Note that you cannot set a new column with a dot notation. Consider the following:

In [None]:
temp['new_column_1'] = temp.MAX_YEAR
temp.new_column_2 = temp.MAX_YEAR
temp.head()

However, you can set an existing column with dot notation.

In [None]:
temp.LEAGUE_ID = 'ZZ'
temp.head()

### Data Frame, Series, dtype

This is different than R data frame in that columns in R data frames have their data types: e.g., `factor`, `integer`, `numeric`, etc. Pandas data frame columns are *all* `Series` with different dtypes. With column types not specified, everything is of dtype `object`:

In [None]:
print(teams.ABBREVIATION.dtype)

In [None]:
teams.ABBREVIATION = teams.ABBREVIATION.astype('category')
teams.TEAM_ID      = teams.TEAM_ID.astype('category')
teams.MIN_YEAR     = teams.MIN_YEAR.astype('int')
teams.MAX_YEAR     = teams.MAX_YEAR.astype('int')

Note that `object` is a general term

In [None]:
print(type(teams.iloc[0]))
print(teams.iloc[0])

### Condition based slicing

Subset just the current teams

In [None]:
teams = teams[teams.MAX_YEAR == 2017]
teams['TEAM_AGE'] = teams.MAX_YEAR - teams.MIN_YEAR

teams_clean = teams.copy() ## make a copy for later
teams

Note the following indexing

In [None]:
print('*** indexing with .iloc:\n', teams.iloc[1])
print('\n*** indexing with .loc :\n', teams.loc[14])

Subset just the players in current teams:

In [None]:
players = players[players.TEAM_ID.isin(teams.TEAM_ID)]
players.tail()

List players groupped by teams:

In [None]:
players.groupby('TEAM_CODE')

Above is called an iterable. You can iterate on the object to see the _views_.

In [None]:
for t, p in players.groupby('TEAM_NAME'):
    print("***", t)
    print('; '.join(p.DISPLAY_LAST_COMMA_FIRST.values), '\n')

### Merging data frames

First we can create a table of unique rows with full team names

In [None]:
team_names = players[['TEAM_ABBREVIATION', 'TEAM_CODE']].drop_duplicates()#.set_index('TEAM_ABBREVIATION')
team_names.head()

We have team codes (names) as a new column.

In [None]:
teams_clean.head()

In [None]:
teams = pd.merge(teams_clean, team_names, left_on='ABBREVIATION', right_on='TEAM_ABBREVIATION')
teams.tail()

We can apply `str` method:

In [None]:
teams.TEAM_CODE = teams.TEAM_CODE.str.capitalize() # returns values so needs to be reassigned
teams.sort_values('ABBREVIATION', inplace=True)    # modifies object
teams.tail()

In [None]:
players.head()

## Interaction with Widgets

One of the advantages of Jupyter notebooks is that it is browser-based. Browsers are highly interactive, and we can also interact with the data by using interactive widgets IPython provides.

We will digress a little bit, and talk about widgets. Widgets take user input by waiting for some action. We can create a simple slider to select some number:

In [None]:
from ipywidgets import interact, FloatSlider, Dropdown, Button

def selected_val(x):
    print('Selected value is', x)

xslider = FloatSlider(min=0.0, max=10.0, step=0.05) #object slider
interact(selected_val, x=xslider);

In [None]:
def f(x, y):
    print(x, y)
    
drop1 = {'Galileo': 10, 'Brahe': 11, 'Hubble': 12}
drop2 = {'Apple': 345, 'Orange': 234, 'Banana': 123}

interact(f, x=drop1, y=drop2); #Here we are passing a dictionary
#this gives a drop down bar that gives the keys, by choosing the key the function will give the value according to the keys.

In [None]:
menu = {
    'juice':['apple', 'peach', 'grape'],
    'tea':['ginger', 'green', 'earl grey'],
} #Set up dictionary

selected = 'tea'

flavor = Dropdown(options=menu[selected], value=menu[selected][0]) 
#pass in the key you want to start out with, in here is tea
drink = Dropdown(options=menu.keys(), value=selected)
order = Button(description='Order!', icon='check')

def update_drink(change):
    flavor.options = menu[change['new']] 
    flavor.selected = menu[change['new']][0]
    flavor.value = menu[change['new']][0] # new line added to update the value of flavor when you update the drink
def make_order(change): #make order when you click on the order button.
    print(flavor.value, drink.value)
    
drink.observe(update_drink, names='value') #without this line of function, it can still show the drop down menus. 
#However, when you change the type of drink the flavour won't update accordinly.
order.on_click(make_order)
 
display(flavor, drink, order)

__Exercise__: Can you add a widget for selecting the size? Size is independent of flavors; however, it should be included when the order is made. Allow for sizes small, regular, and large.

# Now try it on the basketball data
## This part of the lecture really needs you to try out different combiantion of codes and play around to understand, please do so.

Note that`drop1` and `drop2` are used to fill in the dropdown menus with dictionary keys, but returns the value associated with a selected key. We will need to use this approach for basketball data.
The format we need is as follows:

Teams dictionary:
```
  teams_menu = {
      'team1': [teamid1],
      'team2': [teamid2],
      ...,
      }
      
```

Players dictionary of dictionaries:
```   
   plyrs_menu = {
    [teamid1]:{
        'player1': [playerid1],
        'player2': [playerid2],
        ...,
        },
    [teamid2]:{
        'player12': [playerid12],
        'player13': [playerid13],
        ...,
        }
    }
``` 


In [None]:
team_dd_text = teams.TEAM_ABBREVIATION+', '+teams.TEAM_CODE
team_dd = dict(zip(team_dd_text, teams.TEAM_ID)) #make dictionary codes for teams
team_dd

In [None]:
plyr_by_team_dd = dict()

for t, p in players.groupby('TEAM_ID'):
    
    plyr_by_team_dd[t] = dict(zip(p.DISPLAY_LAST_COMMA_FIRST, p.PERSON_ID))

#plyr_by_team_dd

In [None]:
plyr_dd_text = players.DISPLAY_LAST_COMMA_FIRST
plyr_dd_id = players.PERSON_ID
plyr_dd = dict(zip(plyr_dd_text, plyr_dd_id))
#plyr_dd

In [None]:
# selected = 'ATL, Hawks'
selected = 'LAC, Clippers'

team_menu = Dropdown(options=team_dd, label=selected)
plyr_menu = Dropdown(options=plyr_by_team_dd[team_dd[selected]])
#select one dictionary out of many, 
#depends on what is selected in team_manu

display(team_menu, plyr_menu)

Now if you can the team menu the player menu does not change accordingly.
Adding an "Observer" wathces for change events to the drop down menus.

In [None]:
def test_team(change):
    
    print(change['new'])
    print("***********")
    print(change)
#try between these two see the chagnes lable and value make.    
#team_menu.observe(test_team, names='label')
team_menu.observe(test_team, names=['label', 'value']) ## what does this do?
#Now if you go back to the previous code and run it again, it prints out the new value

**Exercise:** What does the label refer to? What does adding value do?

In [None]:
# selected = 'ATL, Hawks'
selected = 'LAC, Clippers'

team_menu = Dropdown(options=team_dd, label=selected)
plyr_menu = Dropdown(options=plyr_by_team_dd[team_dd[selected]])

display(team_menu, plyr_menu)

def update_team(change): #update player according to teams now.
    plyr_menu.options = plyr_by_team_dd[change['new']]

team_menu.observe(update_team, names='value')

Now the players would change according to the change of teams.

In [None]:
#from ipywidgets import dlink

# selected = 'ATL, Hawks'
selected = 'LAC, Clippers'

team_menu = Dropdown(options=team_dd, label=selected)
plyr_menu = Dropdown(options=plyr_by_team_dd[team_dd[selected]])
fetch_button = Button(description='Get Data!', icon='check')

display(team_menu, plyr_menu, fetch_button)

## update players list
def update_team(change):
    plyr_menu.options = plyr_by_team_dd[change['new']]
    plyr_menu.value = list(plyr_by_team_dd[change['new']].values())[0]


team_menu.observe(update_team, names='value')

## get data action
def get_data(change):
    print(team_menu.value, plyr_menu.value)
    
fetch_button.on_click(get_data)


**Exercise:** How would you modify get_data() function to construct param and fetch data using get_nba_data()?