## Working with operators on a DataFrame

- When a DataFrame operates directly with one of the arithmetic or comparison operators, each value of each columns gets the operation applied to it
- If the DataFrame does not contain homogeneous data, then the operation is likely to fail

In [2]:
import pandas as pd
import numpy as np
pd.options.display.max_columns = 40

- Attempting to add 5 to each value of the DataFrame raises a `TypeError` as integers cannot be added to strings

In [8]:
college = pd.read_csv('data/college.csv')
college + 5

TypeError: Could not operate 5 with block values must be str, not int

- To successfull use an operator with a DataFrame, first select homogeneous data
- To get started, we import the data and use the institution name as the label for our index, and then select the columns we desire with the `filter` method

In [6]:
college = pd.read_csv('data/college.csv', index_col='INSTNM')
college_ugds_ = college.filter(like='UGDS_')
college_ugds_.head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
University of Alabama at Birmingham,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01
Amridge University,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715
University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035
Alabama State University,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137


- This recipe uses multiple operators with a DataFrame to round the undergraduate columns to the nearest hundredth
- First add `.00501` to each value of `college_ugds_`:

In [7]:
college_ugds_ + .00501

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.03831,0.94031,0.01051,0.00691,0.00741,0.00691,0.00501,0.01091,0.01881
University of Alabama at Birmingham,0.59721,0.26501,0.03331,0.05681,0.00721,0.00571,0.04181,0.02291,0.01501
Amridge University,0.30401,0.42421,0.01191,0.00841,0.00501,0.00501,0.00501,0.00501,0.27651
University of Alabama in Huntsville,0.70381,0.13051,0.04321,0.04261,0.01931,0.00521,0.02221,0.03821,0.04001
Alabama State University,0.02081,0.92581,0.01711,0.00691,0.00601,0.00561,0.01481,0.02931,0.01871
The University of Alabama,0.78751,0.11691,0.03981,0.01561,0.00881,0.00591,0.03111,0.03181,0.00761
Central Alabama Community College,0.73051,0.26631,0.00941,0.00751,0.00941,0.00501,0.00501,0.00501,0.00691
Athens State University,0.78731,0.12501,0.02411,0.01031,0.02071,0.00601,0.02241,0.01071,0.03841
Auburn University at Montgomery,0.53781,0.34261,0.01241,0.02711,0.00941,0.00661,0.03471,0.04471,0.02961
Auburn University,0.85571,0.07541,0.02981,0.02771,0.01241,0.00501,0.00501,0.01501,0.01901


- Use the floor division operator, //, to round to the nearest whole number percentage:

In [9]:
(college_ugds_ + .00501) // .01

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,3.0,94.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0
University of Alabama at Birmingham,59.0,26.0,3.0,5.0,0.0,0.0,4.0,2.0,1.0
Amridge University,30.0,42.0,1.0,0.0,0.0,0.0,0.0,0.0,27.0
University of Alabama in Huntsville,70.0,13.0,4.0,4.0,1.0,0.0,2.0,3.0,4.0
Alabama State University,2.0,92.0,1.0,0.0,0.0,0.0,1.0,2.0,1.0
The University of Alabama,78.0,11.0,3.0,1.0,0.0,0.0,3.0,3.0,0.0
Central Alabama Community College,73.0,26.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Athens State University,78.0,12.0,2.0,1.0,2.0,0.0,2.0,1.0,3.0
Auburn University at Montgomery,53.0,34.0,1.0,2.0,0.0,0.0,3.0,4.0,2.0
Auburn University,85.0,7.0,2.0,2.0,1.0,0.0,0.0,1.0,1.0


- To complete the rounding exercise, divide by 100:

In [10]:
college_ugds_op_round = (college_ugds_ + .00501) // .01 / 100
college_ugds_op_round.head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.03,0.94,0.01,0.0,0.0,0.0,0.0,0.01,0.01
University of Alabama at Birmingham,0.59,0.26,0.03,0.05,0.0,0.0,0.04,0.02,0.01
Amridge University,0.3,0.42,0.01,0.0,0.0,0.0,0.0,0.0,0.27
University of Alabama in Huntsville,0.7,0.13,0.04,0.04,0.01,0.0,0.02,0.03,0.04
Alabama State University,0.02,0.92,0.01,0.0,0.0,0.0,0.01,0.02,0.01


- Use the `round` DataFrame method to do the rounding automatically for us
- NumPy rounds numbers that are exactly halfway between either side to the even side

In [11]:
college_ugds_round = (college_ugds_ + .000001).round(2)

In [12]:
college_ugds_op_round.equals(college_ugds_round)

True

- Step 1 uses the plus operator, which attempts to add a scalar value to each value of each column of the DataFrame
- As the columns are all numeric, this operation works as expected

## There's more...

- Just as with Series, DataFrames have method equivalents of the operators
- You may replace the operators with their method equivalents:

In [14]:
college_ugds_op_round_methods = college_ugds_.add(.00501).floordiv(.01).div(100)
college_ugds_op_round_methods.equals(college_ugds_round)

True