# More on DataFrames
Only a fraction of the functionality of the **DataFrame** was covered in the previous notebook. This notebook will continue by focusing on a dataset with many more numeric columns. 

The US government provides a tremendous amout of free datasets available at [data.gov](http://www.data.gov). One interesting dataset is called the [college scoreboard](https://catalog.data.gov/dataset/college-scorecard) which contains information on every college and university in the United States.

In [88]:
# begin with importing the needed libraries
import pandas as pd
import numpy as np

### Read in data
Pandas can read in many different formats of data. All these functions begin with **`read_`**. The college scoreboard dataset is another **csv**.

In [89]:
# use a list comprehension to list all the 'read' functions
[f for f in dir(pd) if 'read' in f]

['read_clipboard',
 'read_csv',
 'read_excel',
 'read_fwf',
 'read_gbq',
 'read_hdf',
 'read_html',
 'read_json',
 'read_msgpack',
 'read_pickle',
 'read_sas',
 'read_sql',
 'read_sql_query',
 'read_sql_table',
 'read_stata',
 'read_table']

In [90]:
# read in the college dataset
college = pd.read_csv('data/college.csv')

In [91]:
# inspect data
college.shape

(7535, 27)

In [92]:
# The number of columns to display defaults as 20
# change this to see all columns

pd.options.display.max_columns = 40

In [93]:
college.head()

Unnamed: 0,INSTNM,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
0,Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
1,University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
2,Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
3,University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
4,Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


In [94]:
# almost all columns are numeric
# some missing values too

college.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7535 entries, 0 to 7534
Data columns (total 27 columns):
INSTNM                7535 non-null object
CITY                  7535 non-null object
STABBR                7535 non-null object
HBCU                  7164 non-null float64
MENONLY               7164 non-null float64
WOMENONLY             7164 non-null float64
RELAFFIL              7535 non-null int64
SATVRMID              1185 non-null float64
SATMTMID              1196 non-null float64
DISTANCEONLY          7164 non-null float64
UGDS                  6874 non-null float64
UGDS_WHITE            6874 non-null float64
UGDS_BLACK            6874 non-null float64
UGDS_HISP             6874 non-null float64
UGDS_ASIAN            6874 non-null float64
UGDS_AIAN             6874 non-null float64
UGDS_NHPI             6874 non-null float64
UGDS_2MOR             6874 non-null float64
UGDS_NRA              6874 non-null float64
UGDS_UNKN             6874 non-null float64
PPTUG_EF          

In [95]:
# describe for only the numeric columns 
college.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
HBCU,7164.0,0.014238,0.118478,0.0,0.0,0.0,0.0,1.0
MENONLY,7164.0,0.009213,0.095546,0.0,0.0,0.0,0.0,1.0
WOMENONLY,7164.0,0.005304,0.072642,0.0,0.0,0.0,0.0,1.0
RELAFFIL,7535.0,0.190975,0.393096,0.0,0.0,0.0,0.0,1.0
SATVRMID,1185.0,522.819409,68.578862,290.0,475.0,510.0,555.0,765.0
SATMTMID,1196.0,530.76505,73.469767,310.0,482.0,520.0,565.0,785.0
DISTANCEONLY,7164.0,0.005583,0.074519,0.0,0.0,0.0,0.0,1.0
UGDS,6874.0,2356.83794,5474.275871,0.0,117.0,412.5,1929.5,151558.0
UGDS_WHITE,6874.0,0.510207,0.286958,0.0,0.2675,0.5557,0.747875,1.0
UGDS_BLACK,6874.0,0.189997,0.224587,0.0,0.036125,0.10005,0.2577,1.0


### Data Dictionary
Many datasets from data.gov come with a data dictionary. The dictionary can have lots of data on the data or **metadata**. The **`college_data_dictionary.csv`** file is an abbreviated data dictionary with the description of each column. Data dictionaries can be extremely important when doing analysis

In [96]:
college_dd = pd.read_csv('data/college_data_dictionary.csv')

In [97]:
college_dd

Unnamed: 0,column_name,description
0,INSTNM,Institution Name
1,CITY,City Location
2,STABBR,State Abbreviation
3,HBCU,Historically Black College or University
4,MENONLY,0/1 Men Only
5,WOMENONLY,0/1 Women only
6,RELAFFIL,0/1 Religious Affiliation
7,SATVRMID,SAT Verbal Median
8,SATMTMID,SAT Math Median
9,DISTANCEONLY,Distance Education Only


### Sort DataFrame by a specific column
The **sort_values** DataFrame method can sort the entire frame by one or more columns.

In [98]:
# sort by ctiy descending
college.sort_values('CITY', ascending=False).head()

Unnamed: 0,INSTNM,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
2949,Ohio University-Zanesville Campus,Zanesville,OH,0.0,0.0,0.0,0,,,0.0,1944.0,0.8812,0.0417,0.0123,0.0026,0.0005,0.0005,0.037,0.0036,0.0206,0.5561,1,0.3947,0.5245,0.356,39600,21250
4674,Mid-EastCTC-Adult Education,Zanesville,OH,0.0,0.0,0.0,0,,,0.0,305.0,0.9443,0.0164,0.0,0.0033,0.0033,0.0,0.0262,0.0,0.0066,0.3902,1,0.3712,0.4991,0.4961,29800,6943
2916,Zane State College,Zanesville,OH,0.0,0.0,0.0,0,,,0.0,2063.0,0.6995,0.0296,0.0029,0.0029,0.0029,0.0005,0.0218,0.0,0.2399,0.573,1,0.3645,0.3434,0.3185,23800,13960.5
83,Arizona Western College,Yuma,AZ,0.0,0.0,0.0,0,,,0.0,7218.0,0.1793,0.0292,0.6952,0.01,0.0114,0.003,0.0109,0.0313,0.0296,0.655,1,0.5581,0.0469,0.3166,27000,5500
4833,Yukon Beauty College Inc,Yukon,OK,0.0,0.0,0.0,0,,,0.0,25.0,0.8,0.04,0.0,0.0,0.12,0.04,0.0,0.0,0.0,0.0,1,0.9259,0.8148,0.4706,PrivacySuppressed,PrivacySuppressed


### Give index meaning
As was done previously, a **primaray key** will be put into the index. Check if **`INSTM`** is unique.

In [99]:
# use a boolean expression to check if all INSTNM are unique
college['INSTNM'].nunique() == len(college)

True

In [100]:
# move INSTNM into the index
college = college.set_index('INSTNM')

In [101]:
college.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


In [102]:
# can also set_index on read
college = pd.read_csv('data/college.csv', index_col='INSTNM')

college.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


### Renaming columns or the index
The **`rename`** method allows both the values of the columns and the index to be renamed. You simply pass to the method a dictionary of the old name as the **key** and new name as the **value**.

In [103]:
# Change column UGDS to something more desriptive - UNDERGRAD_POP
college = college.rename(columns={'UGDS':'UNDERGRAD_POP'})
college.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UNDERGRAD_POP,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


In [104]:
# change 'University of Alabama at Birmingham' to 'UAB'
# notice the 'index' argument is used instead of 'columns'

college = college.rename(index={'University of Alabama at Birmingham':'UAB'})

college.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UNDERGRAD_POP,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
UAB,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


### In-Place operations
Many Series and DataFrame methods have the argument **`inplace`** that is always defaulted to **`False`**. When **`inplace`** takes on the its default value False, the operation will return some object (usually a Series or DataFrame - the same type that the method was operating on).

When **`inplace`** is switched to **`True`**, the object **`None`** is returned and the changes happen in-place. The object calling the method is mutated. There is no need to use an assignment operation (the equals sign) when a **`inplace`** is True.

Let's first recall how the list method **`append`** modifies a list in-place.

In [105]:
# modify a list in place
a = [1, 2, 3]

# append an element. No equal sign needed
a.append(4)

In [106]:
# output the list
a

[1, 2, 3, 4]

All the previous operations executed on a Series or a DataFrame have had **`inplace`** defaulted to its default value **`False`**. Let's see how renaming a column in a DataFrame works when **`inplace`** is changed to True. 

In [107]:
# lets reread the data into another variable
df = pd.read_csv('data/college.csv', index_col='INSTNM')

In [108]:
# old command is: college = college.rename(columns={'UGDS':'UNDERGRAD_POP'})

# new command has inplace=True and no assignment to new variable
df.rename(columns={'UGDS':'UNDERGRAD_POP'}, inplace=True)

In [109]:
# check output
df.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UNDERGRAD_POP,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


### Warning about using inplace
Switching opeartions to **`inplace`** is atypical behavior and generally is not the preferred method. Reassigning the output to a new (or the same) variable is preferable since this is the default behavior and is likely less confusing.

### Simple operations on the entire DataFrame
Many simple math operatoins can take place on all the columns of the DataFrame at once. They usually return a Series with the column names now in the index.

In [110]:
# ignores non-numeric columns
college.max()

HBCU                  1.0000
MENONLY               1.0000
WOMENONLY             1.0000
RELAFFIL              1.0000
SATVRMID            765.0000
SATMTMID            785.0000
DISTANCEONLY          1.0000
UNDERGRAD_POP    151558.0000
UGDS_WHITE            1.0000
UGDS_BLACK            1.0000
UGDS_HISP             1.0000
UGDS_ASIAN            0.9727
UGDS_AIAN             1.0000
UGDS_NHPI             0.9983
UGDS_2MOR             0.5333
UGDS_NRA              0.9286
UGDS_UNKN             0.9027
PPTUG_EF              1.0000
CURROPER              1.0000
PCTPELL               1.0000
PCTFLOAN              1.0000
UG25ABV               1.0000
dtype: float64

For the next several sections To simplify the output, only the columns with race information will be used. These are all columns that begin with **UGDS_**.

### Selecting columns by substring
It would be quite annoying to write out all 9 UGDS columns by hand. There are a few ways to programatically select these columns with the easiset done by the **`filter`** method. Pass it the string that you are searching for as the **`like`** argument. If you are familiar with regular expressions you can do more complex searches with the **`regex`** argument.

In [111]:
# Easily grab just the UGDS_ columns
college_ugds = college.filter(like='UGDS_')

college_ugds.head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
UAB,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01
Amridge University,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715
University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035
Alabama State University,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137


Continuing on with more simple math operations on the slimmed frame.

In [112]:
# get the mean of each column
college_ugds.mean()

UGDS_WHITE    0.510207
UGDS_BLACK    0.189997
UGDS_HISP     0.161635
UGDS_ASIAN    0.033544
UGDS_AIAN     0.013813
UGDS_NHPI     0.004569
UGDS_2MOR     0.023950
UGDS_NRA      0.016086
UGDS_UNKN     0.045181
dtype: float64

In [113]:
# get standard deviation
college_ugds.std()

UGDS_WHITE    0.286958
UGDS_BLACK    0.224587
UGDS_HISP     0.221854
UGDS_ASIAN    0.073777
UGDS_AIAN     0.070196
UGDS_NHPI     0.033125
UGDS_2MOR     0.031288
UGDS_NRA      0.050172
UGDS_UNKN     0.093440
dtype: float64

In [114]:
# can also get a series first and then use
college_ugds['UGDS_ASIAN'].max()

0.97270000000000001

In [115]:
# Get the top 5 of a series
college_ugds['UGDS_ASIAN'].nlargest(5)

INSTNM
Cosmopolitan Beauty and Tech School            0.9727
United Beauty College                          0.9670
Diamond Beauty College                         0.9658
Asian American International Beauty College    0.9595
Rosemead Beauty School                         0.9524
Name: UGDS_ASIAN, dtype: float64

In [116]:
# can even use a method like cummax
college_ugds.cummax().head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
UAB,0.5922,0.9353,0.0283,0.0518,0.0024,0.0019,0.0368,0.0179,0.0138
Amridge University,0.5922,0.9353,0.0283,0.0518,0.0024,0.0019,0.0368,0.0179,0.2715
University of Alabama in Huntsville,0.6988,0.9353,0.0382,0.0518,0.0143,0.0019,0.0368,0.0332,0.2715
Alabama State University,0.6988,0.9353,0.0382,0.0518,0.0143,0.0019,0.0368,0.0332,0.2715


In [117]:
# the count method only counts non-missing values
college_ugds.count()

UGDS_WHITE    6874
UGDS_BLACK    6874
UGDS_HISP     6874
UGDS_ASIAN    6874
UGDS_AIAN     6874
UGDS_NHPI     6874
UGDS_2MOR     6874
UGDS_NRA      6874
UGDS_UNKN     6874
dtype: int64

### Missing Values
Missing values are represented by the NumPy object **`np.nan`** and are ignored in many aggregation methods like **`mean, median, sum`**,  etc... The **`count`** method does not include missing values either.

### How to count missing values
There are multiple ways of doing this with one straightfoward way of converting the entire DataFrame into boolean values with the **`isnull`** method and then summing up all the True values.

In [118]:
# change all values to True/False
college_ugds.isnull().head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,False,False,False,False,False,False,False,False,False
UAB,False,False,False,False,False,False,False,False,False
Amridge University,False,False,False,False,False,False,False,False,False
University of Alabama in Huntsville,False,False,False,False,False,False,False,False,False
Alabama State University,False,False,False,False,False,False,False,False,False


In [119]:
# There are no missing values in the top 5 rows but there are at the tail
college_ugds.isnull().tail()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
SAE Institute of Technology San Francisco,True,True,True,True,True,True,True,True,True
Rasmussen College - Overland Park,True,True,True,True,True,True,True,True,True
National Personal Training Institute of Cleveland,True,True,True,True,True,True,True,True,True
Bay Area Medical Academy - San Jose Satellite Location,True,True,True,True,True,True,True,True,True
Excel Learning Center-San Antonio South,True,True,True,True,True,True,True,True,True


In [120]:
# now just sum up every column
college_ugds.isnull().sum()

UGDS_WHITE    661
UGDS_BLACK    661
UGDS_HISP     661
UGDS_ASIAN    661
UGDS_AIAN     661
UGDS_NHPI     661
UGDS_2MOR     661
UGDS_NRA      661
UGDS_UNKN     661
dtype: int64

In [121]:
# add missing values to non-missing to get total number of rows in dataset
college_ugds.isnull().sum() + college_ugds.count()

UGDS_WHITE    7535
UGDS_BLACK    7535
UGDS_HISP     7535
UGDS_ASIAN    7535
UGDS_AIAN     7535
UGDS_NHPI     7535
UGDS_2MOR     7535
UGDS_NRA      7535
UGDS_UNKN     7535
dtype: int64

### Operations accross the rows
All the operations thus far have been calculated down a column. It is possible to perform most of the operations accross the rows as well. A very important argument, **`axis`** controls which direction the operation happens. 

The **`axis`** argument can be  set to 0 or 1. When **`axis`** is 0, the operation will happen down the columns and when **`axis`** is 1 the operation will go accross the rows.

If you think 0 and 1 are cryptic, you can use strings **index** or **columns**, respectively in their place. Pandas defaults these operations to work down the columns.

In [122]:
# same behavior as default
college_ugds.sum(axis='index')

UGDS_WHITE    3507.1643
UGDS_BLACK    1306.0369
UGDS_HISP     1111.0782
UGDS_ASIAN     230.5831
UGDS_AIAN       94.9476
UGDS_NHPI       31.4066
UGDS_2MOR      164.6344
UGDS_NRA       110.5739
UGDS_UNKN      310.5772
dtype: float64

In [123]:
# same behavior as default. Can use 0 instead of 'index'
college_ugds.sum(axis=0)

UGDS_WHITE    3507.1643
UGDS_BLACK    1306.0369
UGDS_HISP     1111.0782
UGDS_ASIAN     230.5831
UGDS_AIAN       94.9476
UGDS_NHPI       31.4066
UGDS_2MOR      164.6344
UGDS_NRA       110.5739
UGDS_UNKN      310.5772
dtype: float64

In [124]:
# new behavior! Add accross rows
# add the race percentages up. Should add up close to one

college_ugds.sum(axis='columns').head(15)

INSTNM
Alabama A & M University                  1.0000
UAB                                       0.9999
Amridge University                        1.0000
University of Alabama in Huntsville       1.0000
Alabama State University                  1.0000
The University of Alabama                 1.0000
Central Alabama Community College         1.0000
Athens State University                   0.9999
Auburn University at Montgomery           0.9999
Auburn University                         1.0000
Birmingham Southern College               1.0001
Chattahoochee Valley Community College    1.0000
Concordia College Alabama                 1.0001
South University-Montgomery               1.0000
Enterprise State Community College        0.9999
dtype: float64

### Impress your friends with DataFrame styling
[Styling the appearance of the DataFrame](http://pandas.pydata.org/pandas-docs/stable/style.html) output is a new feature and can really bring attention to particular parts of the DataFrame.

One clear use-case is to highlight the max/min values of each column. To do this, we first need to grab only the rows that contain the max value of each column. The method **`idxmax`** will return the index label for the maximum value of each column.

In [125]:
# get index label for max value of every row
idx_max = college_ugds.idxmax()

idx_max

UGDS_WHITE               Mr Leon's School of Hair Design-Moscow
UGDS_BLACK                   Velvatex College of Beauty Culture
UGDS_HISP               Thunderbird School of Global Management
UGDS_ASIAN                  Cosmopolitan Beauty and Tech School
UGDS_AIAN                     Haskell Indian Nations University
UGDS_NHPI                               Palau Community College
UGDS_2MOR                                         LIU Brentwood
UGDS_NRA       California University of Management and Sciences
UGDS_UNKN     Le Cordon Bleu College of Culinary Arts-San Fr...
dtype: object

In [126]:
# use .loc to select only these rows
college_ugds.loc[idx_max]

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Mr Leon's School of Hair Design-Moscow,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Velvatex College of Beauty Culture,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Thunderbird School of Global Management,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Cosmopolitan Beauty and Tech School,0.0091,0.0,0.0182,0.9727,0.0,0.0,0.0,0.0,0.0
Haskell Indian Nations University,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Palau Community College,0.0,0.0017,0.0,0.0,0.0,0.9983,0.0,0.0,0.0
LIU Brentwood,0.0,0.1333,0.2667,0.0,0.0,0.0,0.5333,0.0,0.0667
California University of Management and Sciences,0.0102,0.0204,0.0,0.0408,0.0,0.0,0.0,0.9286,0.0
Le Cordon Bleu College of Culinary Arts-San Francisco,0.0317,0.009,0.0113,0.0271,0.0045,0.0023,0.0113,0.0,0.9027


The above DataFrame has 9 rows, one for each column. Each of these rows contains the maximum value for one of the columns. It's still takes some effor to spot where the maximum value was for each column, especially if this was shown to a person not familiar with the analysis.

The **`style.highlight_max()`** method will make it very easy to see where the max value for each column occurs.

In [127]:
# impress friends time
college_ugds.loc[idx_max].style.highlight_max()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Mr Leon's School of Hair Design-Moscow,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Velvatex College of Beauty Culture,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Thunderbird School of Global Management,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Cosmopolitan Beauty and Tech School,0.0091,0.0,0.0182,0.9727,0.0,0.0,0.0,0.0,0.0
Haskell Indian Nations University,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Palau Community College,0.0,0.0017,0.0,0.0,0.0,0.9983,0.0,0.0,0.0
LIU Brentwood,0.0,0.1333,0.2667,0.0,0.0,0.0,0.5333,0.0,0.0667
California University of Management and Sciences,0.0102,0.0204,0.0,0.0408,0.0,0.0,0.0,0.9286,0.0
Le Cordon Bleu College of Culinary Arts-San Francisco,0.0317,0.009,0.0113,0.0271,0.0045,0.0023,0.0113,0.0,0.9027


### Changing Data Types
The **`college`** DataFrame has the column **`MD_EARN_WNE_P10`**, which from the data dicitonary is the median earnings after 10 years of enrollment. Looking at the top of this notebook when **`.info`** was executed, the data type came back as object.

But from its description and inspection it seems like it should be numeric.

In [128]:
# output data type of MD_EARN_WNE_P10
# use dtype for Series and dtypes for DataFrames

# 'O' means object
college['MD_EARN_WNE_P10'].dtype

dtype('O')

In [129]:
# lets inspect some of the values
college['MD_EARN_WNE_P10'].head(10)

INSTNM
Alabama A & M University               30300
UAB                                    39700
Amridge University                     40100
University of Alabama in Huntsville    45500
Alabama State University               26600
The University of Alabama              41900
Central Alabama Community College      27500
Athens State University                39000
Auburn University at Montgomery        35000
Auburn University                      45700
Name: MD_EARN_WNE_P10, dtype: object

### Sorting to see strings
Column **`MD_EARN_WNE_P10`** certainly appears to be numeric. Since numbers come first lexicographically, sorting the column by descending order would put any strings at the top.

In [130]:
# so this is why this column was an object and not numeric
college['MD_EARN_WNE_P10'].sort_values(ascending=False).head()

INSTNM
Sharon Regional Health System School of Nursing    PrivacySuppressed
Northcoast Medical Training Academy                PrivacySuppressed
Success Schools                                    PrivacySuppressed
Louisiana Culinary Institute                       PrivacySuppressed
Bais Medrash Toras Chesed                          PrivacySuppressed
Name: MD_EARN_WNE_P10, dtype: object

### Converting to numeric
In the previous notebook, the method **`astype`** was used to convert a string to a datetime. **`astype`** is not a robust method and will error when the conversion can't happen. Pandas provides helper functions that give more flexibility when converting types. 

The **`to_numeric, to_datetime`** and **`to_timedelta`** functions are much more robust and give much more control to handle different situations where **`astype`** would error out.

The **`to_numeric`** argument **`errors`** can be set to 'coerce' to yield a missing value every time it encounters a value it cannot convert such as the string 'PrivacySuppressed'.

In [131]:
# convert column to numeric. make strings missing values
college['MD_EARN_WNE_P10'] = pd.to_numeric(college['MD_EARN_WNE_P10'], errors='coerce')

# use describe to check type and see summary stats
college['MD_EARN_WNE_P10'].describe()

count      5591.000000
mean      32918.315149
std       14621.845375
min        9500.000000
25%       23900.000000
50%       30700.000000
75%       38800.000000
max      233100.000000
Name: MD_EARN_WNE_P10, dtype: float64

### Filling Missing Values
Pandas gives great flexibility for replacing missing values with some other value. The **`fillna`** method works for both DataFrames and Series. To clearly show the versatility of **`fillna`** a small DataFrame will be created and operated upon.

In [132]:
df_test = pd.DataFrame({'City':['Houston', np.nan, np.nan, 'New Orleans', 'Los Angeles'],
                       'State':['Texas', 'Maine', np.nan, 'Louisiana', 'California'],
                       'Pop':[100, 50, 30, np.nan, 80]},
                      columns=['City', 'State', 'Pop'])

df_test

Unnamed: 0,City,State,Pop
0,Houston,Texas,100.0
1,,Maine,50.0
2,,,30.0
3,New Orleans,Louisiana,
4,Los Angeles,California,80.0


In [133]:
# count the missing values
df_test.isnull().sum()

City     2
State    1
Pop      1
dtype: int64

In [134]:
# fillna with a constant
# non-sensical with both string and numeric columns but still works
df_test.fillna(0)

Unnamed: 0,City,State,Pop
0,Houston,Texas,100.0
1,0,Maine,50.0
2,0,0,30.0
3,New Orleans,Louisiana,0.0
4,Los Angeles,California,80.0


In [135]:
# fill with constant string
# notice how data type of Pop is now object

df_test.fillna('Nevada')

Unnamed: 0,City,State,Pop
0,Houston,Texas,100
1,Nevada,Maine,50
2,Nevada,Nevada,30
3,New Orleans,Louisiana,Nevada
4,Los Angeles,California,80


In [136]:
# use the `method` argument to fill using previous/next values

# Takes closest missing value in front for each column
df_test.fillna(method='backfill')

Unnamed: 0,City,State,Pop
0,Houston,Texas,100.0
1,New Orleans,Maine,50.0
2,New Orleans,Louisiana,30.0
3,New Orleans,Louisiana,80.0
4,Los Angeles,California,80.0


In [137]:
# go the other way
df_test.fillna(method='ffill')

Unnamed: 0,City,State,Pop
0,Houston,Texas,100.0
1,Houston,Maine,50.0
2,Houston,Maine,30.0
3,New Orleans,Louisiana,30.0
4,Los Angeles,California,80.0


In [138]:
# A common approach to missing values is to replace them by the column mean
# This only works for numeric columns

df_test.fillna(df_test.mean())

Unnamed: 0,City,State,Pop
0,Houston,Texas,100.0
1,,Maine,50.0
2,,,30.0
3,New Orleans,Louisiana,65.0
4,Los Angeles,California,80.0


In [139]:
# can also use for Series
# and limit how many missing values get filled

df_test['City'].fillna('Bangor', limit=1)

0        Houston
1         Bangor
2            NaN
3    New Orleans
4    Los Angeles
Name: City, dtype: object

# End of Section Summary
* Be aware of the many **`pd.read_`** functions and their arguments
* Change display options in the notebook with **`pd.options.display.<option_name>`**
* Sort values of a DataFrame
* rename column and index names
* Know what methods do when **`inplace=True`**
* Use the **`filter`** method to find specific columns/indices
* Use a variety of methods on the entire DataFrame like **`count, mean, std, ...`**
* Know how to perform an operation down the rows or columns with the **`axis`** argument
* Know how to style your DataFrame with **`df.style.<style_name>`**
* Force a change of data type with **`pd.to_numeric`**
* Count missing values with **`df.isnull().sum`**
* Fill in missing values in a variety of ways with **`fillna`**

# Problems
Run the code cell below if you just opened this notebook without going executing any statements.

In [140]:
college = pd.read_csv('data/college.csv', index_col='INSTNM')
college_ugds = college.filter(like='UGDS_')

### Problem 1
<span  style="color:green; font-size:16px">Re-read the college.csv file into the variable **`college2`**. Use the documentation of the **`read_csv`** function to assign the index column INSTNM on read, skip the first 20 rows but keep the header.</span>

In [141]:
college2 = pd.read_csv('data/college.csv', index_col='INSTNM',header=0, skiprows=range(1,21))
college2.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
George C Wallace State Community College-Hanceville,Hanceville,AL,0.0,0.0,0.0,0,,,0.0,4920.0,0.863,0.0612,0.0362,0.0065,0.0089,0.0,0.0,0.0059,0.0183,0.4203,1,0.5026,0.4192,0.3229,28800,11186
George C Wallace State Community College-Selma,Selma,AL,0.0,0.0,0.0,0,,,0.0,1513.0,0.1956,0.7449,0.0026,0.004,0.0013,0.0,0.0033,0.004,0.0443,0.384,1,0.7645,0.0,0.3318,24200,PrivacySuppressed
Herzing University-Birmingham,Birmingham,AL,0.0,0.0,0.0,0,,,0.0,302.0,0.3543,0.5265,0.0166,0.0066,0.0,0.0,0.0563,0.0,0.0397,0.5497,1,0.6541,0.7736,0.7813,42300,23216.5
Huntingdon College,Montgomery,AL,0.0,0.0,0.0,1,510.0,490.0,0.0,1149.0,0.6388,0.1993,0.0252,0.0078,0.0122,0.0017,0.0261,0.0061,0.0827,0.2097,1,0.3982,0.7153,0.1937,36500,26230
Heritage Christian University,Florence,AL,0.0,0.0,0.0,1,,,0.0,62.0,0.7419,0.1129,0.0484,0.0,0.0323,0.0161,0.0,0.0161,0.0323,0.4355,1,0.6087,0.4493,0.5942,PrivacySuppressed,PrivacySuppressed


### Problem 2
<span  style="color:green; font-size:16px">Run command **`college.describe().T`** and take a close look at the **`min`** and **`max`** columns. Many columns range from 0 to 1. What kind of data do you think they represent?</span>

Boolean

In [142]:
college.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
HBCU,7164.0,0.014238,0.118478,0.0,0.0,0.0,0.0,1.0
MENONLY,7164.0,0.009213,0.095546,0.0,0.0,0.0,0.0,1.0
WOMENONLY,7164.0,0.005304,0.072642,0.0,0.0,0.0,0.0,1.0
RELAFFIL,7535.0,0.190975,0.393096,0.0,0.0,0.0,0.0,1.0
SATVRMID,1185.0,522.819409,68.578862,290.0,475.0,510.0,555.0,765.0
SATMTMID,1196.0,530.76505,73.469767,310.0,482.0,520.0,565.0,785.0
DISTANCEONLY,7164.0,0.005583,0.074519,0.0,0.0,0.0,0.0,1.0
UGDS,6874.0,2356.83794,5474.275871,0.0,117.0,412.5,1929.5,151558.0
UGDS_WHITE,6874.0,0.510207,0.286958,0.0,0.2675,0.5557,0.747875,1.0
UGDS_BLACK,6874.0,0.189997,0.224587,0.0,0.036125,0.10005,0.2577,1.0


### Problem 3
<span  style="color:green; font-size:16px">Sort first by **`STABBR`** ascending and by **`CITY`** descending. Read the docs on **`sort_values`** to learn how to sort two columns at the same time.</span>

In [143]:
college.sort_values(['STABBR', 'CITY'], ascending=[True, False])

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alaska Christian College,Soldotna,AK,0.0,0.0,0.0,1,,,0.0,68.0,0.0588,0.0000,0.0147,0.0000,0.7794,0.0000,0.0147,0.0000,0.1324,0.0735,1,0.8868,0.6792,0.2264,,PrivacySuppressed
AVTEC-Alaska's Institute of Technology,Seward,AK,0.0,0.0,0.0,0,,,0.0,889.0,0.5388,0.0112,0.0427,0.0157,0.1879,0.0112,0.0529,0.0000,0.1395,0.6817,1,0.0737,0.0664,0.7127,33500,PrivacySuppressed
Alaska Bible College,Palmer,AK,0.0,0.0,0.0,1,,,0.0,27.0,0.8519,0.0000,0.0370,0.0000,0.0741,0.0000,0.0370,0.0000,0.0000,0.1481,1,0.3571,0.2857,0.4286,,PrivacySuppressed
University of Alaska Southeast,Juneau,AK,0.0,0.0,0.0,0,,,0.0,1428.0,0.4748,0.0119,0.0623,0.0357,0.1029,0.0147,0.0686,0.0049,0.2241,0.5112,1,0.1769,0.1996,0.5550,37400,16875
University of Alaska Fairbanks,Fairbanks,AK,0.0,0.0,0.0,0,,,0.0,5536.0,0.4259,0.0210,0.0522,0.0126,0.1284,0.0027,0.0401,0.0110,0.3060,0.3887,1,0.2263,0.2550,0.4519,36200,19355
Ilisagvik College,Barrow,AK,0.0,0.0,0.0,0,,,0.0,109.0,0.1376,0.0183,0.0092,0.0826,0.6881,0.0459,0.0000,0.0183,0.0000,0.6239,1,0.1323,0.0000,0.6498,24900,PrivacySuppressed
University of Alaska Anchorage,Anchorage,AK,0.0,0.0,0.0,0,,,0.0,12865.0,0.5747,0.0358,0.0761,0.0778,0.0653,0.0086,0.0980,0.0181,0.0457,0.4539,1,0.2385,0.2647,0.4386,42500,19449.5
Alaska Pacific University,Anchorage,AK,0.0,0.0,0.0,1,555.0,503.0,0.0,275.0,0.5309,0.0291,0.0364,0.0255,0.1855,0.0109,0.0945,0.0000,0.0873,0.3745,1,0.3152,0.5297,0.4910,47000,23250
Charter College-Anchorage,Anchorage,AK,0.0,0.0,0.0,0,,,0.0,3256.0,0.4373,0.0599,0.3093,0.0123,0.0405,0.0577,0.0436,0.0000,0.0393,0.0000,1,0.8307,0.7503,0.5472,39200,13875
Alaska Career College,Anchorage,AK,0.0,0.0,0.0,0,,,0.0,479.0,0.3800,0.0960,0.1002,0.1983,0.1733,0.0084,0.0334,0.0000,0.0104,0.0000,1,0.7078,0.7860,0.5612,28700,8994


### Problem 4
<span  style="color:green; font-size:16px">Rename column **`HBCU`** to **`HISTORICALLY_BLACK`**, **`STABBR`** to **`STATE_ABBR`** and index **`Alabama State University`** to **`ASU`** all in one line of code. </span>

In [144]:
college.rename(columns={'HBCU':'HISTORICALLY_BLACK', 'STABBR':'STATE_ABBR'}, index={'Alabama State University':'ASU'})

Unnamed: 0_level_0,CITY,STATE_ABBR,HISTORICALLY_BLACK,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0000,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.2600,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.0100,0.2607,1,0.3460,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.2990,0.4192,0.0069,0.0034,0.0000,0.0000,0.0000,0.0000,0.2715,0.4536,1,0.6801,0.7795,0.8540,40100,23370
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.0350,0.2146,1,0.3072,0.4596,0.2640,45500,24097
ASU,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.0010,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.1270,26600,33118.5
The University of Alabama,Tuscaloosa,AL,0.0,0.0,0.0,0,555.0,565.0,0.0,29851.0,0.7825,0.1119,0.0348,0.0106,0.0038,0.0009,0.0261,0.0268,0.0026,0.0844,1,0.2040,0.4010,0.0853,41900,23750
Central Alabama Community College,Alexander City,AL,0.0,0.0,0.0,0,,,0.0,1592.0,0.7255,0.2613,0.0044,0.0025,0.0044,0.0000,0.0000,0.0000,0.0019,0.3882,1,0.5892,0.3977,0.3153,27500,16127
Athens State University,Athens,AL,0.0,0.0,0.0,0,,,0.0,2991.0,0.7823,0.1200,0.0191,0.0053,0.0157,0.0010,0.0174,0.0057,0.0334,0.5517,1,0.4088,0.6296,0.6410,39000,18595
Auburn University at Montgomery,Montgomery,AL,0.0,0.0,0.0,0,486.0,509.0,0.0,4304.0,0.5328,0.3376,0.0074,0.0221,0.0044,0.0016,0.0297,0.0397,0.0246,0.2853,1,0.4192,0.5803,0.2930,35000,21335
Auburn University,Auburn,AL,0.0,0.0,0.0,0,575.0,588.0,0.0,20514.0,0.8507,0.0704,0.0248,0.0227,0.0074,0.0000,0.0000,0.0100,0.0140,0.0862,1,0.1610,0.3494,0.0415,45700,21831


### Problem 5
<span  style="color:green; font-size:16px">Sort the index in-place. Output the head of the DataFrame.</span>

In [145]:
college.sort_index(inplace=True)
college

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
A & W Healthcare Educators,New Orleans,LA,0.0,0.0,0.0,0,,,0.0,40.0,0.0000,0.9750,0.0250,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.1250,1,0.7018,0.8596,0.6667,,19022.5
A T Still University of Health Sciences,Kirksville,MO,0.0,0.0,0.0,0,,,0.0,,,,,,,,,,,,1,,,,219800,PrivacySuppressed
ABC Beauty Academy,Garland,TX,0.0,0.0,0.0,0,,,0.0,30.0,0.0000,0.0333,0.0333,0.9333,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0,0.7857,0.0000,0.8286,,PrivacySuppressed
ABC Beauty College Inc,Arkadelphia,AR,0.0,0.0,0.0,0,,,0.0,38.0,0.2895,0.6579,0.0526,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.2105,1,0.9815,1.0000,0.4688,PrivacySuppressed,16500
AI Miami International University of Art and Design,Miami,FL,0.0,0.0,0.0,0,,,0.0,2778.0,0.0324,0.0198,0.4773,0.0018,0.0000,0.0000,0.0018,0.0025,0.4644,0.2185,1,0.5507,0.6966,0.3262,29900,31000
AIB College of Business,Des Moines,IA,0.0,0.0,0.0,0,,,0.0,1012.0,0.4516,0.0375,0.0484,0.0079,0.0030,0.0049,0.0128,0.0198,0.4140,0.2490,1,0.4132,0.7125,0.3209,37000,19732.5
AOMA Graduate School of Integrative Medicine,Austin,TX,0.0,0.0,0.0,0,,,0.0,,,,,,,,,,,,1,,,,PrivacySuppressed,PrivacySuppressed
ASA College,Brooklyn,NY,0.0,0.0,0.0,0,,,0.0,4551.0,0.0420,0.3228,0.3894,0.0751,0.0009,0.0013,0.0132,0.1525,0.0029,0.0650,1,0.7992,0.6527,0.4816,24100,13747
ASI Career Institute,Turnersville,NJ,0.0,0.0,0.0,0,,,0.0,109.0,0.6147,0.2661,0.1193,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,1,0.2571,0.2571,0.1818,,PrivacySuppressed
ASM Beauty World Academy,Davie,FL,0.0,0.0,0.0,0,,,0.0,328.0,0.0549,0.0793,0.7561,0.0579,0.0183,0.0000,0.0335,0.0000,0.0000,0.9573,1,0.6049,0.6170,0.7344,PrivacySuppressed,3640


### Problem 6
<span  style="color:green; font-size:16px">Use the **`max`** method across the rows for DataFrame **`college_ugds`**. Take the results and apply the pandas **`cut`** function to create a Series with 3 category labels on how 'diverse' the school is.</span>

In [151]:
college_ugds = college.filter(like='UGDS_')
ugds_max = college_ugds.max(axis=1)
# pd.cut(ugds_max, 3, labels=['Good', 'Medium', 'Bad']).head(10)
pd.cut(ugds_max, bins=[0, .4, .7, 1], labels=['Good', 'Medium', 'Bad']).head(15)

INSTNM
A & W Healthcare Educators                                Bad
A T Still University of Health Sciences                   NaN
ABC Beauty Academy                                        Bad
ABC Beauty College Inc                                 Medium
AI Miami International University of Art and Design    Medium
AIB College of Business                                Medium
AOMA Graduate School of Integrative Medicine              NaN
ASA College                                              Good
ASI Career Institute                                   Medium
ASM Beauty World Academy                                  Bad
ATA Career Education                                   Medium
ATA College                                              Good
ATEP at IVC                                               NaN
ATI College-Norwalk                                    Medium
ATS Institute of Technology                               Bad
dtype: category
Categories (3, object): [Good < Medium < Bad]

### Problem 7
<span  style="color:green; font-size:16px">Use the **`select_dtpyes`** method on the **`college`** DataFrame to select only the numeric columns. Google is your friend here.</span>

In [147]:
college.select_dtypes(include=[np.number])

Unnamed: 0_level_0,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
A & W Healthcare Educators,0.0,0.0,0.0,0,,,0.0,40.0,0.0000,0.9750,0.0250,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.1250,1,0.7018,0.8596,0.6667
A T Still University of Health Sciences,0.0,0.0,0.0,0,,,0.0,,,,,,,,,,,,1,,,
ABC Beauty Academy,0.0,0.0,0.0,0,,,0.0,30.0,0.0000,0.0333,0.0333,0.9333,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0,0.7857,0.0000,0.8286
ABC Beauty College Inc,0.0,0.0,0.0,0,,,0.0,38.0,0.2895,0.6579,0.0526,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.2105,1,0.9815,1.0000,0.4688
AI Miami International University of Art and Design,0.0,0.0,0.0,0,,,0.0,2778.0,0.0324,0.0198,0.4773,0.0018,0.0000,0.0000,0.0018,0.0025,0.4644,0.2185,1,0.5507,0.6966,0.3262
AIB College of Business,0.0,0.0,0.0,0,,,0.0,1012.0,0.4516,0.0375,0.0484,0.0079,0.0030,0.0049,0.0128,0.0198,0.4140,0.2490,1,0.4132,0.7125,0.3209
AOMA Graduate School of Integrative Medicine,0.0,0.0,0.0,0,,,0.0,,,,,,,,,,,,1,,,
ASA College,0.0,0.0,0.0,0,,,0.0,4551.0,0.0420,0.3228,0.3894,0.0751,0.0009,0.0013,0.0132,0.1525,0.0029,0.0650,1,0.7992,0.6527,0.4816
ASI Career Institute,0.0,0.0,0.0,0,,,0.0,109.0,0.6147,0.2661,0.1193,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,1,0.2571,0.2571,0.1818
ASM Beauty World Academy,0.0,0.0,0.0,0,,,0.0,328.0,0.0549,0.0793,0.7561,0.0579,0.0183,0.0000,0.0335,0.0000,0.0000,0.9573,1,0.6049,0.6170,0.7344


### Problem 8
<span  style="color:green; font-size:16px">Use **`filter`** to slim your DataFrame down to the **SAT** columns. Then lookup how to use the **`dropna`** method and return a DataFrame that has no missing values. Use the style **`bar`** on the top 10 rows of this DataFrame.</span>

In [153]:
college_sat = college.filter(like='SAT', axis=1).dropna()
college_sat.head(10).style.bar()
# college_sat.style.bar()

Unnamed: 0_level_0,SATVRMID,SATMTMID
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Abilene Christian University,530,545
Abraham Baldwin Agricultural College,465,460
Adams State University,475,509
Adelphi University,550,565
Adrian College,500,490
Adventist University of Health Sciences,473,453
Alabama A & M University,424,420
Alabama State University,425,430
Alaska Pacific University,555,503
Albany College of Pharmacy and Health Sciences,555,610


### Problem 9
<span  style="color:green; font-size:16px">How many colleges have more than 10,000 students and are religiously affiliated?</span>

In [149]:
((college.RELAFFIL==1) & (college.UGDS > 10000)).sum()

10

In [150]:
college[(college.RELAFFIL==1) & (college.UGDS > 10000)].sort_values(['STABBR', 'CITY'])

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Saint Leo University,Saint Leo,FL,0.0,0.0,0.0,1,,,0.0,11976.0,0.3823,0.3696,0.1104,0.0124,0.005,0.0022,0.0145,0.0256,0.0781,0.3059,1,0.4828,0.6032,0.7228,42100,25000
Kennesaw State University,Kennesaw,GA,0.0,0.0,0.0,1,545.0,535.0,0.0,23058.0,0.6082,0.1904,0.0755,0.0339,0.0023,0.0016,0.042,0.0186,0.0273,0.2397,1,0.4067,0.5462,0.2518,40000,22750
Brigham Young University-Idaho,Rexburg,ID,0.0,0.0,0.0,1,515.0,505.0,0.0,23865.0,0.8011,0.0048,0.0303,0.0094,0.0035,0.0044,0.0569,0.0659,0.0238,0.3462,1,0.4733,0.2138,0.371,38800,11000
DePaul University,Chicago,IL,0.0,0.0,0.0,1,,,0.0,15858.0,0.5518,0.0832,0.1756,0.0778,0.0008,0.0017,0.0388,0.0292,0.0411,0.1438,1,0.3504,0.578,0.2019,50300,23500
Loyola University Chicago,Chicago,IL,0.0,0.0,0.0,1,575.0,580.0,0.0,10042.0,0.6028,0.0376,0.132,0.1105,0.001,0.0021,0.0604,0.0352,0.0183,0.0833,1,0.2817,0.6092,0.0804,50700,25000
Indiana Wesleyan University-Marion,Marion,IN,0.0,0.0,0.0,1,530.0,525.0,0.0,10218.0,0.7531,0.1825,0.0307,0.0065,0.0024,0.0008,0.0206,0.0023,0.0012,0.0762,1,0.3816,0.7019,0.6919,46300,24160
St John's University-New York,Queens,NY,0.0,0.0,0.0,1,540.0,560.0,0.0,10878.0,0.338,0.189,0.1452,0.182,0.0024,0.0033,0.0448,0.053,0.0423,0.0297,1,0.3009,0.5599,0.038,52700,25910
Baylor University,Waco,TX,0.0,0.0,0.0,1,610.0,620.0,0.0,13801.0,0.6402,0.0733,0.1413,0.0625,0.0036,0.0005,0.0451,0.0309,0.0024,0.0162,1,0.2135,0.452,0.0245,48200,25131
Brigham Young University-Provo,Provo,UT,0.0,0.0,0.0,1,630.0,630.0,0.0,27163.0,0.832,0.005,0.0563,0.0195,0.0037,0.0058,0.0344,0.0314,0.0118,0.0981,1,0.3702,0.1921,0.122,57200,11000
Liberty University,Lynchburg,VA,0.0,0.0,0.0,1,525.0,510.0,0.0,49340.0,0.5121,0.155,0.0166,0.0093,0.0059,0.0022,0.0227,0.0135,0.2626,0.4458,1,0.4984,0.6648,0.6265,35600,23250
