In [14]:
import os
import numpy as np
import pandas as pd

In [15]:
# For els2002 institution selectivity classification; per F3PSSELECT variable description, its
# values are based on the 2010 Carnegie Classification: Undergraduate Profile, CCUGPROF variable
df2012 = pd.read_csv('../data/raw_data/HD2012/hd2012.csv', delimiter=',', encoding='cp1252') 

# For hsls2009 institution selectivity classification; based on
# hsls2009's X4PS1SELECT variable description, they used the 2016 Carnegie Classification
df2016 = pd.read_csv('../data/raw_data/HD2016/hd2016.csv',  delimiter=',', encoding='cp1252')

In [16]:
df2012[:5]

Unnamed: 0,UNITID,INSTNM,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,CHFTITLE,...,CSA,NECTA,F1SYSTYP,F1SYSNAM,FAXTELE,COUNTYCD,COUNTYNM,CNGDSTCD,LONGITUD,LATITUDE
0,100636,Community College of the Air Force,100 S Turner Blvd,Montgomery,AL,36114-3011,1,0,Jonathan T. Hamill,Commandant,...,-2,-2,1,Air University,3346495100.0,1101,Montgomery County,102,-86.352056,32.378278
1,100654,Alabama A & M University,4900 Meridian Street,Normal,AL,35762,1,5,"Dr. Andrew Hugine, Jr.",President,...,290,-2,2,,2563725030.0,1089,Madison County,105,-86.568502,34.783368
2,100663,University of Alabama at Birmingham,Administration Bldg Suite 1070,Birmingham,AL,35294-0110,1,5,Ray L. Watts,President,...,142,-2,1,The University of Alabama System,2059757114.0,1073,Jefferson County,107,-86.80917,33.50223
3,100690,Amridge University,1200 Taylor Rd,Montgomery,AL,36117-3553,1,5,Michael Turner,President,...,-2,-2,2,,3343873878.0,1101,Montgomery County,102,-86.17401,32.362609
4,100706,University of Alabama in Huntsville,301 Sparkman Dr,Huntsville,AL,35899,1,5,Robert A. Altenkirch,President,...,290,-2,1,The University of Alabama System,,1089,Madison County,105,-86.63842,34.722818


In [17]:
df2016[:5]

Unnamed: 0,UNITID,INSTNM,IALIAS,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,...,CBSATYPE,CSA,NECTA,COUNTYCD,COUNTYNM,CNGDSTCD,LONGITUD,LATITUDE,DFRCGID,DFRCUSCG
0,100654,Alabama A & M University,AAMU,4900 Meridian Street,Normal,AL,35762,1,5,"Dr. Andrew Hugine, Jr.",...,1,290,-2,1089,Madison County,105,-86.568502,34.783368,128,1
1,100663,University of Alabama at Birmingham,,Administration Bldg Suite 1070,Birmingham,AL,35294-0110,1,5,Ray L. Watts,...,1,142,-2,1073,Jefferson County,107,-86.799345,33.505697,115,1
2,100690,Amridge University,Southern Christian University |Regions University,1200 Taylor Rd,Montgomery,AL,36117-3553,1,5,Michael Turner,...,1,-2,-2,1101,Montgomery County,102,-86.17401,32.362609,236,2
3,100706,University of Alabama in Huntsville,UAH |University of Alabama Huntsville,301 Sparkman Dr,Huntsville,AL,35899,1,5,Robert A. Altenkirch,...,1,290,-2,1089,Madison County,105,-86.640449,34.724557,118,2
4,100724,Alabama State University,,915 S Jackson Street,Montgomery,AL,36104-0271,1,5,Leon Wilson,...,1,-2,-2,1101,Montgomery County,107,-86.295677,32.364317,136,1


In [18]:
df2012[pd.isnull(df2012["CCUGPROF"])]

Unnamed: 0,UNITID,INSTNM,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,CHFTITLE,...,CSA,NECTA,F1SYSTYP,F1SYSNAM,FAXTELE,COUNTYCD,COUNTYNM,CNGDSTCD,LONGITUD,LATITUDE


In [19]:
df2016[pd.isnull(df2016["C15UGPRF"])]

Unnamed: 0,UNITID,INSTNM,IALIAS,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,...,CBSATYPE,CSA,NECTA,COUNTYCD,COUNTYNM,CNGDSTCD,LONGITUD,LATITUDE,DFRCGID,DFRCUSCG


In [20]:
df2012.columns[df2012.nunique() == 1]

Index([], dtype='object')

In [21]:
df2016.columns[df2016.nunique() == 1]

Index([], dtype='object')

In [22]:
# Excluding graduate program only institutions
print(df2012.shape)

print(df2016.shape)
df2012 = df2012[df2012['INSTCAT'] != 1]
df2016 = df2016[df2016['INSTCAT'] != 1]

print(df2012.shape)

print(df2016.shape)

(7735, 65)
(7521, 72)
(7385, 65)
(7196, 72)


After exploring the variables and their respective descriptions in `hd2012`, our variables of interest and the relevant aspects of their descriptions are described below:

***`CCUGPROF` (renamed `selectivity` in our dataset) variable description:***

Carnegie Classification 2010 Undergraduate Profile retains the same structure of six parallel classifications, initially adopted in 2005. They are as follows: Basic Classification (the traditional Carnegie Classification Framework), Undergraduate and Graduate Instructional Program classifications, Enrollment Profile and Undergraduate Profile classifications, and Size & Setting classification. These classifications provide different lenses through which to view U.S. colleges and universities, offering researchers greater analytic flexibility. 

These classifications are time-specific snapshots of institutional attributes and behavior based on data from 2008 to 2010. and collectively they depict the most current landscape of U.S. colleges and universities. Institutions might be classified differently using a different timeframe. Individual classifications are not updated with more recent data.

This new classification describes the undergraduate population with respect to three characteristics: the proportion who attend part- or full-time; achievement characteristics of first-year students; and the proportion of entering students who transfer in from another institution. Each of these captures important differences in the nature of the undergraduate population. They do not imply differences in the quality of undergraduate education, but they have implications for how an institution serves its students.

**1: Higher part-time two-year.** Fall enrollment data show at least 60 percent of undergraduates enrolled part-time at these associate’s degree granting institutions.

**2: Mixed part/full-time two-year.** Fall enrollment data show 40–59 percent of undergraduates enrolled part-time at these associate’s degree granting institutions.

**3: Medium full-time two-year.** Fall enrollment data show 10–39 percent of undergraduates enrolled part-time at these associate’s degree granting institutions.

**4: Higher full-time two-year.** Fall enrollment data show less than 10 percent of undergraduates enrolled part-time at these associate’s degree granting institutions.

**5: Higher part-time four-year.** Fall enrollment data show at least 40 percent of undergraduates enrolled part-time at these bachelor’s degree granting institutions.

**6: Medium full-time four-year, inclusive.** Fall enrollment data show 60–79 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. These institutions either did not report test score data or the scores indicate that they extend educational opportunity to a wide range of students with respect to academic preparation and achievement.

**7: Medium full-time four-year, selective, lower transfer-in.** Fall enrollment data show 60–79 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are selective in admissions (our analysis of first-year students’ test scores places most of these institutions in roughly the middle two-fifths of baccalaureate institutions). Fewer than 20 percent of entering undergraduates are transfer students.

**8: Medium full-time four-year, selective, higher transfer-in.** Fall enrollment data show 60–79 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are selective in admissions (our analysis of first-year students’ test scores places most of these institutions in roughly the middle two-fifths of baccalaureate institutions). At least 20 percent of entering undergraduates are transfer students.

**9: Full-time four-year, inclusive.** Fall enrollment data show at least 80 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. These institutions either did not report test score data or the scores indicate that they extend educational opportunity to a wide range of students with respect to academic preparation and achievement.

**10: Full-time four-year, selective, lower transfer-in.** Fall enrollment data show at least 80 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are selective in admissions (our analysis of first-year students’ test scores places these institutions in roughly the middle two-fifths of baccalaureate institutions). Fewer than 20 percent of entering undergraduates are transfer students.

**11: Full-time four-year, selective, higher transfer-in.** Fall enrollment data show at least 80 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are selective in admissions (our analysis of first-year students’ test scores places these institutions in roughly the middle two-fifths of baccalaureate institutions). At least 20 percent of entering undergraduates are transfer students.

**12: Full-time four-year, more selective, lower transfer-in.** Fall enrollment data show at least 80 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are more selective in admissions (our analysis of first-year students’ test scores places these institutions in roughly the top fifth of baccalaureate institutions). Fewer than 20 percent of entering undergraduates are transfer students.

**13: Full-time four-year, more selective, higher transfer-in.** Fall enrollment data show at least 80 percent of undergraduates enrolled full-time at these bachelor’s degree granting institutions. Score data for first-year students indicate that these institutions are more selective in admissions (our analysis of first-year students’ test scores places these institutions in roughly the top fifth of baccalaureate institutions). At least 20 percent of entering undergraduates are transfer students.

**0: Not classified**

**-1: Not applicable**

**-2: Not applicable, special focus institution**

**-3: Not applicable, not in Carnegie universe (not accredited or nondegree-granting)**

***For our purposes, "highly selective" institutions, represented by 1, were considered the more selective groups (12 and 13), "moderately selective" institutions, represented by 2, were considered the selective groups (7, 8, 10, 11) and all other groups besides the unclassified/not applicable ones were considered "inclusive", represented by 3, (all others > 0). The unclassified/not applicable ones were represented by -1.***

***`CONTROL` (renamed `type` in our dataset) variable description:***

A classification of whether an institution is operated by publicly elected or appointed officials or by privately elected or appointed officials and derives its major source of funds from private sources.

**1: Public institution** - An educational institution whose programs and activities are operated by publicly elected or appointed school officials and which is supported primarily by public funds. 

**2: Private not-for-profit institution** - A private institution in which the individual(s) or agency in control receives no compensation, other than wages, rent, or other expenses for the assumption of risk. These include both independent not-for-profit schools and those affiliated with a religious organization. 

**3: Private for-profit institution** - A private institution in which the individual(s) or agency in control receives compensation other than wages, rent, or other expenses for the assumption of risk.

**-3: {Not available}**

***For our purposes, we will keep these classifications the same.***

***`INSTSIZE` variable description:***

1: Under 1,000

2: 1,000 - 4,999

3: 5,000 - 9,999

4: 10,000 - 19,999

5: 20,000 and above

-1: Not reported

-2: Not applicable

***For our purposes, we will keep these classifications the same.***

***`OPEFLAG` (renamed `federalFinAid?` in our dataset) variable description:***

Code indicating the institution's degree of eligibility for Title IV aid.

**1: Participates in Title IV federal financial aid programs**

**2: Branch campus of a main campus that participates in Title IV**

**3: Deferment only - limited participation**

**8: New participants (became eligible during spring collection)**

**5: Not currently participating in Title IV, has an OPE ID number**

**6: Not currently participating in Title IV, does not have OPE ID number**

**7: Stopped participating during the survey year**

***For our purposes, we will consider an institution as either participating in Title IV (1), limited participation/may participate in Title IV (2, 3, 8),  or not participating in Title IV (5, 6, 7).***

***`GROFFER` variable description:***
A code indicating whether the institution offers graduate degrees or certificates.

Graduate degrees or certificates include master's and doctor's degrees and postbaccalaureate and post-master's certificates

1: Graduate degree or certificate offering

2: No graduate offering

-3: {Not available}

***For our purposes, we will keep these classifications the same.***

***`LOCALE` (renamed `urbanicity` in our dataset) variable description:***

Locale codes identify the geographic status of a school on an urban continuum ranging from “large city” to “rural.”  They are based on a school’s physical address. The urban-centric locale codes introduced in this file are assigned through a methodology developed by the U.S. Census Bureau’s Population Division in 2005. The urban-centric locale codes apply current geographic concepts to the original NCES locale codes used on IPEDS files through 2004. 

**11: City, Large** - Territory inside an urbanized area and inside a principal city with population of 250,000 or more. 

**12: City, Midsize** - Territory inside an urbanized area and inside a principal city with population less than 250,000 and greater than or equal to 100,000.

**13: City, Small** -  Territory inside an urbanized area and inside a principal city with population less than 100,000.

**21: Suburb, Large** - Territory outside a principal city and inside an urbanized area with population of 250,000 or more.

**22: Suburb, Midsize:** - Territory outside a principal city and inside an urbanized area with population less than 250,000 and greater than or equal to 100,000.

**23: Suburb, Small:** - Territory outside a principal city and inside an urbanized area with population less than 100,000.

**31: Town, Fringe:** - Territory inside an urban cluster that is less than or equal to 10 miles from an urbanized area.

**32: Town, Distant** - Territory inside an urban cluster that is more than 10 miles and less than or equal to 35 miles from an urbanized area.

**33: Town, Remote** - Territory inside an urban cluster that is more than 35 miles of an urbanized area.

**41: Rural, Fringe** - Census-defined rural territory that is less than or equal to 5 miles from an urbanized area, as well as rural territory that is less than or equal to 2.5 miles from an urban cluster. 

**42: Rural, Distant** -  Census-defined rural territory that is more than 5 miles but less than or equal to 25 miles from an urbanized area, as well as rural territory that is more than 2.5 miles but less than or equal to 10 miles from an urban cluster. 

**43: Rural, Remote** - Census-defined rural territory that is more than 25 miles from an urbanized area and is also more than 10 miles from an urban cluster.

**-3: Not available**

***For our purposes, we grouped these classfications into larger groups of "Urban" (11, 12, 13) represented by 1, "Suburb" (21, 22, 23) represented by 2, "Town" (31, 32, 33) represented by 3, "Rural" (41, 42, 43) represented by 4, and "Unknown" represented by -1 for any unavailable values.***

***`STABBR` variable description:***
US Postal Service state abbreviation.

AL	Alabama,
AK	Alaska,
AZ	Arizona,
AR	Arkansas,
CA	California,
CO	Colorado,
CT	Connecticut,
DE	Delaware,
DC	District of Columbia,
FL	Florida,
GA	Georgia,
HI	Hawaii,
ID	Idaho,
IL	Illinois,
IN	Indiana,
IA	Iowa,
KS	Kansas,
KY	Kentucky,
LA	Louisiana,
ME	Maine,
MD	Maryland,
MA	Massachusetts,
MI	Michigan,
MN	Minnesota,
MS	Mississippi,
MO	Missouri,
MT	Montana,
NE	Nebraska,
NV	Nevada,
NH	New Hampshire,
NJ	New Jersey,
NM	New Mexico,
NY	New York,
NC	North Carolina,
ND	North Dakota,
OH	Ohio,
OK	Oklahoma,
OR	Oregon,
PA	Pennsylvania,
RI	Rhode Island,
SC	South Carolina,
SD	South Dakota,
TN	Tennessee,
TX	Texas,
UT	Utah,
VT	Vermont,
VA	Virginia,
WA	Washington,
WV	West Virginia,
WI	Wisconsin,
WY	Wyoming,
AS	American Samoa,
FM	Federated States of Micronesia,
GU	Guam,
MH	Marshall Islands,
MP	Northern Marianas,
PW	Palau,
PR	Puerto Rico,
VI	Virgin Islands

***For our purposes, we will keep these classifications the same.***

***`HBCU` variable description:***
A code to indicate whether the institution is one of the Historically Black College or University (HBCU) institutions.

Historically Black Colleges and Universities (HBCU) - The Higher Education Act of 1965, as amended, defines an HBCU as: ""...any historically black college or university that was established prior to 1964, whose principal mission was, and is, the education of black Americans, and that is accredited by a nationally recognized accrediting agency or association determined by the Secretary [of Education] to be a reliable authority as to the quality of training offered or is, according to such an agency or association, making reasonable progress toward accreditation."" Federal regulations (20 USC 1061 (2)) allow for certain exceptions to the founding date.

1: Yes

2: No

***For our purposes, we substituted the 2 with 0 to represent "No"***

***`TRIBAL` variable description:***

A code to indicate whether the institution is one of the Tribal Colleges and Universities. These institutions, with few exceptions, are tribally controlled and located on reservations. They are all members of the American Indian Higher Education Consortium.

1: Yes

2: No

***For our purposes, we substituted the 2 with 0 to represent "No"***

In [None]:
df2012['selectivity'] = df2012['CCUGPROF'].apply(
    lambda c: (
        1 if c >= 12 else # Highly selective, 4-year institution
        2 if c in [7, 8, 10, 11] else # Moderately selective, 4-year institution
        3 if c > 0 else # Inclusive
        -1 # Unclassified/Not Applicable
    )
)

df2012['federalFinAid?'] = df2012['OPEFLAG'].apply(
    lambda c: (
        1 if c == 1 else # Participating in Title IV
        2 if c in [2, 3, 8] else # Participates in a limited capacity/may participate in Title IV
        -1 # Doesn't participate in Title IV
    )
)

df2012['urbanicity'] = df2012['LOCALE'].apply(
    lambda c: (
        1 if c in [11, 12, 13] else # Urban
        2 if c in [21, 22, 23] else # Suburban
        3 if c in [31, 32, 33] else # Town
        4 if c in [41, 42, 43] else # Rural
        -1 # Unknown
    )
)

df2012['HBCU'] = df2012['HBCU'].apply(
    lambda c: (
        1 if c == 1 else # Yes
        0 # No
    )
)

df2012['TRIBAL'] = df2012['TRIBAL'].apply(
    lambda c: (
        1 if c == 1 else # Yes
        0 # No
    )
)

df2012.rename(columns={'CONTROL': 'type'}, inplace=True)

colleges_2012 = df2012[['UNITID', 'INSTNM', 'selectivity', 'type', 'INSTSIZE' , 'federalFinAid?', 
        'GROFFER', 'urbanicity', 'STABBR', 'HBCU', 'TRIBAL']]

# Displaying the results
colleges_2012

Unnamed: 0,UNITID,INSTNM,selectivity,type,INSTSIZE,federalFinAid?,GROFFER,urbanicity,STABBR,HBCU,TRIBAL
0,100636,Community College of the Air Force,-1,1,-1,2,2,1,AL,0,0
1,100654,Alabama A & M University,3,1,2,1,1,1,AL,1,0
2,100663,University of Alabama at Birmingham,2,1,4,1,1,1,AL,0,0
3,100690,Amridge University,3,2,1,1,1,1,AL,0,0
4,100706,University of Alabama in Huntsville,2,1,3,1,1,1,AL,0,0
...,...,...,...,...,...,...,...,...,...,...,...
7730,480514,Vatterott College-Fairview Heights,-1,3,1,1,2,2,IL,0,0
7731,480523,Ross Medical Education Center,-1,3,1,1,2,1,OH,0,0
7732,480532,Ross Medical Education Center,-1,3,1,-1,2,2,OH,0,0
7733,480550,Ross Medical Education Center,-1,3,1,-1,2,1,KY,0,0


The `hd2016` dataset had similar variables to the `hd2012` one, with the main difference being their selectivity and financial aid related variables. For the `hd2012` we had `CCUGPROF` to use for selectivity, but in `hd2016` it is `C15UGPRF`. Both datasets have `OPEFLAG` for their financial aid related variable but in `hd2016` the variable description is slightly different from `hd2012`'s description. `hd2016`'s descriptions for `C15UGPRF` and `OPEFLAG` is below:

***`C15UGPRF` (renamed `selectivity` in our dataset) variable description:***

1	Two-year, higher part-time

2	Two-year, mixed part/full-time

3	Two-year, medium full-time

4	Two-year, higher full-time

5	Four-year, higher part-time

6	Four-year, medium full-time, inclusive, lower transfer-in

7	Four-year, medium full-time, inclusive, higher transfer-in

8	Four-year, medium full-time, selective, lower transfer-in

9	Four-year, medium full-time , selective, higher transfer-in

10	Four-year, full-time, inclusive, lower transfer-in

11	Four-year, full-time, inclusive, higher transfer-in

12	Four-year, full-time, selective, lower transfer-in

13	Four-year, full-time, selective, higher transfer-in

14	Four-year, full-time, more selective, lower transfer-in

15	Four-year, full-time, more selective, higher transfer-in

0	Not classified (Exclusively Graduate)

-2	Not applicable, not in Carnegie universe (not accredited or nondegree-granting)

***For our purposes, we will follow a similar logic as for `CCUGPROF` in `hd2012`. THe "highly selective" institutions, represented by 1, were considered the more selective groups (14 and 15), "moderately selective" institutions, represented by 2, were considered the selective groups (8, 9, 12, 13) and all other groups besides the unclassified/not applicable ones were considered "inclusive", represented by 3, (all others > 0). The unclassified/not applicable ones were represented by -1.***

***`OPEFLAG` (renamed `federalFinAid?` in our dataset) variable description:***

Code indicating the institution's degree of eligibility for Title IV aid.

**1: Participates in Title IV federal financial aid programs**

**2: Branch campus of a main campus that participates in Title IV**

**3: Deferment only - limited participation**

**5: Not currently participating in Title IV, has an OPE ID number**

**6: Not currently participating in Title IV, does not have OPE ID number**

**7: Stopped participating during the survey year**

***For our purposes, we will consider an institution as either participating in Title IV (1), limited participation/may participate in Title IV (2, 3),  or not participating in Title IV (5, 6, 7).***

The other variables were the same as in `hd2012` and the same feature enginnering, if necessary, was conducted as below.

In [11]:
df2016['selectivity'] = df2016['C15UGPRF'].apply(
    lambda c: (
        1 if c >= 14 else # Highly selective, 4-year institution
        2 if c in [8, 9, 12, 13] else # Moderately selective, 4-year institution
        3 if c > 0 else # Inclusive
        -1 # Unclassified/Not Applicable
    )
)

df2016['federalFinAid?'] = df2016['OPEFLAG'].apply(
    lambda c: (
        1 if c == 1 else # Participating in Title IV
        2 if c in [2, 3] else # Participates in a limited capacity/may participate in Title IV
        -1 # Doesn't participate in Title IV
    )
)

df2016['urbanicity'] = df2016['LOCALE'].apply(
    lambda c: (
        1 if c in [11, 12, 13] else # Urban
        2 if c in [21, 22, 23] else # Suburban
        3 if c in [31, 32, 33] else # Town
        4 if c in [41, 42, 43] else # Rural
        -1 # Unknown
    )
)

df2016['HBCU'] = df2016['HBCU'].apply(
    lambda c: (
        1 if c == 1 else # Yes
        0 # No
    )
)

df2016['TRIBAL'] = df2016['TRIBAL'].apply(
    lambda c: (
        1 if c == 1 else # Yes
        0 # No
    )
)

df2016.rename(columns={'CONTROL': 'type'}, inplace=True)

colleges_2016 = df2016[['UNITID', 'INSTNM', 'selectivity', 'type', 'INSTSIZE', 'federalFinAid?',
        'GROFFER', 'urbanicity', 'STABBR', 'HBCU', 'TRIBAL']]

# Displaying results
colleges_2016

Unnamed: 0,UNITID,INSTNM,selectivity,type,INSTSIZE,federalFinAid?,GROFFER,urbanicity,STABBR,HBCU,TRIBAL
0,100654,Alabama A & M University,3,1,3,1,1,1,AL,1,0
1,100663,University of Alabama at Birmingham,2,1,4,1,1,1,AL,0,0
2,100690,Amridge University,3,2,1,1,1,1,AL,0,0
3,100706,University of Alabama in Huntsville,2,1,3,1,1,1,AL,0,0
4,100724,Alabama State University,3,1,3,1,1,1,AL,1,0
...,...,...,...,...,...,...,...,...,...,...,...
7516,489973,Relay Graduate School of Education - Camden,-1,-3,-2,1,-3,1,NJ,0,0
7517,489982,Relay Graduate School of Education - Denver,-1,-3,-2,1,-3,1,CO,0,0
7518,489991,Relay Graduate School of Education - Nashville,-1,-3,-2,1,-3,1,TN,0,0
7519,490009,Spartan College of Aeronautics and Technology,-1,3,1,1,2,2,CO,0,0


In [12]:
# Saves each resulting dataframe as separate datasets
np.savetxt('collegerec_els2002.csv', colleges_2012, delimiter=',', header=','.join(list(colleges_2012.columns)),  fmt='%s')

np.savetxt('collegerec_hsls2009.csv', colleges_2016, delimiter=',', header=','.join(list(colleges_2016.columns)), fmt='%s')