<a href="https://colab.research.google.com/github/AbiramiRathina/roi_based_program_selection/blob/abirami/project_big_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [103]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
import numpy as np

# Dataset Description
*Dataset documentation: https://collegescorecard.ed.gov/data/data-documentation/*

*Dataset: https://collegescorecard.ed.gov/data/*

### Overview
In our project, we use two datasets: an institution-level dataset and a cohort (field-of-study)–level dataset. Combined, these data sources help us estimate the return on investment (ROI) of academic programs. Given a list of university choices, a student will be able to compare multiple programs based on ROI.

For simplicity, we restrict our analysis to data from the year 2025. Although we recognize that a fully informed decision requires examining trends across multiple years, the large dataset size and limitations in computational resources make multi-year analysis difficult. Therefore, we proceed with only the 2025 data. The first step in our project is to understand the structure and content of the data.

Since both datasets are large, instead of mounting Google Drive in Colab, we host them via public Google Drive links and load them directly. This approach saves time and ensures that the data is easily accessible to anyone running the project.

### Dataset 1: Institution-Level Data

This dataset contains information on approximately 6,429 institutions and 3,306 features. The data can be grouped into the following categories:

(i) Institutional Demographics

Institution name, location, control (public/private), sector

Campus type, degree levels offered

Admissions information, acceptance rates

Program offerings (CIP codes)

(ii) Cost & Affordability Indicators

Tuition and fees (in-state, out-of-state)

Net price after grants

Average annual cost by family income bracket

Cost of attendance and living expenses

(iii) Financial Aid & Debt

Percentage of students receiving Pell Grants

Average student loan amounts

Median debt at graduation

Repayment and default rates

(iv) Student Demographics

Enrollment numbers

Gender and race/ethnicity distributions

First-generation status

Part-time vs. full-time enrollment

(v) Academic Performance

Retention rates

Completion and graduation rates

Transfer-out and withdrawal rates

(vi) Earnings & Outcomes

Median earnings 1, 2, 6, and 10 years after entry

Employment rates

Loan repayment progress

Earnings by program or award level

Important notes:

Several fields contain NULL or privacy-suppressed values (e.g., “PrivacySuppressed”).

Some variables are only available for specific years.

Certain earnings metrics lag by several years due to tax data availability.

Proper interpretation requires understanding cohort definitions (e.g., first-time students, completers, non-completers).

### Dataset 2: Field-of-Study (Cohort-Level) Data

This dataset contains approximately 229,188 rows and 174 columns. Although the number of columns is smaller than in the institution dataset, the dimensionality remains significant. After reviewing the documentation, the following column groups help structure and understand the data:

(i) Identification & Keys

Institution-level identifiers
Program identifiers (CIP code, credential level)

(ii) Academic Program Information

Characteristics of the program or field of study

(iii) Student Count & Cohort Size

(iv) Cost & Tuition Information

(v) Debt, Loan & Repayment Metrics

(vi) Earnings & Employment Outcomes

(vii) Loan Repayment & Default Indicators

Useful for assessing program-level financial risk

(viii) Demographics

(ix) Program-level demographic details (gender, race, etc.)

(x) Institution Characteristics

Helpful when merging with the institution-level dataset

As with the institution data, this dataset contains many privacy-suppressed values (PS) and null or missing entries that require cleaning.

### EDA

#### Dataset 1: Institute level data

In [6]:
url_institutes = "https://drive.google.com/uc?export=download&id=1SIZufYNWCC91scwafSx3LYOAMEHyfvrr" # https://drive.google.com/file/d/1SIZufYNWCC91scwafSx3LYOAMEHyfvrr/view?usp=sharing"
df_institutes = pd.read_csv(url_institutes)

  df_institutes = pd.read_csv(url_institutes)


In [15]:
df_institutes

Unnamed: 0,UNITID,OPEID,OPEID6,INSTNM,CITY,STABBR,ZIP,ACCREDAGENCY,INSTURL,NPCURL,...,COUNT_WNE_MALE1_P11,GT_THRESHOLD_P11,MD_EARN_WNE_INC1_P11,MD_EARN_WNE_INC2_P11,MD_EARN_WNE_INC3_P11,MD_EARN_WNE_INDEP0_P11,MD_EARN_WNE_INDEP1_P11,MD_EARN_WNE_MALE0_P11,MD_EARN_WNE_MALE1_P11,SCORECARD_SECTOR
0,100654,100200.0,1002.0,Alabama A & M University,Normal,AL,35762,Southern Association of Colleges and Schools C...,www.aamu.edu/,www.aamu.edu/admissions-aid/tuition-fees/net-p...,...,777.0,0.6250,36650.0,41070.0,47016.0,38892.0,41738.0,38167.0,40250.0,4
1,100663,105200.0,1052.0,University of Alabama at Birmingham,Birmingham,AL,35294-0110,Southern Association of Colleges and Schools C...,https://www.uab.edu/,https://tcc.ruffalonl.com/University of Alabam...,...,1157.0,0.7588,47182.0,51896.0,54368.0,50488.0,51505.0,46559.0,59181.0,4
2,100690,2503400.0,25034.0,Amridge University,Montgomery,AL,36117-3553,Southern Association of Colleges and Schools C...,https://www.amridgeuniversity.edu/,https://www2.amridgeuniversity.edu:9091/,...,67.0,0.5986,35752.0,41007.0,,,38467.0,32654.0,49435.0,5
3,100706,105500.0,1055.0,University of Alabama in Huntsville,Huntsville,AL,35899,Southern Association of Colleges and Schools C...,www.uah.edu/,finaid.uah.edu/,...,802.0,0.7810,51208.0,62219.0,62577.0,55920.0,60221.0,47787.0,67454.0,4
4,100724,100500.0,1005.0,Alabama State University,Montgomery,AL,36104-0271,Southern Association of Colleges and Schools C...,www.alasu.edu/,www.alasu.edu/cost-aid/tuition-costs/net-price...,...,1049.0,0.5378,32844.0,36932.0,37966.0,34294.0,31797.0,32303.0,36964.0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6424,49382201,4283601.0,42836.0,College Unbound - Newport/Aquidneck Island,Newport,RI,028400000,New England Commission on Higher Education,https://www.collegeunbound.edu/,,...,,,,,,,,,,14
6425,49425001,2609404.0,26094.0,Valley College - Fairlawn - School of Nursing,Fairlawn,OH,443333631,Accrediting Commission of Career Schools and C...,https://www.valley.edu/,,...,,0.4651,26087.0,37545.0,,,28205.0,27499.0,,15
6426,49501301,4247201.0,42472.0,Western Maricopa Education Center - Southwest ...,Buckeye,AZ,85326-5705,Council on Occupational Education,https://west-mec.edu/findyourhappy,,...,,,,,,,,,,13
6427,49501302,4247202.0,42472.0,Western Maricopa Education Center - Northeast ...,Phoenix,AZ,85027-0000,Council on Occupational Education,https://west-mec.edu/findyourhappy,,...,,,,,,,,,,13


In [16]:
df_institutes.shape

(6429, 3306)

As we can see we have 6429 rows and 3306 columns, this is alot of features, for our problem statement information regarding the strength of the program(value it holds in terms of roi) is what trully matters

In [28]:
df_institutes.dtypes.unique()


array([dtype('int64'), dtype('float64'), dtype('O')], dtype=object)

we can see tat we have both numeric and object datatypes, that means we will have to extract stats for each differently

First we'll describle information about the numeric columns

In [19]:
df_institutes.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
UNITID,6429.0,2.355496e+06,8.043861e+06,100654.0,174570.00,229540.0,458955.00,49664501.0
OPEID,6405.0,1.762653e+06,2.173129e+06,100200.0,304800.00,1019817.0,3101900.00,82098816.0
OPEID6,6405.0,1.714251e+04,1.533956e+04,1002.0,3037.00,10198.0,30987.00,43098.0
SCH_DEG,5926.0,1.961019e+00,9.019431e-01,1.0,1.00,2.0,3.00,3.0
HCM2,6429.0,5.599627e-03,7.462665e-02,0.0,0.00,0.0,0.00,1.0
...,...,...,...,...,...,...,...,...
MD_EARN_WNE_INDEP0_P11,4615.0,4.215305e+04,1.554240e+04,9834.0,31734.50,39402.0,49877.00,128900.0
MD_EARN_WNE_INDEP1_P11,4619.0,4.043234e+04,1.590708e+04,9978.0,29054.00,37751.0,48690.00,155413.0
MD_EARN_WNE_MALE0_P11,4828.0,3.839875e+04,1.508965e+04,10586.0,27895.25,35233.0,45567.25,126750.0
MD_EARN_WNE_MALE1_P11,4226.0,4.918016e+04,1.730486e+04,8364.0,38631.75,46595.5,57097.75,248999.0


In [38]:
df_institutes.describe(include="object").T

Unnamed: 0,count,unique,top,freq
INSTNM,6429,6321,Cortiva Institute,6
CITY,6429,2362,New York,75
STABBR,6429,59,CA,672
ZIP,6429,5819,00961,6
ACCREDAGENCY,6225,40,Higher Learning Commission,1153
...,...,...,...,...
CONTROL_PEPS,6405,3,Proprietary,2411
ADM_RATE_SUPP,1946,1504,PS,174
ADDR,5921,5868,One College Drive,6
PCTPELL_DCS_POOLED_SUPP,5628,4303,PS,69


In [42]:
pd.set_option('display.max_rows', 5000)
df_institutes.nunique().sort_values(ascending=False)

Unnamed: 0,0
UNITID,6429
OPEID,6377
INSTNM,6321
LATITUDE,5874
LONGITUDE,5873
ADDR,5868
ZIP,5819
INSTURL,5541
FEDSCHCD,5419
TUITFTE,5204


we definately have a lot of data, that just cant be manually double checked

Possible steps to reduce thew number of features for numeric columns is to first drop columns with null values, then we can also drop columns that have only one unique value, we seem to have a few of those.

For categorical data we can observe that the top value for alot of columns is PS(meaning these are not disclosed for privacy reasons, that means we might have to drop these columns as they don't hold much value)

we also see that with regards to unique values we have multiple identifies for institues but the UNITID and INSTNM are enough identifiers. There also many columns that have 0-1 unique values, these dont add any value to our model

In [63]:
empty_cols = df_institutes.columns[df_institutes.isna().all()]
len(empty_cols)

75

In [64]:
constant_cols = df_institutes.columns[df_institutes.nunique(dropna=True) <= 1]
len(constant_cols)

193

In [65]:
obj_desc = df_institutes.describe(include='object').T
ps_cols = obj_desc[obj_desc['top'] == 'PS'].index.tolist()
len(ps_cols)

2319

In [66]:
bad_cols = set(empty_cols) | set(constant_cols) | set(ps_cols)
len(bad_cols)

2400

we have 2400 columns that can be removed

In [67]:
df_institutes_cleaned = df_institutes.drop(columns=list(bad_cols))
df_institutes_cleaned.shape


(6429, 906)

In [68]:
df_institutes.shape

(6429, 3306)

We have successfully reduced the number of featues from 3306 to 906

In [69]:
df_institutes_cleaned.nunique().sort_values(ascending=False)

Unnamed: 0,0
UNITID,6429
OPEID,6377
INSTNM,6321
LATITUDE,5874
LONGITUDE,5873
ADDR,5868
ZIP,5819
INSTURL,5541
FEDSCHCD,5419
TUITFTE,5204


From the abouve table we can see that there are some columns that provide too much information that might not be needed like:OPEID(Office of Postsecondary Education Identifier), LATITUDE, LONGITUDE, ADDT, ZIP, FEDSCHCD (A type of fedral aid code), INSTURL, TUITFTE, NPCURL, OPEID6, INEXPFTE

And some columns related to demografic information, that dont hold much value for our problem. We can remove these

Any column with cip in the begining again dont matter much here as these are course level completion rates, this is not required for our project

Columns tha have HH in it, are household information, again of very less value in our project context, eg: LN_MEDIAN_HH_INC

POOLYRS* columns only tell you how many years of data were pooled to calculate certain repayment variables. They are metadata, not features. They do not help with prediction and do not describe the institution or program. Examples of such columns:POOLYRS100, POOLYRS1, POOLYRS10, POOLYRS5

We can remove MTHCMP1 and similar columns (MTHCMP2 … MTHCMP6). These columns only describe the average months to complete the institution’s top programs, not the specific program you’re evaluating. They do not contribute to ROI modeling and mostly add noise, so dropping them is appropriate.

In [70]:
cols_to_remove_manual = [
    'OPEID', 'LATITUDE', 'LONGITUDE', 'ADDR', 'ZIP', 'FEDSCHCD',
    'INSTURL', 'TUITFTE', 'NPCURL', 'OPEID6', 'INEXPFTE'
]

df_institutes_cleaned = df_institutes_cleaned.drop(columns=[col for col in cols_to_remove_manual if col in df_institutes_cleaned.columns])


In [71]:
df_institutes_cleaned.shape

(6429, 895)

In [92]:
demographic_keywords = [
    "male", "female", "men", "women",
    "black", "white", "hisp", "asian",
    "race", "ethnic", "minority", "cip", "hh", "poolyrs", "mthcmp"
]

demographic_cols = [
    c for c in df_institutes_cleaned.columns
    if any(k in c.lower() for k in demographic_keywords)
]

print("Columns to remove:", demographic_cols)


Columns to remove: ['MTHCMP1', 'MTHCMP2', 'MTHCMP3', 'MTHCMP4', 'MTHCMP5', 'MTHCMP6']


In [93]:
len(set(demographic_cols))

6

In [94]:
df_institutes_cleaned = df_institutes_cleaned.drop(columns=[col for col in demographic_cols if col in df_institutes_cleaned.columns])


In [95]:
df_institutes_cleaned.shape

(6429, 431)

In [96]:
df_institutes_cleaned.nunique().sort_values(ascending=False)

Unnamed: 0,0
UNITID,6429
INSTNM,6321
MD_EARN_WNE_1YR,4116
MD_EARN_WNE_5YR,4072
PCTFLOAN_DCS_POOLED_SUPP,4045
MD_EARN_WNE_4YR,4028
PCT75_EARN_WNE_P6,4005
MD_EARN_WNE_P6,3986
PCT75_EARN_WNE_P8,3962
MD_EARN_WNE_P8,3959


we have done alot of fikltering to reduce the number of features, ut we just dont seem to get the right number of columns, hence we are going to use a tree based model like RandomForestRegressor to pick the top 20 features. The ability to prune these tree models, can be a useful technique in our case to select important features, lets assume try to do this but trying to predict MD_EARN_WNE_1YR

In [106]:
df_institutes_cleaned_features_select = df_institutes_cleaned.dropna(subset=['MD_EARN_WNE_1YR'])


In [107]:
y = df_institutes_cleaned_features_select['MD_EARN_WNE_1YR']
X = df_institutes_cleaned_features_select.drop(columns=['MD_EARN_WNE_1YR'])

In [108]:
X = X.select_dtypes(include=['float64','int64'])

In [109]:
imputer = SimpleImputer(strategy='median')
X_imputed = imputer.fit_transform(X)

In [110]:
model = RandomForestRegressor(
    n_estimators=300,
    random_state=42,
    n_jobs=-1
)

model.fit(X_imputed, y)


In [111]:
importances = pd.Series(model.feature_importances_, index=X.columns)
important_cols = importances.sort_values(ascending=False)
important_cols.head(30)

Unnamed: 0,0
MD_EARN_WNE_4YR,0.709182
MD_EARN_WNE_5YR,0.054804
SCORECARD_SECTOR,0.016385
PREDDEG,0.012938
MD_EARN_WNE_P6,0.009218
UG25ABV,0.007279
NPT4_PRIV,0.006489
MD_EARN_WNE_INC2_P6,0.00535
COUNT_NWNE_1YR,0.004636
COUNT_WNE_1YR,0.004507


We now have 30 features from which we can now easily handpick:
lets choose the following columns

UNITID: for unique identification of universities

INSTNM: Ewven though it was not chosen by the model, this represents the name of the university in text, and can be usful while interpreting results

1. MD_EARN_WNE_4YR

Median earnings of completers who are working and not enrolled 4 years after graduation.

2. MD_EARN_WNE_5YR

Median earnings of completers 5 years after graduation (long-term earnings signal).

3. MD_EARN_WNE_P6

60th-percentile earnings of graduates (captures upper-middle earning potential).

4. GT_THRESHOLD_1YR

Share of graduates earning above the federal threshold one year after completion.

5. GT_THRESHOLD_P6

Percentage of graduates hitting the earnings threshold at the 60th percentile.

6. TUITIONFEE_PROG

Program-specific tuition and fees (direct cost impacting ROI).

7. NPT4_PRIV

Net price for private 4-year institutions after grants (actual cost to student).

8. MD_EARN_WNE_INC2_P6

60th-percentile earnings of graduates from middle-income families (income group 2).

9. MD_EARN_WNE_INC3_P6

60th-percentile earnings of graduates from higher-income families (income group 3).

10. PCT75_EARN_WNE_P11

Percentage of graduates reaching threshold earnings at the 75th percentile.

11. ROOMBOARD_ON

Average on-campus room and board cost (major part of total cost).

12. AVGFACSAL

Average faculty salary (proxy for institutional quality and resources).



In [116]:
selected_features = [
    "UNITID",
    "INSTNM",
    "MD_EARN_WNE_4YR",
    "MD_EARN_WNE_5YR",
    "MD_EARN_WNE_P6",
    "GT_THRESHOLD_1YR",
    "GT_THRESHOLD_P6",
    "TUITIONFEE_PROG",
    "NPT4_PRIV",
    "MD_EARN_WNE_INC2_P6",
    "MD_EARN_WNE_INC3_P6",
    "PCT75_EARN_WNE_P11",
    "ROOMBOARD_ON",
    "AVGFACSAL"
]

df_institutes_cleaned_final = df_institutes_cleaned[selected_features]


In [117]:
df_institutes_cleaned_final

Unnamed: 0,UNITID,INSTNM,MD_EARN_WNE_4YR,MD_EARN_WNE_5YR,MD_EARN_WNE_P6,GT_THRESHOLD_1YR,GT_THRESHOLD_P6,TUITIONFEE_PROG,NPT4_PRIV,MD_EARN_WNE_INC2_P6,MD_EARN_WNE_INC3_P6,PCT75_EARN_WNE_P11,ROOMBOARD_ON,AVGFACSAL
0,100654,Alabama A & M University,46562.0,52246.0,27851.0,355.0,0.4613,,,31228.0,33539.0,56598.0,11402.0,8610.0
1,100663,University of Alabama at Birmingham,52404.0,60738.0,46572.0,2290.0,0.7443,,,49623.0,50532.0,75896.0,13590.0,12211.0
2,100690,Amridge University,45765.0,49649.0,30377.0,20.0,0.5026,,,,,59803.0,,5109.0
3,100706,University of Alabama in Huntsville,67695.0,78740.0,55610.0,880.0,0.7854,,,56719.0,60565.0,87130.0,11122.0,10411.0
4,100724,Alabama State University,37551.0,43913.0,27453.0,316.0,0.4467,,,28989.0,31482.0,49594.0,7690.0,8015.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6424,49382201,College Unbound - Newport/Aquidneck Island,,,,,,,,,,,,
6425,49425001,Valley College - Fairlawn - School of Nursing,22873.0,25262.0,25548.0,136.0,0.4118,18625.0,,,,40067.0,,
6426,49501301,Western Maricopa Education Center - Southwest ...,,,,,,,,,,,,
6427,49501302,Western Maricopa Education Center - Northeast ...,,,,,,,,,,,,


In [118]:
df_institutes_cleaned_final.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
UNITID,6429.0,2355496.0,8043861.0,100654.0,174570.0,229540.0,458955.0,49664501.0
MD_EARN_WNE_4YR,5500.0,40667.7,16760.42,9253.0,28120.75,39686.5,49565.0,139418.0
MD_EARN_WNE_5YR,5484.0,46206.59,19233.09,8049.0,31471.0,45490.0,56504.5,160672.0
MD_EARN_WNE_P6,5450.0,37299.8,14459.2,8535.0,27583.0,35565.5,44256.0,143353.0
GT_THRESHOLD_1YR,5041.0,925.6985,1988.892,16.0,77.0,225.0,757.0,27755.0
GT_THRESHOLD_P6,5180.0,0.5949676,0.1671113,0.1467,0.474625,0.6,0.7241,1.0
TUITIONFEE_PROG,2154.0,17311.96,10459.51,585.0,12438.25,16297.0,19260.0,157200.0
NPT4_PRIV,3360.0,21372.54,9090.793,1124.0,15549.25,20773.5,26094.5,112070.0
MD_EARN_WNE_INC2_P6,3773.0,44099.08,12633.84,11779.0,36412.0,41967.0,50137.0,141916.0
MD_EARN_WNE_INC3_P6,3773.0,47082.19,14034.2,12082.0,38650.0,45153.0,53417.0,147468.0


In [119]:
df_institutes_cleaned_final.describe(include="object").T

Unnamed: 0,count,unique,top,freq
INSTNM,6429,6321,Cortiva Institute,6


In [121]:
df_institutes_cleaned_final.nunique().sort_values(ascending=False)

Unnamed: 0,0
UNITID,6429
INSTNM,6321
MD_EARN_WNE_5YR,4072
MD_EARN_WNE_4YR,4028
MD_EARN_WNE_P6,3986
PCT75_EARN_WNE_P11,3695
NPT4_PRIV,3149
AVGFACSAL,3123
GT_THRESHOLD_P6,2805
MD_EARN_WNE_INC3_P6,2592


Now our institues, data looks almost perfect, but for now we are going to stop here, and not reduce the dimentions further, as we'll also have to extract important features from the cohort level information ,merge both tables and produce a final dataset where we will do a final feature selection to confirm the columns most useful

In [13]:
url_cohorts = "https://drive.usercontent.google.com/download?id=1dkZwR3JDSTpH9j3oETqIO2L90WSHDgbT&export=download&confirm=t" #https://drive.google.com/file/d/1dkZwR3JDSTpH9j3oETqIO2L90WSHDgbT/view?usp=sharing
df_cohorts = pd.read_csv(url_cohorts)

In [14]:
df_cohorts

Unnamed: 0,UNITID,OPEID6,INSTNM,CONTROL,MAIN,CIPCODE,CIPDESC,CREDLEV,CREDDESC,IPEDSCOUNT1,...,EARN_COUNT_PELL_WNE_5YR,EARN_PELL_WNE_MDN_5YR,EARN_COUNT_NOPELL_WNE_5YR,EARN_NOPELL_WNE_MDN_5YR,EARN_COUNT_MALE_WNE_5YR,EARN_MALE_WNE_MDN_5YR,EARN_COUNT_NOMALE_WNE_5YR,EARN_NOMALE_WNE_MDN_5YR,EARN_COUNT_HIGH_CRED_5YR,EARN_IN_STATE_5YR
0,100654.0,1002,Alabama A & M University,Public,1,100,"Agriculture, General.",3,Bachelor's Degree,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
1,100654.0,1002,Alabama A & M University,Public,1,101,Agricultural Business and Management.,3,Bachelor's Degree,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
2,100654.0,1002,Alabama A & M University,Public,1,109,Animal Sciences.,3,Bachelor's Degree,3.0,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
3,100654.0,1002,Alabama A & M University,Public,1,110,Food Science and Technology.,3,Bachelor's Degree,7.0,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
4,100654.0,1002,Alabama A & M University,Public,1,110,Food Science and Technology.,5,Master's Degree,4.0,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
229183,,43006,Southeast New Mexico College,Public,1,5201,"Business/Commerce, General.",2,Associate's Degree,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
229184,,43006,Southeast New Mexico College,Public,1,5203,Accounting and Related Services.,1,Undergraduate Certificate or Diploma,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
229185,,43006,Southeast New Mexico College,Public,1,5204,Business Operations Support and Assistant Serv...,1,Undergraduate Certificate or Diploma,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
229186,,43006,Southeast New Mexico College,Public,1,5204,Business Operations Support and Assistant Serv...,2,Associate's Degree,,...,PS,PS,PS,PS,PS,PS,PS,PS,PS,PS
