# National University Ranking

In [1]:
import pandas as pd

In [2]:
nur = pd.read_csv("National Universities Rankings.csv", index_col=0)
nur.shape

(231, 7)

ROUGH OUTLINE
1. extract the state short code from location
2. extract the year founded from description
3. clean up the fees, in-state and enrollment columns
4. clean up the columns names

* 51 states

5. seperate each state schools
6. each state will be a table of it's own in the database for database optimization
7. create a state table
8. can rank school inside each state based on enrollment

-- EDA

9. Top school based on rank, tuition fees, instate fees, and enrollment
10. Top 2 schools within 56 states
11. oldest schools based on year founded
12. oldest school within 51 states

-- App interface

13. A brief overview with visuals from top school based on rank (overall)
14. selection box to select state. Once selected, brief overview with a visual of the top school in that state.
* Overview will include details based on the available data like average tuition and top ranked school.
15. Another section where we take in input from the user to recommend a movie within a choosen state.
16. If no state is choosen, we recommend based on the user location, and closest state.
* For closest state, we can do some resear to know which state is closer to each other (Feature engineering) 

## Objectives
Outline breakdown

### Data Cleaning: 
* Objective: Clean the data and feature engineer new columns
* Tools: Python

### EDA
* Objective: Create visuals from the cleaned data
* Tools: Power BI, Python

### Database Engineers
* Objective: Create the database and the schema. Views if possible
* Tools: SQL

### Project App
* Objective: Create the user interface
* Tools: Python, SQL, USer

### Technical Writers
* Objective: Document every process
* Microsoft/Google Suites (Slide, Docs, Excel, etc)

## Have a video meeting to discuss milestone achievement

### Developing a searchable database to help high school students identify colleges that match their criteria in terms of tuition, graduation rate, location, and rank.

In [3]:
nur.sample(20)

Unnamed: 0_level_0,Name,Location,Rank,Description,Tuition and fees,In-state,Undergrad Enrollment
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
117,Michigan Technological University,"Houghton, MI",118,Michigan Technological University is located i...,"$30,968","$14,634",5721
194,University of Houston,"Houston, TX",194,The University of Houston is situated in one o...,"$25,410","$10,710",34716
1,Harvard University,"Cambridge, MA",2,"Harvard is located in Cambridge, Massachusetts...","$47,074",,6699
113,University of Nebraska--Lincoln,"Lincoln, NE",111,There are about 150 majors to choose from at t...,"$23,148","$8,628",20182
163,Biola University,"La Mirada, CA",164,"Founded in 1908, Biola University is a private...","$36,696",,4225
159,University of Alabama--Birmingham,"Birmingham, AL",159,"Founded in 1969, University of Alabama--Birmin...","$17,654","$7,766",11511
157,University of Massachusetts--Lowell,"Lowell, MA",152,"Founded in 1894, University of Massachusetts--...","$29,125","$13,427",13266
78,Miami University--Oxford,"Oxford, OH",79,Miami University students make up nearly half ...,"$31,592","$14,288",16387
5,Stanford University,"Stanford, CA",5,Stanford University's pristine campus is locat...,"$47,940",,6999
13,Brown University,"Providence, RI",14,"Located atop College Hill in Providence, R.I.,...","$51,367",,6652


In [4]:
nur['Location'].apply(lambda x: x.split(',')[1]).unique()

array([' NJ', ' MA', ' IL', ' CT', ' NY', ' CA', ' NC', ' PA', ' MD',
       ' NH', ' RI', ' TX', ' IN', ' TN', ' MO', ' GA', ' DC', ' VA',
       ' MI', ' OH', ' LA', ' FL', ' WI', ' WA', ' SC', ' UT', ' MN',
       ' DE', ' CO', ' IA', ' OK', ' VT', ' AL', ' OR', ' NE', ' KS',
       ' AZ', ' KY', ' AR', ' MS', ' HI', ' ID', ' WY', ' NM', ' ME',
       ' WV', ' ND', ' NV', ' SD', ' AK', ' MT'], dtype=object)

In [5]:
len(nur['Location'].apply(lambda x: x.split(',')[1]).unique())

51

In [6]:
df_nur = nur.copy()

In [7]:
df_nur.dtypes

Name                    object
Location                object
Rank                     int64
Description             object
Tuition and fees        object
In-state                object
Undergrad Enrollment    object
dtype: object

In [8]:
df_nur.isna().sum()

Name                     0
Location                 0
Rank                     0
Description              0
Tuition and fees         0
In-state                98
Undergrad Enrollment     0
dtype: int64

In [9]:
# Remove the dollar sign ($) and comma
df_nur['Tuition and fees'] = df_nur['Tuition and fees'].str.replace('$', '').str.replace(',', '')

# convert the column datatype to integer
df_nur['Tuition and fees'] = df_nur['Tuition and fees'].fillna('0').astype(int)

In [10]:
df_nur['Tuition and fees'].sample(10)

index
68     29758
77     29371
87     38470
14     50953
21     40191
129    46132
16     49685
119    25994
143    28846
79     40241
Name: Tuition and fees, dtype: int64

In [11]:
df_nur.dtypes

Name                    object
Location                object
Rank                     int64
Description             object
Tuition and fees         int64
In-state                object
Undergrad Enrollment    object
dtype: object

In [12]:
df_nur.sample(10)

Unnamed: 0_level_0,Name,Location,Rank,Description,Tuition and fees,In-state,Undergrad Enrollment
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
98,Auburn University,"Auburn, AL",99,"Auburn, Alabama, has been ranked one of the be...",28840,"$10,696",21786
9,Johns Hopkins University,"Baltimore, MD",10,Johns Hopkins University has four main campuse...,50410,,6524
19,Emory University,"Atlanta, GA",20,Emory University is located in the suburb of D...,47954,,6867
172,University of Idaho,"Moscow, ID",171,University of Idaho is located in the northwes...,22040,"$7,232",9116
10,Dartmouth College,"Hanover, NH",11,"Dartmouth College, located in Hanover, New Ham...",51438,,4307
164,Maryville University of St. Louis,"St Louis, MO",164,"Founded in 1872, Maryville University of St. L...",27958,,2795
75,Texas A&M University--College Station,"College Station, TX",74,Ready to be an Aggie? All students assume the ...,28768,"$10,176",48960
181,University of New Mexico,"Albuquerque, NM",176,"Founded in 1889, University of New Mexico is a...",21302,"$7,071",20522
187,Kent State University,"Kent, OH",188,Kent State University is located in northeaste...,18376,"$10,012",23607
62,University of Connecticut,"Storrs, CT",60,"The University of Connecticut, located in Stor...",35858,"$14,066",18826
