# America's Top College Rankings 2019 (Forbes) Analysis usin SQL

## About the dataset

Starting in 2008, every year Forbes Magazine publishes a list of America's best colleges. Schools are ranked based on alumni salary (20%), student satisfaction (20%), debt (20%), American leaders (15%), on time graduation rate (12.5%), and academic success (12.5%).
___

## About this notebook

This notebook consists of brief data analysis and data visualtion on the given dataset. Majority of the analysis ponders upon <code>Public vs Private</code> institutions.

### Task performed in this notebook

1. Importing the CSV file into SQL database
2. SQL querries to analyze the data

### Packages/libraries used

| S.No | Package/library | Use
| --- | --- | --- |
|1. | Pandas | Data manipulation and analysis |
|2. | SQLITE | Lightweight SQL database |
|3. | Plotly | Graphs and charts |

In [1]:
import pandas as pd
import sqlite3
import plotly.express as px
import os

In [None]:
# Reading the CSV file

data = pd.read_csv("data/ForbesAmericasTopColleges2019.csv")

# Establishing the SQL connection

conn = sqlite3.connect("college.db")

# Inserting the CSV file into SQL DB as SQL tables

data.to_sql("college", conn)

In [8]:
pd.read_sql('select * from college;', conn)

Unnamed: 0,index,Rank,Name,City,State,Public/Private,Undergraduate Population,Student Population,Net Price,Average Grant Aid,Total Annual Cost,Alumni Salary,Acceptance Rate,SAT Lower,SAT Upper,ACT Lower,ACT Upper,Website
0,0,1.0,Harvard University,Cambridge,MA,Private,13844.0,31120.0,14327.0,49870.0,69600.0,146800.0,5.0,1460.0,1590.0,32.0,35.0,www.harvard.edu
1,1,2.0,Stanford University,Stanford,CA,Private,8402.0,17534.0,13261.0,50134.0,69109.0,145200.0,5.0,1390.0,1540.0,32.0,35.0,www.stanford.edu
2,2,3.0,Yale University,New Haven,CT,Private,6483.0,12974.0,18627.0,50897.0,71290.0,138300.0,7.0,1460.0,1580.0,32.0,35.0,www.yale.edu
3,3,4.0,Massachusetts Institute of Technology,Cambridge,MA,Private,4680.0,11466.0,20771.0,43248.0,67430.0,155200.0,7.0,1490.0,1570.0,33.0,35.0,web.mit.edu
4,4,5.0,Princeton University,Princeton,NJ,Private,5659.0,8273.0,9327.0,48088.0,66150.0,139400.0,6.0,1430.0,1570.0,31.0,35.0,www.princeton.edu
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
645,645,646.0,New Mexico State University,Las Cruces,NM,Public,13379.0,14432.0,8625.0,9582.0,34720.0,96700.0,64.0,910.0,1160.0,18.0,23.0,www.nmsu.edu
646,646,647.0,Indiana State University,Terre Haute,IN,Public,13626.0,13763.0,13012.0,9297.0,32938.0,85600.0,85.0,900.0,1110.0,17.0,23.0,www.indstate.edu
647,647,648.0,Emory &amp; Henry College,Emory,VA,Private,1094.0,1226.0,19340.0,27155.0,48100.0,70700.0,72.0,988.0,1170.0,19.0,25.0,www.ehc.edu
648,648,649.0,Wells College,Aurora,NY,Private,488.0,516.0,22828.0,30207.0,55180.0,,80.0,,,,,www.wells.edu


## Analysis and visualization

Querries covered in this notebook:

1. Number of colleges public vs private
2. Average cost of study public vs private
3. States with maximum number of colleges
4. Average acceptance rate public vs private
5. Average alumni salary public vs private
6. Student population public vs private
7. Student population public vs private
8. Average minimum required SAT score for admission public vs private
9. Average aid grant public vs private
10. Student vs college rank (concentration of students)

In [9]:
# Number of colleges Private vs Public 

pr_pu = pd.read_sql('select "Public/Private" as "College Type", count(*) as "Colleges in list" from college group by "College Type";', conn)
fig_1 = px.pie(pr_pu, values="Colleges in list", names="College Type", title="Private vs Public Colleges")
fig_1.update_traces(textposition="inside", textinfo="label+percent")
fig_1.show()

In [11]:
# Average cost of studying in Public vs Private

cost = pd.read_sql('select "Public/Private" as "College Type", avg("Total Annual Cost") as "Average Annual Cost" from college group by "College Type";', conn)
px.bar(cost, x="College Type", y="Average Annual Cost", title="Average Cost of study Publivs vs Private", color="Average Annual Cost")

In [7]:
# States with maximum number of colleges

state = pd.read_sql('select State, count(*) as "Number of colleges" from college group by "State" order by "Number of colleges" desc;', conn)
px.bar(state, x="State", y="Number of colleges", title="Number of college from each state", color="Number of colleges")

In [10]:
# Average acceptance rate Public Vs Private

acc = pd.read_sql('select "Public/Private" as "College type", avg("Acceptance Rate") as "Average acceptance rate" from college group by "College Type";', conn)
fig_2 = px.pie(acc, values="Average acceptance rate", names="College type", title="Average acceptance rate Public Vs Private")
fig_2.update_traces(textposition="inside", textinfo="label+percent")
fig_2.show()

In [14]:
# Average alumni salary Public Vs Private

alu_sal = pd.read_sql('select "Public/Private" as "College type", avg("Alumni Salary") as "Average Alumni Salary" from college group by "College type";', conn)
px.bar(alu_sal, y="Average Alumni Salary", x="College type", title="Average alumni salary Public Vs Private", color="Average Alumni Salary")


In [16]:
# Student population Public Vs Private

stu_pop = pd.read_sql('select "Public/Private" as "College type", avg("Student Population") as "Average Student Population" from college group by "College type";', conn)
px.bar(stu_pop, x="College type", y="Average Student Population", title="Average Student Population Public Vs Private", color="Average Student Population")

In [18]:
# Minimum average requires SAT score for admission Public vs Private

sat = pd.read_sql('select "Public/Private" as "College type", avg("SAT Lower") as "Minimum average SAT score required for admission" from college group by "College type";', conn)
px.bar(sat, x="College type", y="Minimum average SAT score required for admission", color="Minimum average SAT score required for admission", title="Minimum average SAT score required for admission Public Vs Private")

In [19]:
# Average aid grant Public vs Private

aid = pd.read_sql('select "Public/Private" as "College type", avg("Average Grant Aid") as "Average Grant Aid" from college group by "College type";', conn)
px.bar(aid, x="College type", y="Average Grant Aid", color="Average Grant Aid", title="Average Grant Aid Public Vs Private")

In [24]:
# Student vs College rank (Concentration of students)

college = pd.read_sql('select * from college;', conn)

px.scatter(college, x="Rank", y="Student Population", color="Public/Private", size="Student Population", title="Student Populatioon vs College ranking")

In [27]:
# Closing the conection

conn.close()