# Project: school and major select system

## Porject purpose
In today's complex educational landscape, choosing the right school an dmajor can be a daunting task for sutdents and their families. To simplify this process and empower individuals with data-driven insights, we propose the creation of an Education Decision Support System. This system will leverage a comprehensive dataset of educational information to provide valuable guidance and visualization tools for informed decision-making. The purpose of the ducational decision support system is to assist students and their families in making well-informed choices regarding their education by harnessing the power of data analytics and visualization. The platform aims to achieve the following key objectives: 
<br> &emsp;&emsp;a. search a list or visualization chart of university based on the user's perspective 
<br> &emsp;&emsp;b. search a list or visualization chart of major based on the user's perspective
## Data resource
<br> 1. U.S. Bureau of Labor Statistics (gov)
<br> 2. National Center for Education Statistics (gov)
<br> 3. Post-Secondary Employment Outcomes (PSEO) - Census (gov)
## Example
<br> Example 1: 
     <br> &emsp;&emsp;The user comes from a poor family from SD. They want to find a major that can be complete in 4 years and find a job in 1 year. 
<br> Step: 
     <br> &emsp;&emsp;1. change or select the mod to "find a major" 
     <br> &emsp;&emsp;2. change the filter options. location to SD, 
     <br> &emsp;&emsp;3. change sort by: % of instate employed (y1_grads_emp_instate / (y1_grads_emp + y1_grads_nme ))
<br> output :
     <br> &emsp;&emsp;col1: major list (based on cipcode) in SD 
     <br> &emsp;&emsp;col2: percentage of employed 
     <br> &emsp;&emsp;col3: the top 3 percentage of industry type that graduates are employed in 
<br>
<br> Example 2: 
<br> &emsp;&emsp;The student wants to work in school system and get a better development potential. He doesn't consider pursuing a PhD. 
<br> Step: 
    <br>&emsp;&emsp; 1. use function to search cip code  which include "school" or "university" in description
    <br>&emsp;&emsp; 2. change or select the mod to "find a major" 
    <br>&emsp;&emsp; 3. add filter cip-code: add / select all previous search result
    <br>&emsp;&emsp; 4. add filter degree level: select associate, certificate, bachelor, master. 
    <br>&emsp;&emsp; 5. sort by mean_salary 
<br> Output: 
    <br>&emsp;&emsp; major list (based on cipcode) 
    <br>&emsp;&emsp; top 5 related SOC (job title) 
    <br>&emsp;&emsp; top 3 related industry (include school but not limited in school) 
    <br>&emsp;&emsp; 10_year employed rate

## import packages

In [1]:
import pandas as pd
import matplotlib as plt
import numpy as mp
import seaborn as sb
import tkinter as tk

## load pseof_all.csv (data)
### variable list - pseof_all.csv
<br> agg_level_pseo: index representing level of aggregation reported on a given record
<br> inst_level: tabulation level of the institution 
<br> institution: institution ID (label_institution.csv)
<br> degree_level: degree level code (label_degree_level.csv)
<br> cip_level: degree field level of aggregation (label_cip_level.csv)
<br> cipcode: degree field (label_cipcode.csv)
<br> grad_cohort: first year of graduation cohort (YYYY) - All Cohorts 0000 
    grad_cohort is a 4-digit number representing the first year of the graduation cohort. The number of years in the cohort is reported in the separate Grad Cohort Years variable.When tabulating across all cohorts, the value 0000 will be used for grad_cohort. If grad_cohort=2010 and grad_cohort_years=3, then the cell includes graduates from 2010, 2011, and 2012. 
<br> grad_cohort_years: grad_cohort_years is the number of years in the cohort of reference (see Grad Cohort). It varies by Degree level. Bachelor’s degrees (05) are reported in 3 year cohorts, all other degrees are reported in 5 year cohorts. The grad_cohort_years will take a value (3,5). As tabulations are not done across degree types, the appropriate value will be reported in grad_cohort_years when grad_cohort=0000.
<br> geo_level: group: geographic level of employment Geography labels for data files are provided in separate files, by scope. Each file 'label_geograpy_SCOPE.csv' may contain one or more types of records as flagged by geo_level.(label_geo_level.csv) (label_geography_division.csv)
<br> geography: group: geography code of employment (label_fipsnum.csv) (label_stups.csv)(state)
<br> ind_level: group: industry level of employment (label_ind_level.csv)
<br> industry: group: industry code of employment (label_industry.csv)
<br> y1_grads_emp: Count of employed graduates in Year 1 
<br> y1_grads_emp_instate: Count of Graduates Employed in Same State as Educational Institution in Year 1
<br> y5_grads_emp: Count of Employed Graduates in Year 5
<br> y5_grads_emp_instate: Count of Graduates Employed in Same State as Educational Institution in Year 5
<br> y10_grads_emp: Count of Employed Graduates in Year 10
<br> y10_grads_emp_instate: Count of Graduates Employed in Same State as Educational Institution in Year 10
<br> y1_grads_nme: Count of Graduates Jobless or Marginally Employed in Year 1
<br> y5_grads_nme: Count of Graduates Jobless or Marginally Employed in Year 5
<br> y10_grads_nme: Count of Graduates Jobless or Marginally Employed in Year 10 
<br> status_y1_grads_emp, status_y1_grads_emp_instate, status_y5_grads_emp, status_y5_grads_emp_instate, status_y10_grads_emp, status_y10_grads_emp_instate, status_y1_grads_nme	, status_y5_grads_nme, status_y10_grads_nme: Standard Status Flags (7.1) https://lehd.ces.census.gov/data/schema/V4.7.0/lehd_public_use_schema.html


### variable list - label_cip_level
### variable list - label_cipcode
### variable list - label_degree_level
### variable list - label_fipsnum
### variable list - label_geo_level
### variable list - label_geography_division
### variable list - label_ind_level
### variable list - label_industry
### variable list - label_institution
### variable list - CIP2020_SOC2018_Crosswalk.csv (trans cip to soc)
### variable list - national_M2022_dl.xlsx (soc salary 2022)

In [3]:
data = pd.read_csv("pseof_all.csv")
data.head()

  data = pd.read_csv("pseof_all.csv")


Unnamed: 0,agg_level_pseo,inst_level,institution,degree_level,cip_level,cipcode,grad_cohort,grad_cohort_years,geo_level,geography,...,y10_grads_nme,status_y1_grads_emp,status_y1_grads_emp_instate,status_y5_grads_emp,status_y5_grads_emp_instate,status_y10_grads_emp,status_y10_grads_emp_instate,status_y1_grads_nme,status_y5_grads_nme,status_y10_grads_nme
0,38,I,105100,5,A,0,0,3,N,0,...,7281.0,1,1,1,1,1,1,1,1,1
1,38,I,105100,7,A,0,0,5,N,0,...,2646.0,1,1,1,1,1,1,1,1,1
2,38,I,105100,17,A,0,0,5,N,0,...,390.0,1,1,1,1,1,1,1,1,1
3,38,I,105100,18,A,0,0,5,N,0,...,527.0,1,1,1,1,1,1,1,1,1
4,38,I,105200,2,A,0,0,5,N,0,...,11.0,1,1,1,1,1,1,1,1,1


## data cleaning

## recode