<a href="https://colab.research.google.com/github/cfcastillo/DS-6-Notebooks/blob/main/Education_Capstone_MS2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TASK LIST

1. Drill in on finding data that identifies occupations that need to be filled.
  * Glassdoor - top x jobs for next n years
  * Review data sources against desired predictors and identify datasources that provide that data
    * Copy the dataset into Google Drive Data folder

**MS-2 requirements**

Your Jupyter Notebook needs to contain the following: 
* All code needed to clean and format your data
* A written description of the data cleaning process


# Post Secondary Enrollment Trends

Over the last 20 years, a downward trend in college education has been observed. College was very important in the 1980's and 1990's and was a requirement for many jobs. Without Internet resources, college was almost exlusively the only means of obtaining higher level education. With the Internet, YouTube&trade; and many online learning resources, information and education outside college is plentiful. Today, more employers are hiring people based on experience or education gained through alternate channels. So education is still important. What is changing is the mechanism for obtaining that education.

**Possible Audiences**
* Any higher education board in NM
* Future - go nationally

**Problem 1 Definition**

* What factors (predictors) determine post secondary education choice (target)? **Supervised Classification**
* Target - any post secondary school in NM? 
 * What can we do to keep people in NM or entice students to come to NM? **Goal** Keep the funds in NM.
* Scope - County or Tract level - % that go whatever route.

**Problem 2 Definition**
* What factors (predictors) determine career choice (target)? **Supervised Classification**
* Target - job classification (TODO: How are jobs classified)

**Questions to Explore**
* How much have trades gone down? **How many jobs are out there that need to be filled?** i.e. plumbers, electricians are aging out.
* Would it be beneficial for colleges and community colleges to present more topics in the trades or more focused certificate programs?
* What can CNM do to increase enrollment?
 * More focused programs?
 * More programs related to individual trades - understand where the trade gaps / actual needs are and have programs focused on filling those gaps. 

**List of Predictors**

* Demographic data (gender, race, age, income, location, household size, citizenship, parents' education level)
* Learning costs (tuition, board, fees, travel)
 * Scholarships and loans available.
* Learning entry requirements (test scores, interview requirements)
* Learning choices (college/university, trade school, udemy, coursera)
* Learning characteristics (in person, remote, self paced, proctored, certificate, hours)
* Job/trade availability by type
 * Education requirements for jobs/trades.
* Education medium (in person, online, traditional (university/college),degree vs. certificate, trade school, bootcamp (accelerated))
* Affordable housing / cost of living
* Expected pay after completing education
* How long it will take to get a related job after completing education

# Process

First we stated several questions we wanted answered (target). After defining our problem, we listed sets of variables that we believed could answer our questions. We then put the variables and targets into a [spreadsheet](https://docs.google.com/spreadsheets/d/1bOhOBHKOae9TDN9n9-xF7ag4QW_Z0c7HXTYLXeMMLHs/edit#gid=0) to define the dataset we would need to run our analysis. We then researched data sources such as Bureau of Labor Statistics and the US Census to locate data that supported our research. We then mapped the columns in the data sources to the columns in our desired dataset and linked multiple datasets by target code value.

TODO: finish this up.

# Imports

In [None]:
# grab the imports needed for the project
import pandas as pd
import glob
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import statsmodels.api as sm

# all
from sklearn import datasets
from sklearn import metrics
from sklearn import preprocessing
from sklearn.metrics import classification_report
import sklearn.model_selection as model_selection

# Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

# Regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error

  import pandas.util.testing as tm


# Data Collection - Researching

[NM Geo Census data - 2020](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=Census+Tracts)

*Not helpful. Is just data on tracts and counties with lats and longs. No features*

[US College Enrollment data](https://educationdata.org/college-enrollment-statistics)

*Per educationdata.org, for nearly every state, around 90% of students are from outside the state.* 

*In NM, there has been a 24% decline in enrollment since 2010 while other states have declined less or have increased. Why?*

[Percent of recent HS grads enrolled in college](https://nces.ed.gov/programs/digest/d18/tables/dt18_302.20.asp)

[OECD.stat Organization for Economic Cooperation and Development](https://stats.oecd.org/Index.aspx)

*Note: Most recent data is 2012. Need to investigate further*

[BEA.gov US Bureau of Economic Analysis](https://www.bea.gov/data/by-place-us)

[UNESCO Institute for Statistics](http://data.uis.unesco.org/)

[National Science Foundation - Science and Engineering Indicators](https://ncses.nsf.gov/indicators/states/indicator/state-student-aid-expenditures-per-full-time-undergraduate-student)

[American Community Survey (ACS)](https://www.census.gov/programs-surveys/acs)

* It is the premier source for detailed population and housing information about our nation. 

[Federal School Code List](https://fsapartners.ed.gov/knowledge-center/library/federal-school-code-lists/2021-07-28/2021-2022-federal-school-code-list-excel-format)

* Format - Excel
* Contains list of federal school codes to be used as target when determining school choice.

# Data Collection - Accepted Sources

[Annual Social and Economic Supplement (ASEC) 2020 Public Use Dictionary](https://www2.census.gov/programs-surveys/cps/datasets/2020/march/ASEC2020ddl_pub_full.pdf)

* Format - PDF
* Contains data dictionary for public use annual survey.

[Current Population Survey (CPS) ASEC Supplement](https://www2.census.gov/programs-surveys/cps/techdocs/cpsmar20.pdf)

* Format - PDF
* Contains appendices with code descriptions for the annual survey.

[Annual Social and Economic Supplement (ASEC) 2020 Data](https://www.census.gov/data/datasets/2020/demo/cps/cps-asec-2020.html)

* Format - CSV/ASCII or SAS
* Contains data for public use annual survey - no replicate weights.

[Occupational Employment Wage Statistics (OES) Data](https://data.bls.gov/oes/#/geoOcc/Multiple%20occupations%20for%20one%20geographical%20area)

* Format - Excel - converted to CSV
* Contains data for SOC occupations such as jobs available and hourly wage
* Will combine by SOC Occupation code and state code with ASEC data

Footnotes for OES data:
* (1) Estimates for detailed occupations do not sum to the totals because the totals include occupations not shown separately. Estimates do not include self-employed workers.
* (2) Annual wages have been calculated by multiplying the corresponding hourly wage by 2,080 hours.
* (3) The relative standard error (RSE) is a measure of the reliability of a survey statistic. The smaller the relative standard error, the more precise the estimate.
* (4) Wages for some occupations that do not generally work year-round, full time, are reported either as hourly wages or annual salaries depending on how they are typically paid.
* (5) This wage is equal to or greater than \$100.00 per hour or \$208,000 per year.
* (8) Estimate not released.
* SOC code: Standard Occupational Classification code -- see http://www.bls.gov/soc/home.htm
* Date extracted on :Sep 22, 2021




# Data Cleaning - MS-2 - Oct 1

In [None]:
# Mount Drive
from google.colab import drive
drive.mount('/drive')

Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).


In [None]:
# Import Data
asec_path = '/drive/MyDrive/Student Folder - Cecilia/Projects/Capstone/Data/ASEC/asecpub20csv/'
asec_data_person = pd.read_csv(asec_path + 'pppub20.csv')
asec_data_household = pd.read_csv(asec_path + 'hhpub20.csv')
asec_data_family = pd.read_csv(asec_path + 'ffpub20.csv')

# Join and import all 50 states' occupation data
oes_path = '/drive/MyDrive/Student Folder - Cecilia/Projects/Capstone/Data/Occupations/'
oes_file_names = glob.glob(oes_path + "*.csv")
li = []
for filename in oes_file_names:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)
oes_data = pd.concat(li, axis=0, ignore_index=True)

In [None]:
# View Person Data
asec_data_person.head()

Unnamed: 0,PERIDNUM,PH_SEQ,P_SEQ,A_LINENO,PF_SEQ,PHF_SEQ,OED_TYP1,OED_TYP2,OED_TYP3,PERRP,PXRRP,PXMARITL,PXRACE1,PEHSPNON,PXHSPNON,PEAFEVER,PXAFEVER,PEAFWHN1,PXAFWHN1,PEAFWHN2,PEAFWHN3,PEAFWHN4,PXSPOUSE,PENATVTY,PXNATVTY,PEMNTVTY,PXMNTVTY,PEFNTVTY,PXFNTVTY,PEINUSYR,PXINUSYR,PEPAR1,PXPAR1,PEPAR2,PXPAR2,PEPAR1TYP,PXPAR1TYP,PEPAR2TYP,PXPAR2TYP,PRDASIAN,...,TRNT_VAL,TCAP_VAL,TDIV_VAL,TCSP_VAL,TED_VAL,TCHSP_VAL,TPHIP_VAL,TPHIP_VAL2,TPMED_VAL,TPOTC_VAL,TPEMCPREM,TCERNVAL,TCWSVAL,TCSEVAL,TCFFMVAL,TSURVAL1,TSURVAL2,TDISVAL1,TDISVAL2,TAX_ID,PEIOIND,PEIOOCC,A_WERNTF,A_HERNTF,I_DISVL1,I_DISVL2,I_SURVL1,I_SURVL2,MIG_CBST,MIG_DSCP,M5G_CBST,M5G_DSCP,CLWK,DEP_STAT,FILEDATE,FILESTAT,LJCW,NOEMP,WECLW,YYYYMM
0,8329611509015080901101,1,1,1,1,1,0,0,0,40,0,0,0,2,0,2,0,-1,1,-1,-1,-1,0,57,0,57,0,57,0,0,0,-1,1,-1,1,-1,1,-1,1,-1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,101,9480,440,0,0,0,0,0,0,0,0,0,0,2,0,31821,2,3,6,6,202003
1,8329611509015080901102,1,2,2,1,1,0,0,0,42,0,0,0,2,0,2,0,-1,1,-1,-1,-1,0,57,0,57,0,57,0,0,0,-1,1,-1,1,-1,1,-1,1,-1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,101,0,-1,0,0,0,0,0,0,0,0,0,0,2,0,31821,2,4,2,6,202003
2,4238996011902050901101,2,1,1,1,1,0,0,0,40,0,0,0,2,0,1,0,2,0,3,-1,-1,0,57,0,57,0,57,0,0,0,-1,1,-1,1,-1,1,-1,1,-1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,201,7860,9121,0,0,0,0,0,0,0,0,0,0,2,0,31821,2,4,3,6,202003
3,4238996011902050901102,2,2,2,1,1,0,0,0,42,0,0,0,2,0,2,0,-1,1,-1,-1,-1,0,57,0,57,0,57,0,0,0,-1,1,-1,1,-1,1,-1,1,-1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,201,0,-1,0,0,0,0,0,0,0,0,0,0,5,0,31821,2,0,0,9,202003
4,2059506120093750901101,3,1,1,1,1,0,0,0,41,0,0,0,2,0,2,0,-1,1,-1,-1,-1,1,57,0,57,0,57,0,0,0,-1,1,-1,1,-1,1,-1,1,-1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,301,6290,5500,0,0,0,0,0,0,0,0,0,0,1,0,31821,5,1,5,5,202003


In [None]:
asec_data_person['H_IDNUM'] = asec_data_person['PERIDNUM'].str[:20]

In [None]:
# View Household Data
# TODO: link to person dataset by 1st 20 chars of PERIDNUM.
asec_data_household.head()

Unnamed: 0,H_IDNUM,GEREG,GESTFIPS,GEDIV,HRHTYPE,HEFAMINC,H_MONTH,H_YEAR,H_TENURE,H_HHNUM,H_LIVQRT,H_RESPNM,H_TELHHD,H_TELAVL,H_TELINT,H1TENURE,H1LIVQRT,H1TELHHD,H1TELAVL,H1TELINT,H_NUMPER,H_HHTYPE,H_TYPEBC,H_MIS,HANNVAL,HANN_YN,HCHCARE_VAL,HCHCARE_YN,HCOV,HCSPVAL,HCSP_YN,HDISVAL,HDIS_YN,HDIVVAL,HDIV_YN,HDSTVAL,HDST_YN,HEARNVAL,HEDVAL,HED_YN,...,HTOTVAL,HUCVAL,HUNDER15,HUNDER18,HUNITS,HVETVAL,HVET_YN,HWCVAL,HWSVAL,H_SEQ,I_CHCAREVAL,I_HENGAS,I_HENGVA,I_HFDVAL,I_HFLUNC,I_HFLUNN,I_HFOODM,I_HFOODN,I_HFOODS,I_HHOTLU,I_HHOTNO,I_HLOREN,I_HPUBLI,I_HUNITS,I_PROPVAL,NOW_HCOV,NOW_HMCAID,NOW_HPRIV,NOW_HPUB,THCHCARE_VAL,THPROP_VAL,GTCBSA,GTCO,GTCBSAST,GTCBSASZ,GTCSA,GTMETSTA,GTINDVPC,FILEDATE,YYYYMM
0,83296115090150809011,1,23,1,1,15,3,2020,1,1,1,1,1,0,1,0,0,0,0,0,2,1,0,1,0,2,-1,0,1,0,2,0,2,0,2,124,1,108500,0,2,...,127449,0,0,0,1,0,2,0,108500,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,1,2,0,0,0,0,3,0,0,2,0,31821,202003
1,42389960119020509011,1,23,1,1,7,3,2020,1,1,1,1,1,0,1,0,0,0,0,0,2,1,0,4,0,2,-1,0,1,0,2,0,2,0,2,0,2,34000,0,2,...,64680,0,0,0,1,10000,1,0,34000,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,1,2,0,0,0,0,3,0,0,2,0,31821,202003
2,20595061200937509011,1,23,1,7,14,3,2020,1,1,1,1,1,0,1,0,0,0,0,0,1,1,0,2,0,2,-1,0,1,0,2,0,2,0,2,0,2,40000,0,2,...,40002,0,0,0,1,0,2,0,40000,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,1,3,0,0,0,0,3,0,0,2,0,31821,202003
3,14103203009699909011,1,23,1,4,7,3,2020,2,1,1,1,1,0,1,0,0,0,0,0,2,1,0,1,0,2,-1,0,1,0,2,0,2,0,2,0,2,8424,0,2,...,8424,0,0,0,5,0,2,0,8424,4,0,0,0,2,0,0,2,0,0,0,0,1,0,0,0,1,2,3,1,0,0,0,0,3,0,0,2,0,31821,202003
4,21010593220024309011,1,23,1,1,15,3,2020,1,1,1,1,1,0,1,0,0,0,0,0,4,1,0,4,0,2,-1,0,1,0,2,0,2,0,2,0,2,58000,600,1,...,59114,0,0,0,1,0,2,0,58000,5,0,1,0,0,0,0,0,0,1,0,0,0,0,0,2,1,3,1,3,0,0,0,0,3,0,0,2,0,31821,202003


In [None]:
# View Family Data
# NOTE: DON'T THINK THIS WILL BE NEEDED.
# asec_data_family.head()

In [None]:
# Pick desired columns - Personal Data
asec_data_person[['H_IDNUM','PEIOOCC','OCCUP','A_DTOCC','A_MJOCC','AGE1','A_SEX','PRDTRACE','PRCITSHP','A_HGA','A_HRSPAY']]


Unnamed: 0,H_IDNUM,PEIOOCC,OCCUP,A_DTOCC,A_MJOCC,AGE1,A_SEX,PRDTRACE,PRCITSHP,A_HGA,A_HRSPAY
0,83296115090150809011,440,440,1,1,14,2,1,1,39,-1
1,83296115090150809011,-1,8620,0,0,15,1,1,1,39,-1
2,42389960119020509011,9121,9121,22,10,14,1,1,1,39,-1
3,42389960119020509011,-1,0,0,0,16,2,1,1,36,-1
4,20595061200937509011,5500,5500,17,5,11,2,1,1,39,-1
...,...,...,...,...,...,...,...,...,...,...,...
157954,75152117000272511111,-1,0,0,0,16,2,5,1,40,0
157955,75152117000272511111,-1,4760,0,0,11,2,4,3,43,0
157956,56040036441120111111,-1,0,0,0,17,2,4,4,33,0
157957,12210061709659011111,4920,4920,16,4,12,2,4,1,43,0


In [None]:
# Pick desired columns - Household Data
asec_data_household[['H_IDNUM','GTCO','GTCSA','GTINDVPC','GTMETSTA','GEDIV','GEREG','GESTFIPS','HEFAMINC','HHINC']]


Unnamed: 0,H_IDNUM,GTCO,GTCSA,GTINDVPC,GTMETSTA,GEDIV,GEREG,GESTFIPS,HEFAMINC,HHINC
0,83296115090150809011,0,0,0,2,1,1,23,15,41
1,42389960119020509011,0,0,0,2,1,1,23,7,26
2,20595061200937509011,0,0,0,2,1,1,23,14,17
3,14103203009699909011,0,0,0,2,1,1,23,7,4
4,21010593220024309011,0,0,0,2,1,1,23,15,24
...,...,...,...,...,...,...,...,...,...,...
91495,15196319090801411111,3,0,0,1,9,4,15,11,17
91496,12505071701752211111,3,0,0,1,9,4,15,7,9
91497,75152117000272511111,3,0,0,1,9,4,15,6,29
91498,56040036441120111111,3,0,0,1,9,4,15,6,6


In [None]:
# Join Household and Personal records into single dataframe
asec_combined = pd.merge(asec_data_household, asec_data_person, on='H_IDNUM')

In [None]:
# View combined result
asec_combined[['H_IDNUM','GTCO','GTCSA','GTINDVPC','GTMETSTA','GEDIV','GEREG','GESTFIPS','HEFAMINC','HHINC',
                     'PEIOOCC','OCCUP','A_DTOCC','A_MJOCC','AGE1','A_SEX','PRDTRACE','PRCITSHP','A_HGA','A_HRSPAY']]

Unnamed: 0,H_IDNUM,GTCO,GTCSA,GTINDVPC,GTMETSTA,GEDIV,GEREG,GESTFIPS,HEFAMINC,HHINC,PEIOOCC,OCCUP,A_DTOCC,A_MJOCC,AGE1,A_SEX,PRDTRACE,PRCITSHP,A_HGA,A_HRSPAY
0,83296115090150809011,0,0,0,2,1,1,23,15,41,440,440,1,1,14,2,1,1,39,-1
1,83296115090150809011,0,0,0,2,1,1,23,15,41,-1,8620,0,0,15,1,1,1,39,-1
2,42389960119020509011,0,0,0,2,1,1,23,7,26,9121,9121,22,10,14,1,1,1,39,-1
3,42389960119020509011,0,0,0,2,1,1,23,7,26,-1,0,0,0,16,2,1,1,36,-1
4,20595061200937509011,0,0,0,2,1,1,23,14,17,5500,5500,17,5,11,2,1,1,39,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157954,75152117000272511111,3,0,0,1,9,4,15,6,29,-1,0,0,0,16,2,5,1,40,0
157955,75152117000272511111,3,0,0,1,9,4,15,6,29,-1,4760,0,0,11,2,4,3,43,0
157956,56040036441120111111,3,0,0,1,9,4,15,6,6,-1,0,0,0,17,2,4,4,33,0
157957,12210061709659011111,3,0,0,1,9,4,15,13,41,4920,4920,16,4,12,2,4,1,43,0


In [None]:
# Look at OES data
oes_data.info()
# TODO: convert state to state code.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19680 entries, 0 to 19679
Data columns (total 19 columns):
 #   Column                                         Non-Null Count  Dtype 
---  ------                                         --------------  ----- 
 0   State                                          19670 non-null  object
 1   Occupation (SOC code)                          19671 non-null  object
 2   Employment(1)                                  19670 non-null  object
 3   Employment percent relative standard error(3)  19670 non-null  object
 4   Hourly mean wage                               19670 non-null  object
 5   Annual mean wage(2)                            19670 non-null  object
 6   Wage percent relative standard error(3)        19670 non-null  object
 7   Hourly 10th percentile wage                    19670 non-null  object
 8   Hourly 25th percentile wage                    19670 non-null  object
 9   Hourly median wage                             19670 non-null

In [None]:
# Shorten column names
oes_data.rename(columns={'State':'ST_ABBREV',
                         'Occupation (SOC code)':'SOC_DESC',
                         'Employment(1)':'EMP',
                         'Employment percent relative standard error(3)':'EMP_RSDE',
                         'Hourly mean wage':'HOURLY_MEAN',
                         'Annual mean wage(2)':'ANN_MEAN',
                         'Wage percent relative standard error(3)':'WAGE_RSDE',
                         'Hourly 10th percentile wage':'HOURLY_10TH',
                         'Hourly 25th percentile wage':'HOURLY_25TH',
                         'Hourly median wage':'HOURLY_MEDIAN',
                         'Hourly 75th percentile wage':'HOURLY_75TH',
                         'Hourly 90th percentile wage':'HOURLY_90TH',
                         'Annual 10th percentile wage(2)':'ANN_10TH',
                         'Annual 25th percentile wage(2)':'ANN_25TH',
                         'Annual median wage(2)':'ANN_MEDIAN',
                         'Annual 75th percentile wage(2)':'ANN_75TH',
                         'Annual 90th percentile wage(2)':'ANN_90TH',
                         'Employment per 1,000 jobs':'EMP_PER_1000',
                         'Location Quotient':'LOC_QUOTIENT'}, inplace=True)

In [None]:
# Verify we have all states
oes_data['ST_ABBREV'].value_counts()

ALL    1040
CA      797
FL      790
IL      775
MI      753
GA      749
IN      747
CO      742
MN      742
MD      739
MA      735
MO      733
AL      731
LA      730
AZ      729
IA      711
KY      709
MS      688
KS      685
CT      682
AR      666
ID      628
MT      627
ME      626
HI      567
DE      523
AK      519
DC      507
Name: ST_ABBREV, dtype: int64

In [None]:
# Parse out SOC code from the description. The code is inside parentheses.
def getSocCode(value):
  # If not able to parse the code, then return the value from the file.
  try:
    return value[value.index('(')+1:value.index(')')]
  except:
    return value

oes_data['SOC_CODE'] = oes_data['SOC_DESC'].apply(lambda val: getSocCode(val))

In [None]:
# Verify code was properly parsed
oes_data[['SOC_DESC','SOC_CODE']]

Unnamed: 0,SOC_DESC,SOC_CODE
0,All Occupations(000000),000000
1,Management Occupations(110000),110000
2,Top Executives(111000),111000
3,Chief Executives(111011),111011
4,General and Operations Managers(111021),111021
...,...,...
19675,Stockers and Order Fillers(537065),537065
19676,"Pump Operators, Except Wellhead Pumpers(537072)",537072
19677,Wellhead Pumpers(537073),537073
19678,Refuse and Recyclable Material Collectors(537081),537081


In [None]:
# TODO: Handle null data 

# Exploratory Data Analysis (EDA) - MS-3 - Oct 15

# Data Processing / Models - MS-4 - Oct 29

# Data Visualization and Results - MS-5

# Presentation and Conclusions - Final

**Data Availability**

* Lots of education statistics are available making this a practical project topic.
* Most data is in Excel format so will be easily imported and converted to DataFrame.
* Data will be from multiple sources so will need to be merged together.

**Tasks**

* Refine problem definition so I know what indicators will be needed for project analysis.
* Specifically identify data sources that provide needed indicators.
* Import, merge, transform, clean, review data sources in more detail to prepare for project analysis.




In [None]:
# Census Tract Geo Data
census_tract_path = "/drive/MyDrive/Student Folder - Cecilia/Projects/Capstone/Data/tl_2020_35_tract/tl_2020_35_tract.dbf"
census_tract_dbf = DBF(census_tract_path)
census_tract_df = DataFrame(census_tract_dbf)
census_tract_df.head()
# census_tract_df.info()

# Project Conclusions - Before Project Completion