# Analysis

Monica Canavan

In this notebook we will be performing statistical tests on data pertaining to educational attainment, demographics, and technology (Work in Progress)

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats

### Read the Data Files

In [2]:
# read in dataset for the EDD employment projections
# using excel format instead of csv.  edd_proj = pd.read_csv('edd_sacr_occproj_2018_2028.csv', sep='\t', encoding = 'utf-16')
edd_proj = pd.read_excel('EDD_SAC_OCCPROJ_2018_2028.xlsx', sheet_name='Occupational')

#read in dataset for the Census PUMA Educational Attainment with Race, Ethnicity and Gender RC
education = pd.read_excel('ACSST5Y2019_Educational_Attainment.xlsx', sheet_name='Education')

#read in dataset for the Census Technology Information RC
technology = pd.read_excel('ACSST1Y2019_Technology_RC.xlsx', sheet_name='Technology')

### Begin data analysis of EDD Occupation Projections

In [3]:
#Check the datatypes
edd_proj.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 635 entries, 0 to 634
Data columns (total 15 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   SOC Level                                635 non-null    int64  
 1   SOC Code                                 635 non-null    object 
 2   Occupational Title                       635 non-null    object 
 3   Base Year Employment Estimate 2018       635 non-null    int64  
 4   Projected Year Employment Estimate 2028  635 non-null    int64  
 5   Numeric Change 2018-2028                 635 non-null    int64  
 6   Percent-age Change 2018-2028             635 non-null    float64
 7   Exits                                    635 non-null    int64  
 8   Transfers                                635 non-null    int64  
 9   Total Job Openings                       635 non-null    int64  
 10  Median Hourly Wages                      635 non-n

### Analysis of Educational Attainment with Gender, Race and Ethnicity


In [4]:
#Check first several rows
education.head()

Unnamed: 0,Race,Ethnicity,Educational attainment,Total,Male,Female
0,White,Not Hispanic,Less Than HS,2288,1212,1076
1,White,Not Hispanic,HS or GED,9339,4588,4751
2,White,Not Hispanic,Some college and AA,16757,7939,8818
3,White,Not Hispanic,BA,9102,4691,4411
4,White,Not Hispanic,Graduate School,4493,2467,2026


In [5]:
#Check the datatypes
education.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Race                    90 non-null     object
 1   Ethnicity               90 non-null     object
 2   Educational attainment  90 non-null     object
 3   Total                   90 non-null     int64 
 4   Male                    90 non-null     int64 
 5   Female                  90 non-null     int64 
dtypes: int64(3), object(3)
memory usage: 4.3+ KB


#### Chi Square Test, Gender and Educational Attainment

#### Chi Square Test Race and Educational Attainment

#### Chi Square Test Ethnicity and Educational Attainment

### Analysis of Technology Resources

In [6]:
print(technology.shape)

(9, 6)


In [7]:
#Check first several rows
technology.head()

Unnamed: 0,Label,Description,Type,Total Estimate,Percent Estimate,Unnamed: 5
0,Total households,,,40662,(X),179199.0
1,TYPES OF COMPUTER,Has one or more types of computing devices:,ANY,39135,96.2%,40662.0
2,TYPES OF COMPUTER,Has one or more types of computing devices:,Desktop or laptop,33599,82.6%,97875.0
3,TYPES OF COMPUTER,Has one or more types of computing devices:,Smartphone,37031,91.1%,
4,TYPES OF COMPUTER,Has one or more types of computing devices:,Tablet or other portable wireless computer,26437,65.0%,


In [8]:
#Check the column headings
technology.columns

Index(['Label', 'Description', 'Type', 'Total Estimate', 'Percent Estimate',
       'Unnamed: 5'],
      dtype='object')

In [9]:
#Check the datatypes
technology.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Label             9 non-null      object 
 1   Description       8 non-null      object 
 2   Type              5 non-null      object 
 3   Total Estimate    9 non-null      int64  
 4   Percent Estimate  9 non-null      object 
 5   Unnamed: 5        3 non-null      float64
dtypes: float64(1), int64(1), object(4)
memory usage: 560.0+ bytes
