# Efficacy of Vocational Rehabilitation Services in America 

> Dataset: [Current Population Survey, July 2021: Disability Supplement](https://api.census.gov/data/2021/cps/disability/jul.html)
* The universe consists of all persons in the civilian non-institutional population of the United States living in households. 
* The probability sample selected to represent the universe consists of approximately 50,000 households.

### Imports

In [6]:
import requests
import pandas as pd
import numpy as np

### Data Dictionary
> Include information such as exact variable name in API, variable name in analysis, and variable name in visualization, description, measurement units, expected values, expected min/max


### Query Texas Records
> Querying individual records for **every** county in Texas

In [11]:
HOST = "https://api.census.gov/data"
year = "2021"
dataset = "cps/disability/jul"
base_url = "/".join([HOST, year, dataset])

predicates = {}
get_vars = ["PEMLR", 
            "PESD6A", 
            "PESD6B", 
            "PESD6C", 
            "PESD6D", 
            "PESD6E", 
            "PESD6F", 
            "PESD6G", 
            "PESD7A", 
            "PESD7B", 
            "PESD7C",
            "PESD7E", 
            "PESD7G", 
            "PRDISFLG", 
            "PESD41", 
            "PESD42", 
            "PESD43", 
            "PESD44", 
            "PESD45", 
            "PESD46", 
            "PESD47",
            "PESD48", 
            "PESD49",
            "PTDTRACE",
            "PESEX",
            "PRTAGE"
            ]
predicates["get"] = ",".join(get_vars)
predicates["for"] = "county:*"
predicates["in"] = "state:48"

r = requests.get(base_url, params=predicates)

[['PEMLR', 'PESD6A', 'PESD6B', 'PESD6C', 'PESD6D', 'PESD6E', 'PESD6F', 'PESD6G', 'PESD7A', 'PESD7B', 'PESD7C', 'PESD7E', 'PESD7G', 'PRDISFLG', 'PESD41', 'PESD42', 'PESD43', 'PESD44', 'PESD45', 'PESD46', 'PESD47', 'PESD48', 'PESD49', 'PTDTRACE', 'PESEX', 'PRTAGE', 'state', 'county'], ['5', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '2', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '2', '2', '66', '48', '139'], ['-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '2', '1', '10', '48', '139'], ['-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '2', '2', '12', '48', '139'], ['1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '2', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '1', '33', '48', '0'], ['1', '-1', '-1', '-1', '-1', '-1', '-1', '-1'

### Format Data Frame and turn to CSV

In [5]:
col_names = [
    "labor_force_employment_status",
    "used_vocational_rehabilitation_agencies",
    "used_one_stop_career_centers",
    "used_the_ticket_to_work_program",
    "used_assistive_technology_act_prog",
    "used_ctr_for_indpt_living_for_ind_w_dis",
    "used_the_client_assistance_program",
    "used_any_other_employment_assistance_program",
    "how_helpful_vocational_rehab_agency",
    "how_helpful_one_stop_career_centers",
    "the_ticket_to_work_program_helpfulness",
    "ctr_for_indpdt_living_for_ind_w_dis_helpful",
    "other_employment_assist_program_helpful",
    "does_this_person_have_any_of_these_disability_conditions",
    "barrier_lack_of_education_or_training",
    "barrier_lack_of_job_counseling",
    "barrier_lack_of_transportation",
    "barrier_loss_of_government_assistance",
    "barrier_need_for_special_features",
    "barrier_employer_or_coworker_attitudes",
    "barrier_your_difficulty_with_disability",
    "barrier_other",
    "barrier_none",
    "demographics_race_of_respondent",
    "demographics_sex",
    "demographics_age",
    "state",
    "county"
]

df = pd.DataFrame(columns=col_names, data=r.json()[1:])

df.to_excel("raw-data.xlsx")

### Find significant variables
> Note: Solving a classification problem via inference

#### Options:
- Logistic Regression - commonly used for classification problems
- Stepwise
- Decision Trees
- Random forrest
- Neural Network

### Logistic Regression
* [Documentation](https://pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-no-torch-nn)
* Train minimal neural network (logsitic regression, since there are no hidden layers)