# **WORK IN PROGRESS**  - Current as of February 15th 2025

# Data-Git-Hub_eda.ipynb
---

## Exploring the data from the Veterans Affairs using the dataset of Veterans of the US that receive disability compensation and which state and county they reside in report year 2023:

https://catalog.data.gov/dataset/fy-2023-disability-compensation-recipients-by-county

---

## I. Introduction
The equitable distribution and accessibility of Veterans’ benefits are essential indicators of both social policy effectiveness and the overall well-being of the Veteran population across the United States. This report presents a county-level analysis of Veterans receiving VA Disability Compensation benefits at the end of fiscal year (FY) 2023, utilizing data from the Department of Veterans Affairs (VA), Office of Enterprise Integration, and the Veterans Benefits Administration (VBA). The dataset captures the total number of Veterans receiving compensation, their service-connected disability (SCD) ratings, as well as demographic factors such as age and gender. However, data suppression techniques were applied to protect the identities of individuals, leading to some missing values for specific demographic attributes. This analysis seeks to contextualize the distribution of disability compensation recipients relative to broader county-level demographic trends, urban population concentrations, and economic conditions, as measured by state-level Consumer Price Index (CPI) data for FY 2023.

To further investigate regional disparities and potential correlations between Veteran compensation rates and broader socioeconomic factors, this study will integrate external datasets, including total county population figures and major metropolitan population distributions within those counties. This approach will provide a comparative framework to assess the proportion of Veterans receiving benefits relative to the total resident population. Additionally, incorporating state-level CPI data will allow for an economic assessment of how cost-of-living variations may impact the relative adequacy of VA Disability Compensation payments in different geographic areas. Understanding these dynamics is critical for policymakers, as variations in economic conditions may influence Veterans’ financial stability and access to essential services.

This analysis will be conducted using Python within a Jupyter Notebook environment, leveraging robust data science tools for statistical modeling and geospatial visualization. Pandas will be used for structured data manipulation, while Matplotlib and Seaborn will facilitate the graphical representation of key trends. Possible utilization of geospatial patterns of VA Disability Compensation recipients will be explored using GeoPandas and Folium to develop interactive county-level heatmaps, highlighting areas of high and low benefit distribution. Furthermore, state-level CPI adjustments will be applied to evaluate compensation adequacy relative to regional economic conditions. These tools will collectively enable a comprehensive assessment of VA Disability Compensation distribution and provide actionable insights for future policy recommendations.

By synthesizing VA administrative data with broader demographic and economic indicators, this study aims to offer a multidimensional perspective on the accessibility and effectiveness of disability compensation benefits across U.S. counties. The findings will provide a valuable foundation for future research on Veterans' financial well-being and inform targeted policy interventions to enhance benefit delivery in regions with higher economic disparities.

### Background

- VA Disability Compensation Benefits: 
  The VA Disability Compensation program is designed to provide monetary support to eligible Veterans suffering from service-connected disabilities. This financial assistance helps Veterans manage medical expenses, adapt to changes in their circumstances, and maintain a stable quality of life. <br>

- Importance of Equitable Distribution and Accessibility: 
  An equitable distribution of these benefits is essential not only for individual financial stability but also for reinforcing public trust in the support systems provided by government institutions. Ensuring that benefits are accessible to all eligible Veterans—regardless of their county of residence—is critical in addressing broader issues of social and economic inequality. <br>

### Objectives

- County-Level Analysis:
  To analyze the distribution of VA Disability Compensation benefits at the county level for fiscal year (FY) 2023, providing insights into which regions receive higher or lower levels of support. <br>

- Contextualization with Demographics: 
  To integrate county demographic data—such as total population, age distribution, and the presence of major metropolitan areas—to understand how these factors relate to the distribution of benefits. <br>

- Economic Context via CPI:
  To assess how state-level economic conditions, as measured by the Consumer Price Index (CPI) for FY 2023, may influence the adequacy and impact of the compensation payments. <br>

- Geospatial Insights: 
  To utilize geospatial visualization tools (GeoPandas and Folium) for mapping and identifying spatial patterns, clusters, and outliers in the distribution of benefits, thereby highlighting potential areas for policy intervention. <br>

By addressing these objectives, this study aims to provide a comprehensive evaluation of the accessibility and effectiveness of VA Disability Compensation benefits across U.S. counties, ultimately offering data-driven insights to inform policy recommendations. <br>

---

## II. Research Questions

This study seeks to address the following research questions: <br>

- Demographic Correlations: <br> 
  - What are the demographic characteristics (e.g., age, gender, and service-connected disability (SCD) ratings) of Veterans receiving VA Disability Compensation? <br>
  - How do these characteristics compare to the overall county demographics? <br>
  - Are there notable differences in benefit distribution based on specific demographic subgroups? <br>

- Urban vs. Rural Impact:  <br>
  - How does the concentration of urban populations relate to the distribution of VA Disability Compensation benefits? <br>
  - Do rural counties display significantly different compensation patterns compared to urban areas? <br>
  - Can variations in urbanization and population density explain disparities in benefit allocation? <br>

- Economic Influences: <br> 
  - What is the relationship between state-level Consumer Price Index (CPI) values and the adequacy of compensation payments across counties? <br>
  - How do variations in economic conditions, as captured by CPI, influence the distribution and relative impact of benefits? <br>
  - Is there evidence that higher cost-of-living areas receive benefits that are adjusted for economic differences? <br>

- Population Proportions: <br>
  - How do total county population figures and data on major metropolitan populations influence the proportion of Veterans receiving benefits? <br>
  - Are benefits distributed in proportion to the overall county and metropolitan population sizes? <br>
  - What patterns emerge when comparing compensation data against population metrics? <br>

---

## III. Data Sources
**Title: FY 2023 Disability Compensation Recipients by County** <br>
Format: CSV <br>
Website: https://catalog.data.gov/dataset/fy-2023-disability-compensation-recipients-by-county/resource/24855f5c-09c3-45c3-a3e8-f39418a090fd <br>
License: https://creativecommons.org/publicdomain/zero/1.0/ <br>
Source Hash: 50145116a8b3d67ccab4f36d60b93e5cc26f585352b59a3c7e61776fbbc10746 <br>

**Title: Annual Estimates of the Resident Population for Counties in the United States: April 1, 2020 to July 1, 2023** <br>
Format: XLSX <br>
Website: https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/counties/totals/co-est2023-alldata.csv <br>
License: https://creativecommons.org/publicdomain/zero/1.0/ <br>
Source Hash: e76605bc0a0164a9be9169edbebf6356059ddca2 <br>

**Title: United States Cities Database: January 23, 2025** <br>
Format: CSV <br>
Website: https://simplemaps.com/data/us-cities <br>
License: https://creativecommons.org/licenses/by/4.0/ <br>

**Title: Consumer Price Index for All Urban Consumers (CPI-U): U.S. city average, by expenditure category, December 2023** <br>
Format: XLSX <br>
Website: https://www.bls.gov/cpi/tables/supplemental-files/cpi-u-202312.xlsx <br>
Licnese: License: https://creativecommons.org/publicdomain/zero/1.0/ <br>

---

## IV. Data Collection & Preparation

In this section, we load, clean, merge, and transform our datasets to prepare them for analysis. We currently have four key datasets:

1. **VA Disability Compensation Data:**  
   - File: `FY_2023_Disability_Compensation_Recipients_by_County.csv`  
   - Contains county-level data on Veterans receiving compensation, along with some demographic details (age, gender, SCD ratings).

2. **County Population Estimates:**  
   - File: `co-est2023-pop.xlsx`  
   - Provides population estimates for counties (based on 2020 census records projected through 2024).

3. County Geospatial Data: 
   - Needed for GeoPandas/Folium: (WORKING IN PROGRESS)
   - Required a shapefile or GeoJSON file that includes county boundaries. (WORK IN PROGRESS)  
   - Download U.S. county boundaries from sources from the U.S. Census Bureau’s TIGER/Line Shapefiles. (WORK IN PROGRESS) 
   - File: county_shapefile.shp (Stored in your data folder). (WORK IN PROGRESS)


### IV. A. Loading Data

We’ll use Pandas to load the CSV and Excel files and GeoPandas for the geospatial data.



In [30]:
import pandas as pd
from IPython.display import display

# Load VA Disability Compensation Data
va_data = pd.read_csv('data/FY_2023_Disability_Compensation_Recipients_by_County.csv')

# Load County Population Estimates Data
pop_data = pd.read_excel('data/co-est2023-pop.xlsx')


### IV. B. Initial Data Inspection

Displaying the first few rows of each dataset.

In [33]:
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2000)
pd.set_option('display.max_colwidth', None)

print("VA Data Preview:")
print(va_data.head().to_string(), "\n")

# Check for missing values in each dataset and print the full output
print("Missing values in VA Data:")
print(va_data.isna().sum().to_string(), "\n")



VA Data Preview:
  FIPS code    State County Name  Total: Disability Compensation Recipients  SCD rating: 0% to 20%  SCD rating: 30% to 40%  SCD rating: 50% to 60%  SCD rating: 70% to 90%  SCD rating: 100%  Age: 17-44  Age: 45-64  Age: 65 or older    Male  Female
0     01001  Alabama     Autauga                                     2636.0                  438.0                   283.0                   332.0                   838.0             745.0       658.0      1182.0             795.0  2159.0   477.0
1     01003  Alabama     Baldwin                                     6329.0                 1549.0                   819.0                   845.0                  1760.0            1356.0      1372.0      2045.0            2911.0  5754.0   573.0
2     01005  Alabama     Barbour                                      604.0                  104.0                    51.0                    90.0                   198.0             161.0        99.0       234.0             271.0   528.0    

In [35]:
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2000)
pd.set_option('display.max_colwidth', None)

print("Population Data Preview:")
print(pop_data.head().to_string(), "\n")

# Check for missing values in each dataset and print the full output
print("Missing values in Population Data:")
print(pop_data.isna().sum().to_string(), "\n")


Population Data Preview:
            geographic area  april 1, 2020 estimates base          NaN          NaN          NaN          NaN
0             United States                   331464948.0  331526933.0  332048977.0  333271411.0  334914895.0
1  .Autauga County, Alabama                       58809.0      58915.0      59203.0      59726.0      60342.0
2  .Baldwin County, Alabama                      231768.0     233227.0     239439.0     246531.0     253507.0
3  .Barbour County, Alabama                       25229.0      24969.0      24533.0      24700.0      24585.0
4     .Bibb County, Alabama                       22301.0      22188.0      22359.0      21986.0      21868.0 

Missing values in Population Data:
geographic area                 0
april 1, 2020 estimates base    6
NaN                             6
NaN                             6
NaN                             6
NaN                             6 



### IV. C. Data Cleaning

Next, we examine and clean our datasets—addressing missing values, data suppression, and standardizing column names for easier merging.

In [32]:
# View the first few rows of the VA data
va_data.head()

# Check for missing values in the VA dataset
print(va_data.isna().sum())

va_data_cleaned = va_data.fillna({'age': 'Unknown', 'gender': 'Unknown'})

# Standardize column names for consistency (e.g., lower case and no extra spaces)
va_data_cleaned.columns = va_data_cleaned.columns.str.lower().str.strip()
pop_data.columns = pop_data.columns.str.lower().str.strip()


FIPS code                                      0
State                                          0
County Name                                    0
Total: Disability Compensation Recipients     19
SCD rating: 0% to 20%                        136
SCD rating: 30% to 40%                       316
SCD rating: 50% to 60%                       310
SCD rating: 70% to 90%                       143
SCD rating: 100%                             235
Age: 17-44                                   206
Age: 45-64                                   202
Age: 65 or older                              58
Male                                         554
Female                                       554
dtype: int64


## X. Supporting Documentation <br>
**Title: CO-EST2023-ALLDATA: Annual Resident Population Estimates, Estimated Components of Resident Population Change, and Rates of the Components of Resident Population Change for States and Counties. April 1, 2020 to July 1, 2023** <br>
Format: PDF <br>
Website: https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2020-2023/CO-EST2023-ALLDATA.pdf <br>
License: https://creativecommons.org/publicdomain/zero/1.0/ <br>

**Title: license_doc_uscities_csv_simplemaps_com.txt** <br>
Format: txt <br>
Website: Website: https://simplemaps.com/data/us-cities <br>
License: https://creativecommons.org/licenses/by/4.0/ <br>

---