Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable join of CMS to Census #31

Open
dportnoy opened this issue Apr 29, 2015 · 5 comments
Open

Enable join of CMS to Census #31

dportnoy opened this issue Apr 29, 2015 · 5 comments

Comments

@dportnoy
Copy link
Member

As identified by Lloyd Brodsky 4/22/2015


Potential improvements:

  1. Verbose column names: I’ve had to rename all the variables in order to import the data into a statistical package. The CMS spreadsheets were designed to be read by people, so the column names were very descriptive and very long with embedded spaces (illegal in databases and statistics tools). It would have been better if there were short names and descriptive names on separate rows so I could just delete the long ones.
  • Separate sheets: Each year is on a separate sheet in Excel, so these need to be combined into a single data set and pivoted to make them useful. Otherwise, PowerPivot is used to have separate worksheet for each year.
  • Joining CMS to Census data: The CMS data is listed by Hospital Referral Region, which Census is by zip code or census tract number. The Dartmouth Atlas project has another spreadsheet that crosswalks zip codes to hospital referral region. So I need to add up the several Census zip code data to a hospital referral region (there are many zip codes in a HRR). This would be a lot simpler if CMS reported the data out by zip code in the first place. Also would make the analysis more accurate because I could drill down to a lower level. A single hospital referral region has the population of a small city; the average zip code is 8,000 people.
  • Identifying dual eligibles: There’s just a variable saying what percentage of the people on the row are dual-eligible. Should would be nice if there were two rows, one Medicare only and one both. Dual eligible are a hot topic for which I have been unable to find an on-point public use file.
  • Adding shapefiles: In order to map the hospital referral regions I need the geometries of those regions. Dartmouth makes those freely available.
@lloydbrodsky
Copy link

First, let me clarify that this is in reference to the Geographic Variation Public Use file found at:

http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Geographic-Variation/GV_PUF.html

This was most recently updated 2/15/2015

It's widely believed that social attributes that don't show up in administrative OR clinical data have a major indirect impact in healthcare. For example, people that can't read in any language or who don't understand English well are less likely to follow medical advice than those that do. Similarly, people with access to transportation are more likely to be able to get to scheduled appointments than those that don't.

Census data has that kind of information reported out by a range of geographies, including state, county, census tracts, and zip codes.

Well, Census's American Community Survey has that information, among others, education level, primary language, and means of getting to work. The challenge is to link (join) that data to corresponding CMS data to attempt to explain prevalence and cost as a function of social variables.

Ideally, you'd like the key field to be a relatively small reporting unit such as zip code or census tract number. Unfortunately CMS's geographic variation data is reported out by Hospital Referral Region (one to many relationship between HRRs and zip codes) or counties. Zip codes and counties average 8,000 people each, which is large enough to enable reporting of very common diseases without violating privacy rules. Urban counties and HRRs are much bigger and commingle together rich and poor neighborhoods -- commingling precisely the social differences we are trying to analyze. Also, linking by HRR adds an additional step of needing to map HRR to a set of zip codes, retrieve relevant data for all those Census zip code tabulation areas and then add up

In the same vein, CMS commingles dual and single eligible beneficiaries. Every row indicates the percentage of beneficiaries who are dual eligible for both Medicare and Medicaid. It would be very helpful if this were split into two rows, one for dual eligibles and the other for Medicare-only.

@dportnoy
Copy link
Member Author

Created page for full use case specifications and solution: http://hhs.ddod.us/wiki/Use_Case_31

@betshsu
Copy link

betshsu commented Jan 15, 2016

Our colleagues at NIST and CNRI have been working on "mashups" of various datasets and had done prior work integrating Census data with FEMA. They approached HHS about experimenting with mashups of HHS datasets, so we suggested this use case as a possible candidate.

CNRI has piloted combining the CMS data and the Census/Dartmouth Atlas data, with maps that display hospital service area boundaries that have overlays of the Census and CMS data and the ability to create custom queries from the Census data and the medicare Part A/B data. At the moment, it's unclear the best path forward to be able to make such a tool public, but we have further discussion scheduled with them at the end of January to discuss technical details.

@marks
Copy link

marks commented Jan 15, 2016

@betshsu what type of CMS/Census data and use cases are NIST and CNRI working on?

My colleagues and I have experience and ideas for this but the technical details and user experience nearly always depends on the datasets needing to be "joined."

Quick and simple example: http://www.opendatanetwork.com/region/0400000US48-0500000US48453-0400000US51-0500000US51013/Texas-Travis_County_TX-Virginia-Arlington_County_VA/health -- toggle between the "Population" and "Health" tabs for Census/ACS and RWJF County Health Rankings data, respectively

@betshsu
Copy link

betshsu commented Jan 15, 2016

@marks NIST/CNRI started with the datasets mentioned in the wiki here: http://hhs.ddod.us/wiki/Use_Case_31:_Explaining_utilization_with_demographic_data

From their prior work, they already had a live connection to the Census dataset; they downloaded the Dartmouth Atlas data and did some processing to make it compatible with their Census data and to make it available in an interactive way. The CMS data they pulled was the Geographic Variation PUF; they built their own database of the CMS data from the PUF to have a "live" connection to that data. There are some gaps where they could fit the datasets together, or where they had to hardcode some of the information rather than being able to query the databases on the fly. Unfortunately, I'm probably not the best person to communicate the technical details of the pilot project.

The work you and your colleagues have done is interesting and is another good resource for folks to explore to link social variables with health status; do you have any examples studies that you could share where your tool has been utilized? I think that could be very helpful for folks in thinking about the power and possibilities with these types of combined datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants