Skip to content

Includes cumulative county-level positive COVID-19 cases delineated by race and ethnicity

License

Notifications You must be signed in to change notification settings

BroadStreet-Health/Race-and-Ethnicity-Data

Repository files navigation

Broadstreet COVID-19 Data Project

Health Equity

In March of 2020, COVID-19 spread through the United States. To combat the absence of data in the pandemic, the COVID-19 Data Project was developed by the Broadstreet team. Initially, the project included daily numbers data. It has since expanded to include data on race and ethnicity, hereon referred to as Health Equity, policy information, and various other intern-led research projects related to the COVID-19 pandemic. That information can be found here:

The Centers for Disease Control and Prevention (CDC) reports social inequality and health systems issues as a cause for an increased risk of health and socioeconomic impacts as a result of COVID-19 for these groups1. Data reporting for race began in early April, with Louisiana being the first to report data2. Immediately, disparities in mortality deaths were noticed, and a June 2020 report by the CDC confirmed this disparity was widespread3. The need for the aggregate collection of race and ethnicity data became apparent, and the Health Equity team for the COVID-19 Data Project was created in mid-June. The process is ongoing with more counties being added to the dataset each month.

The purpose of the Health Equity team is to collect and report all race and ethnicity data for confirmed cases of COVID-19 across the United States.We publish our data monthly. We report this dataset to include the disparate impacts of COVID-19 on different racial and ethnic groups.

Since February 2021, we have noticed that many states have been reporting COVID-19 case rates for race/ethnicity less frequently. For example, in March, Oklahoma went from reporting new data daily to reporting it weekly. In fact, some states have completely stopped reporting race and ethnicity data. As a result, we have decided to stop reporting this data. The available dates for each state is described in Table 3.

Structure of the Dataset

Definitions

  • Race = a social grouping of people who have similar physical or social characteristics that are generally considered by society as forming a distinct group4
  • Ethnicity = a social group that shares a common and distinctive culture, religion, language, or the like5

Dataset

Data Entry

Each day, volunteer interns on the Health Equity team enter counts of confirmed cases for the United States counties that are reporting race and ethnicity data. Each county has its own column with race and ethnicity data for each day. To limit errors and increase ease of use, weeks are broken up into separate sheets. The team relies on quality assurance of other team members to ensure there are no lapses in data. As an extra layer of assurance, an additional team member checks all counties and keeps a back-up of any county that does not keep historic data. For some counties, historic data is available and can be retrieved. Once a month, team leads do a sweep of data and engage team members to fill all gaps with historic data sets.

Data Source

Interns collect data from health department sites at both state and county levels, depending on the level of data collection and reporting that the county and state utilize.

Data Collection & Input

Data is collected every day for each county in the United States that is reporting at least some race and ethnicity data. To ensure all reporting counties are included, new recruits sweep for new counties at the start of each month.

Data is collected as a cumulative count rather than a daily count. When encountered, only residential data is recorded. Multiracial and biracial categories are included in the “2+ races” category with a note added denoting this discrepancy in categorization. When counties report “Refused to Answer,” this is recorded in “Other” and a note is added to that county’s data going forward.

When counties report data as a percentage, the percentage for each category is multiplied by the total number of confirmed cases to determine the raw data.

When data is not available, often from lack of collection for that category, a - symbol is used to denote this. When 0 cases are reported in a category, a 0 is used as the placeholder.

Coding:

Table 1. Race Data
Variable Name Variable Description
White A person having origins in any of the original peoples of Europe, the Middle East, and North Africa6
Black/ African American A person having origins in any of the black racial groups of Africa6
Asian A person having origins in any of the original peoples of the Far East, Southwest Asia, or the Indian subcontinent6
American Indian/ Alaska Native A person having origins in any of the original people of North and South America (including Central America) and who maintains tribal affiliation of community attachment6
Native Hawaiian/ Pacific Islander A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands6
2+ Races A person with parents from two or more races7
Other A person identifying as any other race
Unknown A person whose race is not known, identified, and/ or recorded
Table 2. Ethnicity Data
Variable Name Variable Description
Hispanic A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race8
Non-Hispanic A person not of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race
Not Specified A person whose ethnicity is not known, identified, and/or recorded

*Ethnicity data is inclusive of all race categories.

Table 3. State-Specific Information
State Data Lag Historic Data Reporting Anomalies Date Stopped Reporting
Alabama Not updated on weekends No Race: only white, black, and other May 12, 2021
Arizona - Yes - May 12, 2021
California Yes No Each county updates differently (use county websites) May 12, 2021
Colorado Count Variation: some update irregularly - - May 12, 2021
Delaware - - - June 30, 2021
Florida - No Automated June 1, 2021
Georgia - No Automated
Does not report American Indian, Native Hawaiian, 2+ races
May 12, 2021
Idaho - - - May 12, 2021
Illinois - Yes (about 10 days) Automated
Race: no multiracial
Ethnicity: only Hispanic
May 12, 2021
Indiana - No Data Entered through Scripts May 12, 2021
Iowa - - -
Kansas - - - May 12, 2021
Kentucky Yes Yes Each county updates differently (use county websites) May 12, 2021
Louisiana Updates only on Wednesdays No Race: only white, black, other, and unknown May 12, 2021
Maryland County Variation: some update weekly, irregularly, or not at all No - May 12, 2021
Michigan Not updated on Sundays No Race: Asian and Pacific Island are combined

Ethnicity: no data

May 12, 2021
Mississippi - Yes Race: No mutliracial or Hawaiian; data is separated by ethnicity May 12, 2021
Missouri Daily updates with a 3-day lag No - May 12, 2021
Nebraska - - - May 12, 2021
Nevada County Variation: some update weekly, irregularly, or not at all - - May 12, 2021
New Mexico County Variation: some update weekly, irregularly, or not at all No - May 12, 2021
North Carolina - No Does not report Native Hawaiian or 2+ races May 12, 2021
Ohio - No Race & Ethnicity: Combines Refused to Answer and Unknown into “Unknown” May 12, 2021
Oklahoma - No Race: Multiracial and Other combined into “Other,” no Hawaiian

Ethnicity: no data

May 12, 2021
Oregon - - - May 12, 2021
Pennsylvania County Variation: some update irregularly No - May 12, 2021
South Carolina - Yes Race: Does not include multiracial; combines Asian, American Indian, and Native Hawaiian into “Asian” June 30, 2021
Tennessee County Variation: some update on a weekly basis No Updates Per 100,000; American Indian and Alaskan Combined, Other and Multiracial are Combined June 30, 2021
Texas County Variation: some update on a weekly basis No Each county updates differently May 12, 2021
Virginia - Yes Automated; Reports by health district but is converted into county level (see methods below) May 12, 2021
Washington - - - May 12, 2021
West Virginia - No Race: Only white, black and other June 30,2021
Wisconsin - - Automated; Does not report native Hawaiian/Pacific Islander or Other May 12, 2021

Methodology

Challenges

Reporting Method Variation

The first challenge the team encountered was the discrepancies in reporting between counties. This includes the reporting of data as both raw data counts and percentages of the total confirmed cases. In addition, there are variations in categorization and reporting. For instance, some counties report 2+ races as biracial or multiracial or do not report one of our designated categories at all.

Combining Counties into Health Districts

Another challenge that we encountered is that Virginia reports case numbers on race and ethnicity on a health district level instead of counties. A health district is a combination for multiple counties. When the Virginia Department of Health (VDH) was asked about this, they stated that Sections 32.1-36, 32.1-38, and 32.1-41 of the Code of Virginia required the VDH to protect the anonymity of people.

This was an issue because health districts do not have FIPS codes, which are unique codes that identify U.S states and counties. FIPS codes are important for data analysis, so we converted the data presented on the health district level into county level.

In order to resolve this issue, we determined the proportion of each race and ethnicity that is present in each county of the health district, using the American Community Survey data from 2018. Using these proportions, we calculated the approximate number of cases by race and ethnicity in each county of Virginia. Our uploaded data currently reports the data for Virginia at a county level, but it is an approximation so that must be taken into consideration when analyzing the data.

Data Releases

The next challenge interns face is the complications of a technology-based reporting system. Throughout the project, there have been technical difficulties with sites crashing, unclear reporting times based on test-updates. There is also wide variation in when counties report ranging from weekly to hourly updates. For that reason, we record data daily and those counties reporting less frequently are denoted.

In addition, our data is currently in the long format. We have SAS code available on our GitHub to convert this data into wide format. It can be found under the repository titled “open-source-contributions” under the file named “Race-and-Ethnicity ConvertorCode”.

References

  1. Centers for Disease Control and Prevention. Health Equity Considerations and Racial and Ethnic Minority Groups. Centers for Disease Control and Prevention website. Accessed September 15, 2020. https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/race-ethnicity.html
  2. Villarosa L. ‘A Terrible Price’: The Deadly Racial Disparities of COVID-19 in America. _The New York Times. _April 29, 2020. Accessed September 15, 2020. https://www.nytimes.com/2020/04/29/magazine/racial-disparities-covid-19.html
  3. Stokes EK, Zambrano LD, Anderson KN, et al. Coronavirus Disease 2019 Case Surveillance -- United States, January 22-May 30, 2020. MMWR Morb Mortal Wkly Rep 2020;69:759-765. http://dx.doi.org/10.15585/mmwr.mm6924e2
  4. Barnshaw,J.Race. InSchaefer,RichardT.,ed.EncyclopediaofRace, Ethnicity, and Society. 1. Thousand Oaks, CA: SAGE Publications; 2008:1091.
  5. Dictionary.com. Ethnicity. Dictionary.com website. Accessed August 20, 2020. https://www.dictionary.com/browse/ethnicity
  6. United States Census Bureau. Race. United States Census Bureau website. Accessed September 15, 2020. https://www.census.gov/topics/population/race/about.html
  7. Merriam-Webster. Biracial. Merriam-Webster website Accessed August 20, 2020. https.//www.merriam-webster.com/dictionary/biracial
  8. UNited States Census Bureau. Ispanic or Latino Origin. United States Census Bureau website. Accessed September 18, 2020. https://www.census.gov/quickfacts/fact/note/US/RHI725219

Suggested Citation

When using data images, downloaded data, or shared document formats, please attribute BroadStreet as well as the original source, when applicable. For examples and more information, review this article which answers the question "How do I cite BroadStreet?"

Contributors

Tom Schmitt, PhD, Tracy Flood, MD, PhD, Sabrine Benzakour, Aisha Saleem, Sydney Myers. A full list of the Broadstreet Covid-19 Data Project volunteers can be found here: https://covid19dataproject.org/team-2/

About

Includes cumulative county-level positive COVID-19 cases delineated by race and ethnicity

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages