Merge APS investigation outcomes into DETECT data #39

mbcann01 · 2024-03-25T16:42:15Z

corvidfox · 2024-05-10T22:17:11Z

Initial review of the APS subject identifier data appeared to indicate that the variable (client_id) would be readily valid, with a few "failed matches" noted due to multiple (client_id) values associated with certain (case_id) values (each case should only belong to a single subject). Data largely appeared to be clean and ready to use.

However, further examination has found significant typographical and other errors in the identifier data fields and a much larger degree of "failed matches" for subject-id values. As such, the data requires significant cleaning in preparation for fuzzy-matching algorithm application and a within-set APS subject ID would need to be created.

Cleaning of APS data is underway.

Name fields:

"Unknown" or equivalent values ("none", "not applicable", "don't know", etc)
Numbers in name fields
Nicknames or comments in name fields
Titles in name fields
Suffixes in name fields
Ensuring consistency in the use of white space (single white spaces, no leading/trailing white space)
Verifying uses of hyphens, single quotes, double quotes
Unexpected/non-standard characters

Potentially valuable information (such as "female" if a name was given as "unknown female" or a suffix trimmed from a name value) is being shifted to a comment field, so it is available in manual review of fuzzy-match pairs.

Additionally, some exploration of address values has been completed.

ZIP Code-State Validation (using publicly available USPS data indicating which States are covered by each ZIP Code)

corvidfox · 2024-05-17T21:08:39Z

corvidfox · 2024-05-24T21:58:30Z

corvidfox · 2024-06-07T22:03:26Z

As of today, further progress has been made in cleaning/standardizing the APS Client data

Address fields are pending only street address and street unit cleaning/validation completion. Street Addresses are a bear.

corvidfox · 2024-06-14T21:52:15Z

corvidfox · 2024-06-21T22:11:34Z

mbcann01 added the data wrangling A data wrangling task label Mar 25, 2024

mbcann01 mentioned this issue Mar 27, 2024

Preliminary DETECT R01 analysis #40

Open

6 tasks

mbcann01 assigned corvidfox May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge APS investigation outcomes into DETECT data #39

Merge APS investigation outcomes into DETECT data #39

mbcann01 commented Mar 25, 2024 •

edited by corvidfox

Loading

corvidfox commented May 10, 2024 •

edited

Loading

corvidfox commented May 17, 2024 •

edited

Loading

corvidfox commented May 24, 2024

corvidfox commented Jun 7, 2024

corvidfox commented Jun 14, 2024

corvidfox commented Jun 21, 2024

Merge APS investigation outcomes into DETECT data #39

Merge APS investigation outcomes into DETECT data #39

Comments

mbcann01 commented Mar 25, 2024 • edited by corvidfox Loading

Overview

Links

Tasks

corvidfox commented May 10, 2024 • edited Loading

corvidfox commented May 17, 2024 • edited Loading

corvidfox commented May 24, 2024

corvidfox commented Jun 7, 2024

corvidfox commented Jun 14, 2024

corvidfox commented Jun 21, 2024

mbcann01 commented Mar 25, 2024 •

edited by corvidfox

Loading

corvidfox commented May 10, 2024 •

edited

Loading

corvidfox commented May 17, 2024 •

edited

Loading