Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIP: Keep variables, data table headers, AND code values in one_word format #11

Open
adanieljohnson opened this issue Jan 30, 2019 · 2 comments
Labels
Tips Tips for the community

Comments

@adanieljohnson
Copy link

Best practice for naming columns in data tables is to give each column a one-word or snake_case title. This makes it easier to call in the columns as variables. I learned this applies to the code values entered in the columns too.

At our last meeting I said I was having trouble using cor.test to get Pearson correlations on word frequencies. I could calculate it for one part of my dataset but other subsets failed to run properly. Jerid pointed out I used text with spaces and punctuation to code values in my CSV source file and suggested re-coding to simpler one-word terms. I used Search/Replace in Excel to switch my coding terms from/to:

  • "1. Basic Criteria" to "1_basic"
  • "2. Writing Quality" to "2_writing"
  • Etc.

Either extra spaces and punctuation was the problem, or I had a hidden typo, but simplifying the code terms solved the problem.

@medewitt
Copy link
Contributor

@adanieljohnson great points and a good topic to discuss!

There is a function that can convert "improper" R names (e.g. spaces and invalid characters) to proper R names. It looks like the following:

names(your_data) <- make.names(names(your_data))

This function replaces all spaces with period and removes invalid characters. It is a quick trick to make proper title names.

Additionally, Karl Broman and Kara Woo wrote a neat journal article on organization of data in spreadsheets which is a great reference and located here. Both are avid R users AND biostats folks.

@francojc
Copy link

francojc commented Jan 30, 2019

The janitor package also has a slew of functions for examining and cleaning data, including clean_names() to deal with non-conventional column names in data.frames.

@medewitt medewitt added the Tips Tips for the community label Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tips Tips for the community
Projects
None yet
Development

No branches or pull requests

3 participants