You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Best practice for naming columns in data tables is to give each column a one-word or snake_case title. This makes it easier to call in the columns as variables. I learned this applies to the code values entered in the columns too.
At our last meeting I said I was having trouble using cor.test to get Pearson correlations on word frequencies. I could calculate it for one part of my dataset but other subsets failed to run properly. Jerid pointed out I used text with spaces and punctuation to code values in my CSV source file and suggested re-coding to simpler one-word terms. I used Search/Replace in Excel to switch my coding terms from/to:
"1. Basic Criteria" to "1_basic"
"2. Writing Quality" to "2_writing"
Etc.
Either extra spaces and punctuation was the problem, or I had a hidden typo, but simplifying the code terms solved the problem.
The text was updated successfully, but these errors were encountered:
There is a function that can convert "improper" R names (e.g. spaces and invalid characters) to proper R names. It looks like the following:
names(your_data) <- make.names(names(your_data))
This function replaces all spaces with period and removes invalid characters. It is a quick trick to make proper title names.
Additionally, Karl Broman and Kara Woo wrote a neat journal article on organization of data in spreadsheets which is a great reference and located here. Both are avid R users AND biostats folks.
The janitor package also has a slew of functions for examining and cleaning data, including clean_names() to deal with non-conventional column names in data.frames.
Best practice for naming columns in data tables is to give each column a one-word or snake_case title. This makes it easier to call in the columns as variables. I learned this applies to the code values entered in the columns too.
At our last meeting I said I was having trouble using
cor.test
to get Pearson correlations on word frequencies. I could calculate it for one part of my dataset but other subsets failed to run properly. Jerid pointed out I used text with spaces and punctuation to code values in my CSV source file and suggested re-coding to simpler one-word terms. I used Search/Replace in Excel to switch my coding terms from/to:Either extra spaces and punctuation was the problem, or I had a hidden typo, but simplifying the code terms solved the problem.
The text was updated successfully, but these errors were encountered: