Course Project for Getting and Cleaning Data
Enclosed is my submission for the "Getting And Cleaning Data" Course Project.
How to run:
- Retrieve the "run_analysis.R" script from the repository. If you need to run it yourself, it is easiest to retrieve all the same data in the repo directory as well.
- Add the directory containing both the R script and the input data to your R environment path. Example: If you pulled everything to the /home/derp/RProject directory, enter "setwd "/home/derp/RProject"" in an RStudio terminal.
- Source the R script so that it may be run: "source "run_analysis.R"" inside RStudio.
- Execute the script by running the "main" method in an R/RStudio terminal: "main()". This will write a new, tidy data set to the directory containing the R script that is called "tidyDataOutputSet.txt". If you want a different output name, you may specify that file name as the first function argument.
The general layout of the script is as follows:
- Read the following files with read.table():
- "subject_test.txt"
- "subject_train.txt"
- "y_test.txt"
- "y_train.txt"
- "X_test.txt"
- "X_train.txt"
- Perform a column-stack of the 3 test data sets to make a master test set
- Perform a column-stack of the 3 train data sets to make a master train set
- Perform a row-stack of the 2 master sets to create an entire merged set.
- Read in the "activity_levels.txt" file with read.table(). This generates the names of the activities corresponding to the numeric keys in the merged set.
- Map the activity names to numeric indices in the 2nd column of the merged set and apply the mapping to the merged set.
- Read the "features.txt" file to determine the name of each of the data fields.
- Append "SubjectNumber" and "ActivityLabel" to the front of that data field name list. This is because those fields appear first in the merged set.
- Apply the name list to the merged set data frame using the colnames method.
- Using regular expressions, discard from the merged set any data fields that do not contain "mean()" or "std()".
- Sort the data by "SubjectNumber" and "ActivityLabel".
- For each subject/activity combination, compute the mean of each data field and concatenate all of those observations into a single data frame. This represents your tidy data set.
- Write the output data set to the output file.