Skip to content

cbrodows/GettingAndCleaningData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

GettingAndCleaningData

Course Project for Getting and Cleaning Data

Enclosed is my submission for the "Getting And Cleaning Data" Course Project.

How to run:

  1. Retrieve the "run_analysis.R" script from the repository. If you need to run it yourself, it is easiest to retrieve all the same data in the repo directory as well.
  2. Add the directory containing both the R script and the input data to your R environment path. Example: If you pulled everything to the /home/derp/RProject directory, enter "setwd "/home/derp/RProject"" in an RStudio terminal.
  3. Source the R script so that it may be run: "source "run_analysis.R"" inside RStudio.
  4. Execute the script by running the "main" method in an R/RStudio terminal: "main()". This will write a new, tidy data set to the directory containing the R script that is called "tidyDataOutputSet.txt". If you want a different output name, you may specify that file name as the first function argument.

The general layout of the script is as follows:

  1. Read the following files with read.table():
  2. "subject_test.txt"
  3. "subject_train.txt"
  4. "y_test.txt"
  5. "y_train.txt"
  6. "X_test.txt"
  7. "X_train.txt"
  8. Perform a column-stack of the 3 test data sets to make a master test set
  9. Perform a column-stack of the 3 train data sets to make a master train set
  10. Perform a row-stack of the 2 master sets to create an entire merged set.
  11. Read in the "activity_levels.txt" file with read.table(). This generates the names of the activities corresponding to the numeric keys in the merged set.
  12. Map the activity names to numeric indices in the 2nd column of the merged set and apply the mapping to the merged set.
  13. Read the "features.txt" file to determine the name of each of the data fields.
  14. Append "SubjectNumber" and "ActivityLabel" to the front of that data field name list. This is because those fields appear first in the merged set.
  15. Apply the name list to the merged set data frame using the colnames method.
  16. Using regular expressions, discard from the merged set any data fields that do not contain "mean()" or "std()".
  17. Sort the data by "SubjectNumber" and "ActivityLabel".
  18. For each subject/activity combination, compute the mean of each data field and concatenate all of those observations into a single data frame. This represents your tidy data set.
  19. Write the output data set to the output file.

About

Course Project for Getting and Cleaning Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages