Getting and Cleaning Data Course Project
This file describes the goals of the poject and the steps taken by the run_analysis.R script.
The project requires data from http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones which can be downloaded as a zip file here: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Please place the unzipped files into a directory named "Getting and Cleaning Data Peer Assessment", or if you would prefer another directory adjust the code accordingly. Please then create a sub-directory called Original_Data and place the unzipped files there. This will allow you to separate the files you generate from the original working files. Then set the "Getting and Cleaning Data Peer Assessment" as your working directory.
Source the run_analysis.R script. This will do the operations in the project description. There are 5 steps 1). Use rbind to combine the appropriate test and train data sets. For example, the X_test.txt file and x_train.txt file will be combined to create one file containing the x data (observations). There should be three files when you are done.
2). In this part of the script, run_analysis.R will take the features.txt file and extract the features that include mean() and std() as a numeric vector. Note, the number of columns in this subset will vary depending on the regular expression used. I chose to use mean() and std() resulting in 79 columns. This avoids using the angle measurements such as gravityMean. Next we will use the numeric subset of features to create a subset of the data (X files) that corresponds to the features including mean() and std(). This should result in a new X dataset where the only columns are those including mean() and std(). The names of the columns will then be properly formatted.
3). Next we will take the activity_labels.txt and properly format the text in column 2 to be ready for use. These labels each correspond to a numeric value in our y dataset from earlier, for example 1 = "walking". We add a column to the yLabel data frame and then using a cascading series of "if else" statements we replace NA in the yLabel dataset with the appropriate activity label. So if yLabel[1,1] = 1 then we set yLabel[1,2] = "walking" Finally we change the name in the y data to "activity_ID" and "activity"
4). Then we will change the name of the subjects column to "subject". Now all the data is properly formatted and we create a dataset called cleanResult, which is generated by column binding the subjects, yLabel, and dataX datasets.
5). Finally, we want to generate the average of each variable by subject and activity. Taking our cleanResult dataset from earlier, we use the melt() function from the reshape2 package to produce a long-format dataset with subject, activity_ID, and activity as ids. Then we call the dcast() function which returns a wide-format dataset using subject, activity_ID, and activity as ids and using the mean() function to aggregate the data. This final tidy dataset is written to the working directory as a text file
At the conclusion of the script you should see a text file called "final_Tidy_Means.txt" in your working directory, containing the average of each variable for each activity and each subject in a tidy format.