Skip to content

The Peer Reviewed course project for the Hopkins Getting and Cleaning Data course

Notifications You must be signed in to change notification settings

dunmireg/Getting-and-Cleaning-Data-Course-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data Course Project

This file describes the goals of the poject and the steps taken by the run_analysis.R script.

The project requires data from http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones which can be downloaded as a zip file here: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Please place the unzipped files into a directory named "Getting and Cleaning Data Peer Assessment", or if you would prefer another directory adjust the code accordingly. Please then create a sub-directory called Original_Data and place the unzipped files there. This will allow you to separate the files you generate from the original working files. Then set the "Getting and Cleaning Data Peer Assessment" as your working directory.

Source the run_analysis.R script. This will do the operations in the project description. There are 5 steps 1). Use rbind to combine the appropriate test and train data sets. For example, the X_test.txt file and x_train.txt file will be combined to create one file containing the x data (observations). There should be three files when you are done.

2). In this part of the script, run_analysis.R will take the features.txt file and extract the features that include mean() and std() as a numeric vector. Note, the number of columns in this subset will vary depending on the regular expression used. I chose to use mean() and std() resulting in 79 columns. This avoids using the angle measurements such as gravityMean. Next we will use the numeric subset of features to create a subset of the data (X files) that corresponds to the features including mean() and std(). This should result in a new X dataset where the only columns are those including mean() and std(). The names of the columns will then be properly formatted.

3). Next we will take the activity_labels.txt and properly format the text in column 2 to be ready for use. These labels each correspond to a numeric value in our y dataset from earlier, for example 1 = "walking". We add a column to the yLabel data frame and then using a cascading series of "if else" statements we replace NA in the yLabel dataset with the appropriate activity label. So if yLabel[1,1] = 1 then we set yLabel[1,2] = "walking" Finally we change the name in the y data to "activity_ID" and "activity"

4). Then we will change the name of the subjects column to "subject". Now all the data is properly formatted and we create a dataset called cleanResult, which is generated by column binding the subjects, yLabel, and dataX datasets.

5). Finally, we want to generate the average of each variable by subject and activity. Taking our cleanResult dataset from earlier, we use the melt() function from the reshape2 package to produce a long-format dataset with subject, activity_ID, and activity as ids. Then we call the dcast() function which returns a wide-format dataset using subject, activity_ID, and activity as ids and using the mean() function to aggregate the data. This final tidy dataset is written to the working directory as a text file

At the conclusion of the script you should see a text file called "final_Tidy_Means.txt" in your working directory, containing the average of each variable for each activity and each subject in a tidy format.

About

The Peer Reviewed course project for the Hopkins Getting and Cleaning Data course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages