data cleaning course from Coursera of John Hopkins
this script process the dataset downloaded from
"https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
according to the instructions from Getting and Cleaning Data Course Project.
files should not moved after unzipping the zip file, and this script work only in the unchanged, unzipped folder of the zip file
features.txt: indicate what each column is for each data set
activity_labels.txt: indicate which number stand for which activity
y_train.txt & y_test.txt: indicate the activity for each row,
converted from number to descriptive names,
as indicated in activity_labels.txt
X_train.txt & X_test.txt: the data sets
subject_train.txt & subject_test.txt: indicate the subject for each row
use grep to get features which are means and standard deviations
split data frame according to subject and activity
use lapply to subset and get average of each column
change list back to the data frame
append Subjects and Activities back to the data frame
label the columns properly
describe the names of the variables in ZF_Submission.txt
the processed clean data set
this file