UCI Human Activity Recognition dataset analysis
UCI's Machine Learning Repository maintains a collection of datasets available to the machine learning community for analysis and research.
As a starting point for the use of data wrangling functions in R, the Johns Hopkins' Getting and Cleaning Data course offered on Coursera starts off with an assignment to perform various data table formatting, merging and summarizing functions on the [UCI Human Activity Recognition Using Smartphones Data Set] (http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones).
- Create an R script named
run_analysis.R
that performs the steps below - Merges the
x_
,y_
andsubject_
data files that contain, respectively, the observations, the activities being recorded and the individual user/subject identifier - Merges the
train
andtest
datasets each of which contain a set ofx_
,y_
andsubject_
data files - Assigns the appropriate column headers to all imported files
- Matches up the activity descriptions with the activity numbers used in the
y_
data file - Of the 561 available columns, extract only the ones that contain
mean()
orstd()
in the column name - Tidy up the dataset, i.e. normalize as much as possible and
- Create another smaller dataset with the average measurement when grouped by the subject identifier, the activity and the measured variable.
Finally, the run_analysis.R
file, the datasets and a Readme file all go up on a github repository, thusly.
run_analysis.R
README.md
CodeBook.md
describing the variables, data and transformations used to arrive at the final solutions- Solution dataset #1 in long format (normalized) -
HAR_tidy_dataset.txt
- Solution dataset #2 with summary by subject, activity and measurement -
HAR_summary.txt
\data\UCI HAR dataset\
with all the source files