GCD-Project

The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.

A R script called run_analysis.R transforms a splited dataset in a tidy dataset.

Merging datasets.
Extracting specific columns.
Using descriptive variables names.

Notes:

It is necessary download the dataset and unzip files inside the project folder.

DataSet

This dataset represents data collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at this site where the data was obtained. For more information about this dataset contact: activityrecognition@smartlab.ws

Output

A tidy dataset with the average of each variable for each activity and each subject.

Execution

mergeFeats function merges the training and the test sets (X) to create one data frame called "measures".

measures <- mergeFeats("UCI HAR Dataset/test/X_test.txt", "UCI HAR Dataset/train/X_train.txt")

Then, selectFeats function selects a set of features names (mean and standard deviation) specified in features.txt file. These features are sorted and concatenated in a vector called "features".

meanFeats <- selectFeats("UCI HAR Dataset/features.txt", "-mean\\(\\)")
stdFeats <- selectFeats("UCI HAR Dataset/features.txt", "-std\\(\\)")
features <- rbind(meanFeats, stdFeats)
features <- arrange(features, FeatId)

Afterwards, The data frame "measures" is composed by the set of measurements about the mean and standard deviation features.

measures <- select(measures, features[,1])

The loadActivities function loads activities Names and their ids in a data frame called "acts"

acts <- loadActivities("UCI HAR Dataset/activity_labels.txt")

The loadFiles function allows us to load the subject and the activities "y"" from training and test set. These data frames are concatenated too in one data frame called "table".

tableT <- loadFiles("UCI HAR Dataset/test/subject_test.txt", "UCI HAR Dataset/test/y_test.txt")
tableTr <- loadFiles("UCI HAR Dataset/train/subject_train.txt", "UCI HAR Dataset/train/y_train.txt")
table <- rbind(tableTr, tableT)

To facilitate reading, we use descriptive activity names to name the activities in the data set. Thus, the data frame "table" is merged with the activities Names "acts".

table <- merge(table, acts, by="ActivityId") 
table <- select(table, Volunteer,ActivityName)

This code labels the data set with descriptive variable names and concatenate the data frame "table" and the measures for each volunteer.

names(measures) <- features[,2] 
table <- bind_cols(table,measures)

Finally , we create a tidy data set with the average of each variable for each activity and each subject called "tidyTable".

tidyTable <- table 
%>% arrange(Volunteer) 
%>% group_by(Volunteer, ActivityName) 
%>% summarise_each(funs(mean))

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
CodeBook.md		CodeBook.md
GCD-Project.Rproj		GCD-Project.Rproj
README.md		README.md
run_analysis.R		run_analysis.R
tidyDataSet.txt		tidyDataSet.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCD-Project

DataSet

Output

Execution

About

Releases

Packages

Languages

LuchoPipe/GCD-Project

Folders and files

Latest commit

History

Repository files navigation

GCD-Project

DataSet

Output

Execution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages