Dandada1993 / gcdc_assignment Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Getting and Cleaning Data Course Assignment

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Codebook.md		Codebook.md
README.md		README.md
run_analysis.R		run_analysis.R

Repository files navigation

Getting and Cleaning Data

Peer-graded Assignment Submission

Name: Ralph P. Goddard

Submission Date: February 17, 2017

What's contained in the Git Hub repo

run_analysis.R - used to reproduce the transformations and analysis perfomed on the original dataset.
README.md
Codebook.txt

Explanation of steps taken in run_analysis.R

Load the dplyr library
Assign names of test files, training files, list of activities, variable names and subjects into variables.
Load the x test data using read.table with header=FALSE
Load the x train data using read.table with header=FALSE
Load the y test data using read.table with header=FALSE
Load the y train data using read.table with header=FALSE
Load variable names (features) using read.table with header=FALSE, stringsAsFactors=FALSE
Load activities with read.table with header=FALSE
Load test subject with read.table with header=FALSE
Load train subject with read.table with header=FALSE
Add column names to activities data frame
Add the column names to the x test and x train data frames
Add column names to the test subject and train subject data frames, called subjects
Use rbind to merge the x train and x test data frames
Use rbind to merge the y train and y test data frames
Make the column names in the merged x data data frame unique.
Use the following regular expression to select only the columns with mean and standard deviation data "\.(mean|std)\." with ignore.case=TRUE
Add column name to merged y data, called activity
Use rbind to merge the train subject and test subjects
Use cbind to merge the subjects, y data and x data
Use mutate to modify variable activity to a factor "activity = as.factor(activity)
Set the levels for the activity factor
Tidy up the variables names by replacing duplicate Body e.g. "BodyBody" with "Body"
Use a custom function "tidy_columnnames" to tidy the variables names
A regex was used to replace .. or ... with a single period
A regex was used to remove a period at the end of a variable name
A new data frame was created by grouping by subject and activity and then using summarise_all to calculate the mean of all variables.
Output the results of step 27 to a file called "tidy_ds.txt" with row.name=FALSE

About

Getting and Cleaning Data Course Assignment

Report repository

Releases

No releases published

Packages

No packages published

Languages

R 100.0%