-
Notifications
You must be signed in to change notification settings - Fork 0
amarchin/runanalysis
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
---
title: "README"
author: "Andrea Marchini"
date: "26/10/2014"
output: pdf_document
---
One of the most exciting areas in all of data science right now is wearable computing. Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users.
The purpose of the run_analysis.R script is to prepare tidy data that can be used for later analysis starting from raw data collected from the accelerometers from the Samsung Galaxy S smartphone.
In order to use the script just type
```{r, eval=F}
run_analysis()
```
after the script sourcing.
**You need the *data.table* and *plyr* libraries to use run_analysis.R script.**
The run_analysis.R script does the following:
1- Downloads the data for the project in a temporary folder,
```{r, eval=F}
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
# creates a temporary directory
td = tempdir()
# creates the placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")
# downloads into the placeholder file
download.file(fileUrl, tf, method = "curl")
```
2- Unzips and reads the files,
```{r, eval=F}
# gets the name of the files in the zip archive
fname = unzip(tf, list=TRUE)$Name
# unzips the files to the temporary directory
unzip(tf, files=fname, exdir=td, overwrite=TRUE)
# fpath is the full path to the extracted files
fpath = file.path(td, fname)
#reads the files
features <- fread(fpath[2])
train <- read.table(fpath[31])
subject_train <- fread(fpath[30])
activity_train <- fread(fpath[32])
test <- read.table(fpath[17])
subject_test <- fread(fpath[16])
activity_test <- fread(fpath[18])
activity_labels <- fread(fpath[1])
```
3- Merges the training set, the test set, the activity IDs and the subject IDs to create one data set with labels from the raw data set,
```{r, eval=F}
#creates the data frame with the files
data <- rbind(train,test)
subject <- rbind(subject_train,subject_test)
activity <- rbind(activity_train,activity_test)
DT <- cbind(data,subject,activity)
colnames(DT)[1:561]<-features$V2
colnames(DT)[562:563]<-c("Subject", "Activity")
```
4- Uses a clean version of descriptive activity names to name the activities in the data set,
```{r, eval=F}
#maps ID activities with a clean version of the activity names
for (i in 1:6) {
DT$Activity[DT$Activity == activity_labels$V1[i]] <-
sub("_", "", tolower(activity_labels$V2[i]))
}
```
5- Extracts only the measurements on the mean and standard deviation for each measurement,
```{r, eval=F}
#subset of DT for means and standard deviations
mean_index <- grep("mean\\(", colnames(DT))
std_index <- grep("std", colnames(DT))
index <- c(mean_index,std_index,562,563)
subDT <- DT[,index]
```
6- Appropriately labels the data set with descriptive variable names,
```{r, eval=F}
#loop on measurements columns
for (j in 1:66) {
#cleaning of the column names
name <- strsplit(colnames(subDT)[j], "-")
if (name[[1]][2] == "mean()") {
name[[1]][2] <- "Mean"
} else if (name[[1]][2] == "std()") {
name[[1]][2] <- "Std"
}
if (length(name[[1]]) == 3) {
colnames(subDT)[j] <- paste(name[[1]][1],name[[1]][2],name[[1]][3], sep = "")
} else {
colnames(subDT)[j] <- paste(name[[1]][1],name[[1]][2], sep = "")
}
#loop is still open
```
6- Creates the tidy data set with the average of each variable for each activity and each subject,
```{r, eval=F}
#it is still the previous loop
#creation of the final tidy data frame
if (j == 1) {
FDF <- ddply(subDT, c("Subject","Activity"), function(d) data.frame(mean=mean(d[,colnames(d)[1]])))
} else {
df <- ddply(subDT, c("Subject","Activity"), function(d) data.frame(mean=mean(d[,colnames(d)[j]])))
FDF[,j+2] <- df[,3]
}
} #loop is now closed
#Appending the colnames to the final data frame
colnames(FDF)[3:68] <- colnames(subDT)[1:66]
#writing of tidy data in run_analysis.txt file
write.table(FDF, "run_analysis.txt", row.names = F, quote = F)
```About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published