project for 'getting and cleaning data' Coursera course
the script "run_analysis.R" performs the following operations:
- data import -- features.txt: variable names -- activity_labels.txt: names of the 6 activities -- X_train.txt: training data set -- y_train.txt: activity indices (1-6) corresponding to rows in the training set -- subject_train.txt: numerical identifiers of the study subjects -- X_test.txt: testing data set -- y_test.txt: activity indices (1-6) corresponding to rows in the testing set
- Merge the training and the test sets to create one data set
- Extracts only the measurements on the mean and standard deviation for each measurement -- these subsets were selected using grep()
- Uses descriptive activity names to name the activities in the data set -- the following names were used: 'walk', 'walkUp', 'walkDown', 'sit', 'stand', 'lay'
- Appropriately labels the data set with descriptive variable names -- the corresponding names from features.txt were used
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject -- activity and subject data were included -- the mean for each variable (involving mean and sdev) was computed for each subject over the 6 activities
- requested data were output into a text file, projectData.txt
Code book describing the variables -- the variables were described in the file features_info.txt, the data from that file are pasted below:
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.
Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).
Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).
These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.
tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag
The set of variables that were estimated from these signals are:
mean(): Mean value std(): Standard deviation mad(): Median absolute deviation max(): Largest value in array min(): Smallest value in array sma(): Signal magnitude area energy(): Energy measure. Sum of the squares divided by the number of values. iqr(): Interquartile range entropy(): Signal entropy arCoeff(): Autorregresion coefficients with Burg order equal to 4 correlation(): correlation coefficient between two signals maxInds(): index of the frequency component with largest magnitude meanFreq(): Weighted average of the frequency components to obtain a mean frequency skewness(): skewness of the frequency domain signal kurtosis(): kurtosis of the frequency domain signal bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window. angle(): Angle between to vectors.
Additional vectors obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
gravityMean tBodyAccMean tBodyAccJerkMean tBodyGyroMean tBodyGyroJerkMean
The complete list of variables of each feature vector is available in 'features.txt'