| title | README: Getting and Cleaning Data - Course Project |
|---|---|
| author | Hans Gonzalez |
| date | 14/12/2020 |
| output | html_document |
The purpose of this assignment is to demonstrate the ability to collect, work with and clean a data set.The goal is to prepare a tidy data set that can be used for later analysis.It is required to submit:
- a tidy data set as described below
- a link to GitHub repository with your script for performing the analysis
- a Code Book that describes the variables, data and any transformations on work that your performed.
- a README.md file in the repository that explains how all scripts work and how they are connected.
One of the most exciting areas in all of data science right now is wearable computing. Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data used in this analysis was collected from the Samsung Galaxy S smartphone. (see below for more information).
The experiments were performed with a group of 30 volunteers between ages of 19-48 years.Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, they captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments were video-recorded to label the data manually. The obtained dataset were randomly partitioned into two sets, where 70% of the volunteers were selected as the training data and 30% as the test data.
The Data set used in this project is called: "Human Activity Recognition Using Smartphones Dataset", version 1.0.
Released in 2013, as a Public Domain Dataset Human Activity Recognition Using Smartphones Dataset.
The credits and license are as follows:
Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto. Smartlab - Non Linear Complex Systems Laboratory DITEN - Università degli Studi di Genova. Via Opera Pia 11A, I-16145, Genoa, Italy. activityrecognition@smartlab.ws www.smartlab.ws
| File | Description |
|---|---|
| README.md | A file in markdown format displaying the overview of this project. |
| CodeBook.md | A file in markdown format that describes the variables (columns) contained in the data set called "tidydata.txt". |
| run_analysis.R | An R script that contains the R code used to transform, clean and subset the raw data and produce the file "tidydata.txt". |
| tidydata.txt | A text file containing the results of the course project as required in the instructions. |
Participants must develop an R script called run_analysis.R that:
- Merges the separate training and test data files to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Use descriptive activity names to name the activities in the data set.
- Appropriately label all variables in the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- Downloading the data file from the source.
- Uploading the data into R.
- Identify and analyze the data previous to merging files
- Create an independent "Training" and "Test" data set.
- Apply Labels to the previous data sets. (features.txt)
- Merge both "Training and Test" data sets.
- Extract only the mean and stdev. from the file.
- Use descriptive names in the "activity column" data.
- Appropriately rename the variable names in the file.
- Group the data by Subject and Activity, with the average of each variable.
- Store in an independent output file the results from previous steps.
The following code is provided in order to read properly the output file in R:
{..r.. newfile <- read.table("tidydata.txt", header= TRUE) View(newfile) ..}