This repository has been archived by the owner on Jun 23, 2020. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WIP] Add script to clean and combine data, and add data (#29)
* Add script to clean and combine data, and add data - Update survey data dictionary with left out questions - Update survey data dictionary with variable/column names for questions - Add script `clean-data.R` to clean and combine the two survey datasets into one for ease of analysis - Create the combined survey dataset after running `clean-data.R` - Create README.md file to explain cleaned data and the script to produce it - Update root README.md file to briefly explain data - Change `data/` directory to `raw-data/` * Move around functions and add more edits - Update date - Categorize functions into different categories - Utility functions - Sub-process functions - Main process functions - Main function - Update function descriptions - Add function to check survey data uses only one ID from each * Move cleaning of code events to own function * Create function to search and add col + formatting - Create function to search in a given column for search terms, then creates a new column labeling rows containing search terms - Reformat input data comments - Reformat NSE functions e.g. mutate_() * Create temp helper function to look at columns * Move reading data function to main processes * Create draft full dataset * Rename cleaning function and update joining key - The cleaning function `clean_part_1` was written for the first dataset. I've changed the function, along with the variables, to attend to the joined dataset. - Removing outliers for hours learning per week was simplified - Added usage case for `search_and_create()` function * Add feedback to user on script actions * Separate other job interests cleaning to function * Fix inconsistent indenting in helper function * Move cleaning other podcasts to separate function * Reorganize sub-cleaning functions to own category * Update helper function with flexible use Allow helper function to either default view the data, print data to console (printYes=1), or to print the number of instances * Create new columns for significant other podcasts - Update description of `clean_podcasts` function - Add more variations to “None” response - Add feedback to user on start and finish of function - Add new columns for podcasts that were mentioned >15 times * Separate a function for cleaning hours learned * Add feedback in cleaning code events & exp earning * Separate function for cleaning months programming * Separate function cleaning post bootcamp salary - Retain previous cleaning - Add in same normalizations from expected income * Separate function for cleaning money for learning * Add description to entire script * Floor values and remove outliers in money to learning * Create function for cleaning age * Initialize functions for columns needing cleaning * Create new boolean column for PodcastOther * Fix feedback message for cleaning hours learning * Update draft of complete data * Remove boolean Podcast Other column * Finish cleaning income and remove extras - Finished cleaning income function - Removed changing ExpectedEarning to integer - Remove unnecessary cleaning * Remove "Other" from new podcast cols * Finish cleaning commute times * Update code events cleaning to make new cols * Clean other resources * Update code events threshold to 1.5% frequency * Update detail on cutoff for other podcasts is 1.5% * Add Bootcamp Name into joining key * Add back in podcast and events from 2nd dataset * Make ages less than 10 to NA * Convert resources to boolean * Finish cleaning data with consistency check - Check for inconsistencies between job role interests - Remove unnecessary columns * Remove "Other" from new Podcast columns * Clean student debt owed * Add CodeEvent column to columns removed * Write final polish of data * Fix small spelling mistakes * Update final dataset * Remove first dataset * Update script date
- Loading branch information