Using Python, SQL, and other tools to acquire, prepare, clean, and automate dataset creation.
- Understand the lifecycle of a data analysis project.
- Perform techniques to acquire data from various sources, cleanse, analyze and automate that same data for various processing needs.
- Combine multiple data sources together for analysis.
- Extract meaning from disparate and large datasets.
- Construct questions that lead to deeper analysis.
- Present data with purposeful visualizations to tell the story.
- GSS2018.sav - Raw downloaded GSS datafile
- GSSData.R - Script created to convert GSS data from SPSS format to CSV
- GSS2018_Original.csv - GSS data converted from SPSS format to CSV
- GSS_Codebook_index.pdf - Raw downloaded PDF file that explains header/column information
- GSS_Codebook_index_original.csv - PDF of GSS header/column information data converted from PDF format to CSV
- GSSHeaders.csv - Trimmed and cleaned up GSS header/column information from GSS_Codebook_index_original.csv (to limit the number of headers/columns and rows)
- GSS2018.csv - Trimmed GSS data file from GSS2018_Original.csv (to limit the number of headers/columns and rows)
-
Exercise6_3.py - The primary Python file that contains all code that runs through the data cleaning tasks for this project.
-
DSC540 Mid-Term Summary.pdf - A brief summary of the project.