Skip to content

chrisinoakland/data_preparation

Repository files navigation

Data Preparation

Repo Description

Using Python, SQL, and other tools to acquire, prepare, clean, and automate dataset creation.

Objectives

  1. Understand the lifecycle of a data analysis project.
  2. Perform techniques to acquire data from various sources, cleanse, analyze and automate that same data for various processing needs.
  3. Combine multiple data sources together for analysis.
  4. Extract meaning from disparate and large datasets.
  5. Construct questions that lead to deeper analysis.
  6. Present data with purposeful visualizations to tell the story.

Files

  1. GSS2018.sav - Raw downloaded GSS datafile
  2. GSSData.R - Script created to convert GSS data from SPSS format to CSV
  3. GSS2018_Original.csv - GSS data converted from SPSS format to CSV
  4. GSS_Codebook_index.pdf - Raw downloaded PDF file that explains header/column information
  5. GSS_Codebook_index_original.csv - PDF of GSS header/column information data converted from PDF format to CSV
  6. GSSHeaders.csv - Trimmed and cleaned up GSS header/column information from GSS_Codebook_index_original.csv (to limit the number of headers/columns and rows)
  7. GSS2018.csv - Trimmed GSS data file from GSS2018_Original.csv (to limit the number of headers/columns and rows)

Notes

  1. Exercise6_3.py - The primary Python file that contains all code that runs through the data cleaning tasks for this project.

  2. DSC540 Mid-Term Summary.pdf - A brief summary of the project.

About

Data preparation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published