Kaggle 2018 Survey Analysis - Exploring the Gender Pay Gap among Data Scientists in the US and in India
Installation
This project is using the Anaconda distribution of Python version 3.7. Libraries used are Pandas, Matplotlib, Numpy, and Seaborn. For development testing, PixieDebugger (https://pixiedust.github.io/pixiedust/install.html) was used as well.
Project Motivation
This project is part of my completion of the Udacity Data Science Nanodegree. The goal is to select a data set and to analyze it and provide a writeup of the analysis in a blog post.
Kaggle's second annual survey of platform users created a very rich data set on individual users' demographics and experience in the data science and data analysis space.
The goal of this analysis is to perform exploratory data analysis on the data set. Specifically, the country, age, and pay distributions of the survey takers are analyzed.
File Descriptions
- data folder: source data for the analysis, downloaded from https://www.kaggle.com/kaggle/kaggle-survey-2018/
- images folder: charts for the blog post are saved in this folder
- analysis.ipyb: main Python workbook containing the full analysis workflow
Results
A summary writeup of the results is published at https://flolytic.com/blog/gender-pay-gap-among-data-scientists-on-kaggle
Licensing
The source data is from Kaggle and can be found at https://www.kaggle.com/kaggle/kaggle-survey-2018/