Skip to content
Using Exploratory Data Analysis to understand Gender Pay Gap among Data Scientist in the US and India
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.ipynb_checkpoints
data
images
.gitattributes
README.md
analysis.ipynb

README.md

Kaggle 2018 Survey Analysis - Exploring the Gender Pay Gap among Data Scientists in the US and in India

Installation

This project is using the Anaconda distribution of Python version 3.7. Libraries used are Pandas, Matplotlib, Numpy, and Seaborn. For development testing, PixieDebugger (https://pixiedust.github.io/pixiedust/install.html) was used as well.

Project Motivation

This project is part of my completion of the Udacity Data Science Nanodegree. The goal is to select a data set and to analyze it and provide a writeup of the analysis in a blog post.

Kaggle's second annual survey of platform users created a very rich data set on individual users' demographics and experience in the data science and data analysis space.

The goal of this analysis is to perform exploratory data analysis on the data set. Specifically, the country, age, and pay distributions of the survey takers are analyzed.

File Descriptions

  • data folder: source data for the analysis, downloaded from https://www.kaggle.com/kaggle/kaggle-survey-2018/
  • images folder: charts for the blog post are saved in this folder
  • analysis.ipyb: main Python workbook containing the full analysis workflow

Results

A summary writeup of the results is published at https://flolytic.com/blog/gender-pay-gap-among-data-scientists-on-kaggle

Licensing

The source data is from Kaggle and can be found at https://www.kaggle.com/kaggle/kaggle-survey-2018/

You can’t perform that action at this time.