Skip to content

adrianlievano/kaggle_data_science_2018_survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Notebook Building Visualizations for Kaggle's Annual Machine Learning & Data Science Survey

An exploratory analysis of Kaggle's 2018 Data Science Survey with over 20,000 respondents. Please navigate to the 2018_data folder and select data science notebook #1. This is the file with refactored code and helper functions to parse the survey data.

Executive Summary:

Data is everywhere -- in every industry, country, organization, and user of digital applications, data and the way we store, process, analyze, and share its insights with others can be used for great benefit. Executing a data strategy starts by building a stronger data culture. To build a stronger data culture, understand the differences between individual contributors in a data team, build your technical infrastructure to support collaboration across company departments, and always consider the impact of data and organizational bias when pursuing data projects.

The Motivation

Data is everywhere -- in every industry, country, organization, and user of digital applications, data and the way we store, process, analyze, and share its insights with others can be used for great benefit. Leaders across companies and prospective job seekers interested in information are on fertile grounds: the cost of data storage is exponentially decreasing, the amount and velocity of data is increasing, and the algorithms that open the valve on this spigot of value are more accessible with modern programming frameworks. To capture this value, however, companies face considerable challenges such as hiring and retaining talent, using an organization’s structured and unstructured datasets, and much more. The best way to tackle these problems is to have a data strategy: a strategy for organizing, governing, analyzing, and deploying an organization’s information assets11.

A data strategy has multiple parts: addressing compliance and security, creating new products and services, or developing organizational analytics capabilities to name a few. A crucial element in creating an effective data strategy, however, starts by creating your data culture; it influences the competitive advantage when your bring talent, tools, and decision making together. There are multiple surveys of c-suite executives from various Fortune 500 companies, each adding a unique understanding of the makings of a strong data culture [16]. In this report, however, we add to the conversation by providing insight into building technical teams and how understanding the nuances of your data defines your data culture. As a result, I aim to empower executives with the insights to build data-driven cultures.

The Data

The annual industry-wide Kaggle Data Science & Machine Learning survey contains 16,000, 23, 859, and 19,717 responses in 2017, 2018, and 2019, respectively2, 3, 4. A Kaggle data science notebook and jupyter notebook is used to analyze the survey fields. This report focuses on self-reported Software Engineer, Data Engineer, and Data Scientist respondents. I selected this audience because these are the key contributors in a data team: software engineers build the infrastructure that allow user actions to be logged, data engineers extract, transform, and load (ETL) these actions into structured tables, and data scientists use this data to analyze, predict, or communicate results to various stakeholders. It’s important to understand their different needs so that organizations that seek to build a data-driven culture can invest in key contributors to solve their major obstacles. All code, visualizations, and supporting resources can be found in the reference section of this notebook. I also cite external studies, referenced below.

##Blog Post on Medium https://medium.com/p/d293fa5a26ff/edit#4c14

Acknowledgements:

Thank you to my friends and mentors for reading drafts of this post and for providing feedback. I am grateful for their support. I’d also like to thank Kaggle for preparing such detailed data sets from their annual data science and machine learning survey. It is such hard work to clean, prepare, and present data for a community of over 1 million data enthusiasts to analyze.

About

An exploratory analysis of Kaggle's 2018 Data Science Survey with over 20,000 respondents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published