For my analysis, I used Python and the following library versions:
- pandas version: 0.25.1
- matplotlib version: 3.0.3
- seaborn version 0.9.0
I'm currently trying to break into the field of data science, and would like to make sure that I'm practicing the skills most frequently used by data scientists today.
To ensure that I am practing the right skills and pursuing the correct credentials, I will investigate the answers given by data scientists to Kaggle's 2020 Machine Learning and Data Science survey to the following questions:
- What is the highest level of formal education that you have attained or plan to attain within the next 2 years?
- What programming languages do you use on a regular basis?
- On which platforms have you begun or completed data science courses?
-
credentials_languages_learning.ipynb -- My analysis of the survey. Here I investigate my 3 questions of interest.
-
kaggle_survey_2020_responses.csv -- The responses to the survey. To get access to this file, go to https://www.kaggle.com/c/kaggle-survey-2020, and accept the rules of the competition.
I wrote a blog post titled What skills do you need to break into Data Science in 2021, that analyzes my findings in greater detail. Check it out!
- Having at least a bachelor's degree can really help you get into the field.
- The average number of programming languages used on a regular basis per data scientist is 2.6, with the top three being Python (92%), SQL (55%), and R (36%).
- Data Scientists are learning machines! The average data scientist has used at least 2 learning platforms to learn data science.
Special thanks to Kaggle.com for conducting this survey and publishing the results.