# “How to recruit Data Scientists?” - Recruitment Analysis 

## **Description:**

You are working for a major tech company in Berlin as a Junior Data Analyst. The tech company wants to grow and hire for their new Data Science team. The HR department of the company asks you to create an analysis on how to recruit for the new data science team. 

You take the Stackoverflow Survey to analyze the data science market. 

The company wants to recruit people who have Python and SQL skills. 

**Please answer the following questions:**
- How many Data Analyst / Scientists are in the survey? 
- What profile do these Data Analyst / Scientist have? (Background, education, work experience)
- What is the salary range for such Data Analysts / Scientists?
- Please share any other insight you find that would be relevant for deciding how to recruit Data Analysts/Scientist.

## Resources:

* [About the Survey ](https://insights.stackoverflow.com/survey)



In [None]:
import requests, zipfile, io
import pandas as pd

In [None]:
import matplotlib.pyplot as plt
from pandas_profiling import ProfileReport
%matplotlib inline
pd.set_option('display.max_colwidth', None)

In [None]:
# Link to Survey Data
zip_file_url = "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2022.zip"

In [None]:
# Download Survey Data into Google Colab
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/content")

In [None]:
# Load Survey Data
survey_results = pd.read_csv("/content/survey_results_public.csv")

In [None]:
survey_results.head(5)

In [None]:
survey_results.columns




Stack Overflow Usage + Community
Demographic Information
Professional Developer Series (Optional) 7. Final Questions **

1. 'ResponseId', : Identity of the saved survey response

Basic Information 

2. 'MainBranch', : Whether developer,hobby developer, student, or not; one option: Reguired

3. 'Employment', : Which of the following best describes your current employment status?

4. 'RemoteWork', : Which best describes your current work situation?

5. 'CodingActivities', : Which of the following best describes the code you write outside of work? Select all that apply.



Education, Work, and Career

6. 'EdLevel', : Which of the following best describes the highest level of formal education that you’ve completed? *

7. 'LearnCode', : How did you learn to code? Select all that apply.

8. 'LearnCodeOnline', : What online resources do you use to learn to code? Select all that apply.

9. 'LearnCodeCoursesCert', : What online courses or certifications do you use to learn to code? Select all that apply.

10. 'YearsCode', : Including any education, how many years have you been coding in total?

11. 'YearsCodePro', : NOT including education, how many years have you coded professionally (as a part of your work)?

12. **'DevType',** : Which of the following describes your current job? Please select all that apply.

13. 'OrgSize', : Approximately how many people are employed by the company or organization you currently work for?

14. 'PurchaseInfluence', : What level of influence do you, personally, have over new technology purchases at your organization?

15. 'BuyNewTool', : When buying a new tool or software, how do you discover and research available solutions? Select all that apply.

16. **'Country', ** : Where do you live? *

17. 'Currency', : Which currency do you use day-to-day? If your answer is complicated, please pick the one you're most comfortable estimating in. *

18. 'CompTotal', : What is your current total compensation (salary, bonuses, and perks, before taxes and deductions)?

19. 'CompFreq', : Is that compensation weekly, monthly, or yearly?



Technology and Tech Culture


20. **'LanguageHaveWorkedWith',** : Which programming, scripting, and markup languages have you done extensive development work in over the past year

21. **'LanguageWantToWorkWith', ** : Which programming, scripting, and markup languages do you want to work in over the next year? 

22. **'DatabaseHaveWorkedWith',** : Which database environments have you done extensive development work in over the past year

23. **'DatabaseWantToWorkWith', ** : Which database environments do you want to work in over the next year? 


24. 'PlatformHaveWorkedWith', : Which cloud platforms now

25. 'PlatformWantToWorkWith',  : Which cloud platforms next

26. 'WebframeHaveWorkedWith', : Which web frameworks and web technologies

27. 'WebframeWantToWorkWith', : Which web frameworks and web technologies

28. **'MiscTechHaveWorkedWith', : Which other frameworks and libraries**

29. **'MiscTechWantToWorkWith', : Which other frameworks and libraries**

30. 'ToolsTechHaveWorkedWith', : Which developer tools

31. 'ToolsTechWantToWorkWith', : Which developer tools

32. 'NEWCollabToolsHaveWorkedWith', : Which development environments 

33. 'NEWCollabToolsWantToWorkWith', : Which development environments 



34. 'OpSysProfessional use', : What is the primary operating system Professional use

35. 'OpSysPersonal use', : What is the primary operating system Personal User

36. 'VersionControlSystem', : What are the primary version control systems you use? Select all that apply.

37. 'VCInteraction',: How do you interact with your version control system? Select all that apply.

38. 'VCHostingPersonal use', : 
What version control hosting service are you using? Personal

39. 'VCHostingProfessional use', : 
What version control hosting service are you using? Professional



40. 'OfficeStackAsyncHaveWorkedWith', : Which collaborative work management tools now

41. 'OfficeStackAsyncWantToWorkWith', : Which collaborative work management tools next

42. 'OfficeStackSyncHaveWorkedWith', : Which communication tools did you use now

43. 'OfficeStackSyncWantToWorkWith', : Which communication tools did you use next


44. 'Blockchain', : How favorable are you about blockchain, crypto, and decentralization?



Stack Community

45. 'NEWSOSites', : 
Which of the following Stack Overflow sites have you visited? Select all that apply.

46. 'SOVisitFreq', : How frequently would you say you visit Stack Overflow?

47. 'SOAccount', : Do you have a Stack Overflow account?

48. 'SOPartFreq', : How frequently would you say you participate in Q&A on Stack Overflow? By participate we mean ask, answer, vote for, or comment on questions.

49. 'SOComm', : Do you consider yourself a member of the Stack Overflow community?



Demographics

50. 'Age', : Age range

51. 'Gender', : Which of the following describe you, if any? Please check all that apply.

52. 'Trans', : Do you identify as transgender?

53. 'Sexuality', : Which of the following describe you, if any? Please check all that apply.

54. 'Ethnicity', : Which of the following describe you, if any? Please check all that apply.

55. 'Accessibility', : Which of the following describe you, if any? Please check all that apply.

56. 'MentalHealth', : Which of the following describe you, if any? Please check all that apply.



TeamsBranch
Professional Developer Series : This is an optional section

57. 'TBranch', : Would you like to participate in the Professional Developer Series?

TeamsQuestions

58. 'ICorPM', : Are you an independent contributor or people manager?

59. **'WorkExp',**: How many years of working experience do you have?

60. 'Knowledge_1',  : I have interactions with people outside of my immediate team.

61. 'Knowledge_2', : Knowledge silos prevent me from getting ideas across the organization (i.e., one individual or team has information that isn't shared with others)

62. 'Knowledge_3', : I can find up-to-date information within my organization to help me do my job.

63. 'Knowledge_4', : I am able to quickly find answers to my questions with existing tools and resources.

64. 'Knowledge_5', : I know which system or resource to use to find information and answers to questions I have.

65. 'Knowledge_6', : I often find myself answering questions that I’ve already answered before.

66. 'Knowledge_7', : Waiting on answers to questions often causes interruptions and disrupts my workflow.



67. 'Frequency_1', : Needing help from people outside of your immediate team?

68. 'Frequency_2', : Interacting with people outside of your immediate team?

69. 'Frequency_3', : Encountering knowledge silos (where one individual or team has information that's not shared or distributed with other individuals or teams) at work?


70. 'TimeSearching', : On an average day, how much time do you typically spend searching for answers or solutions to problems you encounter at work? (This includes time spent searching on your own, asking a colleague, and waiting for a response).

71. 'TimeAnswering', : On an average day, how much time do you typically spend answering questions you get asked at work?

72. 'Onboarding', : The time it takes to onboard new hires at my company is:

73. 'ProfessionalTech', : My company has:

Missing Q: Does your team use Stack Overflow for Teams?

74. 'TrueFalse_1', : Are you involved in supporting new hires during their onboarding?

75. 'TrueFalse_2', : Do you use learning resources provided by your employer?

76. 'TrueFalse_3', : Does your employer give you time to learn new skills?


FinalThoughtsSurveyReview

77. 'SurveyLength', : How do you feel about the length of the survey this year?

78. 'SurveyEase', : How easy or difficult was this survey to complete?

79. 'ConvertedCompYearly': Unknown column

In [None]:
survey_results.DevType.unique()

In [None]:
survey_results.Knowledge_1.unique()

In [None]:
survey_results.Frequency_2.unique()

In [None]:
df = survey_results.copy()

In [None]:
#unique_vals = {col:df[col].unique() for col in df}
#unique_vals

In [None]:
# ResponseId

df.ResponseId.describe()

In [None]:
df.MainBranch.unique()

In [None]:
emplotment_array = df['Employment'].str.split(r";", expand = True,) #r";|,""
emplotment_array
emplotment_array[0].unique()
emplotment_array[1].unique()
emplotment_array[3].unique()

In [None]:
df.Employment.isna().sum()

In [None]:
df.RemoteWork.unique()

In [None]:
df.CodingActivities.unique()
_CodingActivities = df['CodingActivities'].str.split(r";", expand = True,) #r";|,""
_CodingActivities
_CodingActivities[0].unique()
#_CodingActivities[1].unique()
#_CodingActivities[3].unique()
#_CodingActivities[4].unique()
#_CodingActivities[5].unique()


In [None]:
df.EdLevel.unique()