# Table of Contents
1. Reading in the Survey Data
2. Cleaning the Schema Data
3. Analyzing patterns and correlations

## Reading in the Survey Data

In [38]:
import numpy as np
import pandas as pd
import seaborn as sns
import missingno as msno
import matplotlib.pyplot as plt

In [23]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
schema_df = pd.read_csv('./survey_2022/survey_results_schema.csv')
schema_df.head()

Unnamed: 0,qid,qname,question,force_resp,type,selector
0,QID16,S0,"<div><span style=""font-size:19px;""><strong>Hello world! </strong></span></div>\n\n<div> </div>\n\n<div>Thank you for taking the 2022 Stack Overflow Developer Survey, the longest running survey of software developers (and anyone else who codes!) on Earth. </div>\n\n<div> </div>\n\n<div>As in previous years, anonymized results of the survey will be made publicly available under the Open Database License, where anyone can download and analyze the data. On that note, throughout the survey, certain answers you and your peers give will be treated as personally identifiable information, and therefore kept out of the anonymized results file. We'll call out each of those in the survey with a note saying ""This information will be kept private."" </div>\n\n<div> </div>\n\n<div>There are seven sections in this survey. The 2nd, 3rd, and 4th sections will appear in a random order.</div><div><br></div>\n\n<div> 1. Basic Information</div>\n\n<div> 2. Education, Work, and Career</div>\n\n<div> 3. Technology and Tech Culture</div>\n\n<div> 4. Stack Overflow Usage + Community</div>\n\n<div> 5. Demographic Information </div>\n\n<div> 6. Professional Developer Series (Optional)</div><div> 7. Final Questions</div>\n\n<div> \n<div>Most questions in this survey are optional. Required questions are marked with *. This anonymous survey will take about 10 minutes to complete. We encourage you to complete it in one sitting.</div><div><br></div>\n</div>\n\n<div><strong>If you use security or ad-blocking plugins, you may see error messages</strong></div>\n\n<div>Our third-party software provider, Qualtrics, does not work well with certain ad blockers and security software. To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey. </div>\n\n<div> </div>\n\n<div>To begin, click <strong>Next.</strong></div>",False,DB,TB
1,QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
2,QID1,S1,"<span style=""font-size:22px; font-family: arial,helvetica,sans-serif; font-weight: 700;"">Basic Information</span><br>\n<br>\n<p><span style=""font-size:16px; font-family:arial,helvetica,sans-serif;"">The first section will focus on some basic information about who you are.<br>\n<br>\nMost questions in this section are required. Required questions are noted with *.</span></p>",False,DB,TB
3,QID2,MainBranch,"Which of the following options best describes you today? Here, by ""developer"" we mean ""someone who writes code."" <b>*</b>",True,MC,SAVR
4,QID296,Employment,Which of the following best describes your current employment status?,False,MC,MAVR


In [22]:
survey_df = pd.read_csv('./survey_2022/survey_results_public.csv')
survey_df.describe()

Unnamed: 0,ResponseId,CompTotal,VCHostingPersonal use,VCHostingProfessional use,WorkExp,ConvertedCompYearly
count,73268.0,38422.0,0.0,0.0,36769.0,38071.0
mean,36634.5,2.342434e+52,,,10.242378,170761.3
std,21150.794099,4.591478e+54,,,8.70685,781413.2
min,1.0,0.0,,,0.0,1.0
25%,18317.75,30000.0,,,4.0,35832.0
50%,36634.5,77500.0,,,8.0,67845.0
75%,54951.25,154000.0,,,15.0,120000.0
max,73268.0,9e+56,,,50.0,50000000.0


In [None]:
schema_df.set_index('qname', inplace=True)

In [26]:
schema_df.head()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"<div><span style=""font-size:19px;""><strong>Hello world! </strong></span></div>\n\n<div> </div>\n\n<div>Thank you for taking the 2022 Stack Overflow Developer Survey, the longest running survey of software developers (and anyone else who codes!) on Earth. </div>\n\n<div> </div>\n\n<div>As in previous years, anonymized results of the survey will be made publicly available under the Open Database License, where anyone can download and analyze the data. On that note, throughout the survey, certain answers you and your peers give will be treated as personally identifiable information, and therefore kept out of the anonymized results file. We'll call out each of those in the survey with a note saying ""This information will be kept private."" </div>\n\n<div> </div>\n\n<div>There are seven sections in this survey. The 2nd, 3rd, and 4th sections will appear in a random order.</div><div><br></div>\n\n<div> 1. Basic Information</div>\n\n<div> 2. Education, Work, and Career</div>\n\n<div> 3. Technology and Tech Culture</div>\n\n<div> 4. Stack Overflow Usage + Community</div>\n\n<div> 5. Demographic Information </div>\n\n<div> 6. Professional Developer Series (Optional)</div><div> 7. Final Questions</div>\n\n<div> \n<div>Most questions in this survey are optional. Required questions are marked with *. This anonymous survey will take about 10 minutes to complete. We encourage you to complete it in one sitting.</div><div><br></div>\n</div>\n\n<div><strong>If you use security or ad-blocking plugins, you may see error messages</strong></div>\n\n<div>Our third-party software provider, Qualtrics, does not work well with certain ad blockers and security software. To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey. </div>\n\n<div> </div>\n\n<div>To begin, click <strong>Next.</strong></div>",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,"<span style=""font-size:22px; font-family: arial,helvetica,sans-serif; font-weight: 700;"">Basic Information</span><br>\n<br>\n<p><span style=""font-size:16px; font-family:arial,helvetica,sans-serif;"">The first section will focus on some basic information about who you are.<br>\n<br>\nMost questions in this section are required. Required questions are noted with *.</span></p>",False,DB,TB
MainBranch,QID2,"Which of the following options best describes you today? Here, by ""developer"" we mean ""someone who writes code."" <b>*</b>",True,MC,SAVR
Employment,QID296,Which of the following best describes your current employment status?,False,MC,MAVR


In [29]:
schema_df[['question']]

Unnamed: 0_level_0,question
qname,Unnamed: 1_level_1
S0,"<div><span style=""font-size:19px;""><strong>Hello world! </strong></span></div>\n\n<div> </div>\n\n<div>Thank you for taking the 2022 Stack Overflow Developer Survey, the longest running survey of software developers (and anyone else who codes!) on Earth. </div>\n\n<div> </div>\n\n<div>As in previous years, anonymized results of the survey will be made publicly available under the Open Database License, where anyone can download and analyze the data. On that note, throughout the survey, certain answers you and your peers give will be treated as personally identifiable information, and therefore kept out of the anonymized results file. We'll call out each of those in the survey with a note saying ""This information will be kept private."" </div>\n\n<div> </div>\n\n<div>There are seven sections in this survey. The 2nd, 3rd, and 4th sections will appear in a random order.</div><div><br></div>\n\n<div> 1. Basic Information</div>\n\n<div> 2. Education, Work, and Career</div>\n\n<div> 3. Technology and Tech Culture</div>\n\n<div> 4. Stack Overflow Usage + Community</div>\n\n<div> 5. Demographic Information </div>\n\n<div> 6. Professional Developer Series (Optional)</div><div> 7. Final Questions</div>\n\n<div> \n<div>Most questions in this survey are optional. Required questions are marked with *. This anonymous survey will take about 10 minutes to complete. We encourage you to complete it in one sitting.</div><div><br></div>\n</div>\n\n<div><strong>If you use security or ad-blocking plugins, you may see error messages</strong></div>\n\n<div>Our third-party software provider, Qualtrics, does not work well with certain ad blockers and security software. To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey. </div>\n\n<div> </div>\n\n<div>To begin, click <strong>Next.</strong></div>"
MetaInfo,Browser Meta Info
S1,"<span style=""font-size:22px; font-family: arial,helvetica,sans-serif; font-weight: 700;"">Basic Information</span><br>\n<br>\n<p><span style=""font-size:16px; font-family:arial,helvetica,sans-serif;"">The first section will focus on some basic information about who you are.<br>\n<br>\nMost questions in this section are required. Required questions are noted with *.</span></p>"
MainBranch,"Which of the following options best describes you today? Here, by ""developer"" we mean ""someone who writes code."" <b>*</b>"
Employment,Which of the following best describes your current employment status?
...,...
Frequency_2,Interacting with people outside of your immediate team?
Frequency_3,Encountering knowledge silos (where one individual or team has information that's not shared or distributed with other individuals or teams) at work?
TrueFalse_1,Are you involved in supporting new hires during their onboarding?
TrueFalse_2,Do you use learning resources provided by your employer?


## Cleaning the Schema Data

In [30]:
import re
CLEANR = re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')

def cleanhtml(raw_html):
  cleantext = re.sub(CLEANR, '', raw_html)
  return cleantext

In [41]:
schema_df['question'] = schema_df['question'].apply(cleanhtml)
# Removes unnecessary newlines
schema_df.replace('\\n', '', regex=True, inplace=True)
schema_df.head()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"Hello world! Thank you for taking the 2022 Stack Overflow Developer Survey, the longest running survey of software developers (and anyone else who codes!) on Earth. As in previous years, anonymized results of the survey will be made publicly available under the Open Database License, where anyone can download and analyze the data. On that note, throughout the survey, certain answers you and your peers give will be treated as personally identifiable information, and therefore kept out of the anonymized results file. We'll call out each of those in the survey with a note saying ""This information will be kept private."" There are seven sections in this survey. The 2nd, 3rd, and 4th sections will appear in a random order. 1. Basic Information 2. Education, Work, and Career 3. Technology and Tech Culture 4. Stack Overflow Usage + Community 5. Demographic Information 6. Professional Developer Series (Optional) 7. Final Questions Most questions in this survey are optional. Required questions are marked with *. This anonymous survey will take about 10 minutes to complete. We encourage you to complete it in one sitting.If you use security or ad-blocking plugins, you may see error messagesOur third-party software provider, Qualtrics, does not work well with certain ad blockers and security software. To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey. To begin, click Next.",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,Basic InformationThe first section will focus on some basic information about who you are.Most questions in this section are required. Required questions are noted with *.,False,DB,TB
MainBranch,QID2,"Which of the following options best describes you today? Here, by ""developer"" we mean ""someone who writes code."" *",True,MC,SAVR
Employment,QID296,Which of the following best describes your current employment status?,False,MC,MAVR


In [43]:
schema_df.head()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"Hello world! Thank you for taking the 2022 Stack Overflow Developer Survey, the longest running survey of software developers (and anyone else who codes!) on Earth. As in previous years, anonymized results of the survey will be made publicly available under the Open Database License, where anyone can download and analyze the data. On that note, throughout the survey, certain answers you and your peers give will be treated as personally identifiable information, and therefore kept out of the anonymized results file. We'll call out each of those in the survey with a note saying ""This information will be kept private."" There are seven sections in this survey. The 2nd, 3rd, and 4th sections will appear in a random order. 1. Basic Information 2. Education, Work, and Career 3. Technology and Tech Culture 4. Stack Overflow Usage + Community 5. Demographic Information 6. Professional Developer Series (Optional) 7. Final Questions Most questions in this survey are optional. Required questions are marked with *. This anonymous survey will take about 10 minutes to complete. We encourage you to complete it in one sitting.If you use security or ad-blocking plugins, you may see error messagesOur third-party software provider, Qualtrics, does not work well with certain ad blockers and security software. To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey. To begin, click Next.",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,Basic InformationThe first section will focus on some basic information about who you are.Most questions in this section are required. Required questions are noted with *.,False,DB,TB
MainBranch,QID2,"Which of the following options best describes you today? Here, by ""developer"" we mean ""someone who writes code."" *",True,MC,SAVR
Employment,QID296,Which of the following best describes your current employment status?,False,MC,MAVR
RemoteWork,QID308,Which best describes your current work situation?,False,MC,SAVR
CodingActivities,QID297,Which of the following best describes the code you write outside of work? Select all that apply.,False,MC,MAVR
S2,QID190,"Education, work, and career This section will focus on your education, work, and career.Most questions in this section are optional. Required questions are noted with *.",False,DB,TB
EdLevel,QID25,Which of the following best describes the highest level of formal education that you’ve completed? *,False,MC,SAVR
LearnCode,QID276,How did you learn to code? Select all that apply.,False,MC,MAVR
