## Overview

### Context

We at HackerRank (https://www.hackerrank.com) are passionate about ensuring that developers and companies can find each other and that the best matches are made.  Our platforms, for the community and recruiting, are built to create the best experience for all involved.

We have over the years built a very strong global community of developers. In order to provide more transparency for ourselves and the world on the state of developers, we conducted a survey of our developers late in 2016. We got an astounding 25K responses! The survey asked developers many questions around their skills, educational background, current role, and more. We provided a high-level report of our findings from this survey earlier this year (see acknowledgements below).

We have since focused more on understanding trends about women pursuing careers as developers. On March 1 we released our high-level report on our findings. This report is based on survey responses from professional developers (14K developers, which includes hiring managers), and it is available here: [Women In Tech 2018][1]

The data set we are releasing here is the full dataset of 25K responses from our developer survey, which includes both students and professionals. The  [Women In Tech 2018][1] report uses only the 14K responses from professionals.

### Content

The data consists of five files:

 1. `HackerRank-Developer-Survey-2018-Codebook.csv`: a CSV file with survey schema. This schema includes the questions that correspond to each column name in `HackerRank-Developer-Survey-2018-Numeric.csv` and `HackerRank-Developer-Survey-2018-Values.csv`.   It also provides extra notes on questions if they were conditionally shown, or what the correct answer was to a coding question.
 2. `HackerRank-Developer-Survey-2018-Numeric-Mapping.csv`: This file provides the mapping from the numeric values in `HackerRank-Developer-Survey-2018-Numeric.csv` and what their textual representation in the survey was.  Each row represents one of the possible answers to a specific question, with a mapping of the numeric answer in the data file to the textual label in the survey.
 3. `Country-Code-Mapping.csv`: a CSV file that provides the mapping of the numeric country code in our raw data in `HackerRank-Developer-Survey-2018-Numeric.csv` to the associated country.
 4. `HackerRank-Developer-Survey-2018-Numeric.csv`: a CSV file with the raw survey responses. Each row is one respondent, including an anonymous respondent id, the timestamp of when the survey was started and ended, and the numeric responses to each question. This is the data file that we used for our analysis.
 5. `HackerRank-Developer-Survey-2018-Values.csv`: a CSV file with the text version of the survey responses. Each row is one respondent, including an anonymous respondent id, the timestamp of when the survey was started and ended, and the textual response to each question. This file was derived from `HackerRank-Developer-Survey-2018-Numeric.csv` using the mapping files that are included in this data set. We provide it for ease of use for those who prefer to work directly with the text values.


### Methodology

 - A total of 25,090 professional and student developers completed our 10-minute online survey. 
 - The survey was live from October 16 through November 1, 2017.
 - The survey was hosted by SurveyMonkey and we recruited respondents via email from our community of over 3.4 million members and through social media sites.
 - We removed responses that were incomplete as well as obvious spam submissions.
 - Not every question was shown to every respondent, as some questions were specifically for those involved in hiring. The codebook (`HackerRank-Developer-Survey-2018-Codebook.csv`) highlights under what conditions some questions were shown.
 - The [Women In Tech 2018][1] report is based only on the 14K responses from professionals
    - Respondents who identified as students (`q8Student=1`; N=10351) were excluded from this report.
    - Respondents who identify as “non-binary” (`q3Gender=3`; N=76) were excluded from the male-female comparisons.

### Acknowledgements

The data set we are releasing is based on the [Developer Skills][2] survey and report we released earlier this year. We did not release the data set then, so here it is!

  [1]: https://research.hackerrank.com/women-in-tech/2018/
  [2]: https://research.hackerrank.com/developer-skills/2018/
  
#### I am going to focus mainly on technical aspects of data visualisation - what to do to see the plots we want to see, less on interpreting the results - once you know how, you can do it yourselves. Treat this like a tutorial/introduction

### So, hello world! I hope you enjoy my first kernel.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas.tools.plotting
import seaborn as sns
import matplotlib
import squarify
%matplotlib inline

plt.style.use('seaborn')

After importing the libraries we need to:   
1. Read the data
2. Get an overview

In [None]:
codebook = pd.read_csv('../input/HackerRank-Developer-Survey-2018-Codebook.csv')
numeric_mapping = pd.read_csv('../input/HackerRank-Developer-Survey-2018-Numeric-Mapping.csv')
numeric = pd.read_csv('../input/HackerRank-Developer-Survey-2018-Numeric.csv', na_values=['#NULL!', 'nan'])
values = pd.read_csv('../input/HackerRank-Developer-Survey-2018-Values.csv', na_values=['#NULL!', 'nan'])

codebook.head()

In [None]:
numeric_mapping.head()

In [None]:
numeric.head()

In [None]:
values.head()

OK, quite untidy. We need to do something about it.   
First of all, we will change column names and indices to something friendlier. Then we will get rid of Nan's

In [None]:
codebook.columns = ['fieldname', 'question', 'notes']
codebook.set_index('fieldname', inplace=True);
numeric_mapping.set_index('Data Field', inplace=True)

numeric.q1AgeBeginCoding = numeric.q1AgeBeginCoding.astype(float)
numeric.q2Age = numeric.q2Age.astype(float)
numeric = numeric.fillna(-1)

values = values.fillna('Not provided')

We also need to see what data do we even have available

In [None]:
print(values.columns.ravel())

Now that we prepared our data we can start with visualisations. We will make a heatmaping function for an easy start of a few interesting plots

In [None]:
def draw_heatmap(column1, column2, title=None, annot=True, ax=None, size=(10, 10), data=values):
    cross = pd.crosstab(data[column1], data[column2])
    
    if ax is None:
        f, ax = plt.subplots(figsize=size)
        
    sns.heatmap(cross, cmap='Reds', annot=annot, ax=ax)
    ax.set_ylabel(codebook.loc[column1]['question'])
    ax.set_xlabel(codebook.loc[column2]['question'])
    
    if title is not None:
        ax.set_title(title)

Let's see the most basic info about surveyed - their age by gender

In [None]:
# We need to shift NaN to 0, because data starts from value 1
numeric.loc[numeric['q1AgeBeginCoding'] == -1, 'q1AgeBeginCoding'] = 0
numeric.loc[numeric['q2Age'] == -1, 'q2Age'] = 0

# And to trim text so that it fits plots
numeric_mapping.loc['q2Age'] = [[i+1, j] for i, j in zip(range(9), ['Under 12', '12 - 18', '18 - 24', '25 - 34', '35 - 44', '45 - 54', '55 - 64', '65 - 74', '75+'])]
numeric_mapping.loc['q1AgeBeginCoding'] = numeric_mapping.loc['q1AgeBeginCoding'].applymap(lambda x: str(x).replace('years old', ''))

In [None]:
# I will be frequently using semicolons to suppress matplotlib output
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20,15))

sns.set(font_scale=1)
count = sns.countplot(x='q3Gender', data=numeric, ax=ax[0][0])
count.set_xticklabels(np.append(['Not provided'], numeric_mapping.loc['q3Gender'].values[:, 1]));
count.set_xlabel('Gender')

bar = sns.barplot(x='q2Age', y='q1AgeBeginCoding', hue='q3Gender', data=numeric, ax=ax[0][1])
ax[0][1].yaxis.set_ticks([i for i in range(len(numeric_mapping.loc['q1AgeBeginCoding'].values[:, 1]))])
bar.set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q1AgeBeginCoding'].values[:, 1]))
bar.set_xticklabels(np.append(['Not provided'], numeric_mapping.loc['q2Age'].values[:, 1]));
bar.set_xlabel(codebook.loc['q2Age']['question'])
bar.set_ylabel(codebook.loc['q1AgeBeginCoding']['question'])

bar.legend(loc=2)
for i, j in zip(bar.get_legend().texts, np.append(['Not provided'], numeric_mapping.loc['q3Gender'].values[:, 1])):
    i.set_text(j)
bar.get_legend().set_title('')

# fig.tight_layout()
sns.set()
draw_heatmap('q1AgeBeginCoding', 'q2Age', ax=ax[1][0], annot=False)
draw_heatmap('q2Age', 'q3Gender', ax=ax[1][1], annot=True, size=(5, 5))
fig.tight_layout()
plt.savefig('basic_info.jpg')

We are ready to make our first observations!   
The most obvious one is how much more males than females and non-binary are taking part in HackerRank. Also, it appears there were some funny guys who claim to have been coding longer then they are alive.   
Next ones however are a little bit trickier: notice the heatmap. The only categories that we have represenative(-ish) data for (and therefore the only ones we can be using) are 12-44 year olds that learned coding between age of 11 and 30. Having that in mind, let's take a closer look at the age barplot

In [None]:
trimmed_numeric = numeric.loc[(numeric['q2Age']<6) & (numeric['q2Age']>1)]

sns.set()
f, ax = plt.subplots(nrows=1, ncols=2, figsize=(16, 5));

sns.factorplot(x='q2Age', y='q1AgeBeginCoding', data=trimmed_numeric, ax=ax[0], color='y')
sns.barplot(x='q2Age', y='q1AgeBeginCoding', hue='q3Gender', data=trimmed_numeric, ax=ax[0], alpha=1);
ax[0].set_yticks(range(6))
ax[0].set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q1AgeBeginCoding'].values[:4, 1]))
ax[0].set_ylabel(codebook.loc['q1AgeBeginCoding']['question'])
ax[0].set_xlabel(codebook.loc['q2Age']['question'])
ax[0].set_xticklabels(numeric_mapping.loc['q2Age'].values[1:, 1])
ax[0].get_legend().set_title('')
# This could have been done better, but I was unable to figure this out
ax[0].set_title('Average age for all genders (bars) and collective (line)')

sns.countplot(trimmed_numeric['q2Age'], ax=ax[1])
ax[1].set_title('Age distribution')
ax[1].set_xlabel('')
ax[1].set_xlabel(codebook.loc['q2Age']['question'])
ax[1].set_xticklabels(numeric_mapping.loc['q2Age'].values[1:, 1])

for i, j in zip(ax[0].get_legend().texts, np.append(['Not provided'], numeric_mapping.loc['q3Gender'].values[:, 1])):
    i.set_text(j)

# A small hack to get around seaborn generating unneccessary plots
plt.clf();

f = plt.figure(figsize=(8,5));
factAx = plt.gca();
sns.factorplot(x='q2Age', y='q1AgeBeginCoding', hue='q3Gender', data=trimmed_numeric, legend=False, kind='violin', ax=factAx);
factAx.yaxis.set_ticks([i for i in range(len(numeric_mapping.loc['q1AgeBeginCoding'].values[:, 1]))])
factAx.set_xlabel(codebook.loc['q2Age']['question'])
factAx.set_xticklabels(numeric_mapping.loc['q2Age'].values[1:, 1])
factAx.set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q1AgeBeginCoding'].values[:, 1]))
factAx.set_ylabel(codebook.loc['q1AgeBeginCoding']['question'])
# ax.get_legend().remove()

for i, j in zip(factAx.get_legend().texts,np.append(['Not provided'], numeric_mapping.loc['q3Gender'].values[:, 1])):
    i.set_text(j)
factAx.get_legend().set_title('')

f.tight_layout();
plt.clf();
plt.savefig('age.jpg')

OK, we are ready to say something more. Unfortunately, age intervals overlap each other - even though we have very clear tendencies, we are unable to say much. Vast majority of 18-24 year olds (and all users altogether) started coding between 16-20. What does it mean? That this group holds both people who have started coding last week and those who have been doing it for 8 years. Same goes for other groups, so we will have to discard those results.   
However, a fact worth noticing is that women of HackerRank tend to start coding a little bit later than their male colleagues, but this difference very little in our most popular (18-24) group. 

Let's move on to something more interesting - experience of users

In [None]:
# Information about experience is stored in 5 columns (kind of like one hot encoding), so we need to reverse it
# We are going to follow three-table pattern in order to be able to use seaborn (it needs numeric data) 
columns = ['q6LearnCodeDontKnowHowToYet', 'q6LearnCodeOther',
           'q6LearnCodeAccelTrain', 'q6LearnCodeSelfTaught', 'q6LearnCodeUni']

res = np.where(values[columns[0]]!='Not provided', 0, -1)
res_val = np.where(values[columns[0]]!='Not provided', "Didn't", 'Not provided')
for i, j in enumerate(columns[1:]):
    res[values[j]!='Not provided'] = i
    res_val[values[j]!='Not provided'] = j.split('LearnCode')[-1]
    
numeric['q6LearnCode'] = res
values['q6LearnCode'] = res_val

numeric_mapping = numeric_mapping.append(pd.DataFrame(
    {'Data Field': 'q6LearnCode',
     'Value': [i-1 for i in range(6)], 
     'Label': ['Not provided', "Didn't", 'Other way', 'Self taught', 'Accel Train', 'University']}
    ).set_index('Data Field')) 

codebook.loc['q6LearnCode'] = 'How did you learn how to code?'

In [None]:
# Clearing fonts settings
sns.set()
plt.figure(figsize=(16,5))
ax = plt.subplot(121)
draw_heatmap('q2Age', 'q6LearnCode', title='Learning means by age', annot=False, ax=ax)
ax = plt.subplot(122)
draw_heatmap('q1AgeBeginCoding', 'q6LearnCode', title='Learning means by age of starting coding', annot=True, ax=ax)
plt.tight_layout()
plt.savefig('learning_means.jpg')

The results are not surprising at all - regardless of age, the most popular means of learning how to code is firstly university, secondly self teaching. Let's move on to language popularity

In [None]:
# First we will have to do a little preprocessing of columns

columns = ['q25LangC', 'q25LangCPlusPlus', 'q25LangJava', 'q25LangPython',
         'q25LangRuby', 'q25LangJavascript', 'q25LangCSharp', 'q25LangGo', 'q25Scala',
         'q25LangPerl', 'q25LangSwift', 'q25LangPascal', 'q25LangClojure', 'q25LangPHP',
         'q25LangHaskell', 'q25LangLua', 'q25LangR', 'q25LangRust', 'q25LangTypescript',
         'q25LangKotlin', 'q25LangJulia', 'q25LangErlang', 'q25LangOcaml']

res = np.where(values[columns[0]]!='Not provided', 0, 1)

for i in columns:
    numeric[i+'WillLearn'] = np.where(values[i]=='Will Learn', 1, 0)
    numeric[i+'Know'] = np.where(values[i]=='Know', 1, 0)

In [None]:
plt.figure(figsize=(8, 5))
  
for i, j in enumerate(columns):
    plt.barh(i, np.sum(numeric[j+'Know']) + np.sum(numeric[j+'WillLearn']), color='orange')

for i, j in enumerate(columns):
    plt.barh(i, np.sum(numeric[j+'Know']), color='#005aff')

plt.gca().set_yticks(range(len(columns)));
plt.gca().set_yticklabels([j.split('Lang')[-1] for j in columns]);
plt.title('Languages popularity on HackerRank');

custom_lines = [matplotlib.patches.Patch(color='#005aff', lw=1),
                matplotlib.patches.Patch(color='orange', lw=1)]
    
plt.legend(custom_lines, ['Know language', 'Want to learn language']);
plt.gca().get_legend().set_title('Number of developers that')

plt.tight_layout()

plt.savefig('language_popularity.jpg')

Looks like python is the most interesting language for HackerRank developers, closely followed by C. The most niche ones are Julia and Clojure

In [None]:
plt.figure(figsize=(16, 10))

res_will_learn = np.array(np.sum(numeric[numeric['q2Age']==0][[j+'WillLearn' for j in columns]]).values)/len(numeric[numeric['q2Age']==0])
res_know = np.array(np.sum(numeric[numeric['q2Age']==0][[j+'Know' for j in columns]]).values)/len(numeric[numeric['q2Age']==0])
res_everything = np.array((np.sum(numeric[numeric['q2Age']==0][[j+'Know' for j in columns]]).values \
               + np.sum(numeric[numeric['q2Age']==0][[j+'WillLearn' for j in columns]]).values) \
               / len(numeric[numeric['q2Age']==0]))
for i in list(set(numeric['q2Age']))[1:]:
    res_will_learn = np.vstack((res_will_learn, 
                                np.sum(numeric[numeric['q2Age']==i][[j+'WillLearn' for j in columns]]).values \
                                /len(numeric[numeric['q2Age']==i])))
    res_know = np.vstack((res_know, 
                          np.sum(numeric[numeric['q2Age']==i][[j+'Know' for j in columns]]).values \
                          /len(numeric[numeric['q2Age']==i])))
    res_everything = np.vstack((res_everything, 
                          (np.sum(numeric[numeric['q2Age']==i][[j+'Know' for j in columns]]).values \
                        + np.sum(numeric[numeric['q2Age']==i][[j+'WillLearn' for j in columns]]).values) \
                        / len(numeric[numeric['q2Age']==i])))
    
ax1 = plt.subplot(221)
sns.heatmap(res_will_learn, ax=ax1);
ax1.set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q2Age'].values[:, 0]), rotation='horizontal')
ax1.set_xticklabels([j.split('Lang')[-1] for j in columns], rotation='vertical');
ax1.set_title('Percentage of developers that want to learn languages in each age category ');

ax2 = plt.subplot(222)
sns.heatmap(res_know, ax=ax2);
ax2.set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q2Age'].values[:, 0]), rotation='horizontal')
ax2.set_xticklabels([j.split('Lang')[-1] for j in columns], rotation='vertical');
ax2.set_title('Percentage of known languages in each age category ');

ax3 = plt.subplot(223)
sns.heatmap(res_everything, ax=ax3);
ax3.set_yticklabels(np.append(['Not provided'], numeric_mapping.loc['q2Age'].values[:, 0]), rotation='horizontal')
ax3.set_xticklabels([j.split('Lang')[-1] for j in columns], rotation='vertical');
ax3.set_title('Percentage of developers that know or want to learn language')
plt.tight_layout()
plt.savefig('languages_heatmap.jpg')

Looks like only developers aged 35+ know Perl and Pascal, and hardly anyone older than 12 y.o. can write in Typescript. However, this is a language that has the biggest interest rate in 75+ category. The results are quite uniform beside that, so I will leave it here.

In [None]:
columns = [i for i in values.columns.ravel() if 'q28' in i]
langs_known = [i for i in values.columns.ravel() if 'q25' in i]
columns = columns[:-1]
langs_known = langs_known[:-1]

plt.figure(figsize=(16,5))

plt.subplot(121)
love_height = []
hate_height = []

for i, j in enumerate(zip(columns, langs_known)):
    love = len(numeric[(numeric[j[1]]>=1) & (values[j[0]]=='Love')])/(len(numeric[numeric[j[1]]>=1]))
    plt.bar(i, love, color='#4c72b0')
    plt.text(i, love-0.05, '%i' % int(love*100), horizontalalignment='center', size=10, color='white')
    
    hate = len(numeric[(numeric[j[1]]>=1) & (values[j[0]]=='Hate')])/(len(numeric[numeric[j[1]]>=1]))
    plt.bar(i, -hate, color='#55a868')
    plt.text(i, -hate+0.01, '%i' % int(hate*100), horizontalalignment='center', size=10, color='white')
    
    love_height.append(love)
    hate_height.append(hate)
    
    
custom_lines = [matplotlib.patches.Patch(color='#4c72b0', lw=1),
                matplotlib.patches.Patch(color='#55a868', lw=1),
                matplotlib.lines.Line2D([0], [0], color='orange')]
    
plt.legend(custom_lines, ['Love', 'Hate', 'Overall reputation'])
plt.plot([(i-j)/2 for i, j in zip(love_height, hate_height)], color='orange')

plt.gca().set_xticks(range(len(columns)))
plt.gca().set_xticklabels([j.split('Love')[-1] for j in columns], rotation='vertical');
plt.gca().set_title('Reputation of languages that developers know or will learn');
plt.gca().set_yticklabels(['%i%%' % abs(i*100) for i in plt.yticks()[0]]);
plt.ylabel('Percentage of users');

plt.subplot(122)
love_height = []
hate_height = []

for i, j in enumerate(zip(columns, langs_known)):
    love = len(numeric[(numeric[j[1]]==0) & (values[j[0]]=='Love')])/(len(numeric[numeric[j[1]]>=1]))
    plt.bar(i, love, color='#4c72b0')
    plt.text(i, love-0.02, '%i' % int(love*100), horizontalalignment='center', size=10, color='white')
    
    hate = len(numeric[(numeric[j[1]]==0) & (values[j[0]]=='Hate')])/(len(numeric[numeric[j[1]]>=1]))
    plt.bar(i, -hate, color='#55a868')
    plt.text(i, -hate+0.01, '%i' % int(hate*100), horizontalalignment='center', size=10, color='white')
    
    love_height.append(love)
    hate_height.append(hate)
    
    
plt.plot([(i-j)/2 for i, j in zip(love_height, hate_height)], color='orange')

plt.gca().set_xticks(range(len(columns)))
plt.gca().set_xticklabels([j.split('Love')[-1] for j in columns], rotation='vertical');
plt.gca().set_title('Reputation of languages that developers did not nor will not learn');
plt.gca().set_yticklabels(['%i%%' % abs(i*100) for i in plt.yticks()[0]]);
plt.ylabel('Percentage of users');
plt.savefig('languages_reputation.jpg')

Apparently quite a few developer regret their language choice, but Python users are definitely not part of them - they both love their language the most and hate it the least, putting second-placed C++ way behind. The most hated one is OCaml.  
As for the second plot, bad reviews are quite understandable - if you are not willing to learn a language there is a high chance you don't like it

In [None]:
plt.figure(figsize=(16,5))

plt.subplot(121)
love_height = []
hate_height = []

for i, j in enumerate(zip(columns, langs_known)):
    love = len(numeric[(values[j[1]]=='Know') & (values[j[0]]=='Love')])/(len(numeric[values[j[1]]=='Know']))
    plt.bar(i, love, color='#4c72b0')
    plt.text(i, love-0.05, '%i' % int(love*100), horizontalalignment='center', size=10, color='white')
    
    hate = len(numeric[(values[j[1]]=='Know') & (values[j[0]]=='Hate')])/(len(numeric[values[j[1]]=='Know']))
    plt.bar(i, -hate, color='#55a868')
    plt.text(i, -hate+0.01, '%i' % int(hate*100), horizontalalignment='center', size=10, color='white')
    
    love_height.append(love)
    hate_height.append(hate)
    
    
custom_lines = [matplotlib.patches.Patch(color='#4c72b0', lw=1),
                matplotlib.patches.Patch(color='#55a868', lw=1),
                matplotlib.lines.Line2D([0], [0], color='orange')]
    
plt.legend(custom_lines, ['Love', 'Hate', 'Overall reputation'])
plt.plot([(i-j)/2 for i, j in zip(love_height, hate_height)], color='orange')

plt.gca().set_xticks(range(len(columns)))
plt.gca().set_xticklabels([j.split('Love')[-1] for j in columns], rotation='vertical');
plt.gca().set_title('Reputation of languages that developers already know');
plt.gca().set_yticklabels(['%i%%' % abs(i*100) for i in plt.yticks()[0]]);
plt.ylabel('Percentage of users');

plt.subplot(122)
love_height = []
hate_height = []

for i, j in enumerate(zip(columns, langs_known)):
    love = len(numeric[(values[j[1]]=='Will Learn') & (values[j[0]]=='Love')])/(len(numeric[values[j[1]]=='Will Learn']))
    plt.bar(i, love, color='#4c72b0')
    plt.text(i, love-0.05, '%i' % int(love*100), horizontalalignment='center', size=10, color='white')
    
    hate = len(numeric[(values[j[1]]=='Will Learn') & (values[j[0]]=='Hate')])/(len(numeric[values[j[1]]=='Will Learn']))
    plt.bar(i, -hate, color='#55a868')
    plt.text(i, -hate+0.01, '%i' % int(hate*100), horizontalalignment='center', size=10, color='white')
    
    love_height.append(love)
    hate_height.append(hate)
    
plt.plot([(i-j)/2 for i, j in zip(love_height, hate_height)], color='orange')

plt.gca().set_xticks(range(len(columns)))
plt.gca().set_xticklabels([j.split('Love')[-1] for j in columns], rotation='vertical');
plt.gca().set_title('Reputation of languages that developers are going to learn');
plt.gca().set_yticklabels(['%i%%' % abs(i*100) for i in plt.yticks()[0]]);
plt.ylabel('Percentage of users');
plt.savefig('languages_opinion.jpg')

What i understand from this plot is, that if someone did learn the language they also learned how to like it. I have a question for the second group though: if you hate the language, why do you even want to learn it?

In [None]:
plt.figure(figsize=(20, 10))
columns = [i for i in values.columns.ravel() if 'q28' in i]
langs_prof = [i for i in values.columns.ravel() if 'q22' in i]
langs_prof = langs_prof[1:-1]
columns = columns[:len(langs_prof)]

langs_sum = len(numeric[values['q16HiringManager']=='Yes'])
langs_name = ['%s: %.1f%%' % (i.split('Prof')[-1], (len(numeric[numeric[i]==1])/langs_sum)*100) for i in langs_prof]
langs_count = [len(numeric[numeric[i]==1]) for i in langs_prof]

squarify.plot(sizes=langs_count, label=langs_name, alpha=0.7, color=list(np.random.rand(17,3)))
plt.axis('off')
plt.title(codebook.loc[langs_prof[0]][0]);
plt.savefig('languages_desired.jpg')

So, if you want to have big chances of finding a job, you should definitely learn Javascript and Java.

That concludes my kernel. I hope you find it at least a bit interesting or informative