In [None]:
import math
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
pd.set_option('display.max_columns', None) 

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline

import math
import os

import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

In [None]:
survey_df = pd.read_csv('../input/survey_results_public.csv')
survey_schema = pd.read_csv('../input/survey_results_schema.csv', index_col='Column')

In [None]:
survey_df_india = survey_df.loc[survey_df['Country']=='India', :].copy(deep=True)
survey_df_germany = survey_df.loc[survey_df['Country']=='Germany', :].copy(deep=True)
survey_df_uk = survey_df.loc[survey_df['Country']=='United Kingdom', :].copy(deep=True)
survey_df_us = survey_df.loc[survey_df['Country']=='United States', :].copy(deep=True)

# How a developer from India differs from an average developer:

* [**What's the data?**](#data)
* [**My Motivation**](#motivation)
* [**Key Findings from my analysis**](#findings)
* [**Helper functions**](#helper)
* **Analysis**:
    * [**Salary**](#general)
        * [First Glance](#firstGlance)
        * [Unemployment rate](#unemployment)
        * [Students](#student)
        * [Summary of responders with non-zero salaries](#nonZero)
    * [**Education**](#education)
        * [Highest formal Education](#formal)
        * [Undergrad major](#major)
        * [Self-education](#selfedu)
        * [Other insights relating to education](#eduOthers)
    * [**Hobby and Open-source contribution**](#hobbyAndOSS)
        * [Hobby](#hobby)
        * [Open-source](#oss)
    * [**Age and experience**](#youth)
        * [Age](#age)
        * [Years coding (including studying)](#yearsCoding)
        * [Years coding professionally](#yearsCodingProf)
        * [Competitive nature](#competitive)
        * [Ambitions for career growth](#ambitious)
    * [**Development life**](#devlife)
        * [Devloper Types](#devtypes)
        * [IDE](#ide)
        * [Platforms](#platforms), [Languages](#languages) and [Frameworks](#frameworks)
        * [Communication Tools](#commTools)
        * [Programming Methodology](#methodology)
        * [Ethics in coding](#ethics)
    * **Miscellenous**
        * [**AI opinions**](#ai)
        * [**New hypothetical tools**](#hypoTools)
        * [**Use Ad Blockers?**](#adblocker)
        * [**Operating System**](#os)
        * [**Stack Overflow usage**](#so)
        * [**Exercise**](#exercise)
        * [**Sexual Orientation**](#sexOrient)
* [**_Conclusion and Final thoughts_**](#theEnd)
* [__Other areas that I want to explore (*and that you can try too!*)__](#curiousity)

# What's the data?<a class="anchor" id="data#"></a>

Each year, Stack Overflow conducts a public survey where they ask the developer community about everything from their favorite technologies to their job preferences. This year marked the eighth year that they've published this Annual Developer Survey results. 

This time they covered a few new topics ranging from artificial intelligence to ethics in coding. There were a total of 129 questions on the survey and on an average, it would take a person 30-minutes to respond to the entire survey.

*Despite such a great length, __over 100,000 developers__ took the survey in January 2018! That's not all though.. out of those responders, __a whopping 67,441 developers__ completed the entire survey!!*

Hence, this data presents a unique opportunity to learn about the trends in the opinions and the daily lifestyle of developers in 2018.  
This is the data that I have used for my analysis.

# My motivation:<a class="anchor" id="motivation"></a>

I knew very little about data visualisation (*couldn't even produce simple bar graphs if I wanted to*) and wished to learn it. And, I had recently come to the conclusion that [Kaggle is an amazing place to learn ML and Data Science concepts](https://towardsdatascience.com/use-kaggle-to-start-and-guide-your-ml-data-science-journey-f09154baba35). So, I decided to pick some interesting dataset and just dive into exploring it. 

I chose the [Stack Overflow Developer Survey 2018](https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey) for the reasons mentioned above. There are some excellent public analyses present for this data already. These are my favourites:
* [Stack Overflow 2018 survey: age, gender, sexuality](https://www.kaggle.com/heesoo37/stack-overflow-2018-survey-age-gender-sexuality)
* [Stack Overflow 2018 survey report](https://www.kaggle.com/pavanraj159/stack-overflow-2018-survey-report)

So, as I was exploring these kernels and trying to build my own analysis, I had this idea of introducing a new dimension to the analysis. I wanted to compare the various aspects of a developer's life based on the countries. Particularly, I wondered how the developers from my country, India, differed from the ones belonging to other countries and from the world average.

Given that India is a developing country with the world's 2nd highest population that counts Bengaluru (the software outsourcing capital of the world) as one of its major cities, I expected to find some interesting differences in my analysis. It was also encouraging to see that India is at the 2nd position based on the number of respondants to this survey:

In [None]:
count = pd.DataFrame(survey_df['Country'].value_counts()[:10].copy(deep=True))
percentage = pd.DataFrame(survey_df['Country'].value_counts(normalize=True)[:10].copy(deep=True))

count.columns = ['Count']
percentage.columns = ['Percentage']
percentage['Percentage'] *= 100

top_responders_countries = pd.concat([count, percentage], axis=1)

top_responders_countries.columns.name = '#Responders'
top_responders_countries

In [None]:
from wordcloud import WordCloud

country = survey_df["Country"].value_counts()[:100].reset_index()
wrds = country["index"].str.replace(" ","")
wc = WordCloud(background_color='white', colormap=cm.viridis, scale=5).generate(" ".join(wrds))
plt.figure(figsize=(16,8))
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud of countries based on the number of responders:", fontdict={'size':22, 'weight': 'bold'});

I am new to Data Science. As I said, my aim by doing this analysis was to learn some data visualisation techniques. So, I will really appreciate it if you could give me feedback on this report and tell me what I did right and what I did wrong.

# Key-findings from my analysis:<a class="anchor" id="findings"></a>

In this analysis, I tried to present a comparison of Indian developers with the entire world on average. I also tried to present the results of the other top 3 responding countries (viz. United States, United Kingdom and Germany) to make the comparison.

Here are the important areas where Indian developers differ from the developers in the other groups:

* **Low average salaries** [(link)](#general):
    * Average salary of Indian developers is much lesser than the world average.
    * This is despite the fact that there is a larger proportion of responders with an undergrad major in a CS-related field in India.
    * One reason behind the lower salary might be that there is a smaller percentage of experienced and older developers in India.
    * Rate of unemployment is much higher in India.
    * 1 out of 3 responders from India is a full-time student.

* **Youth** [(youth)](#youth):
    * India has a __much__ younger developer population as compared to the other groups.
    * An overwhelmingly large majority of developers have been coding (including any education) for 0-8 years
    * A large majority of developers in India have been coding professionally for 0-5 years

* **Competitive and ambitious** [(link)](#competitive) and [(link)](#ambitious):
    * A majority of the Indian responders feel like they are competing with their peers
    * Indian developers are very ambitious about their careers 
        * Only very few of them want to continue doing the same work
        * A lot of them want to work in a more specialised role or as product managers
        
* **Appeal of Mobile development** [(link)](#mobDev):
    * Mobile development has a unique appeal in India
        * India has the largest number of mobile developers in the world
        * Android and Firebase are among the top 3 most popular platforms in India, ranking even above Windows desktop or server
    

* **Opinions of Indian developers about AI differs a lot from the rest of the world** [(link)](#ai)
* **Indians are _extremely_ interested in all new hypothetical tools** [(link)](#hypoTools)
* **Not a lot of Indians consider ethics when writing code** [(link)](#ethics)
* **Indians contribute to open-source but don't think they are learning from doing so** [(link)](#oss)
* **Indian developers also differ from the other groups in their choice of communication tools, IDE and programming methodology** [(link)](#commTools), [(link)](#ide) and [(link)](#methodology)


# Helper functions:<a class="anchor" id="helper"></a>

In [None]:
country_df_dict = {'World': survey_df, 'India': survey_df_india, 'US': survey_df_us,
                   'UK': survey_df_uk, 'Germany': survey_df_germany}

In [None]:
def what_is(name_list):
    '''
    Gives a description of each item present in `name_list` based on
    `survey_schema`
    :param name_list: A list of the feature names whose description is required
    
    Returns: A list containing one description string per item in `name_list`
    '''
    what_is_list = [name+': '+str(survey_schema.loc[name, 'QuestionText']) for name in name_list]
    return what_is_list

In [None]:
def response_overall(feature, normalize=True):
    '''
    Gives the overall response stats for `feature` in different countries.
    :param feature: String storing the column name whose overall description is
        required
    
    Returns: A pandas.DataFrame object with unique feature values as the columns
        and countries as the index
    '''
    df = pd.DataFrame(columns=survey_df[feature].value_counts().index)
    df.loc['World', :] = survey_df[feature].value_counts(normalize=normalize) * 100
    df.loc['India', :] = survey_df_india[feature].value_counts(normalize=normalize) * 100
    df.loc['US', :] = survey_df_us[feature].value_counts(normalize=normalize) * 100
    df.loc['UK', :] = survey_df_uk[feature].value_counts(normalize=normalize) * 100
    df.loc['Germany', :] = survey_df_germany[feature].value_counts(normalize=normalize) * 100
    return df

In [None]:
def get_trues(col):
    '''
    Helper function to store the frequency of True values for a 
    feature in its related col
    :param col: A Pandas DataFrame column generated upon calling
        `describe()` upon a boolean feature
    '''
    if col['top'] == False:
        col['top'] = True
        col['freq'] = col['count'] - col['freq']

In [None]:
def generate_expanded_features(feature, df):
    '''
    Helper function to generate a list of expanded feature names for 
    Multiple Options Correct type feature.
    
    :param feature: Parent feature name
    :param df: Pandas DataFrame object to which the parent feature belongs
    Returns: A list of generated feature names where each feature is of
        type -> parent+"_"+value
    '''
    values_set = set()
    values = [item for item in survey_df[feature].unique() if isinstance(item, str)]
    for entry in values:
        for item in entry.split(';'):
            values_set.add(item)
    return [feature+'_'+value for value in values_set]

In [None]:
def response_overall_moc(feature, normalize=True):
    '''
    Gives the overall response stats for a Multiple Options Correct type
    feature for different countries.
    :param feature: String storing the column name whose overall description is
        required
    
    Returns: A pandas.DataFrame object with unique feature values as the columns
        and countries as the index
    '''
    features_expanded = generate_expanded_features(feature=feature, df=survey_df)

    spread_features_all(moc_parent=feature, printable=False)

    features_overall_df = pd.DataFrame(columns=features_expanded)

    for country, country_df in country_df_dict.items():
        features_df = country_df[features_expanded].describe(include='all')
        features_df.apply(get_trues, axis=0)
        features_df.loc['percentage', :] = (features_df.loc['freq', :] / features_df.loc['count', :]) * 100
        features_df.rename(index={'top': 'true_count'}, inplace=True)
        for tool in features_expanded:
            if normalize:
                features_overall_df.loc[country, tool] = features_df.loc['percentage', tool]
            else:
                features_overall_df.loc[country, tool] = features_df.loc['freq', tool]

    columns = list(features_overall_df.columns)
    features_overall_df.columns = [cname.split('_')[1] for cname in columns]
    
    return features_overall_df

In [None]:
def update_countries_dict():
    global country_df_dict
    country_df_dict = {'World': survey_df, 'India': survey_df_india, 'US': survey_df_us,
                       'UK': survey_df_uk, 'Germany': survey_df_germany}

In [None]:
def spread_features(df, moc_parent, printable=True):
    '''
    Handles the Multiple Options Correct type features by spreading out
    each possible entry into a different column.
    :param df: Pandas.DataFrame object upon which we need to perform the
        operation
    :param moc_parent: The Multiple Options Correct type feature
    :param printable: Boolean; True to print the running info, False otherwise
    
    Returns: List of newly generated column names
    '''
    features_set = set()
    values = [item for item in survey_df[moc_parent].unique() if isinstance(item, str)]
    for entry in values:
        for item in entry.split(';'):
            features_set.add(item)

    if printable:
        print(features_set)

    for feature in features_set:
        df.loc[~df[moc_parent].isnull(), moc_parent+'_'+feature] = \
            df.loc[~df[moc_parent].isnull(), :] \
                     .apply(lambda row: feature in row[moc_parent], axis=1)

    if printable:
        for feature in features_set:
            print(df[moc_parent+'_'+feature].value_counts())
            
    return [moc_parent+'_'+feature for feature in features_set]

In [None]:
def spread_features_all(moc_parent, printable=True):
    '''
    Spread the passed feature in all the countries' DataFrames.
    :param moc_parent: The Multiple Options Correct type feature
    :param printable: Boolean; True to print the running info, False otherwise
    '''
    spread_features(survey_df, moc_parent, printable)
    global survey_df_india, survey_df_us, survey_df_germany, survey_df_uk
    survey_df_india = survey_df.loc[survey_df['Country']=='India', :].copy(deep=True)
    survey_df_germany = survey_df.loc[survey_df['Country']=='Germany', :].copy(deep=True)
    survey_df_uk = survey_df.loc[survey_df['Country']=='United Kingdom', :].copy(deep=True)
    survey_df_us = survey_df.loc[survey_df['Country']=='United States', :].copy(deep=True)
    update_countries_dict()

In [None]:
def plot_sequential(df, feature, order=None, colormap=cm.viridis, horizontal=False):
    '''
    Function to plot feature with sequential feature values.
    :param df: Pandas.DataFrame object containing values to be plotted
        [likely one returned from response_overall()]
    :param feature: The feature name that is being plotted
    :param order: The order in which we want to plot the feature values
    :param colormap: matplotlib.cm object that provides the colormap for plotting
    :param horizontal: Boolean specifying the orientation of the barplot
    
    Returns the plotted axis
    '''
    if order is None:
        order = list(df.columns)
    country_order = ['World', 'India', 'US', 'UK', 'Germany']
    title = what_is([feature])[0]
    if horizontal:
        ax = df.loc[country_order[::-1], order] \
                .plot.barh(figsize=(16, 8), stacked=True, colormap=colormap)
        ax.set_xlabel("Percentage", fontdict={'size':16});
        ax.set_xlim(0, 100)
        ax.set_ylabel("Responses", fontdict={'size':16});
        ax.set_title(title, fontdict={'weight': 'bold'});
        sns.despine()
        return ax
    else:
        ax = df.loc[country_order, order] \
                .plot.bar(figsize=(16, 8), stacked=True, colormap=colormap)
        ax.set_ylabel("Percentage", fontdict={'size':16});
        ax.set_xlabel("Responses", fontdict={'size':16});
        ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
        ax.set_title(title, fontdict={'weight': 'bold'});
        sns.despine(bottom=True);
        return ax

In [None]:
def highlighter(row):
    '''
    Function to be passed to `DataFrame.style.apply()`. 
    Highlights the rows of the DataFrame: row corresponding to 'World' data with blue 
    and the one corresponding to 'India' data with orange
    
    :param row: A pandas Series representing the row
    
    Returns: A list storing the background colors for each cell in the row
    '''
    if row.name == 'India':
        return ['background: orange' for i in row]
    elif row.name == 'World':
        return ['background: lightblue' for i in row]
    else:
        return ['' for i in row]

# Salary:<a class="anchor" id="general"></a>
## First glance:<a class="anchor" id="firstGlance"></a>

In [None]:
salary_nan_df = survey_df.loc[survey_df['ConvertedSalary'].isnull(), :]
percentage = round((salary_nan_df.shape[0] / survey_df.shape[0]) * 100, 
                   2)
print(str(percentage)+"% ("+str(salary_nan_df.shape[0])+") of responders have not filled in their salary.")

Therefore, I tried to make some plausible imputations to the missing salaries based on the following assumptions:

* Responders who were full-time students at the time of filling in the survey have 0 salaries.
* People who selected "I've never had a job" were unemployed at the time of filling in the survey.

In [None]:
for col in ['ConvertedSalary', 'Salary']:
    survey_df.loc[(survey_df['ConvertedSalary'].isnull()) & \
                  (survey_df['Student'] == 'Yes, full-time'), col] = 0
    survey_df.loc[(survey_df['ConvertedSalary'].isnull()) & \
                  (survey_df['LastNewJob'] == "I've never had a job"), col] = 0

In [None]:
salary_nan_df = survey_df.loc[survey_df['ConvertedSalary'].isnull(), :]
percentage = round((salary_nan_df.shape[0] / survey_df.shape[0]) * 100, 
                   2)
print(str(percentage)+"% ("+str(salary_nan_df.shape[0])+") of responders have not filled in their salary.")

Now, let's see a density estimate of the yearly salary of the various groups, i.e, World and the top 4 responding countries (India, US, Germany and UK).

I am not going to take the top 5 percentile salary into consideration for the figure below because the deviation of salaries is too large in that range and it will just distort the figure too much.

In [None]:
salary_percentile = pd.DataFrame(survey_df.loc[:, 'ConvertedSalary'].quantile(list(np.linspace(0.9, 0.99, 10))))
salary_percentile.index.name = "Percentiles"
salary_percentile.index = salary_percentile.index * 100
salary_percentile

In [None]:
MAX_LIM = survey_df.loc[:, 'ConvertedSalary'].quantile(0.95)

fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(35, 20));

sns.kdeplot(survey_df_india.loc[survey_df_india['ConvertedSalary']<MAX_LIM, 'ConvertedSalary'], ax=axes[0], shade=True);
axes[0].set_title("India", fontdict={'weight': 'bold', 'size': 24});

sns.kdeplot(survey_df.loc[survey_df['ConvertedSalary']<MAX_LIM, 'ConvertedSalary'], ax=axes[1], shade=True);
axes[1].set_title("World", fontdict={'weight': 'bold', 'size': 24});

for ax in axes:
    ax.set_xlabel("Yearly Salary (converted to USD assuming 50 working weeks)", 
                  fontdict={'weight': 'bold', 'size': 24});
    ax.tick_params(axis='both', labelsize=20);
    ax.set_xlim(left=0, right=200000);

fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True, figsize=(30, 10))

sns.kdeplot(survey_df_us.loc[survey_df_us['ConvertedSalary']<MAX_LIM, 'ConvertedSalary'], ax=axes[0], shade=True);
axes[0].set_title("United States", fontdict={'weight': 'bold', 'size': 16});

sns.kdeplot(survey_df_germany.loc[survey_df_germany['ConvertedSalary']<MAX_LIM, 'ConvertedSalary'], ax=axes[1], shade=True);
axes[1].set_title("Germany", fontdict={'weight': 'bold', 'size': 16});

sns.kdeplot(survey_df_uk.loc[survey_df_uk['ConvertedSalary']<MAX_LIM, 'ConvertedSalary'], ax=axes[2], shade=True);
axes[2].set_title("United Kingdom", fontdict={'weight': 'bold', 'size': 16});
for ax in axes:
    ax.set_xlabel("Yearly Salary (converted to USD assuming 50 working weeks)", 
                  fontdict={'weight': 'bold', 'size': 16});
    ax.tick_params(axis='both', labelsize=16);
    ax.set_ylim(0, 0.00005);
    ax.set_xlim(left=0, right=200000);

- **Salaries of Indian developers are much lower as compared to the world taken as a whole.**
- **Salaries of developers in other 3 top responding countries - US, UK and Germany - are much higher than the world average.**

*I want to understand why the salaries of Indian developers are so low as compared to the rest of the world. I know Indian developers have historically been working at lower salaries as compared to their US counterparts but is it just that? Or is there some other reason behind this too?*

Maybe a little more description of the salaries might help..

In [None]:
salary_describe = (survey_df.loc[survey_df['Country'].isin(['India', 'United States', 'Germany', 'United Kingdom'])]
                   .groupby('Country')['ConvertedSalary'].describe())
salary_describe.loc['World', :] = survey_df['ConvertedSalary'].describe()
salary_describe.columns.name = 'Salary description'
salary_describe.loc[['World', 'India', 'United States', 'United Kingdom', 'Germany'], 
                    ['count', 'mean', '25%', '50%', '75%']].style.apply(highlighter, axis=1)

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(16, 8));
salary_describe['mean'].plot.bar(ax=ax);
ax.set_ylabel("Mean salary (converted to USD)", fontdict={'size':18});
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_xlabel("Country", fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=18);

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(20, 5), sharey=True)
for ax_id, percentile in zip([0, 1, 2], ['25%', '50%', '75%']):
    salary_describe[percentile].plot.bar(ax=axes[ax_id])
    axes[ax_id].set_title(percentile)
axes[0].set_ylabel("Salary (converted to USD)", fontdict={'size': 16});
for ax in axes:
    ax.tick_params(axis='both', labelsize=18);

Here's an interesting observation - **the 25th percentile on yearly salaries of Indian developers is actually a zero**. Sure no developer is working for free, right?

*So, is it because of the feared - high unemployement rate in India? Or does India have a higher percentage of full-time student responders who as a result have a 0 salary?*

Let's see..

## Unemployment rate:<a class="anchor" id="unemployment"></a>

In [None]:
unemployment_dict = {}

unemployment_df = survey_df.loc[(survey_df['JobSearchStatus'] == 'I am actively looking for a job'), :]
unemployment_dict['World'] = (unemployment_df.shape[0] / survey_df.shape[0]) * 100
unemployment_df_india = survey_df_india.loc[(survey_df_india['JobSearchStatus'] == 'I am actively looking for a job'), :]
unemployment_dict['India'] = (unemployment_df_india.shape[0] / survey_df_india.shape[0]) * 100
unemployment_df_us = survey_df_us.loc[(survey_df_us['JobSearchStatus'] == 'I am actively looking for a job'), :]
unemployment_dict['US'] = (unemployment_df_us.shape[0] / survey_df_us.shape[0]) * 100
unemployment_df_uk = survey_df_uk.loc[(survey_df_uk['JobSearchStatus'] == 'I am actively looking for a job'), :]
unemployment_dict['UK'] = (unemployment_df_uk.shape[0] / survey_df_uk.shape[0]) * 100
unemployment_df_germany = survey_df_germany.loc[(survey_df_germany['JobSearchStatus'] == 'I am actively looking for a job'), :]
unemployment_dict['Germany'] = (unemployment_df_germany.shape[0] / survey_df_germany.shape[0]) * 100

In [None]:
fig = pd.Series(unemployment_dict).plot.bar(figsize=(16, 8));
fig.set_ylabel("Percentage", fontdict={'size':16});
fig.set_title("Actively looking for a job", fontdict={'size':20});
fig.set_xticklabels(fig.get_xticklabels(), rotation=0);
fig.tick_params(axis='both', labelsize=18);

In [None]:
pd.Series(unemployment_dict)

> **India has a considerably higher percentage of unemployed developers as compared to the other countries.**

## Students:<a class="anchor" id="student"></a>

In [None]:
student_df = response_overall('Student')
student_df.columns.name = 'Student?'
student_df.index.name = 'Country'

In [None]:
plot_sequential(df=student_df, feature='Student', order=['No', 'Yes, part-time', 'Yes, full-time']);

In [None]:
student_df.style.apply(highlighter, axis=1)

> **India also has a much higher percentage of student developers. Almost 1 in every 3 responders from India is a full-time student. Compare this with the entire world where only 1 in every 5 responders is a full-time student.**

**_Both of the above factors - higher unemployment rate and larger proportion of full-time students - must contribute to India having 0 as the 25th percentile salary._**

## Summary of responders with non-zero salaries:<a class="anchor" id="nonZero"></a>

It will be insightful to see a summary of salaries of those responders who actually have a job.

In [None]:
salary_describe = (survey_df.loc[survey_df['Country'].isin(['India', 'United States', 'Germany', 'United Kingdom']) & \
                                 survey_df['ConvertedSalary']>0]
                   .groupby('Country')['ConvertedSalary'].describe())
salary_describe.loc['World', :] = survey_df['ConvertedSalary'].describe()
salary_describe.columns.name = 'Salary description'
salary_describe.loc[['World', 'India', 'US', 'UK', 'Germany'], ['count', 'mean', '25%', '50%', '75%']].style.apply(highlighter, axis=1)
salary_describe.loc[:, '90%'] = \
    survey_df.loc[survey_df['Country'].isin(['India', 'United States', 'Germany', 'United Kingdom']) & \
                  survey_df['ConvertedSalary']>0] \
                  .groupby('Country')['ConvertedSalary'].quantile(0.9)
salary_describe.loc['World', '90%'] = survey_df['ConvertedSalary'].quantile(0.9)
salary_describe.loc[['World', 'India', 'United States', 'United Kingdom', 'Germany'], 
                    ['count', 'mean', '25%', '50%', '75%', '90%']].style.apply(highlighter, axis=1)

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(16, 8));
salary_describe.loc[['World', 'India', 'United States', 'United Kingdom', 'Germany'], 'mean'].plot.bar(ax=ax);
ax.set_ylabel("Salary (converted to USD)", fontdict={'size':18});
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_xlabel("Country", fontdict={'size': 18});
ax.set_title("Mean", fontdict={'size': 20, 'weight': 'bold'});

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(20, 5), sharey=True)
for ax_id, percentile in zip([0, 1, 2], ['25%', '50%', '75%']):
    salary_describe.loc[['World', 'India', 'United States', 'United Kingdom', 'Germany'], percentile].plot.bar(ax=axes[ax_id])
    axes[ax_id].set_title(percentile, fontdict={'size': 18, 'weight': 'bold'});
    axes[ax_id].tick_params(axis='both', labelsize=18);
    axes[ax_id].set_xlabel("Country", fontdict={'size': 18})
axes[0].set_ylabel("Salary (converted to USD)", fontdict={'size': 16});

In [None]:
MAX_LIM = survey_df.loc[:, 'ConvertedSalary'].quantile(0.95)

fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(35, 20));

sns.kdeplot(survey_df_india.loc[(survey_df_india['ConvertedSalary']<MAX_LIM) & \
                                (survey_df_india['ConvertedSalary']>0), 'ConvertedSalary'], 
            ax=axes[0], shade=True);
axes[0].set_title("India", fontdict={'weight': 'bold', 'size': 26});

sns.kdeplot(survey_df.loc[(survey_df['ConvertedSalary']<MAX_LIM) & \
                          (survey_df['ConvertedSalary']>0), 'ConvertedSalary'], 
            ax=axes[1], shade=True);
axes[1].set_title("World", fontdict={'weight': 'bold', 'size': 26});

for ax in axes:
    ax.set_xlabel("Yearly Salary (converted to USD assuming 50 working weeks)", 
                  fontdict={'weight': 'bold', 'size': 24});
    ax.tick_params(axis='both', labelsize=20);
    ax.set_xlim(left=0, right=200000);

fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True, figsize=(30, 10))

sns.kdeplot(survey_df_us.loc[(survey_df_us['ConvertedSalary']<MAX_LIM) & \
                             (survey_df_us['ConvertedSalary']>0), 'ConvertedSalary'], 
            ax=axes[0], shade=True);
axes[0].set_title("United States", fontdict={'weight': 'bold', 'size': 18});

sns.kdeplot(survey_df_germany.loc[(survey_df_germany['ConvertedSalary']<MAX_LIM) & \
                                  (survey_df_germany['ConvertedSalary']>0), 'ConvertedSalary'], 
            ax=axes[1], shade=True);
axes[1].set_title("Germany", fontdict={'weight': 'bold', 'size': 18});

sns.kdeplot(survey_df_uk.loc[(survey_df_uk['ConvertedSalary']<MAX_LIM) & \
                             (survey_df_uk['ConvertedSalary']>0), 'ConvertedSalary'], 
            ax=axes[2], shade=True);
axes[2].set_title("United Kingdom", fontdict={'weight': 'bold', 'size': 18});

for ax in axes:
    ax.set_xlabel("Yearly Salary (converted to USD assuming 50 working weeks)", fontdict={'size': 16, 'weight': 'bold'});
    ax.tick_params(axis='both', labelsize=16);
    ax.set_ylim(0, 0.00006);
    ax.set_xlim(left=0, right=200000);

> - **Average annual salary of developers in India is almost 2.5 times lesser the average salary of all developers all over the world, almost 5 times lesser than the average salary in the US, almost 4 times lesser than the average salary in the UK and almost 3 times lesser than average salary in Germany.**
> - **More than 50% of the Indian developers are working on a yearly salary of less than \$10,000.**
> - **The median of annual salaries for the entire developer population in the world is ~3.5 times more than in India. It is ~10 times more in the US and ~6 times more in both UK and Germany.**
> - **More than 90% of the Indian developers have an annual salary less than \$50, 000.**

Clearly, there is a large gap in the salary of an Indian developer and an average developer of the world. This gap is even more huge when compared to the other top 3 responding countries. Indian developers are working on significantly lower salaries than the other groups.

So, what is the reason behind this huge difference? 2 possible reasons come to mind :-
    
1. __Indian developers are less qualified__: 
    * One way to measure the qualification is by college degree (*I agree that it isn't a good way but a lot of the companies do use college education as a yardstick to measure a candidate's qualification*).
    * Another way is to use a candidate's interest in programming as a proxy for his/her qualification (*Again, I know that it might not be the best possible way to measure it*). There were questions in the survey asking the responder whether he/she codes as a hobby and his/her participation in open-source culture. These might be able to indicate a candidate's interest in programming.
    
1. __Indian developers are working on younger developer positions__:

    The younger or entry-level positions are probably paid lesser than the more experienced roles at any company.
    
    We have a few questions on the survey that can be used to investigate this - age of responder, years of experience with coding and years of experience with coding professionally.
    
*So, now I am going to use this survey's data to find out how true the above 2 statements are..*

# Education:<a class="anchor" id="education"></a>
**_Now, let's discuss the education of these developers.._**

## Formal Education:<a class="anchor" id="formal"></a>

In [None]:
formal_education_df = response_overall('FormalEducation')
formal_education_df.columns.name = 'Highest level of formal education'
formal_education_df.index.name = 'Country'

In [None]:
formal_education_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = formal_education_df.T.plot.barh(figsize=(12, 20))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right');
ax.set_title(what_is(['FormalEducation'])[0].split(':')[1], fontdict={'size': 18, 'weight': 'bold'});
ax.set_xlim(0, 100);
ax.set_xlabel("Percentage", fontdict={'size': 20});
ax.set_ylabel("Highest level of formal education", fontdict={'size': 20});
ax.tick_params(axis='both', labelsize=18);
ax.legend(prop={'size': 16});

In [None]:
formal_education_df.loc[:, list(reversed(formal_education_df.columns))].style.apply(highlighter, axis=1)

> - **The proportion of responders who have a `Bachelor's degree` is much larger in India than in the other 3 countries and the world.**
> - **The proportion of responders who have a `Master's degree` is almost the same for all groups except Germany where it is quite larger.**
> - **The proportion of responders with `Other doctoral degree (Ph.D, Ed.D, etc)` is very less as compared to the other groups (almost 13 times less than the world average and even lesser if compared to the other top responding countries).**

## Undergrad Major:<a class="anchor" id="major"></a>

In [None]:
undergrad_major_df = response_overall('UndergradMajor')
undergrad_major_df.columns.name = 'Undergrad major in college'
undergrad_major_df.index.name = 'Country'
undergrad_major_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = undergrad_major_df.T.plot.barh(figsize=(12, 20))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper left');
ax.set_xlim(0, 100);
ax.set_ylabel("Undergrad Majors", fontdict={'size': 20});
ax.set_xlabel("Percentage", fontdict={'size': 20});
ax.set_title("Undergrad Majors in college", fontdict={'size': 18, 'weight': 'bold'});
ax.tick_params(axis='both', labelsize=16);
ax.legend(prop={'size': 16});

In [None]:
undergrad_major_df.loc[:, list(reversed(undergrad_major_df.columns))].style.apply(highlighter, axis=1)

> - **A huge majority of the responders in India belong to a CS-related major i.e, `Computer science, computer engineering, or software engineering`, `Information systems, information technology, or system administration` and `Web development or web design` (~85%).**
> - **The above figure is 75% for world as a whole, ~66% for US, ~65% for UK and ~73% for Germany.**
> - **Another 12% of Indian responders belong to some other engineering discipline.**

## Self-education:<a class="anchor" id="selfedu"></a>

In [None]:
what_is(['EducationTypes'])

In [None]:
education_types_df = response_overall_moc('EducationTypes')
education_types_df.columns.name = 'Non-degree education types'
education_types_df.index.name = 'Country'
education_types_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = education_types_df.T.plot.barh(figsize=(12, 20));
ax.set_title("Non-degree education", fontdict={'weight': 'bold'});
ax.set_xlim(0, 100);
ax.set_xlabel('Percentage', fontdict={'size': 18});
ax.set_ylabel('Non-degree education types', fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=18);
ax.legend(prop={'size': 16});

In [None]:
education_types_df.loc[:, list(reversed(education_types_df.columns))].style.apply(highlighter, axis=1)

> - **MOOCs are almost just as popular in India as they are in other groups.**
> - **Online coding competitions hosted on websites like Codechef, Hackerrank and TopCoder are particularly popular in India.**
> - **Part-time in-person courses are also more popular in India than the entire world or the other 3 top responding countries.**
> - **Teaching oneself a new language, framework or tool without taking a formal course is much less popular in India as compared to the other groups.**

- *The proportion of responders with a traditional CS-related undergrad degree is much larger in India than in the other countries taken into consideration and the world taken as a whole. So, in terms of traditional college degree, a higher proportion of Indian developers are qualified for developer jobs as compared to the entire world and the other 3 top responding countries. So, this shouldn't be a reason for the low pays of Indian developers.*
- *India does have a much lower proportion of developers with a doctoral degree (like PhD.). So, the jobs that require a PhD as a requirement aren't occupied by Indians much. Such jobs probably come with higher pays but they are also probably not too common.*
- *In terms of enhancing one's knowledge by taking non-degree courses and MOOCs, Indian developers are atleast as eager to do so as the other groups.*
- *Being able to teach oneself a new tool, language or framework without taking any formal course is also an important part of developer's life. A large proportion of Indian developers have never done that. This is probably a bad signal to any employer.*

## Other insights relating to the formal education:<a class="anchor" id="eduOthers"></a>

### Presence of other disciplines:

Programming doesn't have to be so specific to any particular college major. All the resources that one needs to learn programming are available on the internet to everyone. But still, there is a problem of inclusion of other disciplines in the developer community. This problem is particularly huge in India.

*Let's try to zoom in on the proportion of responders from less technical disciplines:*

In [None]:
ax = (undergrad_major_df.loc[:, ['Fine arts or performing arts (ex. graphic design, music, studio art)',
                                 'A health science (ex. nursing, pharmacy, radiology)',
                                 'A social science (ex. anthropology, psychology, political science)',
                                 'A humanities discipline (ex. literature, history, philosophy)',
                                 'A natural science (ex. biology, chemistry, physics)',
                                 'A business discipline (ex. accounting, finance, marketing)']]
     .T.plot.barh(figsize=(12, 16)));
ax.set_ylabel("Undergrad Majors", fontdict={'size': 18});
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_title("Percentage of people from less technical fields using StackOverflow", fontdict={'size': 18, 'weight': 'bold'});
ax.tick_params(axis='both', labelsize=14);
ax.legend(prop={'size': 16});

> - **Natural Science majors don't code in India as much as they do in the other countries. There are more responders belonging to a `A business discipline (ex. accounting, finance, marketing)` as compared to the ones belonging to `A natural science (ex. biology, chemistry, physics)` in India. The reverse is true for the other countries taken into consideration and the whole world.**
> - **The world average for the percentage of responders from `A business discipline (ex. accounting, finance, marketing)` is ~3 times more than the percentage seen in India.**
> - **The world average for the percentage of responders from `A natural science (ex. biology, chemistry, physics)` is ~5 times more than the percentage seen in India.**
> - **The world average for the percentage of responders from `A humanities discipline (ex. literature, history, philosophy)` is ~13 times more than the percentage seen in India.**
> - **The world average for the percentage of responders from a `A social science (ex. anthropology, psychology, political science)` is ~19 times more than the percentage seen in India.**
> - **The world average for the percentage of responders from a `A health science (ex. nursing, pharmacy, radiology)` is ~5 times more than the percentage seen in India.**
> - **The world average for the percentage of responders from a `Fine arts or performing arts (ex. graphic design, music, studio art)` is  ~41 times more than the percentage seen in India.**

**_If inclusivity of other disciplines is a problem in the rest of the world, it is a disaster in India._**

### Education level of parents:<a class="anchor" id="eduParents"></a>

In [None]:
education_parents_df = response_overall("EducationParents")
education_parents_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = education_parents_df.T.plot.barh(figsize=(12, 20));
ax.set_title(what_is(['EducationParents'])[0], fontdict={'weight': 'bold'});
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=16);
ax.legend(prop={'size': 16});

In [None]:
education_parents_df.loc[:, list(reversed(education_parents_df.columns))].style.apply(highlighter, axis=1)

> - **The categories where the orange band representing India sticks out are - `Bachelor’s degree (BA, BS, B.Eng., etc.)`, `Primary/elementary school` and `They never completed any formal education`.**

- *Large percentage of people in India with a parent who has a bachelor's degree, signifies that getting a bachelor's degree has been easier in India for quite some time now.*
- *It is difficult to reason out this large difference in the percentage of developers with a bachelor's degree in India. Maybe, its because getting a bachelor's degree in India isn't as expensive as it is in other countries. I mean, I often witness people (on Twitter, Medium, blogs, TV) from US complain how expensive going to college is and how people end up repaying college loans for a long time. But I haven't heard of it being such a big problem here in India. So, maybe it isn't. But I am not sure about it.*  
  *What is a bigger problem here is the quality of college education. To see my point, look at this: no Indian colleges are in the top 150 of World University Rankings according to [this resource](https://www.topuniversities.com/university-rankings/world-university-rankings/2018).*
- *The orange bars representing `Primary/elementary school` and no formal education didn't stick out in the bar graph about [developers' own formal education](#formal) but they do over here. This, hopefully, represents the improving literacy levels in India.*

# Hobby and Open-source contribution:<a class="anchor" id="hobbyAndOSS"></a>

## Hobby:<a class="anchor" id="hobby"></a>

In [None]:
what_is(['Hobby'])

In [None]:
hobby_df = response_overall('Hobby')
hobby_df.columns.name = 'Code as a hobby?'
hobby_df.index.name = 'Country'
hobby_df.style.apply(highlighter, axis=1)

In [None]:
ax = hobby_df.plot.bar(figsize=(16, 8), color=['green', 'red'])
ax.set_title("Do you code as a hobby", fontdict={'weight': 'bold'});
ax.set_ylim(0, 100);
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xlabel("");
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.tick_params(axis='both', labelsize=16);

> - **Around 80% of the responders in all the groups responded "Yes" to the the above question.**

- *Indians are just as interested in coding as any other group.*

## Open-source :<a class="anchor" id="oss"></a>
 "We contribute to Open-source software a lot but don't think that we are learning something by doing so"

In [None]:
ax = response_overall('OpenSource').loc[:, ['Yes']].plot.bar(figsize=(16, 8), legend=False);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xlabel("Country", fontdict={'size': 18});
ax.set_title(what_is(['OpenSource'])[0], fontdict={'size': 16, 'weight': 'bold'});

> **A larger proportion of developers from India contribute to open source than the world average or any of the other 3 countries.**

But they don't consider it as a part of their education as can be seen in the column about [EducationTypes](#selfEdu):

In [None]:
what_is(['EducationTypes'])

In [None]:
ax = education_types_df.loc[:, ['Contributed to open source software']].plot.bar(figsize=(16, 8), legend=False);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xlabel("Country", fontdict={'size': 18});
ax.set_title("Percentage of people who marked contributing to open-source as non-degree education", 
             fontdict={'size': 16, 'weight': 'bold'});

In [None]:
os_df = pd.DataFrame(columns=education_types_df.index)
os_df.loc['Contribute to open-source', :] = response_overall('OpenSource').loc[:, 'Yes']
os_df.loc['Marked open-source contribution as non-degree education', :] = education_types_df.loc[:, 'Contributed to open source software']
ax = os_df.T.plot.bar(figsize=(16, 8));
ax.set_xlabel("");
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.tick_params(axis='both', labelsize=14);
ax.set_title("Contributing to open-source vs. learning from open-source contributions", fontdict={'weight': 'bold'});

> **While ~50% of responders from India contribute to open-source software, only ~30% responders feel that they are learning something from doing so. This is in contrast with the entire world and the other countries taken into consideration, where the 2 ratios are almost the same.**

- *Contributing to open-source software doesn't just help the community, but it also helps the contributor in his/her personal growth. So, I am not too sure what to infer from this above finding. Maybe it means that a lot of Indians who do contribute to open-source projects aren't regular at it. Again, I am not sure. So, let me know what you think.*

# Age and experience:<a class="anchor" id="youth"></a>

## Age:<a class="anchor" id="age"></a>

In [None]:
age_df = response_overall('Age')
age_df.columns.name = 'Age'
age_df.index.name = 'Country'

In [None]:
plot_sequential(df=age_df, feature='Age',
                order=['Under 18 years old', '18 - 24 years old', '25 - 34 years old', '35 - 44 years old',
                       '45 - 54 years old', '55 - 64 years old', '65 years or older'],
                colormap=cm.inferno_r);

In [None]:
age_df.loc[:, ['Under 18 years old', '18 - 24 years old', '25 - 34 years old', '35 - 44 years old',
                       '45 - 54 years old', '55 - 64 years old', '65 years or older']].style.apply(highlighter, axis=1)

> - **~93% of the Indian responders are aged between 18-34 years whereas this number is 73% for the world taken as a whole and even lesser for the other 3 top responding countries.**

## Years Coding:<a class="anchor" id="yearsCoding"></a>

In [None]:
years_coding_df = response_overall('YearsCoding')
years_coding_df.columns.name = 'Years Coding'
years_coding_df.index.name = 'Country'

In [None]:
plot_sequential(df=years_coding_df, feature='YearsCoding',
                order=['0-2 years', '3-5 years', '6-8 years', '9-11 years', '12-14 years', '15-17 years',
                       '18-20 years', '21-23 years', '24-26 years', '27-29 years', '30 or more years'],
                colormap=cm.inferno_r);

In [None]:
years_coding_df.loc[:, ['0-2 years', '3-5 years', '6-8 years', '9-11 years', '12-14 years', '15-17 years',
                       '18-20 years', '21-23 years', '24-26 years', '27-29 years', '30 or more years']].style.apply(highlighter, axis=1)

> - **~85% of the responders from India have been coding for 0-8 years. This proportion is ~56% for world, ~46% for US, ~43% for UK and ~46% for Germany.**

## Years Coding Professionally:<a class="anchor" id="yearsCodingProf"></a>

In [None]:
years_coding_prof_df = response_overall('YearsCodingProf')
years_coding_prof_df.columns.name = 'Years Coding Professionally'
years_coding_prof_df.index.name = 'Country'

In [None]:
plot_sequential(df=years_coding_prof_df, feature='YearsCodingProf',
                order=['0-2 years', '3-5 years', '6-8 years', '9-11 years', '12-14 years', '15-17 years',
                       '18-20 years', '21-23 years', '24-26 years', '27-29 years', '30 or more years'],
                colormap=cm.inferno_r);

In [None]:
years_coding_prof_df.loc[:, ['0-2 years', '3-5 years', '6-8 years', '9-11 years', '12-14 years', '15-17 years',
                       '18-20 years', '21-23 years', '24-26 years', '27-29 years', '30 or more years']].style.apply(highlighter, axis=1)

> - **If we calculate the percentage of developers who have been coding for 0-5 years, it is more than 78% in India whereas it is ~48% in the US, ~46% in the UK, ~55% in Germany and ~57% in the entire world taken together.**

**Thus, we may conclude that India has an incredibly young developer population.**

- *__It probably results in Indians mainly contributing to the younger workforce of companies, occupying mostly the entry-level jobs, which are most probably paid lesser than the more experienced roles.__*
- *It also means that there is a large pool of young developer talent in India.*
- *I believe that the major reason behind this stark variation in developer age and experience between India and the other groups taken into consideration here, is that India, being a developing country, was introduced to computing technology much later than the others. This resulted in India producing very few programmers in the 20th century or even in the early 2000s.*
- *It also means that the job market for the junior developer roles is incredibly crowded here in India. This might explain why the Indian developers are particularly competitive in nature..*


## Competitive nature:<a class="anchor" id="competitive"></a>

In [None]:
competitive_df = response_overall('AgreeDisagree2')
competitive_df.columns.name = 'Competing with peers'
competitive_df.index.name = 'Country'

In [None]:
plot_sequential(df=competitive_df, feature='AgreeDisagree2', 
                order=['Strongly agree', 'Agree', 'Neither Agree nor Disagree', 'Disagree', 'Strongly disagree'],
                colormap=cm.viridis_r);

In [None]:
competitive_df[['Strongly agree', 'Agree', 'Neither Agree nor Disagree', 'Disagree', 'Strongly disagree']].style.apply(highlighter, axis=1)

> **~52% Indian developers agree that they think that they are competing with their peers and another 25% are neutral on the topic.**

- *Since a large majority of Indian developers are just getting started, we can see some differences in the hopes that Indian developers have 5 years from now..*

## Ambitions for career growth:<a class="anchor" id="ambitious"></a>

In [None]:
hope_five_years_df = response_overall('HopeFiveYears')
hope_five_years_df.columns.name = 'Hope 5 years from now'
hope_five_years_df.index.name = 'Country'

In [None]:
order_hopes = ["Working in a different or more specialized technical role than the one I'm in now",
               "Doing the same work",
               "Working as a product manager or project manager",
               "Working as an engineering manager or other functional manager",
               "Retirement",
               "Working as a founder or co-founder of my own company",
               "Working in a career completely unrelated to software development"]
ax = hope_five_years_df.loc[:, order_hopes].plot.bar(figsize=(16, 8), 
                                                     stacked=True, 
                                                     legend=True,
                                                     colormap=cm.Set2)
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_title(what_is(['HopeFiveYears'])[0], fontdict={'weight': 'bold'});

In [None]:
hope_five_years_df.style.apply(highlighter, axis=1)

> - **India has a higher percentage of responders who want to work in a more specialised technical role than they are in right now (_lowest band in the above figure_).**
> - **Unsurprisingly, it also has a lower percentage of responders who want to keep doing the same work (_orange band in the above figure_).**
> - **There is also a larger percentage of Indians who wish to be product/project managers in the next 5 years.**

# Development Life:<a class="anchor" id="devlife"></a>

## Developer types:<a class="anchor" id="devtyes"></a>

In [None]:
dev_type_count_df = response_overall_moc('DevType')
dev_type_count_df.columns.name = 'Developer Type'
dev_type_count_df.index.name = 'Country'
dev_type_count_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = dev_type_count_df.T.plot.barh(figsize=(16, 24));
ax.set_title("Developer Type", fontdict={'size': 20, 'weight': 'bold'});
ax.set_xlabel("Percentage", fontdict={'size': 20});
ax.set_ylabel("Developer Types", fontdict={'size': 20});
ax.tick_params(axis='both', labelsize=16);
ax.legend(prop={'size': 18});

In [None]:
dev_type_count_df.loc[:, list(reversed(dev_type_count_df.columns))].style.apply(highlighter, axis=1)

> - **Just like the world average and the other 3 countries in the analysis, more than half of the responders work as a `Back-end developers` in India too.**
> - **India's orange bar sticks out in the `Mobile Developer` category: it has a much larger percentage of `Mobile developers` as compared to the world average and the other 3 countries taken into consideration.**

In fact, India has the largest number of mobile developers in the world:-

### Large number of mobile developers:<a class="anchor" id="mobdev"></a>

In [None]:
mob_dev_count = (pd.DataFrame(survey_df.loc[survey_df['DevType_Mobile developer'] == True, :]
                 .groupby('Country')['DevType_Mobile developer'].count())
                 .unstack()
                 .reset_index().loc[:, ['Country', 0]]
                 .sort_values(axis=0, by=0, ascending=False))
            
mob_dev_count.columns = ['Country', 'Count']
mob_dev_count.reset_index(drop=True, inplace=True)

In [None]:
import squarify
import random

plt.figure(figsize=(18, 9))
cmap = cm.get_cmap(name='Oranges')
color = [cmap(random.random()) for i in range(50)]

ax = squarify.plot(label=mob_dev_count.iloc[:50, :]['Country'], 
                   sizes=mob_dev_count.iloc[:50, :]['Count'], 
                   value=mob_dev_count.iloc[:50, :]['Count'], 
                   color=color);
ax.set_title("Top 50 countries with the max number of mobile developers", fontdict={'weight': 'bold'});
ax.set_axis_off()

### Age and experience of those Indian Mobile developers:<a class="anchor" id="expMobdev"></a>

In [None]:
fig, (ax_age, ax_exp) = plt.subplots(nrows=1, ncols=2, figsize=(24, 12))

survey_df_india.loc[survey_df_india['DevType'] == 'Mobile developer', 'Age']\
.value_counts(normalize=True)\
.plot.pie(ax=ax_age, legend=True, labels=None, colormap=cm.Paired, autopct='%1.1f%%');
ax_age.set_title("Distribution of Indian mobile developers according to age", 
                 fontdict={'size': 18, 'weight': 'bold'});
#ax_age.legend(prop={'size': 16});

survey_df_india.loc[survey_df_india['DevType'] == 'Mobile developer', 'YearsCoding']\
.value_counts(normalize=True)\
.plot.pie(ax=ax_exp, legend=True, labels=None, colormap=cm.Accent, autopct='%1.1f%%');
ax_exp.set_title("Distribution of Indian mobile developers according to coding experience", 
                 fontdict={'size': 18, 'weight': 'bold'});
#ax_exp.legend(prop={'size': 16});

*A large majority of the Indian population might have entirely skipped the PC revolution and leapfrogged to smartphones as their first computing devices. Hence there is much excitement around smartphones and the mobile apps. I think that might be a major reason behind the interest that we see in mobile app development in India.*

## IDE:<a class="anchor" id="ide"></a>

In [None]:
ide_df = response_overall_moc('IDE')
ide_df.columns.name = 'IDE'
ide_df.index.name = 'Country'
ide_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = ide_df.T.plot.barh(figsize=(16, 24))
ax.set_xlim(0, 100);
ax.set_title(what_is(['IDE'])[0], fontdict={'size': 14, 'weight': 'bold'})
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("IDE", fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=14);

In [None]:
ide_df.loc[:, list(reversed(ide_df.columns))].style.apply(highlighter, axis=1)

> - **The IDEs which are particularly popular amongst Indian developers are - `Notepad++`, `Sublime Text`, `Eclipse`, `Android Studio` and `Netbeans`.**

- *The popularity of `Android Studio` as an IDE can be attributed to the large number of Mobile developers in India as [we saw earlier].*
- *`Eclipse` and `Netbeans` are mostly used for Java programming. Java is the primary language used in mobile development for the Android platform. The popularity of these IDEs amongst Indian developers reflects the popularity of Java as a programming language in India.*

Now, let's see how the excitement around mobile development affects the choice of platforms, languages and frameworks for Indian developers..

## Platforms:<a class="anchor" id="platforms"></a>

In [None]:
platform_worked_with_df = response_overall_moc('PlatformWorkedWith')
platform_worked_with_df.columns.name = 'Platforms Worked With'
platform_worked_with_df.index.name = 'Country'

In [None]:
platform_desire_df = response_overall_moc('PlatformDesireNextYear')
platform_desire_df.columns.name = 'Platforms desire next year'
platform_desire_df.index.name = 'Country'

In [None]:
def get_prefs(worked_with_df, desire_df, country):
    preferences = pd.DataFrame(columns=desire_df.columns)
    preferences.loc['Worked', :] = worked_with_df.loc[country, :]
    preferences.loc['Desire', :] = desire_df.loc[country, :]
    preferences.loc['Interest', :] = preferences.loc['Desire', :] - \
                                                     preferences.loc['Worked', :]
    preferences.sort_values(by='Worked', axis=1, ascending=False, inplace=True)
    
    return preferences

In [None]:
def plot_preferences(preferences_df, ax, title, ylim, tick_labelsize, xlabel):
    preferences_df.loc[['Worked', 'Desire'], :].T.plot.line(ax=ax, style='*-')
    ax.set_xticks(range(len(list(preferences_df.columns))));
    ax.set_xticklabels(list(preferences_df.columns), rotation=90);
    ax.set_ylim(0, ylim);
    ax.set_title(title, fontdict={'size': 24, 'weight': 'bold'});
    ax.set_ylabel("Percentage of responders", fontdict={'size': 22});
    ax.set_xlabel(xlabel, fontdict={'size': 22});
    ax.grid(True)
    ax.tick_params(axis='both', labelsize=tick_labelsize)
    ax.legend(prop={'size': 18});

In [None]:
def show_preferences(worked_with_df, desire_df, xlabel, ylim=100, tick_labelsizes=(14, 12)):
    preferences_df_dict = {}
    preferences_df_dict['India'] = get_prefs(worked_with_df, desire_df, 'India')
    preferences_df_dict['World'] = get_prefs(worked_with_df, desire_df, 'World')
    preferences_df_dict['US'] = get_prefs(worked_with_df, desire_df, 'US')
    preferences_df_dict['UK'] = get_prefs(worked_with_df, desire_df, 'UK')
    preferences_df_dict['Germany'] = get_prefs(worked_with_df, desire_df, 'Germany')
    
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(24, 8))
    plot_preferences(preferences_df_dict['India'], axes[0], title='India',
                     ylim=ylim, tick_labelsize=tick_labelsizes[0], xlabel=xlabel)
    plot_preferences(preferences_df_dict['World'], axes[1], title='World',
                     ylim=ylim, tick_labelsize=tick_labelsizes[0], xlabel=xlabel)

    fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(28, 8))
    plot_preferences(preferences_df_dict['US'], axes[0], title='US',
                     ylim=ylim, tick_labelsize=tick_labelsizes[1], xlabel=xlabel)
    plot_preferences(preferences_df_dict['UK'], axes[1], title='UK',
                     ylim=ylim, tick_labelsize=tick_labelsizes[1], xlabel=xlabel)
    plot_preferences(preferences_df_dict['Germany'], axes[2], title='Germany',
                     ylim=ylim, tick_labelsize=tick_labelsizes[1], xlabel=xlabel)

In [None]:
show_preferences(worked_with_df=platform_worked_with_df, desire_df=platform_desire_df, 
                 xlabel='Platforms', ylim=60, tick_labelsizes=(16, 14))

In [None]:
platform_worked_with_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

In [None]:
platform_desire_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

- **Almost 2 out of every 5 Indian developers has worked on the Android platform.**
- **More Indian developers have developed for the `Android` and `Firebase` platforms than for `Windows Desktop or Server`. This is unlike the world average and the other 3 countries taken into consideration.**
- **Note that `Firebase` is nowhere near the top 3 spots in any other group except India.**
- **This consolidates the popularity of mobile development among Indian developers as both `Android` and `Firebase`, platforms related to mobile development, are amongst the top 3 most popular platforms among Indians.**
- **And, the interest in them is still growing because even larger percentages of developers want to work on them next year!**
- **Top 3 platforms which have seen a _maximum increase in interest_ among Indian developers are - **
    - `Google Cloud Platform/App Engine`
    - `Raspberry Pi` 
    - `Amazon Echo`
- **Top 3 platforms for which the Indian developers _don't want to work_ anymore are - **
    - `Windows Desktop or Server`
    - `Wordpress` 
    - `Linux`
    
    **Apart from them, every other platform has a net positive interest among the Indian developers.**

In [None]:
platform_preferences_df = get_prefs(platform_worked_with_df, platform_desire_df, 'India').sort_values(by='Interest', axis=1, ascending=False)
platform_preferences_df.columns.name = 'Platforms: Indian devs'
platform_preferences_df

## Languages:<a class="anchor" id="languages"></a>

In [None]:
languages_worked_with_df = response_overall_moc('LanguageWorkedWith')
languages_worked_with_df.columns.name = 'Languages Worked With'
languages_worked_with_df.index.name = 'Country'

In [None]:
languages_desire_df = response_overall_moc('LanguageDesireNextYear')
languages_desire_df.columns.name = 'Languages Desire Next Year'
languages_desire_df.index.name = 'Country'

In [None]:
show_preferences(worked_with_df=languages_worked_with_df, desire_df=languages_desire_df, 
                 xlabel='Languages', ylim=100, tick_labelsizes=(16, 14))

In [None]:
languages_worked_with_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

In [None]:
languages_desire_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

- **`Java` is the most popular language amongst Indian developers (_unlike the world average and the other countries taken where `C` is more popular_). This is in coherence with our earlier conclusion where in we found that India has an unusually large percentage of developers developing for the `Android` and the `Firebase` platforms (both having `Java` as their primary programming language).**
- **There is a clearly increasing interest in the `Python` programming language in India. Not a lot of Indian developers have worked with `Python` (29%; world average - 38%) yet but a lot of them are interested in using it next year (50%; world average - 44%).**
- **Top 3 languages which have seen a _maximum increase in interest_ among Indian developers are - **
    1. `Python`
    2. `Kotlin` 
    3. `R`
- **Top 3 languages which the developers _don't want to work on_ anymore are - **
    1. `HTML`
    2. `CSS` 
    3. `C`

In [None]:
language_preferences_df = get_prefs(languages_worked_with_df, languages_desire_df, 'India').sort_values(by='Interest', axis=1, ascending=False)
language_preferences_df.columns.name = 'Languages: Indian devs'
language_preferences_df

## Frameworks:<a class="anchor" id="frameworks"></a>

In [None]:
framework_worked_with_df = response_overall_moc('FrameworkWorkedWith')
framework_worked_with_df.columns.name = 'Framework Worked With'
framework_worked_with_df.index.name = 'Country'

In [None]:
framework_desire_df = response_overall_moc('FrameworkDesireNextYear')
framework_desire_df.columns.name = 'Framework Desire Next Year'
framework_desire_df.index.name = 'Country'

In [None]:
show_preferences(worked_with_df=framework_worked_with_df, desire_df=framework_desire_df, 
                 xlabel='Frameworks', ylim=60, tick_labelsizes=(16, 14))

In [None]:
framework_worked_with_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

In [None]:
framework_desire_df.sort_values(by='India', axis=1, ascending=False).style.apply(highlighter, axis=1)

- **Javascript frameworks - `Node`, `Angular` and `React` - are the most popular frameworks, both according to the percentage of developers already using them and those who want to use them next year.**
- **`Angular` has gained a net positive interest amongst Indian developers. This is in contrast with the world average and the other 3 countries taken into consideration, where the interest in `Angular` has decreased.**
- **Looking at the numbers, one can see that Microsoft's `.NET Core` framework is much less popular in India as compared to the rest of the world and the other 3 countries. Only ~18% of Indian developers have worked on it as compared to the world average of ~27% and only ~16% Indian devs desire to work on it next year as compared to the world average of ~28%. The differences are even more stark for US or UK.**
- **Top 3 frameworks which have seen a _maximum increase in interest_ among Indian developers are - **
    1. `React`
    2. `Tensorflow` 
    3. `Hadoop`
- **Top 3 frameworks which the Indian developers _don't want to use anymore_ are - **
    1. `Spring`
    2. `.NET Core` 
    3. `Cordova`
    
    **Apart from them, every other framework has a net positive interest among the Indian developers.**

In [None]:
framework_preferences_df = get_prefs(framework_worked_with_df, framework_desire_df, 'India').sort_values(by='Interest', axis=1, ascending=False)
framework_preferences_df.columns.name = 'Frameworks: Indian devs'
framework_preferences_df

# Communication Tools:<a class="anchor" id="commTools"></a>

Now, let's see what communication tools do the developers use at work..

In [None]:
comm_tools = response_overall_moc('CommunicationTools')
comm_tools.columns.name = 'Communication Tools'
comm_tools.index.name = 'Country'
comm_tools.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = comm_tools.T.plot.barh(figsize=(14, 24))
ax.set_xlim(0, 100);
ax.set_title(what_is(['CommunicationTools'])[0].split(':')[0], fontdict={'size': 14, 'weight': 'bold'})
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Communication tools", fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=18);
ax.legend(prop={'size': 16});

In [None]:
comm_tools.loc[:, list(reversed(comm_tools.columns))].style.apply(highlighter, axis=1)

> - **`Stack Overflow Enterprise` is considerably more popular in India as compared to the other groups. The percentage of developers using`StackOverflow Enterprise` in India is ~4 times more as compared to the entire world, almost 8 times more as compared to US, ~12 times more as compared to UK and ~13 times more as compared to Germany.**
> - **A larger proportion of Indian developers use `Google Hangouts/Chat` and `Facebook` as compared to the other groups.** 

## Programming Methodology:<a class="anchor" id="methodology"></a>

Now, let's see if the Indian developers differ in their choice of programming methodologies..

In [None]:
methodology_df = response_overall_moc('Methodology')
methodology_df.columns.name = 'Programming methodology'
methodology_df.index.name = 'Country'
methodology_df.sort_values(by='India', axis=1, inplace=True)

In [None]:
ax = methodology_df.T.plot.barh(figsize=(16, 24))
ax.set_xlim(0, 100);
ax.set_title(what_is(['Methodology'])[0], fontdict={'size': 14, 'weight': 'bold'})
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Programming methodology", fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=18);
ax.legend(prop={'size': 16});

In [None]:
methodology_df.loc[:, list(reversed(methodology_df.columns))].style.apply(highlighter, axis=1)

> - **The orange bar of India is much shorter for `Scrum`, `Kanban` and `Pair Programming`. These methodologies aren't as popular in India as they are in the entire world and the other countries taken into consideration.**
> - **It is interesting to see that even though [Pair Programming](https://en.wikipedia.org/wiki/Pair_programming) is much less popular in India, [Mob Programming](https://en.wikipedia.org/wiki/Mob_programming) (a more extreme form of `Pair programming`) is more popular here than the other groups.**

## "We are a little light on Ethics" :<a class="anchor" id="ethics"></a>

As noted earlier, the survey had a few questions about ethics in coding. Let's see how the Indian developers differ from the others on that front..

### Responsibility of unethical code:<a class="anchor" id="responsibilityEthics"></a>

In [None]:
ethics_responsible_df = response_overall('EthicsResponsible')
ethics_responsible_df.columns.name = 'Responsibility of unethical code'
ethics_responsible_df.index.name = 'Country'

In [None]:
ax = ethics_responsible_df.T.plot.bar(figsize=(16, 8))
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
ax.set_ylim(0, 100);
ax.set_title(what_is(['EthicsResponsible'])[0], fontdict={'size': 16, 'weight': 'bold'});
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xlabel("");
ax.legend(prop={'size': 16});
ax.tick_params(axis='both', labelsize=13);

In [None]:
ethics_responsible_df.style.apply(highlighter, axis=1)

> - **While almost 60% of the developers in all the other groups would hold upper management responsible for unethical code, only 40% of the developers in India do so.**
> - **A much larger proportion of Indian devs as compared to the devs in the other groups would hold the person who came up with the idea of unethical code or the developer who wrote it as the one responsible.**

### Write Unethical code?<a class="anchor" id="write"></a>

In [None]:
ethics_choice_df = response_overall('EthicsChoice')
ethics_choice_df.columns.name = 'Write Unethical Code?'
ethics_choice_df.index.name = 'Country'

In [None]:
ax = plot_sequential(ethics_choice_df, feature='EthicsChoice', colormap=cm.magma_r);
ax.tick_params(axis='both', labelsize=18);

In [None]:
ethics_choice_df.style.apply(highlighter, axis=1)

> - **A majority of the Indian developers are ready to write code that they themselves consider unethical.**

### Consider ethical implications of your code?<a class="anchor" id="ethicalImplications"></a>

In [None]:
ethical_implications_df = response_overall('EthicalImplications')
ethical_implications_df.columns.name = 'Consider ethical implications of your code?'
ethical_implications_df.index.name = 'Country'

In [None]:
ax = plot_sequential(ethical_implications_df, 'EthicalImplications', colormap=cm.magma_r);
ax.tick_params(axis='both', labelsize=18);

In [None]:
ethical_implications_df.style.apply(highlighter, axis=1)

> - **Only ~63% of Indian developers believe that they need to consider the ethical implications of the code that they write.**

*Perhaps its a bit difficult to consider the ethics when you have a low paying jobs or, well sometimes, even no jobs.*

*Imagine being in a country with a high unemployment rate, low salaries and a highly competitive job market. And then, imagine being asked to write an unethical code at the job that you have or risk making your boss unhappy and losing the next big promotion to your co-worker who agrees to doing it or maybe, even risk losing the job itself! So, I believe that if a developer agrees to write unethical code, it does not necessarily mean that he/she is an unethical person.* 

*Its the same dilemma as, "Would you steal a bread to feed your hungry family?" The correct choice isn't always black and white.*

## "The most popular reason for attending hackathons is different for Indians" :<a class="anchor" id="hackathon"></a>

In [None]:
hackathon_reasons_df = response_overall_moc('HackathonReasons')
hackathon_reasons_df.columns.name = 'Hackathon Reasons'
hackathon_reasons_df.index.name = 'Country'
hackathon_reasons_df.style.apply(highlighter, axis=1)

- **Most popular reason why Indian devs attend hackathons is _"To improve my general technical skills or programming ability"_ whereas if we take the whole world, the most popular reason is _"Because I find it enjoyable"_.**

# AI opinions:<a class="anchor" id="ai"></a>

There were a few questions in the survey relating to Artificial Intelligence. 

2 of them were:

In [None]:
what_is(['AIDangerous', 'AIInteresting'])

Both of them had the same set of options, viz.

In [None]:
list(survey_df['AIDangerous'].value_counts().index)

I have broken these options down below to see compare where a country's majority lies: 

Are they interested and hopeful for that aspect of advancing AI or are they more afraid of it and consider it a danger?

In [None]:
dangerous_ai_df = response_overall('AIDangerous')
interesting_ai_df = response_overall('AIInteresting')

## Automation of jobs:<a class="anchor" id="automation"></a>

In [None]:
automation = pd.DataFrame(columns=['Interesting', 'Dangerous'], index=list(dangerous_ai_df.index))
automation.loc[:, 'Interesting'] = interesting_ai_df \
    .loc[:, 'Increasing automation of jobs']
automation.loc[:, 'Dangerous'] = dangerous_ai_df \
    .loc[:, 'Increasing automation of jobs']

automation.columns.name = 'Automation of Jobs'
automation.index.name = 'Country'

In [None]:
ax = automation.plot.bar(figsize=(16, 6))
ax.set_title('Increasing automation of jobs', fontdict={'weight': 'bold'});
ax.set_ylabel('Percentage', fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=12);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);

In [None]:
automation.style.apply(highlighter, axis=1)

> - **Potential automation of jobs by AI is not so interesting to us Indians. We are more likely to put it in the "Dangerous" bucket.**  
> **This is unlike the rest of the world taken as a whole as well as the other 3 top responding countries, where a larger proportion of the population is interested in it.**

*This shows that Indians are much more afraid of losing their jobs to AI than people in the other groups. This might be a reflection on the high unemployment rates and a highly competitive job market for the young Indian developers.* 

## Algorithms making important decisions:<a class="anchor" id="decisions"></a>

In [None]:
imp_decisions = pd.DataFrame(columns=['Interesting', 'Dangerous'], index=list(dangerous_ai_df.index))
imp_decisions.loc[:, 'Interesting'] = interesting_ai_df \
    .loc[:, 'Algorithms making important decisions']
imp_decisions.loc[:, 'Dangerous'] = dangerous_ai_df \
    .loc[:, 'Algorithms making important decisions']

imp_decisions.columns.name = 'Algorithms making important decisions'
imp_decisions.index.name = 'Country'

In [None]:
ax = imp_decisions.plot.bar(figsize=(16, 6))
ax.set_title('Algorithms making important decisions', fontdict={'weight': 'bold'});
ax.set_ylabel('Percentage', fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=12);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);

In [None]:
imp_decisions.style.apply(highlighter, axis=1)

> - **Indians are much more comfortable with a future in which AI is making important decisions for them than any of the other 3 countries taken into consideration or the world taken as a whole.**

## Singularity:<a class="anchor" id="singularity"></a>

In [None]:
singularity = pd.DataFrame(columns=['Interesting', 'Dangerous'], index=list(dangerous_ai_df.index))
singularity.loc[:, 'Interesting'] = interesting_ai_df \
    .loc[:, 'Artificial intelligence surpassing human intelligence ("the singularity")']
singularity.loc[:, 'Dangerous'] = dangerous_ai_df \
    .loc[:, 'Artificial intelligence surpassing human intelligence ("the singularity")']

singularity.columns.name = 'Singularity'
singularity.index.name = 'Country'

In [None]:
ax = singularity.plot.bar(figsize=(16, 6))
ax.set_title('Artificial intelligence surpassing human intelligence ("the singularity")', fontdict={'weight': 'bold'});
ax.set_ylabel('Percentage', fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=12);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);

In [None]:
singularity.style.apply(highlighter, axis=1)

> - **Indians are more worried about a "singularity"-type situation than the other groups are.**

## Evolving definitions of "fairness" in algorithmic vs. human decisions:<a class="anchor" id="fairness"></a>

In [None]:
fairness = pd.DataFrame(columns=['Interesting', 'Dangerous'], index=list(dangerous_ai_df.index))
fairness.loc[:, 'Interesting'] = interesting_ai_df \
    .loc[:, 'Evolving definitions of "fairness" in algorithmic versus human decisions']
fairness.loc[:, 'Dangerous'] = dangerous_ai_df \
    .loc[:, 'Evolving definitions of "fairness" in algorithmic versus human decisions']

fairness.columns.name = 'Evolving definitions of fairness'
fairness.index.name = 'Country'

In [None]:
ax = fairness.plot.bar(figsize=(16, 6))
ax.set_title('Evolving definitions of "fairness" in algorithmic versus human decisions', fontdict={'weight': 'bold'});
ax.set_ylabel('Percentage', fontdict={'size': 18});
ax.tick_params(axis='both', labelsize=12);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);

In [None]:
fairness.style.apply(highlighter, axis=1)

> - **Indians are not as worried about the evolving definitions of "fairness" in algorithmic vs. human decision as the world or the other 3 countries taken into consideration are.**

## Responsibility:<a class="anchor" id="responsibility"></a>

In [None]:
responsibility_ai = response_overall("AIResponsible")
responsibility_ai.columns.name = 'Responsibility for AI'
responsibility_ai.index.name = 'Country'

ax = responsibility_ai.plot.bar(figsize=(16, 6), stacked=True, colormap=cm.Set3);
ax.set_title(what_is(['AIResponsible'])[0], fontdict={'weight': 'bold'});
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);

In [None]:
responsibility_ai.style.apply(highlighter, axis=1)

> - **Indians have lesser faith in `A governmental or other regulatory body` to consider the ramifications of AI as compared to the world average and the other top 3 responding countries.**

# We are pretty excited about all new hypothetical tools:<a class="anchor" id="hypoTools"></a>

## A peer mentoring system:<a class="anchor" id="peer"></a>

In [None]:
peer_mentoring_sys = response_overall("HypotheticalTools1")
peer_mentoring_sys.columns.name = 'Peer mentoring system'
peer_mentoring_sys.index.name = 'Country'
peer_mentoring_sys.style.apply(highlighter, axis=1)

In [None]:
ax = plot_sequential(df=peer_mentoring_sys, feature='HypotheticalTools1', 
                    order=['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                           'Extremely interested'],
                    colormap=cm.magma, horizontal=True)
ax.tick_params(axis='both', labelsize=16);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Responses", fontdict={'size': 18});
ax.legend(prop={'size': 14});
ax.set_title("Interest in a peer mentoring system", fontdict={'size': 18, 'weight': 'bold'});

## A private area for people new to programming:<a class="anchor" id="newbie"></a>

In [None]:
newbie_private_area = response_overall("HypotheticalTools2")
newbie_private_area.columns.name = 'Newbie private area'
newbie_private_area.index.name = 'Country'
newbie_private_area.style.apply(highlighter, axis=1)

In [None]:
ax = plot_sequential(df=newbie_private_area, feature='HypotheticalTools2', 
                order=['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                       'Extremely interested'],
                colormap=cm.magma, horizontal=True)
ax.tick_params(axis='both', labelsize=16);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Responses", fontdict={'size': 18});
ax.legend(prop={'size': 14});
ax.set_title("Interest in a private area for people new to programming", fontdict={'size': 18, 'weight': 'bold'});

## A programming oriented blog platform:<a class="anchor" id="progBlog"></a>

In [None]:
prog_blog = response_overall("HypotheticalTools3")
prog_blog.columns.name = 'Programming blog platform'
prog_blog.index.name = 'Country'
prog_blog.style.apply(highlighter, axis=1)

In [None]:
ax = plot_sequential(df=prog_blog, feature='HypotheticalTools3', 
                    order=['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                           'Extremely interested'],
                    colormap=cm.magma, horizontal=True)
ax.tick_params(axis='both', labelsize=16);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Responses", fontdict={'size': 18});
ax.legend(prop={'size': 14});
ax.set_title("Interest in a programming oriented blog platform", fontdict={'size': 18, 'weight': 'bold'});

## An employer or job review system:<a class="anchor" id="reviewJob"></a>

In [None]:
job_review = response_overall("HypotheticalTools4")
job_review.columns.name = 'Job review system'
job_review.index.name = 'Country'
job_review.style.apply(highlighter, axis=1)

In [None]:
ax = plot_sequential(df=job_review, feature='HypotheticalTools4', 
                order=['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                       'Extremely interested'],
                colormap=cm.magma, horizontal=True)
ax.tick_params(axis='both', labelsize=16);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Responses", fontdict={'size': 18});
ax.legend(prop={'size': 14});
ax.set_title("Interest in an employer or job review system", fontdict={'size': 18, 'weight': 'bold'});

## An area for Q&A related to career growth:<a class="anchor" id="careerGrowth"></a>

In [None]:
career_growth_qa = response_overall("HypotheticalTools5")
career_growth_qa.columns.name = 'Career growth Q&A'
career_growth_qa.index.name = 'Country'
career_growth_qa.style.apply(highlighter, axis=1)

In [None]:
ax = plot_sequential(df=career_growth_qa, feature='HypotheticalTools5', 
                order=['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                       'Extremely interested'],
                colormap=cm.magma, horizontal=True)
ax.tick_params(axis='both', labelsize=16);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_ylabel("Responses", fontdict={'size': 18});
ax.legend(prop={'size': 14});
ax.set_title("Interest in an area for Q&A related to career growth", fontdict={'size': 18, 'weight': 'bold'});

## \*\* *An Aside* \*\*<a class="anchor" id="aside"></a>

### Relation between interest in hypothetical tools and coding experience:

*Maybe, all this excitement has something to do with the youth of the Indian developers that we saw [earlier in this analysis](#youth). Let's see if there is a correlation between coding experience and the interest in new tools..* 

*I have grouped all the developers according to their years of coding experience (`YearsCoding`) and calculated the percentages of developer belonging to different interest levels for every experience group. Here is a heatmap representing those percentages:*

In [None]:
def get_tools_vs_years_coding(tool):
    df = survey_df.groupby('YearsCoding')[tool].value_counts(normalize=True).unstack()
    df.columns = df.columns.get_level_values(0)
    df *= 100
    return df.loc[['0-2 years', '3-5 years', '6-8 years', '9-11 years', '12-14 years', '15-17 years',
                           '18-20 years', '21-23 years', '24-26 years', '27-29 years', '30 or more years'], 
                  ['Not at all interested', 'A little bit interested', 'Somewhat interested', 'Very interested',
                           'Extremely interested']]

In [None]:
hypo1_df = get_tools_vs_years_coding('HypotheticalTools1')
hypo2_df = get_tools_vs_years_coding('HypotheticalTools2')
hypo3_df = get_tools_vs_years_coding('HypotheticalTools3')
hypo4_df = get_tools_vs_years_coding('HypotheticalTools4')
hypo5_df = get_tools_vs_years_coding('HypotheticalTools5')

In [None]:
titles = [desc.split('.')[-1] for desc in what_is(["HypotheticalTools"+str(i) for i in range(1, 6)])]

fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(16, 6))
sns.heatmap(hypo1_df, ax=axes[0], annot=True, vmin=0, vmax=35, cmap=cm.viridis)
sns.heatmap(hypo2_df, ax=axes[1], annot=True, vmin=0, vmax=35, cmap=cm.viridis)
axes[0].set_title(titles[0], fontdict={'weight': 'bold'});
axes[1].set_title(titles[1], fontdict={'weight': 'bold'});
for ax in axes:
    ax.tick_params(axis='both', labelsize=16);
    ax.set_ylabel("YearsCoding", fontdict={'size': 18});
    ax.set_xlabel("", fontdict={'size': 18});

fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True, figsize=(20, 6))
sns.heatmap(hypo3_df, ax=axes[0], annot=True, vmin=0, vmax=35, cmap=cm.viridis)
sns.heatmap(hypo4_df, ax=axes[1], annot=True, vmin=0, vmax=35, cmap=cm.viridis)
sns.heatmap(hypo5_df, ax=axes[2], annot=True, vmin=0, vmax=35, cmap=cm.viridis)
axes[0].set_title(titles[2], fontdict={'weight': 'bold'});
axes[1].set_title(titles[3], fontdict={'weight': 'bold'});
axes[2].set_title(titles[4], fontdict={'weight': 'bold'});
for ax in axes:
    ax.tick_params(axis='both', labelsize=14);
    ax.set_ylabel("YearsCoding", fontdict={'size': 16});
    ax.set_xlabel("", fontdict={'size': 18});

Clearly, there is a strong correlation between the number of years one has been coding and his/her interest in the hypothetical tools.

- **Younger developers tend to be much more interested in new hypothetical tools than the older developers.**
- **This contrast is most striking for the tool - `"A private area for people new to programming"` (_1st row, 2nd figure_).  Its clear that young developers who are new to programming, want a private area for newbies. Maybe, its because the existing forums [like StackOverflow](https://meta.stackexchange.com/questions/9953/could-we-please-be-a-bit-nicer-to-new-users) can be [quite harsh](https://meta.stackoverflow.com/questions/269416/stack-overflow-seems-harsh-compared-to-programming-forums-ive-been-participatin) and [unwelcoming](https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/) for newcomers. However, the older developers are very uninterested in this tool.**

*Maybe its the [curse of knowledge](https://en.wikipedia.org/wiki/Curse_of_knowledge) or maybe the more experienced devs believe that the existing tools are fine and sufficient because, well, they have made a career out of the existing ones.*

# Not many Indian developers use / know about Ad Blockers:<a class="anchor" id="adblocker"></a>

In [None]:
ad_blocker_df = response_overall('AdBlocker')

In [None]:
ax = ad_blocker_df.T.plot.barh(figsize=(16, 6));
ax.set_xlim(0, 100);
ax.set_xlabel("Percentage", fontdict={'size': 18});
ax.set_title(what_is(['AdBlocker'])[0], fontdict={'weight': 'bold'});
ax.tick_params(axis='both', labelsize=16);
ax.legend(prop={'size': 16});
ax.set_ylabel("");

In [None]:
ad_blocker_df.columns.name = "Use Ad Blocker?"
ad_blocker_df.index.name = 'Country'
ad_blocker_df.style.apply(highlighter, axis=1)

- **_Okay, that's just sad._**

# Operating System:<a class="anchor" id="os"></a>

In [None]:
os = response_overall('OperatingSystem')
os.sort_values(by='India', axis=1, inplace=True)
os.columns.name = 'Operating System'
os.index.name = 'Country'

In [None]:
ax = os.T.plot.barh(figsize=(16, 8));
ax.set_title("Operating System", fontdict={'weight': 'bold'});
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='lower right');

In [None]:
os.style.apply(highlighter, axis=1)

> - **Linux-based Operating Systems are more popular in India as compared to the other groups.**
> - **MacOS isn't as popular amongst developers in India as it is in other groups.**

*__I suspect that a major reason behind these trends is, again, low-income rates of Indian developers.__*

- *Open-sourced, Linux-based OSes are free to use. I think that this is a major reason for their appeal to the Indian developers.*
- *Using a MacOS means owning a Mac device and Mac devices are expensive. Perhaps that is why it isn't as popular in India.*

# StackOverflow usage:<a class="anchor" id="so"></a>

In [None]:
so_visit_df = response_overall("StackOverflowVisit")
so_visit_df.columns.name = 'StackOverflow visits frequency'
so_visit_df.index.name = 'Country'

In [None]:
plot_sequential(so_visit_df, 'StackOverflowVisit',
                order=['Multiple times per day', 'Daily or almost daily', 'A few times per week',
                       'A few times per month or weekly', 'Less than once per month or monthly',
                       'I have never visited Stack Overflow (before today)'],
                colormap=cm.plasma_r)

In [None]:
so_visit_df.loc[:, ['Multiple times per day', 'Daily or almost daily', 'A few times per week',
                    'A few times per month or weekly', 'Less than once per month or monthly',
                    'I have never visited Stack Overflow (before today)']].style.apply(highlighter, axis=1)

> - **Indian developers use Stack Overflow a lot. Almost 44% of Indian responders visit it multiple times a day.**

*StackOverflow seems to have a strong appeal in India. Along with the large percentage of daily active users that we see here, it also has a large number of Enterprise level users as we [saw earlier](#commTools).*

# Exercise:<a class="anchor" id="exercise"></a>

In [None]:
exercise_df = response_overall("Exercise")
exercise_df.columns.name = '#Exercise / week'
exercise_df.index.name = 'Country'

In [None]:
plot_sequential(exercise_df, 'Exercise', colormap=cm.Oranges,
                order=['Daily or almost every day', '3 - 4 times per week', '1 - 2 times per week',
                       "I don't typically exercise"])

In [None]:
exercise_df.loc[:, ['Daily or almost every day', '3 - 4 times per week', '1 - 2 times per week',
                    "I don't typically exercise"]].style.apply(highlighter, axis=1)

> - **India has a larger percentage of developers who exercise daily as compared to the other groups.**
> - **India also has a slightly larger percentage of developers who typically don't exercise.**

*I can't think of any reason why India has a larger percentage in both the extremes over here. It is possible that there is no reason and that this is just an anomaly. I am not sure. Let me know what you think.*

# Sexual Orientation:<a class="anchor" id="sexOrient"></a>

In [None]:
sexual_orientation_df = response_overall_moc('SexualOrientation')
sexual_orientation_df.columns.name = 'Sexual Orientation'
sexual_orientation_df.index.name = 'Country'

In [None]:
ax = sexual_orientation_df.T.plot.bar(figsize=(16, 8));
ax.set_ylim(0, 100);
ax.set_ylabel("Percentage", fontdict={'size': 18});
ax.set_title("Sexual Orientation", fontdict={'weight': 'bold'});

In [None]:
sexual_orientation_df.style.apply(highlighter, axis=1)

> - **The percentage of developers identifying as `Gay or Lesbian` is very less (more than 4 times less) in India as compared to the world average or any of the other countries taken into consideration.**
> - **The proportion of developers who identify themselves as `Asexual` is almost twice as large in India as compared to all other groups.**

- *The Indian society is still very unaccepting of homosexuality. I suspect that is a reason why not a lot of people are able to come out of the closet in India.*
- *I am unable to think of any reason for the larger proportion of people identifying as `Asexual` in India. There may be no reason at all. Anyway, let me know what you think.*

# Conclusions and final thoughts:<a class="anchor" id="theEnd"></a>

- Indian developers have historically been paid lesser than their counterparts in developed countries by orders of magnitude. In this analysis we saw the same insights. We also saw that it is __not__ because the Indian developers don't have the necessary formal education. In fact, developers who have a traditional undergrad degree in a CS-related field are more common in India. It is also __not__ because the Indian developers are less interested in programming. 
- India had a late start in the world of computing but as we have seen in this analysis, its developers realised the potential of the last major revolution in tech - mobile apps. This makes me hopeful that we are catching up fast. Maybe, we will even play a pivotal role in defining the next major tech-revolution (*probably Artificial Intelligence*).
- Talking about Artificial Intelligence, Indian developers are afraid of losing their jobs to it. Is that just paranoia or do they believe that they are doing a job which can be automated using some software in the future?
   They are also more comfortable with allowing a software in the decision-making process. Is that a reflection of what they think about the current decision-makers in their company?
- Job losses or not, in this fast-evolving tech scene, everyone needs to have a spirit self-learning. We saw in this analysis that self-education without taking any course isn't popular in India. That must change.
- The developer community needs to make a conscious effort in on-boarding people from other disciplines. Software is getting more and more embedded in our lives each day and it is important that people from many different backgrounds are involved in building these softwares. This will especially become important as we move towards more capable AI systems so that they aren't biased.
- Finally, it seems like India might be an important market for Stack Overflow. Indian developers use the website very frequently, they are the largest group of users of their enterprise level communication tool and are extremely interested in whatever new hypothetical tool they wish to make.

*We saw in this analysis that there are very few experienced developers in India. This reflects how India missed the opportunity to play a defining role in the computing revolution that happened in the last century. But with the overwhelmingly large proportions of young developer talent, the current developer scene in the country looks very promising.*

# Other areas that I want to explore (*and that you can try too!*)<a class="anchor" id="curiousity"></a>

Doing this lengthy analysis has made me curious about a few other questions too. *__The exciting part is that it is possible to arrive at some satisfactory answers by analysing this very dataset!__* 

So, here I post all the questions that I wish to ask this dataset --

- **Do people learn by contributing to open-source projects?**
    I assumed that people do. But it turns out that not all of them do (like the 20% that we [saw in this kernel](#oss). So, it would be interesting to study this.
- **How do the opinions about AI differ?**
    In this kernel, we saw how the Indians have much different opinions about the various aspects of advancing AI tech. It would be interesting to draw comparisons in other groups based on age, developer jobs, education, frameworks, languages, salaries, etc.
- **What do the students think?**
    1 out of every 5 responder to this survey is a full-time student. I think that this provides a uniques opportunity to explore what's popular among the next generation of software developers (*myself included* :D). 
- **How Vim users differ from non-Vim users?**
    This is just because I use Vim. I love Vim. It would be fun to make something like [this excellent R vs. Python analysis](https://www.kaggle.com/nanomathias/predicting-r-vs-python).
- **A salary predictor**
    Stack Overflow recently launched a [salary calculator](https://stackoverflow.com/jobs/salary) using this very dataset. It might be fun to try and build a similar thing.
    
I plan on exploring some of these questions myself but feel free to go ahead and do your own analysis if you find any of the above questions interesting. If you found this analysis interesting, you can do a similar one for your own country.

It would be awesome if you could tag me in the comments or shoot me a tweet on [Twitter](https://twitter.com/nityeshaga) or a text on [LinkedIn](https://www.linkedin.com/in/nityeshaga/) when you (/if you) decide to make your work public.