## Preface

Finaly got an opportunity to take a look at what's going on inside in the life of developers and learners around the world. <br/>
Please **suggest me, correct me, walk next to me or upvote this kernel if you like this kernel or  love to motivate. ** <br/>
This will be an on going kernel, **stay tuned!** : ) <br/>
-**Mahbub** <br/>
-31, May, 2018

## Introduction

StackOverflow:
> Each year, we at Stack Overflow ask the developer community about everything from their favorite technologies to their job preferences. This year marks the eighth year we¡¯ve published our Annual Developer Survey results¡ªwith the largest number of respondents yet. Over 100,000 developers took the 30-minute survey in January 2018.
> 
> This year, we covered a few new topics ranging from artificial intelligence to ethics in coding. We also found that underrepresented groups in tech responded to our survey at even lower rates than we would expect from their participation in the workforce. Want to dive into the results yourself and see what you can learn about salaries or machine learning or diversity in tech? We look forward to seeing what you find!
> 

## Content

This 2018 Developer Survey results are organized on Kaggle in two tables:

survey_results_public contains the main survey results, one respondent per row and one column per question

survey_results_schema contains each column name from the main results along with the question text corresponding to that column

There are 98,855 responses in this public data release. These responses are what we consider ¡°qualified¡± for analytical purposes based on completion and time spent on the survey and included at least one non-PII question. Approximately 20,000 responses were started but not included here because respondents did not answer enough questions, or only answered questions with personally identifying information. Of the qualified responses, 67,441 completed the entire survey.


## Import Libraries

In [None]:
import numpy as np 
import pandas as pd 
import time
import matplotlib
import matplotlib.pyplot as plt 

import seaborn as sns
color = sns.color_palette()

import plotly.offline as py
py.init_notebook_mode(connected=True)
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
from __future__ import division
import plotly.offline as offline
offline.init_notebook_mode()
from plotly import tools

from mpl_toolkits.basemap import Basemap
from numpy import array
from matplotlib import cm
import cufflinks as cf
cf.go_offline()
from sklearn import preprocessing
import missingno as msno # to view missing values
from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords
from PIL import Image
import plotly.figure_factory as ff
from sklearn.decomposition import PCA, KernelPCA
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import pairwise_distances_argmin_min, pairwise_distances
from scipy.stats import norm

import squarify
import warnings
warnings.filterwarnings('ignore')
import os

import os
print(os.listdir("../input"))

## Load Data

In [4]:
%%time
survey_results_schema = pd.read_csv('../input/survey_results_schema.csv')
survey_results_public = pd.read_csv('../input/survey_results_public.csv')

## View Survey Results Schema Data

In [5]:
survey_results_schema.head(3)

## View Survey Results Public

In [35]:
survey_results_public.head(2)

## Missing Data

In [14]:
print(survey_results_public.isnull().sum())
msno.matrix(survey_results_public)
plt.show()

## Total Percentage of Missing Data (Survey Results Public)

In [15]:
# how many total missing values do we have?
missing_values_count = survey_results_public.isnull().sum()
total_cells = np.product(survey_results_public.shape)
total_missing = missing_values_count.sum()

# percent of data that is missing
print((total_missing/total_cells) * 100, '% of Missing Values in Survey Results Public')

## % of Missing Data in Columns (Survey Results Public)

In [26]:
# checking missing data in each survey results public column
total_missing = survey_results_public.isnull().sum().sort_values(ascending = False)
percentage = (survey_results_public.isnull().sum()/survey_results_public.isnull().count()*100).sort_values(ascending = False)
missing_survey_results_public = pd.concat([total_missing, percentage], axis=1, keys=['Total Missing (Column-wise)', 'Percentage (%)'])
missing_survey_results_public.head()

## Dendrogram of Survey Data Public

In [24]:
msno.dendrogram(survey_results_public)
plt.savefig('survey_results_public.png')
plt.show()

## Developers Country

In [7]:
# Step 1
# chart stages data
temp = survey_results_public['Country'].value_counts().head(5).sort_values(ascending=False)
values = temp.values
phases = temp.index
#values = [13873, 10553, 5443, 3703, 1708]
#phases = ['Visit', 'Sign-up', 'Selection', 'Purchase', 'Review']

# color of each funnel section
colors = ['rgb(32,155,160)', 'rgb(253,93,124)', 'rgb(28,119,139)', 'rgb(182,231,235)', 'rgb(35,154,160)']

# Shaping
n_phase = len(phases)
plot_width = 400

# height of a section and difference between sections 
section_h = 100
section_d = 10

# multiplication factor to calculate the width of other sections
unit_width = plot_width / max(values)

# width of each funnel section relative to the plot width
phase_w = [int(value * unit_width) for value in values]

# plot height based on the number of sections and the gap in between them
height = section_h * n_phase + section_d * (n_phase - 1)

# Step 3
# list containing all the plot shapes
shapes = []

# list containing the Y-axis location for each section's name and value text
label_y = []

for i in range(n_phase):
        if (i == n_phase-1):
                points = [phase_w[i] / 2, height, phase_w[i] / 2, height - section_h]
        else:
                points = [phase_w[i] / 2, height, phase_w[i+1] / 2, height - section_h]

        path = 'M {0} {1} L {2} {3} L -{2} {3} L -{0} {1} Z'.format(*points)

        shape = {
                'type': 'path',
                'path': path,
                'fillcolor': colors[i],
                'line': {
                    'width': 1,
                    'color': colors[i]
                }
        }
        shapes.append(shape)
        
        # Y-axis location for this section's details (text)
        label_y.append(height - (section_h / 2))

        height = height - (section_h + section_d)


In [13]:
# For phase names
label_trace = go.Scatter(
    x=[-350]*n_phase,
    y=label_y,
    mode='text',
    text=phases,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)
 
# For phase values
value_trace = go.Scatter(
    x=[350]*n_phase,
    y=label_y,
    mode='text',
    text=values,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

data = [label_trace, value_trace]
 
layout = go.Layout(
    title="<b>Top Countries on Stack Overflow</b>",
    titlefont=dict(
        size=20,
        color='rgb(203,203,203)'
    ),
    shapes=shapes,
    height=560,
    width=800,
    showlegend=False,
    paper_bgcolor='rgba(44,58,71,1)',
    plot_bgcolor='rgba(44,58,71,1)',
    xaxis=dict(
        showticklabels=False,
        zeroline=False,
    ),
    yaxis=dict(
        showticklabels=False,
        zeroline=False
    )
)

fig = go.Figure(data=data, layout=layout)
image='png' 
from IPython.display import Image
Image('a-simple-plot.png')
py.iplot(fig, filename='a-simple-plot')

## Most Dangerous Aspect of Increasingly Advanced AI Technology?
Survey Question:
> What do you think is the most dangerous aspect of increasingly advanced AI technology?

In [19]:
temp = survey_results_public['AIDangerous'].value_counts().head(10)

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Most Dangerous Aspect of Increasingly Advanced AI Technology?', 
         pull=.01,hole=.1,
         textposition='inside', 
         color = ['#B0CBE6', 'orange', 'blue', '#CCCCCC'],
         textinfo='percent')

## Most Exciting Aspect of Increasingly Advanced AI Technology
Survey Question:
> What do you think is the most exciting aspect of increasingly advanced AI technology?

In [21]:
temp = survey_results_public['AIInteresting'].value_counts().head(10)

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Most Exciting Aspect of Increasingly Advanced AI Technology?', 
         pull=.02,hole=.75,
         textposition='inside', 
         color = ['#B0CBE6', 'orange', 'blue', '#CCCCCC'],
         textinfo='percent')

## Developers Ethics
Survey question:
> Imagine that you were asked to write code for a purpose or product that you consider extremely unethical. Do you write the code anyway?

In [22]:
temp = survey_results_public['EthicsChoice'].value_counts().head(10)

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Would you write unethical code?', 
         pull=.05,hole=.2,
         textposition='outside', 
         color = ['#B0CBE6', 'orange', '#CCCCCC'],
         textinfo='percent')

## Ethics Report - Do you report or otherwise call out the unethical code in question??
Survey question:
> Do you report or otherwise call out the unethical code in question?

In [23]:
temp = survey_results_public['EthicsReport'].value_counts().head(10)
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Do you report or otherwise call out the unethical code in question?', 
         pull=.03,hole=.2,
         textposition='outside', 
         color = ['#B0CBE6', 'orange', 'blue', '#CCCCCC'],
         textinfo='percent')

## Ethics Responsible
Survey Question:
> Who do you believe is ultimately most responsible for code that accomplishes something unethical?

In [26]:
temp = survey_results_public['EthicsResponsible'].value_counts().head(10)
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Do you report or otherwise call out the unethical code in question?', 
         pull=.03,hole=.1,
         textposition='inside', 
         color = ['#B0CBE6', 'orange', 'blue', '#CCCCCC'],
         textinfo='percent')

## What are Top Languages
###### Survey Question:
> Which of the following programming, scripting, and markup languages have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the language and want to continue to do so, please check both boxes in that row.)

In [15]:
survey_results_public_language = survey_results_public['LanguageWorkedWith'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(survey_results_public_language.values, survey_results_public_language.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Languages', 
       title = "Top Languages")
plt.show()

## Language Desire Next Year

In [63]:
temp = survey_results_public['LanguageDesireNextYear'].value_counts().head(12).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Language Desire Next Year', 
       title = "Top Language Desire Next Year")
plt.show()

## Top Database

In [65]:
temp = survey_results_public['DatabaseWorkedWith'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Database Worked With', 
       title = "Top Database Worked With")
plt.show()

## Top Database Desire Next Year

In [66]:
temp = survey_results_public['DatabaseDesireNextYear'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Database Desire Next Year', 
       title = "Top Database Desire Next Year")
plt.show()

## Top Platform Worked With

In [67]:
temp = survey_results_public['PlatformWorkedWith'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Platform Worked With', 
       title = "Top Platform Worked With")
plt.show()

## Top Platform Desire Next Year

In [None]:
temp = survey_results_public['PlatformDesireNextYear'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Platform Desire Next Year', 
       title = "Top Platform Desire Next Year")
plt.show()

## Top Framework Worked With

In [None]:
temp = survey_results_public['FrameworkWorkedWith'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Framework Worked With', 
       title = "Top Framework Worked With")
plt.show()

## Framework Desired Next Year

In [None]:
temp = survey_results_public['FrameworkDesireNextYear'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Framework Desire Next Year', 
       title = "Top Framework Desire Next Year")
plt.show()

## Top IDE Used by Developers

In [None]:
temp = survey_results_public['IDE'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'IDE', 
       title = "Top Top IDE Used by Developers")
plt.show()

## Top Operating System

In [None]:
temp = survey_results_public['OperatingSystem'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Operating System', 
       title = "Top Operating Systems Used by Developers")
plt.show()

## Top Methodology Used by Developers

In [None]:
temp = survey_results_public['Methodology'].value_counts().head(10).sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(19,9))
sns.barplot(temp.values, temp.index, ax=ax)
ax.set(xlabel= 'Count', 
       ylabel = 'Methodology', 
       title = "Top Methodologies Used by Developers")
plt.show()

## Open-source Project Contribution

In [31]:
temp = survey_results_public['OpenSource'].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Open-source Project Contribution', 
         pull=.05,hole=.2,
         textposition='inside', 
         color = ['#CCCCCC', 'orange'],
         textinfo='percent+label')

## Code as A Hobby?

In [38]:
temp = survey_results_public['Hobby'].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Do you code as a hobby?', 
         pull=.001,
         hole=.7,
         textposition='outside', 
         color = ['blue', 'orange'],
         textinfo='percent+label')

## Student
> Are you currently enrolled in a formal, degree-granting college or university program?

In [46]:
temp = survey_results_public['Student'].value_counts().head(10)

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values })
df.iplot(kind='pie',
         labels='labels',
         values='values', 
         title='Student engagement in a formal college or university?', 
         pull=.01,
         hole=.05,
         textposition='outside', 
         color = ['#CCCCCC', 'orange', 'blue'],
         textinfo='label+percent')

In [9]:
#survey_results_public['Salary'].info()

In [63]:
#temp = survey_results_public[['Country','Salary','SalaryType']]
#temp[temp['SalaryType'] == 'Yearly']
#survey_results_public['ConvertedSalary']

## Stay Tuned!!!