<a href="https://colab.research.google.com/github/PreethikaShankar/ML_Algorithms/blob/main/kaggle_vs_morocco_state_of_dev_surveys.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Courtesy - https://www.kaggle.com/amr009/kaggle-vs-morocco-state-of-dev-surveys

In [None]:
import numpy as np 
import pandas as pd
import plotly.io as pio
import plotly.express as px
from plotly.subplots import make_subplots
from IPython.display import Markdown, display
import plotly.express as px
import plotly.graph_objects as go
import warnings

warnings.filterwarnings("ignore")
pio.templates.default = "presentation"

def content(text):
    """ This function allows you to output content in a common styling """
    
    my_response = "<div  style= 'background-color:rgb(247, 247, 247); border:1px solid rgb(207,207,207); border-color:rgb(107,107,107); padding: 10px'> \
    <span style='color: black;  font-family: medium-content-serif-font, Georgia, Cambria, 'Times New Roman', Times, serif; \
    font-weight: 400; letter-spacing: -0.004em; line-height: 1.58; '>"+ text + "</span></div>"
    display(Markdown(my_response))

__About the State of Dev survey:__<br>
The State of Dev survey is an initiative by the developer circles Morocco community to collect the feedback of software devlopers in Morocco :morocco: about how to move the IT community forward, and better respond to developers’ evolving need?

This Survey was composed of 49 questions grouped into 4 main areas: Education, Work, Tech and Community. It was launched October 2020 at BlaBlaConf 1.0 an online conference that gathered thousands of viewers during 5 days, and remained open for a full month window.

A a total of 2287 submissions were received. That's from Morocco only and it's close to the 2532 submissions from Stackoverflow's first survey back in 2011. It’s important to note that not all fields were mandatory, so the results and graphics may not reflect the respondents’ total number for every question.

As part of the developer circles Morocco community core principles, all collected data is anonymized. Raw results are also available under the BY-NC-SA 2.0 license. Same thing for the website code, we put everything on DevC-Casa GitHub organization.

__Notebook Highlights:__<br>
This notebook provides a begginer's friendly EDA that focuces on comparing the answers provided by Moroccan members to the Kaggle 2020 survey and those who answred the State of Dev.

Both surveys cover many areas in common, but with different design of questions and possibles answers.

The State of Dev syrvey has collected (~17x) more answers than the Kaggle survey from Moroccan memebers.
The participants are young: ~50% are less than 25 years.
Students are the common driving force in both surveys with ~30%.
The reported salary range from Data profiles is silghtly better than the general Moroccan IT field.

###Read SOD and Kaggle Data:

In [None]:
SOD_Morocco_Full = pd.read_csv(r"https://raw.githubusercontent.com/PreethikaShankar/ML_Algorithms/main/kaggle-vs-morocco-state-of-dev-surveys/results_preprocessed.csv")
SOD_Morocco_Qst = pd.read_csv(r"https://raw.githubusercontent.com/PreethikaShankar/ML_Algorithms/main/kaggle-vs-morocco-state-of-dev-surveys/questions_preprocessed.csv").set_index('Keys')
Kaggle_Full = pd.read_csv(r"https://raw.githubusercontent.com/PreethikaShankar/ML_Algorithms/main/kaggle-vs-morocco-state-of-dev-surveys/kaggle_survey_2020_responses.csv")

In [None]:
Kaggle_Full.head()

Renaming some Kaggle Data columns and Filtering for Moroccan participants:

In [None]:
Kaggle_Full = Kaggle_Full.rename(columns={
    "Time from Start to Finish (seconds)":"Time",
    "Q1":"Age",
    "Q2":"Gender",
    "Q2_OTHER_TEXT":"Gender_Text",
    "Q3":"Country",
    "Q4":"Education",
    "Q5":"Job_Title",
    "Q6":"Coding_Experience"})

Kaggle_Full_Morocco = Kaggle_Full[Kaggle_Full.Country == "Morocco"]
Kaggle_Full_Morocco.head()

__Number of Participants:__<br>
Let's start by taking a look at the size sampled by each survey

In [None]:
fig = px.bar(y = [Kaggle_Full_Morocco.shape[0], SOD_Morocco_Full.shape[0]],
             x = ["Kaggle", "SOD_Full"])

fig.update_layout(
    title = "Count of Survey Participants",
    yaxis_title = "Participants",
    xaxis_title = "Community")

The State of Dev syrvey has collected (~17x) more answers than the Kaggle survey from Moroccan memebers. Two factors help explain this: - The SOD survey was designed to include more different disciplines than just data related ones - The SOD survey had a better reach across the Moroccan community and benefited from it's local target nature, whilst the Kaggle survey was a worldwide campaign.

__Let's try applying a filter selecting only the SOD participants with Job titles closely related to the fields of Data Science and ML__

In [None]:
SOD_Morocco_Qst.Job_Title["choices"]

In [None]:
SOD_Morocco_Data = SOD_Morocco_Full[SOD_Morocco_Full.Job_Title.isin(['Data or business analyst',
                                 'Data scientist or machine learning specialist',
                                 'Engineer, data'])]
SOD_Morocco_Data.head()

In [None]:
fig = px.bar(y = [Kaggle_Full_Morocco.shape[0], SOD_Morocco_Data.shape[0]],
             x = ["Kaggle", "SOD"])

fig.update_layout(
    title = "Count of Survey Participants",
    yaxis_title = "Participants",
    xaxis_title = "Community")

We have more comparable sample size with State of Dev syrvey having a (+100) more answers than the Kaggle survey from Moroccan memebers. This suggsest that **the core data community contributing to both surveys has a very similar profile** We will explore this aspects further down

__The Age profile:__<br>
Harmonizing the Age brackest between both surveys

In [None]:
SOD_Morocco_Qst.Age["choices"], Kaggle_Full_Morocco.Age.unique()

In [None]:
SOD_Morocco_Data_Age = SOD_Morocco_Data.groupby(["Age"], as_index=False).count()
SOD_Morocco_Full_Age = SOD_Morocco_Full.groupby(["Age"], as_index=False).count()
Kaggle_Full_Morocco_Age = Kaggle_Full_Morocco.groupby(["Age"], as_index=False).count()

In [None]:
df_Age = pd.DataFrame({'Less than 20': 
                       [SOD_Morocco_Full_Age["userId"][0], SOD_Morocco_Data_Age["userId"][0], Kaggle_Full_Morocco_Age["Time"][0]],
                       '20 - 24 years':
                       [SOD_Morocco_Full_Age["userId"][1], SOD_Morocco_Data_Age["userId"][1], Kaggle_Full_Morocco_Age["Time"][1]],
                       '25 - 29 years':
                       [SOD_Morocco_Full_Age["userId"][2], SOD_Morocco_Data_Age["userId"][2], Kaggle_Full_Morocco_Age["Time"][2]],
                       '30 - 34 years' :
                       [SOD_Morocco_Full_Age["userId"][3], SOD_Morocco_Data_Age["userId"][3], Kaggle_Full_Morocco_Age["Time"][3]],
                       '35 - 39 years':
                       [SOD_Morocco_Full_Age["userId"][4], SOD_Morocco_Data_Age["userId"][4], Kaggle_Full_Morocco_Age["Time"][4]],
                       '40 years and older':
                       [SOD_Morocco_Full_Age["userId"][5], SOD_Morocco_Data_Age["userId"][5], Kaggle_Full_Morocco_Age["Time"][5]],
         }, index=['SOD_Full', 'SOD_Data', 'Kaggle'])
df_Age = 100*df_Age.transpose()/df_Age.transpose().sum() 
display(df_Age)

In [None]:
fig = go.Figure()

for i in df_Age.columns:
    data = df_Age[i]
    name = i
    fig.add_trace(go.Scatter(x = df_Age.index , y=data,
                             opacity=0.8,
                             name = name)) 

fig.update_layout( title='Age Percentages (%)')
fig.show()

In [None]:
fig = go.Figure()
fig.add_pie(values=df_Age['Kaggle'],
            labels = df_Age.index,
            name = "Kaggle", title = "Kaggle", 
            hole = 0.6, direction ='clockwise')

fig.update_layout(title_text="Pie Chart Age brackets repartition" )
fig.update_traces(textinfo='percent+label')
fig.show()

In [None]:
fig = go.Figure()
fig.add_pie(values=df_Age["SOD_Full"],
            labels = df_Age.index,
            name = "SOD_Full", title = "SDO_Full",
            hole = 0.6, direction ='clockwise')
fig.update_layout(title_text="Pie Chart Age brackets repartition" )

fig.show()  

In [None]:
fig = go.Figure()
fig.add_pie(values=df_Age["SOD_Data"],
            labels = df_Age.index,
            name = "SOD_Data", title = "SOD_Data",
            hole = 0.6, direction ='clockwise')
fig.update_layout(title_text="Pie Chart Age brackets repartition" )

fig.show()

2 Main observations emerge: - The Kaggle participants are slightly older: the age range [35++] represents more than 20% - The core data community in the State of Dev survey is more younger than the genral population: the age range [--24] represents 70% vs 50% in the full SOD **Hint:** you can click on each age bracket in the legend to show/hide the individual group

In [None]:
del SOD_Morocco_Data_Age, SOD_Morocco_Full_Age, Kaggle_Full_Morocco_Age

__Gender Profile__

In [None]:
SOD_Morocco_Qst.Gender["choices"], Kaggle_Full_Morocco.Gender.unique()

In [None]:
SOD_Morocco_Data_Gender = SOD_Morocco_Data.groupby(["Gender"], as_index=False).count()
SOD_Morocco_Full_Gender = SOD_Morocco_Full.groupby(["Gender"], as_index=False).count()
Kaggle_Full_Morocco_Gender = Kaggle_Full_Morocco.groupby(["Gender"], as_index=False).count()

In [None]:
df_Gender = pd.DataFrame({'Male':
                          [SOD_Morocco_Data_Gender["userId"][1],
                           SOD_Morocco_Full_Gender["userId"][1], 
                           Kaggle_Full_Morocco_Gender["Time"][0]],
                          'Female': 
                          [SOD_Morocco_Data_Gender["userId"][0],
                           SOD_Morocco_Full_Gender["userId"][0], 
                           Kaggle_Full_Morocco_Gender["Time"][2]],
                         }, index=['SOD_Data', 'SOD_Full', 'Kaggle'])
df_Gender

In [None]:
fig = make_subplots(rows=1, cols=3, horizontal_spacing=0.15,
                    specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

fig.add_pie(values=df_Gender.transpose()['Kaggle'],
            row=1, col=1, labels = df_Gender.columns,
            name = "Kaggle", title = "Kaggle", 
            hole = 0.6, direction ='clockwise'),

fig.add_pie(values=df_Gender.transpose()["SOD_Full"],
            row=1, col=2, labels = df_Gender.columns,
            name = "SOD_Full", title = "SDO_Full", 
            hole = 0.6, direction ='clockwise'),
fig.add_pie(values=df_Gender.transpose()["SOD_Data"],
            row=1, col=3, labels = df_Gender.columns,
            name = "SOD_Data", title = "SOD_Data", 
            hole = 0.6, direction ='clockwise'),
    

fig.update_layout(title_text="Pie Chart Gender repartition" )
fig.show()

Males are predominant as it is a general observation in the Tech field. Looking more closely, **the core data profiles in the State of Dev survey are** **silghtly more diverse with a ~30% female members, compared to 15% in the general survey**

In [None]:
del Kaggle_Full_Morocco_Gender, SOD_Morocco_Data_Gender, SOD_Morocco_Full_Gender

__Job Titles Profile__
The Kaggle and State of Dev surveys have a different definition of Job titles in their design.

Kaggle includes "Not employed" and "Students" as a fulltime job titles, wheras SOD addreses those categories in a second question about "Employed status" so we should filter for that first.

In [None]:
Roles_dict = {'Management Roles':
              ['Product/Project Manager', 'Engineering manager',
               'Marketing or sales professional', 'Product manager',
               'Senior executive/VP'],
              'Analysis Roles' : 
              ['Business Analyst', 'Data or business analyst', 'Data Analyst'],
              'Developement Roles':
              ['Software Engineer', 'Developer, QA, or test',
               'Developer, back-end', 'Developer, desktop applications',
               'Developer, embedded applications or devices',
               'Developer, front-end', 'Developer, full-stack',
               'Developer, game or graphics', 'Developer, mobile'],
              'Statistician or Research Scientist':
              ['Research Scientist', 'Statistician', 'Academic researcher',
               'Educator', 'Scientist'],
              'Students': 
              ['Student'],
              'Unemployed':
              ['Looking for work', 'Currently not employed'],
              'Other':
              ['Other', 'Engineer, site reliability',  'Designer',
               'System administrator', 'DevOps specialist'],
              'Data Roles':
              ['Data Engineer', 'DBA/Database Engineer', 'Data Scientist',
               'Machine Learning Engineer', 'Data scientist or machine learning specialist',
               'Engineer, data', 'Database administrator']
}
Roles_dict_reverse = {k: oldk for oldk, oldv in Roles_dict.items() for k in oldv}

In [None]:
SOD_Morocco_Full_Status = SOD_Morocco_Full.groupby(['Employment_Status']).count()
Kaggle_Full_Morocco_Job = Kaggle_Full_Morocco.groupby("Job_Title").count()
SOD_Morocco_Full_Job = SOD_Morocco_Full[SOD_Morocco_Full.Employment_Status.isin(['Employed full-time', 'Freelancer, or self-employed', 'Employed part-time'])].groupby("Job_Title").count()

In [None]:
SOD_Morocco_Full_Status.index = SOD_Morocco_Full_Status.index.map(Roles_dict_reverse)
SOD_Morocco_Full_Job.index = SOD_Morocco_Full_Job.index.map(Roles_dict_reverse)
Kaggle_Full_Morocco_Job.index = Kaggle_Full_Morocco_Job.index.map(Roles_dict_reverse)
Kaggle_Full_Morocco_Job.head()

In [None]:
SOD_Morocco_Full_Status = SOD_Morocco_Full_Status.groupby(SOD_Morocco_Full_Status.index).sum()
SOD_Morocco_Full_Job = SOD_Morocco_Full_Job.groupby(SOD_Morocco_Full_Job.index).sum()
Kaggle_Full_Morocco_Job = Kaggle_Full_Morocco_Job.groupby(Kaggle_Full_Morocco_Job.index).sum()
Kaggle_Full_Morocco_Job.head()

In [None]:
df_Job_Titles = pd.DataFrame()
df_Job_Titles["Kaggle"] = Kaggle_Full_Morocco_Job["Time"].transpose()
df_Job_Titles["SOD"] = SOD_Morocco_Full_Job["userId"]
df_Job_Titles["SOD"][-2:] = SOD_Morocco_Full_Status["userId"][-2:]
display(df_Job_Titles)

In [None]:
figure = go.Figure()

axis = list(df_Job_Titles.index)
axis.append(axis[0])
for col in df_Job_Titles:
    plot_data = df_Job_Titles[col].tolist()
    plot_data = (np.array(plot_data) / sum(plot_data) * 100).tolist()
    plot_data.append(plot_data[0])  
    figure.add_trace(go.Scatterpolar(r=plot_data,
                                     theta=axis,
                                     mode='lines',
                                     showlegend=True,
                                     name=col,
                                     hovertemplate='%{r:0.0f}%',
                                     opacity=0.8,
                                     line_shape='spline',
                                     line_smoothing=0.9,
                                     line_width=2))
    
figure.update_layout(polar_radialaxis_layer='below traces',
                     polar_radialaxis_range=(0,50), 
                     polar_radialaxis_tickvals=[10, 20, 30],
                     polar_radialaxis_ticktext=['10%', '20%', '30%'],
                     title='Job Title'
)

figure.show()

In [None]:
axis = list(df_Job_Titles.index)
plot_data = (100* df_Job_Titles /df_Job_Titles.sum())

figure = make_subplots(rows=1, cols=2, horizontal_spacing=0.15,
                    specs=[[{"type": "domain"}, {"type": "domain"}]])

figure.add_pie(values=plot_data["SOD"],
            labels = plot_data.index,
            row = 1, col = 1,
            name = "SOD", title = "SOD", 
            hole = 0.6, direction ='clockwise'),
figure.add_pie(values=plot_data["Kaggle"],
            labels = plot_data.index,
            row = 1, col = 2,
            name = "Kaggle", title = "Kaggle", 
            hole = 0.6, direction ='clockwise'),    

figure.update_layout(title_text="Pie Chart Job Titles repartition" )
figure.show()

- The first visualisation shows a **polarization efftect in the State of Dev survey** : 46% of particiapnts occupie devlopement roles and the next 28% arestill students. This is confirmed in the second vizualisation. - On the Kaggle survey, **60% of the Morccan commuity is split between Students and the 3 data roles** (Data science, Data Engineering, ML), Statistiscian and research scienctist are also well represented in the Kaggle survey compared to SOD. - Another intersting categorie is the **Unemployed** : ~13% in the State od Dev survey self reported as unemployed, this number drops to ~6.5% among the Kaggle survey participants. - Keeping in mind that both surveys were done during the **Covid-19** pandemic, it will be very intersting to keep track of these stats in future surveys as **an indicator not only for the pandimic effect, but also for the evolotion of the Moroccan IT job market toward becoming Data driven.**

How much salary?

In [None]:
data_x = [100*SOD_Morocco_Data.Salary_Range_DH.isna().sum()/SOD_Morocco_Data.shape[0],
 100*SOD_Morocco_Full.Salary_Range_DH.isna().sum()/SOD_Morocco_Full.shape[0],
 100*Kaggle_Full_Morocco.Q24.isna().sum()/Kaggle_Full_Morocco.shape[0]]
data_x = [ '%.2f' % elem for elem in data_x]

In [None]:
fig = px.bar( x=data_x,
    y=['SOD_Data', 'SOD_Full', 'Kaggle'],
    text=data_x, orientation='h',
            color =['SOD_Data', 'SOD_Full', 'Kaggle'])

fig.update_layout(
    title = "How many participants provided Salary range information?",
    yaxis_title = "",
    xaxis_title = "% of answers recieved")
fig.update_xaxes(range=[0, 100])
fig.show()

The topic of Salary range is very informative about the state of the IT field, but it remains sensitve to most participants: only ~40% of SOD population chose to report salary range information, the percentage slightly improves to ~55% in the core SOD data group, which is the same level reported on Kaggel by moroccans.

In [None]:
Salary_range_dict = {'10 000 - 20 000':
                     ['10 000 - 13 000', '13 000 - 15 000', '1,000-1,999', '15 000 - 20 000'],
                     '20 000 - 40 000':
                     ['20 000 - 25 000', '> 25 000', '2,000-2,999', '3,000-3,999'],
                     '< 10 000':
                     ['< 6 000', '8 000 - 10 000', '6 000 - 8 000', '$0-999'],
                     'Outliers':
                     ['30,000-39,999', '40,000-49,999', '5,000-7,499',
                      '7,500-9,999', '80,000-89,999', '40,000-49,999' ]
                    }
Salary_range_dict_reverse = {k: oldk for oldk, oldv in Salary_range_dict.items() for k in oldv}

In [None]:
SOD_Morocco_Data_Salary = SOD_Morocco_Data[SOD_Morocco_Data.Salary_Range_DH.isna()==False].groupby(SOD_Morocco_Data.Salary_Range_DH).count()
SOD_Morocco_Full_Salary = SOD_Morocco_Full[SOD_Morocco_Full.Salary_Range_DH.isna()==False].groupby(SOD_Morocco_Full.Salary_Range_DH).count()
Kaggle_Full_Morocco_Salary = Kaggle_Full_Morocco[Kaggle_Full_Morocco.Q24.isna()==False].groupby(Kaggle_Full_Morocco.Q24).count()

In [None]:
figure = make_subplots(rows=1, cols=2, horizontal_spacing=0.15,
                    specs=[[{"type": "domain"}, {"type": "domain"}]])

figure.add_pie(values=SOD_Morocco_Data_Salary["userId"],
            labels = SOD_Morocco_Data_Salary.index,
            row = 1, col = 1,
            name = "SOD_Data", title = "SOD_Data", 
            hole = 0.6, direction ='clockwise'), 
figure.add_pie(values=SOD_Morocco_Full_Salary["userId"],
            labels = SOD_Morocco_Full_Salary.index,
            row = 1, col = 2,
            name = "SOD_Full", title = "SOD_Full", 
            hole = 0.6, direction ='clockwise')

figure.update_layout(title_text="Pie Chart Salary(DH) range repartition: State of Dev data" )
figure.show()

In [None]:
SOD_Morocco_Full_Salary.index = SOD_Morocco_Full_Salary.index.map(Salary_range_dict_reverse)
SOD_Morocco_Data_Salary.index = SOD_Morocco_Data_Salary.index.map(Salary_range_dict_reverse)
Kaggle_Full_Morocco_Salary.index = Kaggle_Full_Morocco_Salary.index.map(Salary_range_dict_reverse)

In [None]:
SOD_Morocco_Full_Salary = SOD_Morocco_Full_Salary.groupby(SOD_Morocco_Full_Salary.index).sum()
SOD_Morocco_Data_Salary = SOD_Morocco_Data_Salary.groupby(SOD_Morocco_Data_Salary.index).sum()
Kaggle_Full_Morocco_Salary = Kaggle_Full_Morocco_Salary.groupby(Kaggle_Full_Morocco_Salary.index).sum()

In [None]:
df_Salary = pd.DataFrame()
df_Salary["Kaggle"] = Kaggle_Full_Morocco_Salary["Time"]
df_Salary["SOD_Full"] = SOD_Morocco_Full_Salary["userId"]
df_Salary["SOD_Data"] = SOD_Morocco_Data_Salary["userId"]
display(df_Salary)

In [None]:
axis = list(df_Job_Titles.index)

figure = make_subplots(rows=1, cols=3, horizontal_spacing=0.15,
                    specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

figure.add_pie(values=df_Salary["SOD_Data"],
            labels = df_Salary.index,
            row = 1, col = 1,
            name = "SOD_Data", title = "SOD_Data", 
            hole = 0.6, direction ='clockwise'),
figure.add_pie(values=df_Salary["Kaggle"],
            labels = df_Salary.index,
            row = 1, col = 2,
            name = "Kaggle", title = "Kaggle", 
            hole = 0.6, direction ='clockwise'),    
figure.add_pie(values=df_Salary["SOD_Full"],
            labels = df_Salary.index,
            row = 1, col = 3,
            name = "SOD_Full", title = "SOD_Full", 
            hole = 0.6, direction ='clockwise')

figure.update_layout(title_text="Pie Chart Salary(DH) range repartition: Overview" )
figure.show()

**Note:** The Kaggle survey proposed values for salary range in USD while State of Dev in Moroccan DH, we harmonize both with a scale of **1 USD = 10 DH**. Some values in the Kaggle survey are designed as **outliers**, the partipants probably didn't account for the currency difference. - The State of Dev salary data show more details in the salary brackets. As an overview, **~50% of the Moroccan IT Talents earns up to 1K (USD) per month**, the core data field is **slightly more rewarding** **with ~65% of its talent earning beyond the 1K (USD) threshhold.**

__Coding Experience__<br>
Exploring the IT field maturity

In [None]:
SOD_Morocco_Qst.Coding_Experience["choices"], Kaggle_Full_Morocco.Coding_Experience.unique()

We start by harmonizing the coding experience brackets in both surveys

In [None]:
Exp_dict = {'0-1 years':
              ['Less than 1 year', 'I have never written code', '< 1 years'],
              '1-3 years' : 
              ['1-3 years', '1-2 years'],
              '3-5 years':
              ['3-5 years'],
              '5-10 years': 
              ['5-10 years'],
              'More than 10 years':
              ['More than 10', '10-20 years', '20+ years']
             }
Exp_dict_reverse = {k: oldk for oldk, oldv in Exp_dict.items() for k in oldv}

In [None]:
SOD_Morocco_Full_Exp = SOD_Morocco_Full.groupby(['Coding_Experience']).count()
Kaggle_Full_Morocco_Exp = Kaggle_Full_Morocco.groupby("Coding_Experience").count()
SOD_Morocco_Data_Exp = SOD_Morocco_Data.groupby("Coding_Experience").count()
Kaggle_Full_Morocco_Exp.index = Kaggle_Full_Morocco_Exp.index.map(Exp_dict_reverse)
Kaggle_Full_Morocco_Exp = Kaggle_Full_Morocco_Exp.groupby(Kaggle_Full_Morocco_Exp.index).sum()
SOD_Morocco_Data_Exp.index = SOD_Morocco_Data_Exp.index.map(Exp_dict_reverse)
SOD_Morocco_Full_Exp.index = SOD_Morocco_Full_Exp.index.map(Exp_dict_reverse)
Kaggle_Full_Morocco_Exp.head()

In [None]:
df_Coding_Exp = pd.DataFrame({"Kaggle": Kaggle_Full_Morocco_Exp["Time"],
                              "SOD_Data": SOD_Morocco_Data_Exp["userId"],
                              "SOD_Full": SOD_Morocco_Full_Exp["userId"]
                             })
df_Coding_Exp = 100*df_Coding_Exp/df_Coding_Exp.sum()
display(df_Coding_Exp)

In [None]:
fig = go.Figure(data=[
    go.Bar(name="Kaggle", x=df_Coding_Exp.index, y=df_Coding_Exp.Kaggle),
    go.Bar(name="SOD_Full", x=df_Coding_Exp.index, y=df_Coding_Exp.SOD_Full),
    go.Bar(name="SOD_Data", x=df_Coding_Exp.index, y=df_Coding_Exp.SOD_Data)
])

# Change the bar mode
fig.update_layout(
    title='Coding Experience of respondents ',
    xaxis_title=None,
    yaxis_title='Percentage'
)
fig.update_yaxes(range=[0, 55])
fig.show()


In [None]:
figure = make_subplots(rows=1, cols=3, horizontal_spacing=0.15,
                    specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

figure.add_pie(values=df_Coding_Exp["SOD_Full"],
            labels = df_Coding_Exp.index,
            row = 1, col = 1,
            name = "SOD_Full", title = "SOD_Full", 
            hole = 0.6, direction ='clockwise'),
figure.add_pie(values=df_Coding_Exp["Kaggle"],
            labels = df_Coding_Exp.index,
            row = 1, col = 2,
            name = "Kaggle", title = "Kaggle", 
            hole = 0.6, direction ='clockwise'),
figure.add_pie(values=df_Coding_Exp["SOD_Data"],
            labels = df_Coding_Exp.index,
            row = 1, col = 3,
            name = "SOD_Data", title = "SOD_Data", 
            hole = 0.6, direction ='clockwise')

figure.update_layout(title_text="Pie Chart Coding Experience repartition" )
figure.show()

- As a general observation **the IT field sampled in both suveys is young in terms of professionnal experience.** - The State of Dev population is predominantly junior: 60% of paricipants in the general survey have less than 3 years of experience, 75% in the selected core data group. - The core data group in SOD is even more junior: **~50% of participants have less than 1 years experience.** One potential reason is **the large number of students that answered the survey, combined with the fact that moroccan universitieshave increased their offerings in Bachelors and Masters diploma that focuse on Data Science and ML.** - The Kaggle survey participants are **slightly more experienced** with ~45% having between 3-5 years of experience and ~65% having less than 5 years.

In [None]:
del Kaggle_Full_Morocco_Exp, SOD_Morocco_Data_Exp, SOD_Morocco_Full_Exp