### Introduction

#### Français
Ce Jupyter Notebook vise à représenter mes propres données que j'ai extraites des sites d'emploi via un processus de web scraping, ces données sont des offres d'emploi provenant essentiellement de LinkedIn, mais à l'avenir, je mettrai en œuvre un scraper pour extraire les offres du site web Indeed, afin d'avoir des données diversifiées. Pour l'instant avec les données que j'ai en ma possession, j'ai représenté sous forme graphique les technologies web les plus couramment utilisées dans les projets web d'entreprise dans les pays de l'UE et certains pays d'Asie.

#### English
This Jupyter Notebook aims to represent my own data that I extracted from jobboards via a web scraping process, this data is job offers coming mainly from LinkedIn, but in the future I will implement a scraper to extract job offers from Indeed website, in order to have diversified data. For now, with the data I have in my possession, I have represented in graphical form the most commonly used web technologies in companies web projects in EU countries and some Asian countries.

### Import libraries

In [1]:
from pandas import Series, DataFrame, read_csv
from json import loads
import plotly.express as px

### Loading and filtering data

In [2]:
# data dump from MySQL database at 17/13/2022
df = read_csv('jobs_offers.csv', encoding='utf-8')

# convert str (json in database) to python dict with loads function
df.technologies = df.technologies.apply(lambda str_dict: loads(str_dict))

# delete unnecessary columns
df.drop(['description', 'company_url', 'date_time',
        'criteria', 'job_offer_url'], axis=1, inplace=True)

# set job_offer_id as default index
df.set_index('job_offer_id', inplace=True)

# technologies filter
technologies_filter = df.technologies.apply(lambda d: d != None and len(d) > 0)

# apply technologies filter to df
df = df[technologies_filter]
df.shape

(1638, 5)

### Creating DataFrames

In [3]:
# 19 countries selected for EU
union_european_countries = [
    'FRANCE', 'GERMANY', 'BELGIUM', 'DENMARK', 'ESTONIA',
    'FINLAND', 'GREECE', 'ICELAND', 'IRELAND', 'ITALY',
    'LUXEMBOURG', 'NETHERLANDS', 'NORWAY', 'POLAND',
    'PORTUGAL', 'SPAIN', 'SWEDEN', 'SWITZERLAND', 'AUSTRIA'
]
# create DataFrame by filtering only countries listed in union_european_countries list
df_eu = df.query('country in @union_european_countries')
df_eu.shape

(970, 5)

In [4]:
# 3 countries selected for Asia
asia_countries = ['SOUTH KOREA', 'CHINA', 'JAPAN']
df_asia = df.query('country in @asia_countries')
df_asia.shape

(82, 5)

In [5]:
# 3 countries selected for North America
n_america_countries = ['UNITED STATES', 'CANADA', 'MEXICO']
df_n_america = df.query('country in @n_america_countries')
df_n_america.shape

(177, 5)

### Utility functions

In [6]:
def get_freq(S: Series, category: str, columns: list = ['Frequencies']) -> DataFrame:
    L = []
    for dict in S:
        for cat, tech in dict.items():
            if cat == category:
                L.extend(tech)
                break
    return DataFrame(Series(L).value_counts(normalize=True), columns=columns)

### Analysis of EU data

In [7]:
freq_php_frameworks_eu = get_freq(df_eu.technologies, 'PHP Frameworks')
# print top PHP Frameworks for df_eu
freq_php_frameworks_eu

Unnamed: 0,Frequencies
Laravel,0.39726
Symfony,0.356164
Laminas,0.054795
Slim,0.054795
Flight,0.041096
CodeIgniter,0.041096
CakePHP,0.027397
Yii,0.027397


In [8]:
fig_freq_php_frameworks_eu = px.bar(
    freq_php_frameworks_eu,
    title=f"Top PHP Frameworks used in EU web projects, number of job offers: {df_eu.shape[0]}",
    labels=dict(value='Frequency', index='Technology'),
    template='plotly_dark',
    color_discrete_sequence=['orange']
)
fig_freq_php_frameworks_eu.show()

In [9]:
freq_js_frameworks_eu = get_freq(df_eu.technologies, 'JavaScript Frameworks')
# print top JS Frameworks for df_eu
freq_js_frameworks_eu

Unnamed: 0,Frequencies
React,0.454082
Angular,0.345663
Node.js,0.103316
Vue.js,0.076531
Svelte,0.008929
Ext,0.006378
Ember.js,0.002551
Nuxt.js,0.001276
Aurelia,0.001276


In [10]:
fig_freq_js_frameworks_eu = px.bar(
    freq_js_frameworks_eu,
    title=f"Top JS Frameworks used in EU web projects, number of job offers: {df_eu.shape[0]}",
    labels=dict(value='Frequency', index='Technology'),
    template='plotly_dark',
    color_discrete_sequence=['orange']
)
fig_freq_js_frameworks_eu.show()

### Analysis of Asia data

In [11]:
freq_php_frameworks_asia = get_freq(df_asia.technologies, 'PHP Frameworks')
# print top PHP Frameworks for df_asia
freq_php_frameworks_asia

Unnamed: 0,Frequencies
Flight,0.666667
Laravel,0.333333


In [12]:
fig_freq_php_frameworks_asia = px.bar(
    freq_php_frameworks_asia,
    title=f"Top PHP Frameworks used in Asia web projects, number of job offers: {df_asia.shape[0]}",
    labels=dict(value='Frequency', index='Technology'),
    template='plotly_dark'
)
fig_freq_php_frameworks_asia.show()

In [13]:
freq_js_frameworks_asia = get_freq(
    df_asia.technologies, 'JavaScript Frameworks')
freq_js_frameworks_asia

Unnamed: 0,Frequencies
React,0.471698
Angular,0.226415
Node.js,0.207547
Vue.js,0.075472
Ext,0.018868


In [14]:
fig_freq_js_frameworks_asia = px.bar(
    freq_js_frameworks_asia,
    title=f"Top JS Frameworks used in Asia web projects, number of job offers: {df_asia.shape[0]}",
    labels=dict(value='Frequency', index='Technology'),
    template='plotly_dark'
)
fig_freq_js_frameworks_asia.show()

In [15]:
freq_countries_dict = df.country.value_counts(normalize=True).to_dict()
fig_countries_freq = px.bar(
    x=freq_countries_dict.keys(),
    y=freq_countries_dict.values(),
    labels=dict(y='Frequency', x='Country'),
    template='plotly_dark',
    title='Data volume by country'
)
fig_countries_freq.show()