# Quick analysis of the VIE/VIA contracts available on *civiweb.fr*

This Jupyter notebook is designed to use data from the civiweb website that you can get by using the following scraper : WWW

VIE and VIA are a specific kind of international internship in company or administration, that are managed in France by an organization called Business France. More information can be found at the following address : https://www.civiweb.com/EN/le-volontariat-international.aspx.

The purpose of this notebook is to describe quickly the extracted data, and to get some useful informations both from Business France and applicants perspectives.

In [184]:
# Importing the data
import pandas as pd

df = pd.read_json('data/offers.json')
df = df[df['Reference'] != 'N/A']
print(df.columns)
print('Number of offers :', df.shape[0])

Index(['URL', 'Reference', 'Title', 'Country', 'City', 'StartDate', 'EndDate',
       'NumberOfMonths', 'Organization', 'Salary', 'NumberOfJobs',
       'DesiredExperience', 'EducationLevel', 'Languages', 'Competence',
       'Diploma', 'PublicationDate', 'PublisherCity', 'TypeOfContract'],
      dtype='object')
Number of offers : 1848


A first useful analysis could be to see the geographical repartition of the offers. To do so, we need to transform French country names into 

In [185]:
import pycountry
import gettext
from unidecode import unidecode

def get_country_code(french_name):
    fr = gettext.translation('iso3166', pycountry.LOCALES_DIR, languages=['FR'])
    fr.install()
    country_code = ''.join([country.alpha_3 for country in pycountry.countries if unidecode(fr.gettext(country.name).upper()) == french_name]) # u'FR'
    return country_code

df['CountryAlpha3'] = df['Country'].apply(get_country_code)
print('Unrecognized countries :', df[df['CountryAlpha3'] == ''].Country.unique())

Unrecognized countries : ['VIETNAM' 'REPUBLIQUE TCHEQUE' 'RUSSIE' 'MYANMAR (EX BIRMANIE)'
 'COREE DU SUD' 'TAIWAN' 'GUINEE-BISSAO' 'LAOS']


In [186]:
# Mapping the unrecognized country manually
map = {'VIETNAM' : 'VNM','REPUBLIQUE TCHEQUE' : 'CZE','RUSSIE' : 'RUS','MYANMAR (EX BIRMANIE)' : 'MMR','COREE DU SUD' : 'KOR','TAIWAN' : 'TWN','GUINEE-BISSAO' : 'GNB','LAOS' : 'LAO'}
temp = df['Country'].map(map, na_action='ignore')
df['CountryAlpha3'].update(temp)

Now, I would like to answer to the following questions :

- How VIE are distributed around the world ?
- Where the salary of the VIE are the highest comparated to the local median income ?

To do so, I'm going to create a dataframe containing the required data grouped by country code.

*NB It would have been much more accurate to use cities instead of countries to do this analysis (as in USA or China for example, the salary changes in big cities). However, it requires a lot of work on the data cleansing and aggregating part, and that's not the purpose of this notebook.*

In [187]:
df_country = pd.read_csv('data/net_income.csv')
temp = df_country['Country'].apply(pycountry.countries.search_fuzzy)
df_country['code'] = temp.apply(lambda x : x[0].alpha_3)
df_country.set_index('code', inplace=True)

df['Salary'] = pd.to_numeric(df['Salary'])
vie_salary = df.groupby('CountryAlpha3').Salary.mean()
vie_number = df.groupby('CountryAlpha3').Salary.count()
df_country['NumberOfVIE'] = vie_number
df_country['AverageSalaryVIE'] = vie_salary

df_country = df_country[df_country['AverageSalaryVIE'].isnull() == False]