# Pandas case study - EURES job offers

## [Download exercises and solution](../_static/generated/pandas.zip)

### What to do

1. If you haven't already, install Pandas:

    Anaconda:

    `conda install pandas`

    Without Anaconda (`--user` installs in your home):

    `python3 -m pip install --user pandas`


2. unzip exercises in a folder, you should get something like this: 

```
 pandas
     eures-jobs.ipynb
     eures-jobs-sol.ipynb     
     italian-poets-chal.ipynb
     pandas.ipynb     
     pandas-sol.ipynb     
     jupman.py
```

<div class="alert alert-warning">

**WARNING 1**: to correctly visualize the notebook, it MUST be in an unzipped folder !
</div>


3. open Jupyter Notebook from that folder. Two things should open, first a console and then browser. 
4. The browser should show a file list: navigate the list and open the notebook `pandas/eures-jobs.ipynb`

<div class="alert alert-warning">

**WARNING 2**: DO NOT use the _Upload_ button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
</div>

5. Go on reading that notebook, and follow instuctions inside.


Shortcut keys:

- to execute Python code inside a Jupyter cell, press `Control + Enter`
- to execute Python code inside a Jupyter cell AND select next cell, press `Shift + Enter`
- to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press `Alt + Enter`
- If the notebooks look stuck, try to select `Kernel -> Restart`

After exiting your school prison, when looking for a job in Europe you will be shocked to discover a great variety of languages are spoken. Many job listings are provided by [Eures](https://ec.europa.eu/eures/public/homepage) portal, which is easily searchable with many fields on which you can filter. For this exercise we will use a test dataset which was generated just for a hackaton: it is a crude italian version of the job offers data, with many fields expressed in natural language. We will try to convert it to a dataset with more columns and translate some terms to English.

Data provider: [Autonomous Province of Trento](https://dati.trentino.it/dataset/offerte-di-lavoro-eures-test-odhb2019)

License: [Creative Commons Zero 1.0](http://creativecommons.org/publicdomain/zero/1.0/deed.it)

<div class="alert alert-warning">

**WARNING**: avoid constants in function bodies !!

In the exercises data you will find many names such as `'Austria'`, `'Giugno'`, etc. **DO NOT** put such constant names inside body of functions !! You have to write generic code which works with any input.
</div>

## offerte dataset

We will load the dataset [offerte-lavoro.csv](offerte-lavoro.csv) into Pandas:

In [1]:
import pandas as pd   # we import pandas and for ease we rename it to 'pd'
import numpy as np    # we import numpy and for ease we rename it to 'np'

# remember the encoding !
offerte = pd.read_csv('offerte-lavoro.csv', encoding='UTF-8')  
offerte.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 8 columns):
RIFER.                 53 non-null object
SEDE LAVORO            53 non-null object
POSTI                  53 non-null int64
IMPIEGO RICHIESTO      53 non-null object
TIPO CONTRATTO         53 non-null object
LINGUA RICHIESTA       51 non-null object
RET. LORDA             53 non-null object
DESCRIZIONE OFFERTA    53 non-null object
dtypes: int64(1), object(7)
memory usage: 3.4+ KB


It contains Italian column names, and many string fields:

In [2]:
offerte.head()

Unnamed: 0,RIFER.,SEDE LAVORO,POSTI,IMPIEGO RICHIESTO,TIPO CONTRATTO,LINGUA RICHIESTA,RET. LORDA,DESCRIZIONE OFFERTA
0,18331901000024,Norvegia,6,Restaurant staff,Tempo determinato da maggio ad agosto,Inglese fluente + Vedi testo,Da 3500\nFr/\nmese,"We will be working together with sales, prepar..."
1,083PZMM,Francia,1,Assistant export trilingue italien et anglais ...,Non specificato,Inglese; italiano; francese fluente,Da definire,Vos missions principales sont les suivantes : ...
2,4954752,Danimarca,1,Italian Sales Representative,Non specificato,Inglese; Italiano fluente,Da definire,"Minimum 2 + years sales experience, preferably..."
3,-,Berlino\nTrento,1,Apprendista perito elettronico; Elettrotecnico,Inizialmente contratto di apprendistato con po...,Inglese Buono (B1-B2); Tedesco base,Min 1000\nMax\n1170\n€/mese,Ti stai diplomando e/o stai cercando un primo ...
4,10531631,Svezia,1,Italian speaking purchase,Non specificato,Inglese; italiano fluente,Da definire,"This is a varied Purchasing role, where your m..."


## rename columns

As first thing, we create a new dataframe `offers` with columns renamed into English:

In [3]:
replacements = ['Reference','Workplace','Positions','Qualification','Contract type',
                'Required languages','Gross retribution','Offer description']
diz = {}
i = 0
for col in offerte:
    diz[col] = replacements[i]
    i += 1
offers = offerte.rename(columns = diz)

In [4]:
offers

Unnamed: 0,Reference,Workplace,Positions,Qualification,Contract type,Required languages,Gross retribution,Offer description
0,18331901000024,Norvegia,6,Restaurant staff,Tempo determinato da maggio ad agosto,Inglese fluente + Vedi testo,Da 3500\nFr/\nmese,"We will be working together with sales, prepar..."
1,083PZMM,Francia,1,Assistant export trilingue italien et anglais ...,Non specificato,Inglese; italiano; francese fluente,Da definire,Vos missions principales sont les suivantes : ...
2,4954752,Danimarca,1,Italian Sales Representative,Non specificato,Inglese; Italiano fluente,Da definire,"Minimum 2 + years sales experience, preferably..."
3,-,Berlino\nTrento,1,Apprendista perito elettronico; Elettrotecnico,Inizialmente contratto di apprendistato con po...,Inglese Buono (B1-B2); Tedesco base,Min 1000\nMax\n1170\n€/mese,Ti stai diplomando e/o stai cercando un primo ...
4,10531631,Svezia,1,Italian speaking purchase,Non specificato,Inglese; italiano fluente,Da definire,"This is a varied Purchasing role, where your m..."
5,51485,Islanda,1,Pizza chef,Tempo determinato,Inglese Buono,Da definire,Job details/requirements: Experience in making...
6,4956299,Danimarca,1,Regional Key account manager - Italy,Non specificato,Inglese; italiano fluente,Da definire,Requirements: possess good business acumen; ar...
7,-,Italia\nLazise,1,Receptionist,Non specificato,Inglese; Tedesco fluente + Vedi testo,Min 1500€\nMax\n1800€\nnetto\nmese,"Camping Village Du Parc, Lazise,Italy is looki..."
8,2099681,Irlanda,11,Customer Service Representative in Athens,Non specificato,Italiano fluente; Inglese buono,Da definire,Responsibilities: Solving customers queries by...
9,12091902000474,Norvegia,1,Dispatch personnel,Maggio – agosto 2019,Inglese fluente + Vedi testo,Da definire,The Dispatch Team works outside in all weather...


## 1. Rename countries

We would like to create a new column holding a list of countries where the job is to be done. You will also have to translate countries to their English name. 

To allow for text processing, you are provided with some data as python data structures (you do not need to further edit it):

In [5]:

connectives = ['e', 'ed']
punctuation = ['.',';',',']

countries = {
    'Austria':'Austria',
    'Belgio': 'Belgium',
    'Cipro':'Cyprus',    
    'Danimarca': 'Denmark',
    'Irlanda':'Ireland',
    'Italia':'Italy',
    'Grecia':'Greece',
    'Finlandia' : 'Finland',
    'Francia' : 'France',
    'Norvegia': 'Norway',    
    'Paesi Bassi':'Netherlands',
    'Regno Unito': 'United Kingdom',
    'Spagna': 'Spain',
    'Svezia':'Sweden', 
    'Islanda':'Iceland',
    'Svizzera':'Switzerland',
    'estero': 'abroad'        # special case
}

cities = {
    'Pfenninger Alm': 'Pfenninger Alm',
    'Berlino': 'Berlin',
    'Trento': 'Trento',
    'Klagenfurt': 'Klagenfurt',
    'Lazise': 'Lazise',
    'Lund':'Lund',
    'Møre e Romsdal': 'Møre og Romsdal',
    'Pfenninger Alm' : 'Pfenninger Alm',
    'Sogn og Fjordane': 'Sogn og Fjordane',
    'Hesla Gaard':'Hesla Gaard'
}


### 1.1 countries_to_list

✪✪ Implement function `countries_to_list` which given a string from `Workplace` column, RETURN a list holding country names in English **in the exact order they appear in the string**.  The function will have to remove city names as well as punctuation, connectives and newlines using data define in the previous cell. There are various ways to solve the exercise: if you try the most straightforward one, most probably you will get countries which are not in the same order as in the string. 

**NOTE**: this function only takes a single string as input!

Example:

```python
>>> countries_to_list("Regno Unito, Italia ed estero")
['United Kingdom', 'Italy', 'abroad']
```
For other examples, see asserts.

In [6]:

def countries_to_list(s):
    
    ret = []
    i = 0
    ns = s.replace('\n',' ')
    for connective in connectives:
        ns = ns.replace(' ' + connective + ' ',' ')
    for p in punctuation:
        ns = ns.replace(p,'')
        
    while i < len(ns):
        for country in countries:
            if ns[i:].startswith(country):
                ret.append(countries[country])
                i += len(country)
        i += 1  # crude but works for this dataset ;-)
    return ret
    

# single country
assert countries_to_list("Francia") == ['France']
# country with a city
assert countries_to_list("Austria Klagenfurt") == ['Austria']
# country with a space
assert countries_to_list("Paesi Bassi") == ['Netherlands']
# one country, newline, one city
assert countries_to_list("Italia\nLazise") == ['Italy']
# newline, multiple cities
assert countries_to_list("Norvegia\nMøre e Romsdal e Sogn og Fjordane.") == ['Norway']
# multiple countries - order *must* be preserved !
assert countries_to_list('Cipro Grecia Spagna') == ['Cyprus', 'Greece', 'Spain']
# punctuation and connectives, multiple countries - order *must* be preserved !
assert countries_to_list('Regno Unito, Italia ed estero') == ['United Kingdom', 'Italy', 'abroad']

In [6]:

def countries_to_list(s):
    raise Exception('TODO IMPLEMENT ME !')

# single country
assert countries_to_list("Francia") == ['France']
# country with a city
assert countries_to_list("Austria Klagenfurt") == ['Austria']
# country with a space
assert countries_to_list("Paesi Bassi") == ['Netherlands']
# one country, newline, one city
assert countries_to_list("Italia\nLazise") == ['Italy']
# newline, multiple cities
assert countries_to_list("Norvegia\nMøre e Romsdal e Sogn og Fjordane.") == ['Norway']
# multiple countries - order *must* be preserved !
assert countries_to_list('Cipro Grecia Spagna') == ['Cyprus', 'Greece', 'Spain']
# punctuation and connectives, multiple countries - order *must* be preserved !
assert countries_to_list('Regno Unito, Italia ed estero') == ['United Kingdom', 'Italy', 'abroad']

### 1.2 Filling column Workplace Country

✪ Now create a new column `Workplace Country` with data calculated using the function you just defined.

To do it, check method [transform in Pandas worksheet](https://sciprog.davidleoni.it/pandas/pandas-sol.html#7.-Transforming)

In [7]:
# write here



In [8]:
# SOLUTION

offers['Workplace Country'] = offerte['SEDE LAVORO']
offers['Workplace Country'] = offers['Workplace Country'].transform(countries_to_list)

In [9]:
print()
print("            *****************     SOLUTION OUTPUT     ********************")
offers 


            *****************     SOLUTION OUTPUT     ********************


Unnamed: 0,Reference,Workplace,Positions,Qualification,Contract type,Required languages,Gross retribution,Offer description,Workplace Country
0,18331901000024,Norvegia,6,Restaurant staff,Tempo determinato da maggio ad agosto,Inglese fluente + Vedi testo,Da 3500\nFr/\nmese,"We will be working together with sales, prepar...",[Norway]
1,083PZMM,Francia,1,Assistant export trilingue italien et anglais ...,Non specificato,Inglese; italiano; francese fluente,Da definire,Vos missions principales sont les suivantes : ...,[France]
2,4954752,Danimarca,1,Italian Sales Representative,Non specificato,Inglese; Italiano fluente,Da definire,"Minimum 2 + years sales experience, preferably...",[Denmark]
3,-,Berlino\nTrento,1,Apprendista perito elettronico; Elettrotecnico,Inizialmente contratto di apprendistato con po...,Inglese Buono (B1-B2); Tedesco base,Min 1000\nMax\n1170\n€/mese,Ti stai diplomando e/o stai cercando un primo ...,[]
4,10531631,Svezia,1,Italian speaking purchase,Non specificato,Inglese; italiano fluente,Da definire,"This is a varied Purchasing role, where your m...",[Sweden]
5,51485,Islanda,1,Pizza chef,Tempo determinato,Inglese Buono,Da definire,Job details/requirements: Experience in making...,[Iceland]
6,4956299,Danimarca,1,Regional Key account manager - Italy,Non specificato,Inglese; italiano fluente,Da definire,Requirements: possess good business acumen; ar...,[Denmark]
7,-,Italia\nLazise,1,Receptionist,Non specificato,Inglese; Tedesco fluente + Vedi testo,Min 1500€\nMax\n1800€\nnetto\nmese,"Camping Village Du Parc, Lazise,Italy is looki...",[Italy]
8,2099681,Irlanda,11,Customer Service Representative in Athens,Non specificato,Italiano fluente; Inglese buono,Da definire,Responsibilities: Solving customers queries by...,[Ireland]
9,12091902000474,Norvegia,1,Dispatch personnel,Maggio – agosto 2019,Inglese fluente + Vedi testo,Da definire,The Dispatch Team works outside in all weather...,[Norway]


## 2. Work dates

You will add columns holding the dates of when a job start and when a job ends. 

### 2.1 from_to function

✪✪ First define `from_to`  function, which takes some text from column `"Contract type"`  and RETURNS a tuple holding the extracted month numbers (starting from ONE, not zero!)

Example: 

In this this case result is `(5, 8)` because May is the fifth month and August is the eighth:

```python
>>> from_to("Tempo determinato da maggio ad agosto")
(5,8)
```

If it is not possible to extract the text, the function should return a tuple holding NaNs:

```python
>>> from_to('Non specificato')
(np.nan, np.nan)
```

Beware NaNs can lead to puzzling results, make sure you have read NaN and Infinities section in [Numpy Matrices notebook](https://en.softpython.org/matrices-numpy/matrices-numpy-sol.html#NaNs-and-infinities)

For other patterns to check, see asserts.

In [10]:
months = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]


def from_to(text):
    
    ntext = text.lower().replace('ad ', 'a ')
    
    found = False
    
    if 'da ' in ntext:
        from_pos = ntext.find('da ') + 3
        from_month = text[from_pos:].split(' ')[0]
        if ' a ' in ntext:
            to_pos = ntext.find(' a ') + 3
            to_month = ntext[to_pos:].split(' ')[0]
            found = True
    if '–' in ntext:
        from_month = ntext.split(' – ')[0]
        to_month = ntext.split(' – ')[0].split(' ')[0]
        found = True
        
    if found:
        from_number = months.index(from_month) + 1
        to_number = months.index(to_month) + 1   
        return (from_number,to_number)
    else:
        return (np.nan, np.nan)
    
    
assert from_to('Da maggio a settembre') == (5,9)
assert from_to('Da maggio ad ottobre') == (5, 10)
assert from_to('Tempo determinato da maggio ad agosto') == (5,8)
# Unspecified
assert from_to('Non specificato') == (np.nan, np.nan)
# WARNING: BE SUPERCAREFUL ABOUT THIS ONE: SYMBOL  –  IS *NOT* A MINUS !!
# COPY AND PASTE IT EXACTLY AS YOU FIND IT HERE 
# (BUT OF COURSE *DO NOT COPY* THE MONTH NAMES !)
assert from_to('Maggio – agosto 2019') == (5, 5)
# special case 'or', we just consider first interval and ignore the following one.
assert from_to('Da maggio a settembre o da giugno ad agosto')  == (5,9)
# special case only right side, we ignore all of it
assert from_to('Contratto stagionale fino a novembre 2019') == (np.nan, np.nan)

In [10]:
months = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]


def from_to(text):
    raise Exception('TODO IMPLEMENT ME !')
    
assert from_to('Da maggio a settembre') == (5,9)
assert from_to('Da maggio ad ottobre') == (5, 10)
assert from_to('Tempo determinato da maggio ad agosto') == (5,8)
# Unspecified
assert from_to('Non specificato') == (np.nan, np.nan)
# WARNING: BE SUPERCAREFUL ABOUT THIS ONE: SYMBOL  –  IS *NOT* A MINUS !!
# COPY AND PASTE IT EXACTLY AS YOU FIND IT HERE 
# (BUT OF COURSE *DO NOT COPY* THE MONTH NAMES !)
assert from_to('Maggio – agosto 2019') == (5, 5)
# special case 'or', we just consider first interval and ignore the following one.
assert from_to('Da maggio a settembre o da giugno ad agosto')  == (5,9)
# special case only right side, we ignore all of it
assert from_to('Contratto stagionale fino a novembre 2019') == (np.nan, np.nan)

### 2.2. From To columns

✪ Change `offers` dataframe to so add `From` and `To` columns. 

- **HINT 1**: You can call transform, see Transforming section in [Pandas worksheet](https://sciprog.davidleoni.it/pandas/pandas-sol.html#7.-Transforming)
- **HINT 2** : to extract the element you want from the tuple, you can pass to the transform a function on the fly with `lambda`.  See lambdas section in [Functions worksheet](https://sciprog.davidleoni.it/functions/functions-sol.html#Lambda-functions)

In [11]:
# write here

offers['From'] = offers['Contract type'].transform(lambda t: from_to(t)[0])
offers['To'] =  offers['Contract type'].transform(lambda t: from_to(t)[1])

In [11]:
# write here



In [13]:
print()
print(" ****************   SOLUTION OUTPUT  ****************")
offers


 ****************   SOLUTION OUTPUT  ****************


Unnamed: 0,Reference,Workplace,Positions,Qualification,Contract type,Required languages,Gross retribution,Offer description,Workplace Country,From,To
0,18331901000024,Norvegia,6,Restaurant staff,Tempo determinato da maggio ad agosto,Inglese fluente + Vedi testo,Da 3500\nFr/\nmese,"We will be working together with sales, prepar...",[Norway],5.0,8.0
1,083PZMM,Francia,1,Assistant export trilingue italien et anglais ...,Non specificato,Inglese; italiano; francese fluente,Da definire,Vos missions principales sont les suivantes : ...,[France],,
2,4954752,Danimarca,1,Italian Sales Representative,Non specificato,Inglese; Italiano fluente,Da definire,"Minimum 2 + years sales experience, preferably...",[Denmark],,
3,-,Berlino\nTrento,1,Apprendista perito elettronico; Elettrotecnico,Inizialmente contratto di apprendistato con po...,Inglese Buono (B1-B2); Tedesco base,Min 1000\nMax\n1170\n€/mese,Ti stai diplomando e/o stai cercando un primo ...,[],,
4,10531631,Svezia,1,Italian speaking purchase,Non specificato,Inglese; italiano fluente,Da definire,"This is a varied Purchasing role, where your m...",[Sweden],,
5,51485,Islanda,1,Pizza chef,Tempo determinato,Inglese Buono,Da definire,Job details/requirements: Experience in making...,[Iceland],,
6,4956299,Danimarca,1,Regional Key account manager - Italy,Non specificato,Inglese; italiano fluente,Da definire,Requirements: possess good business acumen; ar...,[Denmark],,
7,-,Italia\nLazise,1,Receptionist,Non specificato,Inglese; Tedesco fluente + Vedi testo,Min 1500€\nMax\n1800€\nnetto\nmese,"Camping Village Du Parc, Lazise,Italy is looki...",[Italy],,
8,2099681,Irlanda,11,Customer Service Representative in Athens,Non specificato,Italiano fluente; Inglese buono,Da definire,Responsibilities: Solving customers queries by...,[Ireland],,
9,12091902000474,Norvegia,1,Dispatch personnel,Maggio – agosto 2019,Inglese fluente + Vedi testo,Da definire,The Dispatch Team works outside in all weather...,[Norway],5.0,5.0


## 3. Required languages

Now we will try to extract required languages. 


### 3.1 function reqlan

✪✪✪ First implement function `reqlan` that given a string from column `'Required language'` produces a dictionary with extracted languages and associated level code in CEFR standard (Common European Framework of Reference for Languages).

Example:

```python
>>> reqlan("Italiano; Francese fluente; Spagnolo buono")
{'italian': 'C1', 'french': 'C1', 'spanish': 'B2'}
```

To know what italian words are to be translated to, use dictionaries provided in the following cell.

See tests for more cases to handle.

<div class="alert alert-warning">

**WARNING 1**: function takes a **single** string !!

</div>

<div class="alert alert-warning">

**WARNING 2: BE VERY CAREFUL WITH NaN input !**
</div>

Function might also take a NaN value (`math.nan` or `np.nan` they are the same), in which case it should RETURN an empty dictionary:


```python
>>> reqlan(np.nan)
{}
```

If you are checking for a NaN, **DO NOT** write 

```python
if text == np.nan:   # WRONG !
```

To see why, do read [NaNs and Infinities section in Numpy Matrices worksheet](https://sciprog.davidleoni.it/matrices-numpy/matrices-numpy-sol.html#NaNs-and-infinities) !
 


In [14]:

languages = {
 'italiano':'italian',
 'tedesco':'german',
 'francese':'french',
 'inglese':'english',
 'spagnolo':'spanish',
}

lang_levels = {
    'discreto':'B1',
    'buono':'B2',
    'fluente':'C1',
}

def reqlan(text):
    
    
    import math
    if type(text) != str and math.isnan(text):
        return {}
    
    ret = {}
    ntext = text.lower().replace('+ vedi testo', '')
    ntext = ntext.replace('e/o','; ')
    ntext = ntext.replace(' e ','; ')
    words = ntext.replace(';','').split(' ')
    
    found_langs = []
    for w in words:
        if w in languages:
            found_langs.append(w)
        if w in lang_levels or (w[:-1] +'e' in lang_levels):
            if w in lang_levels:
                label = lang_levels[w]
            else:
                label = lang_levels[w[:-1] + 'e']
            for lang in found_langs:                
                ret[languages[lang]] = label
            found_langs = []  # reset 

    return ret
    

# different languages may have different skills
assert reqlan("Italiano fluente; Inglese buono") == {'italian': 'C1',
                                                     'english': 'B2'}


# a sequence of languages terminating with a level is assumed to have that same level
assert reqlan("Inglese; italiano; francese fluente") == {'english': 'C1',
                                                         'italian':'C1',
                                                         'french' : 'C1'}

#  semicolon absence shouldn't be a problem
assert reqlan("Tedesco italiano discreto") == {
                                                'german':'B1',
                                                'italian': 'B1'
                                              }


# we can have multiple sequences
assert reqlan("Italiano; Francese fluente; Spagnolo buono") == {'italian': 'C1',
                                                                'french': 'C1',
                                                                'spanish': 'B2'}
# text after plus needs to be removed
assert reqlan("Inglese fluente + Vedi testo") == {'english': 'C1'}

# plural.
# NOTE: to do this, assume all plurals in the world 
# are constructed by substituing 'i' to last character of singular words
assert reqlan("Tedesco e italiano fluenti") == {'german':'C1',
                                                'italian':'C1'}

# special case: we ignore codes in parentheses and just put B2
assert reqlan("Inglese Buono (B1-B2); Tedesco base") == {'english': 'B2'}

# e/o:   and / or case. We simplify and just list them as others

assert reqlan("Tedesco fluente; francese e/o italiano buono") == { 'german':'C1',
                                                                   'french':'B2',
                                                                   'italian':'B2'
                                                                  }
# of course there is a cell which is NaN  :P
assert reqlan(np.nan) == {}

In [14]:

languages = {
 'italiano':'italian',
 'tedesco':'german',
 'francese':'french',
 'inglese':'english',
 'spagnolo':'spanish',
}

lang_levels = {
    'discreto':'B1',
    'buono':'B2',
    'fluente':'C1',
}

def reqlan(text):
    raise Exception('TODO IMPLEMENT ME !')

# different languages may have different skills
assert reqlan("Italiano fluente; Inglese buono") == {'italian': 'C1',
                                                     'english': 'B2'}


# a sequence of languages terminating with a level is assumed to have that same level
assert reqlan("Inglese; italiano; francese fluente") == {'english': 'C1',
                                                         'italian':'C1',
                                                         'french' : 'C1'}

#  semicolon absence shouldn't be a problem
assert reqlan("Tedesco italiano discreto") == {
                                                'german':'B1',
                                                'italian': 'B1'
                                              }


# we can have multiple sequences
assert reqlan("Italiano; Francese fluente; Spagnolo buono") == {'italian': 'C1',
                                                                'french': 'C1',
                                                                'spanish': 'B2'}
# text after plus needs to be removed
assert reqlan("Inglese fluente + Vedi testo") == {'english': 'C1'}

# plural.
# NOTE: to do this, assume all plurals in the world 
# are constructed by substituing 'i' to last character of singular words
assert reqlan("Tedesco e italiano fluenti") == {'german':'C1',
                                                'italian':'C1'}

# special case: we ignore codes in parentheses and just put B2
assert reqlan("Inglese Buono (B1-B2); Tedesco base") == {'english': 'B2'}

# e/o:   and / or case. We simplify and just list them as others

assert reqlan("Tedesco fluente; francese e/o italiano buono") == { 'german':'C1',
                                                                   'french':'B2',
                                                                   'italian':'B2'
                                                                  }
# of course there is a cell which is NaN  :P
assert reqlan(np.nan) == {}

### 3.2 Languages column

✪ Now add the `languages` column using the previously defined `reqlan` function:

In [15]:
# write here

offers['Languages'] = offers['Required languages'].transform(reqlan)

In [15]:
# write here



In [16]:
print()
print("         *******************    SOLUTION OUTPUT   ***********************")
offers


         *******************    SOLUTION OUTPUT   ***********************


Unnamed: 0,Reference,Workplace,Positions,Qualification,Contract type,Required languages,Gross retribution,Offer description,Workplace Country,From,To,Languages
0,18331901000024,Norvegia,6,Restaurant staff,Tempo determinato da maggio ad agosto,Inglese fluente + Vedi testo,Da 3500\nFr/\nmese,"We will be working together with sales, prepar...",[Norway],5.0,8.0,{'english': 'C1'}
1,083PZMM,Francia,1,Assistant export trilingue italien et anglais ...,Non specificato,Inglese; italiano; francese fluente,Da definire,Vos missions principales sont les suivantes : ...,[France],,,"{'french': 'C1', 'english': 'C1', 'italian': '..."
2,4954752,Danimarca,1,Italian Sales Representative,Non specificato,Inglese; Italiano fluente,Da definire,"Minimum 2 + years sales experience, preferably...",[Denmark],,,"{'english': 'C1', 'italian': 'C1'}"
3,-,Berlino\nTrento,1,Apprendista perito elettronico; Elettrotecnico,Inizialmente contratto di apprendistato con po...,Inglese Buono (B1-B2); Tedesco base,Min 1000\nMax\n1170\n€/mese,Ti stai diplomando e/o stai cercando un primo ...,[],,,{'english': 'B2'}
4,10531631,Svezia,1,Italian speaking purchase,Non specificato,Inglese; italiano fluente,Da definire,"This is a varied Purchasing role, where your m...",[Sweden],,,"{'english': 'C1', 'italian': 'C1'}"
5,51485,Islanda,1,Pizza chef,Tempo determinato,Inglese Buono,Da definire,Job details/requirements: Experience in making...,[Iceland],,,{'english': 'B2'}
6,4956299,Danimarca,1,Regional Key account manager - Italy,Non specificato,Inglese; italiano fluente,Da definire,Requirements: possess good business acumen; ar...,[Denmark],,,"{'english': 'C1', 'italian': 'C1'}"
7,-,Italia\nLazise,1,Receptionist,Non specificato,Inglese; Tedesco fluente + Vedi testo,Min 1500€\nMax\n1800€\nnetto\nmese,"Camping Village Du Parc, Lazise,Italy is looki...",[Italy],,,"{'english': 'C1', 'german': 'C1'}"
8,2099681,Irlanda,11,Customer Service Representative in Athens,Non specificato,Italiano fluente; Inglese buono,Da definire,Responsibilities: Solving customers queries by...,[Ireland],,,"{'english': 'B2', 'italian': 'C1'}"
9,12091902000474,Norvegia,1,Dispatch personnel,Maggio – agosto 2019,Inglese fluente + Vedi testo,Da definire,The Dispatch Team works outside in all weather...,[Norway],5.0,5.0,{'english': 'C1'}
