<font size="3">

## Sistema di auto-completamento

Il progetto consiste nel realizzare un sistema di auto completamento di parole che rappresentano delle skill tecniche.

Il sistema suggerisce all'utente la skill con la relazione maggiore rispetto a quelle già inserite, è anche in grado di suggerire il completamento di una skill, mentre l'utente la sta inserendo, basandosi sullo stesso principio.

### Workflow

1. Se l'utente non ha ancora inserito nessuna skills (context = []), allora si suggeriscono le skill più simili all'input inserito


2. Per le successive stringe (context $\neq$ []) si considerano le skill inserite fino a quel momento: 
    - si calcola la similarità dell'input inserito dall'utente e le skills non nel context:
    
    $$A = \{(s, sim(i, s)) \: | \: s \in S \: \& \: s \notin C\}$$<br>
    - per ogni skills non nel context, si calcola la media delle similarità tra quella skills e quelle nel context:
    
    $$B = \{(s, sim_{avg}(s, C)) \: | \: s \in S \: \& \: s \notin C\}$$
    
    dove $sim_{avg}(s, C) = \frac{\sum_{c \in C}sim(s, c)}{|C|}$
    <br>
    - per ogni skills non nel context si calcola la similarità complessiva come:
    
    $$sim_s = \alpha \cdot sim_A + (1-\alpha) \cdot sim_B \qquad \forall s \in S \:\&\: s \notin C$$<br>
    dove $(s, sim_A) \in A$ e $(s, sim_B) \in B$


3. Si suggeriscono anche skills senza tener conto dell'input inserito e considerando solo il context, in questo caso le similarità sono calcolate utilizzando solo $B$.
</font>

In [1]:
import numpy as np                                                 # For some operations
import pandas as pd
from gensim.models import FastText                                 # Load FastText vector model
from sklearn.metrics.pairwise import cosine_similarity             # Compute similarity matrix
from IPython.core.display import display                           # Display UI components
from ipywidgets import Layout, Textarea, HBox, VBox, Label, Button, interact# Widgets and components for UI

In [2]:
class FormUI:
    """
    Class for UI, manage and show the UI components as the text area na buttons.

    ...
    Modules
    -------
    IPython.core.display
    ipywidgets.Layout
    ipywidgets.Textarea
    ipywidgets.HBox
    ipywidgets.VBox
    ipywidgets.Label
    ipywidgets.Button
    

    Attributes
    ----------
    layout_button : ipywidgets.Layout
        The buttons layout
    layout_buttonBox : ipywidgets.Layout
        The buttons box layout
    layout_textArea : ipywidgets.Layout
        The text area layout
    layout_textAreaBox : ipywidgets.Layout
        The text area box layout
    suggest_buttons : list
        List of buttons for suggestions based context and input
    suggest_buttons_context : list
        List of buttons for suggestions based only context
    textArea : ipywidgets.TextArea
        The text area widget
    suggests : ipywidgets.HBox
        Horizontal box for suggest_buttons
    suggests_context : ipywidgets.HBox
        Horizontal box for suggest_buttons_context
    form : ipywidgets.VBox
        Vertical box for the components

    Methods
    -------
    __init__()
        Constructor of the class, it initializes the various components
    show_form(sound=None)
        Display the form
    init_button(description, button_style, tooltip)
        Inizilize button with description and text for tooltip
    close_suggest_buttons()
        Closes the buttons on the suggest_buttons list
    close_suggest_buttons_context()
        Closes the buttons on the suggest_buttons_context list
    """
    
    def __init__(self):
        """
        Constructor of the class, it initializes the various components.
        
        """
        
        # Layouts
        self.layout_button = Layout(width='25%', height='50px')
        self.layout_buttonBox = Layout(flex='0 1 auto', height='100px', width='90%')
        self.layout_textArea = Layout(height='100px', width='auto')
        self.layout_textAreaBox = Layout(margin='0 0 20px 0')
        
        # Lists button
        self.suggest_buttons = list()
        self.suggest_buttons_context = list()
        
        # Text area
        self.textArea = Textarea(
            value='',
            placeholder='Type something',
            disabled=False,
            tooltip='Enter the name of the Text field',
            height='90px',
            layout=self.layout_textArea
        )
        
        # Box for suggest_buttons
        self.suggests = HBox(self.suggest_buttons, layout=self.layout_buttonBox)
        
        # Box for suggest_buttons_context
        self.suggests_context = HBox(self.suggest_buttons_context, layout=self.layout_buttonBox)
        
        
        # Box for all components
        self.form = VBox([VBox([Label(value='Skills:'), self.textArea], layout=self.layout_textAreaBox), 
                          HBox([Label(value='Suggests:'), self.suggests]),
                          HBox([Label(value='Similar skills:'), self.suggests_context])])
        
    def show_form(self):
        """
        Display the UI form.
        
        """
        display(self.form)
        
        
    
    def init_button(self, description, button_style, tooltip):
        '''
        Inizilize button with description and text for tooltip.

        Parameters
        ----------
        description : str
            The button's description text
        button_style : str
            The button's style, can be 'success', 'info', 'warning', 'danger' or ''
        tooltip : str
            The button's tooltip text
        
        Returns
        -------
        ipywidgets.Button
            The new button
        
        '''
        
        # Create button
        b = Button(
            description=description,
            disabled=False,
            button_style=button_style,  # 'success', 'info', 'warning', 'danger' or ''
            tooltip=tooltip,
            layout=self.layout_button
        )

        return b
    
    
    def close_suggest_buttons(self):
        '''
        Closes the buttons on the suggest_buttons list.
        
        '''
        for w in self.suggest_buttons:
            w.close()
        self.suggest_buttons = list()
    
    
    def close_suggest_buttons_context(self):
        '''
        Closes the buttons on the suggest_buttons_context list.
        
        '''
        for w in self.suggest_buttons_context:
            w.close()
        self.suggest_buttons_context = list()
   
    

In [14]:
class AutoCompleteManager:
    
    """
    Class for UI, manage and show the UI components as the text area na buttons.

    ...
    Modules
    -------
    numpy
    pandas
    gensim.models.FastText
    sklearn.metrics.pairwise.cosine_similarity
    

    Attributes
    ----------
    skills_list : list
        The list of skills
    model : gensim.models.FastText
        The gensim FastText model 
    cos_sim_matrix : pandas.DataFrame
        Similarity matrix as dataframe
    alpha : float
        The alpha parameter for similarity weighing
    num_suggests : int
        The number of suggestion to show
    FormUI : FormUI
        The UI form

    Methods
    -------
    __init__(skills_list, model, cos_sim_matrix, alpha=0.8, num_suggests=4)
        Constructor of the class, it initializes the various components
    get_skills_input_similarity(word, context):
        Compute the similarity between the user's input and the skills that are not in context
    get_skills_context_similarity(context):
        Compute the similarity between the context skills and other skills
    get_best_similarity_skill(new_input, context):
        Compute the overall similarity and return the most similarity skills
    suggest_interest_skill(context)
        Suggests the skills most similar to the context
    add_skill(btn_object)
        Add the skill chosen by the user to the text area
    suggests_manager(textArea)
        Listener function for text area user's interaction
    show_form()
        Show the UI form and bind the text area to its listener suggests_manager
    """
    
    def __init__(self, skills_list, model, cos_sim_matrix, alpha=0.8, num_suggests=4):
        '''
        Constructor of the class, it initializes the various components.
        
        Parameters
        ----------
        skills_list : list
            The list of skills
        model : gensim.models.FastText
            The gensim FastText model
        cos_sim_matrix : pandas.DataFrame
            Similarity matrix as dataframe
        alpha : float, optional
            The alpha parameter for similarity weighing (default is 0.8)
        num_suggests : int, optional
            The number of suggestion to show (default is 4)

        Raises
        ------
        AssertionError
            If the alpha parameter must be greater than 0 and less than 1.
        AssertionError
            If the model parameter must be a FastText instance.
        AssertionError
            If the num_suggests parameter must be a int.
        AssertionError
            If the cos_sim_matrix parameter must be a pandas Dataframe.
        
        '''
        # Input assert check
        assert 0 <= alpha <= 1, "The alpha parameter must be greater than 0 and less than 1"
        assert isinstance(model, FastText), "The model parameter must be a FastText instance"
        assert isinstance(num_suggests, int), "The num_suggests parameter must be a int"
        assert isinstance(cos_sim_matrix, pd.DataFrame), "The cos_sim_matrix parameter must be a pandas Dataframe"
        
        # Set the parameters
        self.alpha = alpha
        self.num_suggests = num_suggests
        self.skills_list = skills_list
        self.model = model
        self.cos_sim_matrix = cos_sim_matrix
        self.FormUI = FormUI()
        
    
    def get_skills_input_similarity(self, word, context):
        '''
        Compute the similarity between the user's input and the skills that are not in context.
        
        Parameters
        ----------
        word : str
            A single word
        context : list
            The skills context
        
        Returns
        -------
        dict
            A dictionary with keys the skills and values their similarity with input word        
        
        '''
        return {k: self.model.wv.similarity(word, k) for k in self.skills_list if k not in context}
        
        
    def get_skills_context_similarity(self, context):
        '''
        Compute the similarity between the context skills and other skills.
        
        Parameters
        ----------
        context : list
            The skills context
        
        Returns
        -------
        dict
            A dictionary with keys the skills and values the mean of similarity between the key
            and the skills in the context        
        
        '''
        res = dict()
        for s in self.skills_list:
            if s not in context:
                res[s] = np.array([self.model.wv.similarity(s, c) for c in context]).mean()
        return res
    
    
    def get_best_similarity_skill(self, new_input, context):
        '''
        Compute the overall similarity and return the most similarity skills.
        
        Parameters
        ----------
        new_input : str
            The user's input
        context : list
            The skills context
        
        Returns
        -------
        dict
            A dictionary with the first num_suggests skills more similar to both the context and the 
            user's input       
        
        '''
        # Similarity between context and input
        similarity_with_input = self.get_skills_input_similarity(new_input, context)
        res_sim = dict()
        
        # Average similarity between every skill and the skills in the context
        similarity_with_context = self.get_skills_context_similarity(context)
        
        # Overall similarity
        for skill in similarity_with_context.keys():
            res_sim[skill] = self.alpha*similarity_with_input[skill]+(1-self.alpha)*similarity_with_context[skill]
        
        return {k: v for k, v in sorted(res_sim.items(), key=lambda item: -item[1])[:self.num_suggests]}
        
        
    def suggest_interest_skill(self, context):
        '''
        Suggests the skills most similar to the context, create and show their buttons.
        
        Parameters
        ----------
        context : list
            The skills context
             
        '''
        #ll = list()
        #for word in [w for w in self.cos_sim_matrix.columns if w not in context]:
         #   ll.append((word, self.cos_sim_matrix.loc[word, context].mean()))
        
        # Get similarity
        similarity_context = self.get_skills_context_similarity(context)
        similarity_context = {k: v for k, v in sorted(similarity_context.items(), 
                                                      key=lambda item: -item[1])[:self.num_suggests]}
        
        # Create buttons
        for index, value in similarity_context.items():
            b = self.FormUI.init_button(description=f'{index} - \n{round(value * 100, 2)}%',
                                        button_style='info',
                                        tooltip=f'{index} - {round(value * 100, 2)}%')
            
            b.on_click(self.add_skill)
            self.FormUI.suggest_buttons_context.append(b)
        
        # Add buttons to form
        self.FormUI.suggests_context.children=tuple(self.FormUI.suggest_buttons_context)
    
    
    def add_skill(self, btn_object):
        '''
        Add the skill chosen by the user to the text area.
        
        Parameters
        ----------
        btn_object : ipywidgets.Button
            The pressed button    
        
        '''
        # Get the text area values
        skill_list = self.FormUI.textArea.value.split(', ')
        
        # Add new skill to the text area value
        skill_list[-1] = btn_object.description.split(" -")[0]
        new_contest = ', '.join(skill_list) + ', '
        self.FormUI.textArea.value = new_contest
        
        # Remove the buttons with old suggest
        self.FormUI.close_suggest_buttons()
        self.FormUI.close_suggest_buttons_context()
        
        # Compute the similarity with only the context
        self.suggest_interest_skill(skill_list)
        
        
    def suggests_manager(self, textArea):
        '''
        Listener function for text area user's interaction.
        
        Parameters
        ----------
        textArea : ipywidgets.TextArea
            The text area widget   
        
        '''
        # Remove old buttons
        self.FormUI.close_suggest_buttons()
        
        # Get the context
        old_input = textArea['old']
        context = old_input.split(', ')[:-1]
        
        # Get the new input
        new_input = textArea['new']
        
        if len(new_input)>3:
            if context == []:
                # If context is empty, compute similarity only with the new input
                self.FormUI.close_suggest_buttons()
                similarity_input = self.get_skills_input_similarity(new_input, context)
                similarity_input = dict(sorted(similarity_input.items(), 
                                                    key=lambda x: -x[1])[:self.num_suggests])

                for index, value in similarity_input.items():
                    b = self.FormUI.init_button(description=f'{index} - \n{round(value * 100, 2)}%',
                                                button_style='success',
                                                tooltip=f'{index} - {round(value * 100, 2)}%')

                    b.on_click(self.add_skill)
                    self.FormUI.suggest_buttons.append(b)
            else:
                # If context is not empty, compute similarity with the new input and the context
                best_similarity = self.get_best_similarity_skill(new_input, context)
                
                for index, value in best_similarity.items():
                    b = self.FormUI.init_button(description=f'{index} - \n{round(value * 100, 2)}%',
                                                button_style='success',
                                                tooltip=f'{index} - {round(value * 100, 2)}%')
                    b.on_click(self.add_skill)
                    self.FormUI.suggest_buttons.append(b)
        
        # Show buttons with new suggest
        self.FormUI.suggests.children=tuple(self.FormUI.suggest_buttons)
        
    
    def show_form(self):
        '''
        Show the UI form and bind the text area to its listener suggests_manager.
        
        '''
        self.FormUI.show_form()
        self.FormUI.textArea.observe(self.suggests_manager, names='value')
    

In [4]:
skills_list = pd.read_excel("data/2020_06_09 Allocation to ONET.xlsx")['escoskill_level_3']
vectors = FastText.load_fasttext_format("data/ft_vectors_cbow_50_10_0_05.bin")
skills_vectors = {k: vectors.wv[k] for k in skills_list}
cos_sim_matrix = pd.DataFrame(cosine_similarity(np.array(list(skills_vectors.values()))),
                                   columns=list(skills_vectors.keys()),
                                   index=list(skills_vectors.keys()))

  


In [13]:
"big data" in skills_list

False

In [12]:
vectors.wv.most_similar("big data")

[('dataproc', 0.5866081118583679),
 ('dataand', 0.577002763748169),
 ('petabyte_scale_data', 0.5674172043800354),
 ('petabytes_data', 0.5668173432350159),
 ('spark_amazon', 0.5335162281990051),
 ('ofdata', 0.5292026400566101),
 ('bench_scale', 0.5282772183418274),
 ('akta', 0.5262991189956665),
 ('cell_culture_scientist', 0.5238085985183716),
 ('gda', 0.5196707248687744)]

In [15]:
t = AutoCompleteManager(skills_list, vectors, cos_sim_matrix, alpha=0.8, num_suggests=4)
t.show_form()

VBox(children=(VBox(children=(Label(value='Skills:'), Textarea(value='', layout=Layout(height='100px', width='…

In [29]:

import asyncio
def wait_for_change(widget, value):
    future = asyncio.Future()
    def getvalue(change):
        # make the new value available
        future.set_result(change.new)
        widget.unobserve(getvalue, value)
    widget.observe(getvalue, value)
    return future

from ipywidgets import IntSlider, Output
slider = IntSlider()
out = Output()

async def f():
    for i in range(10):
        out.append_stdout('did work ' + str(i) + '\n')
        x = await wait_for_change(slider, 'value')
        out.append_stdout('async function continued with value ' + str(x) + '\n')
asyncio.ensure_future(f())

slider

IntSlider(value=0)

In [30]:

out

Output(outputs=({'output_type': 'stream', 'name': 'stdout', 'text': 'did work 0\n'},))