## **3. Pondération statistique** (TF-IDF / OKapiBM25)  

https://stackoverflow.com/questions/46580932/calculate-tf-idf-using-sklearn-for-n-grams-in-python  
http://scikit-learn.sourceforge.net/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn-feature-extraction-text-tfidfvectorizer  
https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction

https://pypi.org/project/rank-bm25/

In [34]:
path = '../04-filtrage/output/'
acteur = 'msss'
tag = ''

if tag:
    csv_file = acteur + '_' + tag + '_significant-collocations.csv'

else:
    csv_file = acteur + '_significant-collocations.csv'

### **Lire le vocabulaire** (termes retenus au prétraitement)

In [35]:
from pandas import *

with open(path+csv_file, encoding='utf-8') as f:
    csv = read_csv(f)[["Collocation", "Structure syntaxique", "Fréquence"]] # "LLR", "p-value"]]

csv

Unnamed: 0,Collocation,Structure syntaxique,Fréquence
0,services sociaux,NOM ADJ,10268
1,ministère de la santé,NOM PRP DET:ART NOM,3698
2,santé publique,NOM ADJ,3559
3,indicateurs de gestion,NOM PRP NOM,2876
4,répertoire des indicateurs,NOM PRP:det ADJ,2542
...,...,...,...
37594,igg adapté,NOM VER:pper,4
37595,sens du piq directive,NOM PRP:det NOM ADJ,4
37596,jeunes ayant un tsa,NOM VER:ppre DET:ART NOM,4
37597,priorités nationales pour l'année,NOM ADJ PRP DET:ART NOM,4


In [36]:
vocabulaire = [t.lower() for t in list(csv['Collocation'])]

In [37]:
print('On a un vocabulaire de {} formes.'.format(len(vocabulaire)))
vocabulaire

On a un vocabulaire de 37599 formes.


['services sociaux',
 'ministère de la santé',
 'santé publique',
 'indicateurs de gestion',
 'répertoire des indicateurs',
 'répertoire des indicateurs de gestion',
 'pandémie de la covid',
 'santé et services sociaux',
 'santé et services',
 'gestion en santé',
 'gestion en santé et services sociaux',
 'gestion en santé et services',
 'indicateurs de gestion en santé',
 'réseau de la santé',
 'soutien à domicile',
 'méthode de calcul',
 'bilan de la dernière journée',
 'bilan de la dernière',
 'santé mentale',
 'prélèvements réalisés',
 'santé physique',
 'disponibilité des données',
 'soins intensifs',
 'décision de la demande',
 'ministre de la santé',
 'institut national',
 'nombre total',
 'santé publique du québec',
 'année financière',
 'gestionnaire principal',
 'institut national de santé',
 'institut national de santé publique',
 'doses administrées',
 'jeunes en difficulté',
 'services sociaux msss',
 'renseignements administratifs',
 'objectif cible',
 "statut de l'indicat

### **Lire le corpus**

In [38]:
import os, shutil, re
from pathlib import Path
from os import path
from pandas import *

base_path = '../03-corpus/2-data/1-fr/'
if tag:
    base_path = path.join(base_path, acteur, acteur + '_' + tag + '.csv')

else:
    base_path = path.join(base_path, acteur +  '.csv')
        
with open(base_path, "r", encoding = "UTF-8") as f:
    data = read_csv(base_path, sep=',')
    text = data['text'].tolist()

In [39]:
text = text[:round(len(text))]

nb_docs = len(text)

print("On a donc un corpus de {} documents.".format(nb_docs))

On a donc un corpus de 4859 documents.


### **Nettoyage**

In [40]:
corpus = [str(t).strip('\n').lower().replace('’', '\'') for t in text]
    
punct = '[!#$%&\(\)•►*+,-\/:;<=>?@[\]^_{|}~©«»—“”–—]'
spaces = '\s+'
postals = '([a-zA-Z]+\d+|\d+[a-zA-Z]+)+'
phones = '\d{3}\s\d{3}-\d{4}' #très simple (trop)

corpus = [re.sub(punct, ' ', t).replace("' ", "'" ).replace("'", "'") for t in corpus]
corpus = [re.sub(spaces, ' ', t) for t in corpus]
corpus = [str(t).strip('\n').lower().replace('’', '\'') for t in corpus]
corpus = [re.sub(spaces, ' ', t) for t in corpus]
corpus = [re.sub(phones, ' ', t) for t in corpus]
corpus = [re.sub(postals, ' ', t) for t in corpus]
corpus = [re.sub(punct, ' ', t) for t in corpus]
corpus = [t.replace("  ", " " ) for t in corpus]

### **Appliquer le prétraitement**
Si les termes passées comme vocabulaire sont lemmatisés, changer le paramètre lem pour True au moment d'appliquer la fonction nlp(corpus)  
Le TfIdfVectorizer de sklearn va extraire lui-mêmes les ngrammes, faire le filtrage des mots fonctionnels et calculer le tf-idf pour nos termes d'intérêt ;  
Or, si les termes qu'on lui donne comme vocabulaire ont été lemmatisés, on veut donc aussi lui passer un corpus lemmatisé.

In [41]:
import nltk
from nltk.tokenize import RegexpTokenizer
from french_lefff_lemmatizer.french_lefff_lemmatizer import FrenchLefffLemmatizer

def nlp(corpus, lem=False): 
    if not lem:
        # Tokenisation
        tokenizer = RegexpTokenizer(r"\w\'|\w+")

        tokens = [tokenizer.tokenize(doc) for  doc in corpus]
        len_corpus = len(nltk.flatten(tokens))
        print("Avec le RegExpTokenizer, notre corpus contient {} tokens.".format(len_corpus))

        return tokens

    else:
        # POS tagging
        input = [" ".join(nltk.flatten(doc)).replace("' ", "'") for doc in tokens]
        import treetaggerwrapper
        tagger = treetaggerwrapper.TreeTagger(TAGLANG='fr')

        path = '../04-filtrage/mapping_treeTagger_lefff.csv'

        with open(path) as f:
            csv = read_csv(f)

        treeTag = [term for term in csv['TreeTagger'].tolist()] 
        lefff = [term for term in csv['Lefff'].tolist()]

        mapping = {term : lefff[treeTag.index(term)] for term in treeTag}

        tagged= [tagger.tag_text(doc) for doc in input]

        tuples_doc = []
        for doc in tagged:
            tuples = []
            for t in doc:
                token = t.split('\t')[0]
                pos = mapping[t.split('\t')[1]]

                tuples.append([token, pos])
            tuples_doc.append(tuples)

        #Lemmatisation
        lemmatizer = FrenchLefffLemmatizer()
        docs_lemmas = []

        for doc in tuples_doc:
            doc_lemma = []
            for t in doc:
                term_lemmatized = ""
                if(lemmatizer.lemmatize(t[0], t[1]) == []):
                    term_lemmatized = lemmatizer.lemmatize(t[0])
                else:
                    term_lemmatized = lemmatizer.lemmatize(t[0], t[1])[0][0] # [0][0] pour avoir le lemme seul et non (lemme, pos)
            
                if len(term_lemmatized) >1 :
                    doc_lemma.append(term_lemmatized)
            docs_lemmas.append(doc_lemma)

        docs_lemmas = [" ".join(doc) for doc in docs_lemmas]

        return docs_lemmas

In [42]:
# corpus = nlp(corpus)

In [43]:
file_path = '../04-filtrage/mwe_stopwords.txt'

with open (file_path, 'r', encoding='utf-8') as f:
    mwe_sw = [t.lower().strip('\n') for t in f.readlines()]

In [44]:
from sklearn.feature_extraction.text import TfidfVectorizer

# max_df : ignore words that appear in 85% of documents, 
# min df:  ignore words that appear in less than 1% of documents 
# vocabulary = vocabulaire

# Sans utiliser le vocabulaire
# tfidf = TfidfVectorizer(min_df=0.1, stop_words=None, ngram_range=(2,4), max_df=0.85, use_idf=True)

# def identity_tokenizer(text):
#     return text
# tokenizer=identity_tokenizer

# vocabulary = vocabulaire
tfidf = TfidfVectorizer(vocabulary = set(vocabulaire),ngram_range=(1,12), use_idf=True, lowercase=False)
tfs = tfidf.fit_transform(corpus)

KeyboardInterrupt: 

In [None]:
features_names = tfidf.get_feature_names_out()
corpus_index = [corpus.index(n) for n in corpus]

import pandas as pd
df = pd.DataFrame(tfs.T.todense(), index=features_names, columns=corpus_index).transpose()

In [None]:
df

Unnamed: 0,abitibi témiscamingue,accès aux services,accès aux services de santé,accès à un service,accès à un service en clsc,accès à un service en clsc dans les délais,aide et soutien,aide et soutien à l'entourage,aide et soutien à la désintoxication,aide et soutien à la récupération,...,établissements du réseau,établissements du réseau de la santé,établissements du réseau de la santé et des services,établissements du réseau de la santé et des services sociaux,évaluation de la candidature,évaluation des avantages,évaluation des avantages d'ensemble,évaluation des conducteurs,évaluation des conducteurs saaq,évaluation des conducteurs saaq dernière
0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
160,0.0,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.080817,0.080817,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0
159,0.0,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.080817,0.080817,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0
162,0.0,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.080817,0.080817,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0
163,0.0,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.080817,0.080817,0.080817,0.080817,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# from pathlib import Path

# base_path = '../05-transformation/' + acteur + '/'
# Path(base_path).mkdir(parents=True, exist_ok=True)

# if sous_corpus:
#     path = base_path + tag + '/'
#     titre = tag

# else:
#     titre = acteur

# df.to_csv(base_path + titre + '_matrice-TFIDF.csv')

In [None]:
terms_weighted = []
rows, cols = tfs.nonzero()
for row, col in zip(rows,cols):
    terms_weighted.append([features_names[col], tfs[row,col]])

terms_weighted = DataFrame(terms_weighted, columns=['Collocation', 'TF-IDF'])
terms_weighted.sort_values(["TF-IDF"], 
                    axis=0,
                    ascending=[False], 
                    inplace=True)

In [None]:
terms_weighted = terms_weighted.drop_duplicates(keep='first')

terms_weighted

Unnamed: 0,Collocation,TF-IDF
142,jeu pathologique,0.950191
1222,abitibi témiscamingue,0.908013
707,centre de santé,0.859981
697,côte nord,0.833842
688,abitibi témiscamingue,0.799067
...,...,...
392,ministère de la santé et des services,0.028359
403,alcool drogues jeu,0.028359
170,alcool drogues,0.027397
394,jeu pathologique,0.024043


In [None]:
terms_weighted = pd.merge(csv, terms_weighted, on='Collocation').drop_duplicates(
  subset = ['Collocation', 'Fréquence'],
  keep = 'first').reset_index(drop = True)

In [None]:
terms_weighted

Unnamed: 0,Collocation,Structure syntaxique,Fréquence,TF-IDF
0,répertoire des ressources en dépendances,NOM PRP:det NOM PRP NOM,140,0.403545
1,fiche de la ressource,NOM PRP DET:ART NOM,118,0.267657
2,services sociaux,NOM ADJ,88,0.479329
3,durée du programme,NOM PRP:det NOM,85,0.391911
4,ressource certifiée en dépendances,NOM VER:pper PRP NOM,80,0.344905
...,...,...,...,...
259,intervention précoce en dépendance,NOM ADJ PRP NOM,6,0.080817
260,prévention de la rechute,NOM PRP DET:ART NOM,6,0.482743
261,réadaptation dans un délai,NOM PRP DET:ART NOM,6,0.080817
262,indicateurs associés aux egi,NOM VER:pper PRP:det NOM,6,0.080817


**Test : clustering (documents)**

In [None]:
def cluster_text(corpus):
    # vocabulary = vocabulaire
    vectorizer = TfidfVectorizer(vocabulary = set(vocabulaire), tokenizer=identity_tokenizer, ngram_range=(2,5), use_idf=True, lowercase=False, stop_words= mwe_sw)
    X = vectorizer.fit_transform(corpus)

    import matplotlib.pyplot as plt
    from sklearn.cluster import KMeans
    Sum_of_squared_distances = []
    K = range(2,10)
    for k in K:
       km = KMeans(n_clusters=k, max_iter=200, n_init=10)
       km = km.fit(X)
       Sum_of_squared_distances.append(km.inertia_)
    plt.plot(K, Sum_of_squared_distances, 'bx-')
    plt.xlabel('k')
    plt.ylabel('Sum_of_squared_distances')
    plt.title('Elbow Method For Optimal k')
    plt.show()

    print('How many clusters do you want to use?')
    true_k = int(input())
    model = KMeans(n_clusters=true_k, init='k-means++', max_iter=200, n_init=10)
    model.fit(X)

    labels=model.labels_
    clusters=pd.DataFrame(list(zip(text,labels)),columns=['title','cluster'])
    #print(clusters.sort_values(by=['cluster']))

    for i in range(true_k):
        print(clusters[clusters['cluster'] == i])
        clusters.to_csv('../06-clustering/' + acteur + '_clusters.csv')
        
    return

In [None]:
#cluster_text(corpus)

## **OKapi BM25**
https://hal.archives-ouvertes.fr/hal-00760158 

In [None]:
from rank_bm25 import BM25Okapi

In [None]:
corpus = nlp(corpus)
bm25 = BM25Okapi(corpus)

Avec le RegExpTokenizer, notre corpus contient 29706 tokens.


In [None]:
#tokenizer = RegexpTokenizer(r"\w\'|\w+")
tokenized_queries = [t.split() for t in set(vocabulaire)]

features_names = [t for t in set(vocabulaire)]
corpus_index = [corpus.index(n) for n in corpus]

tab = [bm25.get_scores(query) for query in tokenized_queries]
df = pd.DataFrame(tab, index=features_names, columns=corpus_index).transpose()

In [None]:
tokenized_queries

[['soutien', 'à', 'la', 'désintoxication', 'en', 'externe', 'réinsertion'],
 ['nord', 'du', 'québec'],
 ['service', 'en', 'clsc'],
 ['problèmes',
  "d'alcoolisme",
  'et',
  'de',
  'toxicomanie',
  'pour',
  'les',
  'problèmes'],
 ['services', 'sociaux', 'ententes', 'de', 'gestion'],
 ['attente', "d'un", 'service'],
 ['gestion', 'le', 'répertoire'],
 ['jeunes', 'à', 'faire'],
 ['matière', 'de', 'consommation', 'et', 'de', 'pratique', 'des', 'jha'],
 ['télécopieur', 'ressource', 'publique'],
 ['gestion', 'en', 'santé'],
 ['ministres', 'des', 'provinces', 'et', 'territoires'],
 ['matière', 'de', 'santé', 'mentale', 'et', 'de', 'traitement'],
 ['évaluation', 'des', 'avantages'],
 ['dépendance', 'en', 'centre', 'de', 'réadaptation', 'dans', 'un', 'délai'],
 ['québec', 'fiche'],
 ['humaniste', 'dernière'],
 ['estrie', 'fiche'],
 ['fédération', 'pour', "l'innovation", 'en', 'matière', 'de', 'santé'],
 ['capitale', 'nationale'],
 ['toxicomanie', 'pour', 'les', 'problèmes', 'de', 'jeu'],
 ['

In [None]:
df

Unnamed: 0,soutien à la désintoxication en externe réinsertion,nord du québec,service en clsc,problèmes d'alcoolisme et de toxicomanie pour les problèmes,services sociaux ententes de gestion,attente d'un service,gestion le répertoire,jeunes à faire,matière de consommation et de pratique des jha,télécopieur ressource publique,...,engagement social,âge pour découvrir,gouvernement fédéral,ministère de la santé et des services,entretien motivationnel,rencontre estivale,innovation en matière,numéro sans frais,campagne d'information et de sensibilisation,réseau de la santé
0,4.477143,0.898129,0.553512,8.050746,2.102319,0.000000,1.123891,5.434189,20.192904,0.000000,...,6.894985,5.025365,0.000000,9.766396,0.0,0.0,2.120745,0.0,11.958490,4.831060
1,6.020985,3.318077,1.895358,9.277304,4.637206,0.000000,1.308684,1.803576,10.950599,0.772407,...,0.000000,1.261930,4.013301,12.150063,0.0,0.0,1.895358,0.0,4.072666,8.074890
2,0.000000,1.606623,0.000000,2.814782,0.000000,0.000000,0.000000,0.000000,1.606623,0.000000,...,0.000000,0.000000,0.000000,1.606623,0.0,0.0,0.000000,0.0,1.606623,0.000000
0,4.477143,0.898129,0.553512,8.050746,2.102319,0.000000,1.123891,5.434189,20.192904,0.000000,...,6.894985,5.025365,0.000000,9.766396,0.0,0.0,2.120745,0.0,11.958490,4.831060
4,5.476973,1.166901,1.565952,7.186745,2.148735,0.000000,1.142209,5.682340,20.523067,0.000000,...,0.000000,5.041086,0.000000,8.921313,0.0,0.0,3.202148,0.0,11.406755,4.068124
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
160,4.747790,2.655602,7.094467,6.282793,15.766458,4.135697,7.949222,1.063688,8.147650,0.000000,...,0.000000,0.924442,1.804517,12.974884,0.0,0.0,1.999220,0.0,4.091342,7.946272
159,4.747790,2.655602,7.094467,6.282793,15.766458,4.135697,7.949222,1.063688,8.147650,0.000000,...,0.000000,0.924442,1.804517,12.974884,0.0,0.0,1.999220,0.0,4.091342,7.946272
162,4.747790,2.655602,7.094467,6.282793,15.766458,4.135697,7.949222,1.063688,8.147650,0.000000,...,0.000000,0.924442,1.804517,12.974884,0.0,0.0,1.999220,0.0,4.091342,7.946272
163,4.747790,2.655602,7.094467,6.282793,15.766458,4.135697,7.949222,1.063688,8.147650,0.000000,...,0.000000,0.924442,1.804517,12.974884,0.0,0.0,1.999220,0.0,4.091342,7.946272


In [None]:
#df.to_csv(base_path + titre + '_matrice-OkapiBM25.csv') # Si on veut avoir la matrice (mais le fichier peut être très volumineux)

In [None]:
terms_okapi = {term: df[term].max() for term in df}

In [None]:
terms_okapi

{'soutien à la désintoxication en externe réinsertion': 11.831030612265158,
 'nord du québec': 7.784250628570667,
 'service en clsc': 7.0944673420293975,
 "problèmes d'alcoolisme et de toxicomanie pour les problèmes": 14.666534162118145,
 'services sociaux ententes de gestion': 15.7664580483457,
 "attente d'un service": 4.135697077133087,
 'gestion le répertoire': 7.94922152746782,
 'jeunes à faire': 5.682339584698672,
 'matière de consommation et de pratique des jha': 20.52306693613287,
 'télécopieur ressource publique': 4.364461737961852,
 'gestion en santé': 10.031992769937602,
 'ministres des provinces et territoires': 10.57737579618191,
 'matière de santé mentale et de traitement': 15.450186487643254,
 'évaluation des avantages': 4.93309930239116,
 'dépendance en centre de réadaptation dans un délai': 14.554826038285725,
 'québec fiche': 3.271841348059141,
 'humaniste dernière': 4.168647152367172,
 'estrie fiche': 5.187118142360289,
 "fédération pour l'innovation en matière de san

In [None]:
tab = DataFrame(terms_okapi.items(), columns=['Collocation', 'OkapiBM25'])
tab

Unnamed: 0,Collocation,OkapiBM25
0,soutien à la désintoxication en externe réinse...,11.831031
1,nord du québec,7.784251
2,service en clsc,7.094467
3,problèmes d'alcoolisme et de toxicomanie pour ...,14.666534
4,services sociaux ententes de gestion,15.766458
...,...,...
364,rencontre estivale,4.405745
365,innovation en matière,7.201582
366,numéro sans frais,9.250429
367,campagne d'information et de sensibilisation,11.958490


In [None]:
tab.sort_values(["OkapiBM25"], 
                    axis=0,
                    ascending=[False], 
                    inplace=True)

tab = pd.merge(terms_weighted, tab, on="Collocation")
tab.sort_values(["OkapiBM25"], 
                    axis=0,
                    ascending=[False], 
                    inplace=True)

tab

Unnamed: 0,Collocation,Structure syntaxique,Fréquence,TF-IDF,OkapiBM25
170,indicateurs de gestion utilisés par le ministè...,NOM PRP NOM VER:pper PRP DET:ART NOM PRP DET:A...,6,0.080817,27.872834
135,règlement sur la certification des ressources ...,NOM PRP DET:ART NOM PRP:det NOM ADJ,8,0.173450,21.886570
183,indicateurs de gestion utilisés par le ministère,NOM PRP NOM VER:pper PRP DET:ART NOM,6,0.080817,21.671002
92,répertoire des indicateurs de gestion en santé,NOM PRP:det NOM PRP NOM PRP NOM,12,0.161635,21.502882
105,indicateurs de gestion en santé et services,NOM PRP NOM PRP NOM KON NOM,12,0.161635,21.142464
...,...,...,...,...,...
232,jours ouvrables,NOM ADJ,6,0.080817,2.311282
6,réinsertion sociale,NOM ADJ,72,0.554374,2.007392
5,ressource certifiée,NOM VER:pper,80,0.344905,1.860217
10,jours approche,NOM VER:pres,64,0.389097,1.708479


In [None]:
base_path = '../05-transformation/'

In [None]:
if tag:
    file_path = base_path + acteur + '_' + tag + '_weighting_OKapiBM25.csv'

else: 
    file_path = base_path + acteur  + '_weighting_OKapiBM25.csv'
tab.to_csv(file_path)

In [None]:
tab

Unnamed: 0,Collocation,Structure syntaxique,Fréquence,TF-IDF,OkapiBM25
170,indicateurs de gestion utilisés par le ministè...,NOM PRP NOM VER:pper PRP DET:ART NOM PRP DET:A...,6,0.080817,27.872834
135,règlement sur la certification des ressources ...,NOM PRP DET:ART NOM PRP:det NOM ADJ,8,0.173450,21.886570
183,indicateurs de gestion utilisés par le ministère,NOM PRP NOM VER:pper PRP DET:ART NOM,6,0.080817,21.671002
92,répertoire des indicateurs de gestion en santé,NOM PRP:det NOM PRP NOM PRP NOM,12,0.161635,21.502882
105,indicateurs de gestion en santé et services,NOM PRP NOM PRP NOM KON NOM,12,0.161635,21.142464
...,...,...,...,...,...
232,jours ouvrables,NOM ADJ,6,0.080817,2.311282
6,réinsertion sociale,NOM ADJ,72,0.554374,2.007392
5,ressource certifiée,NOM VER:pper,80,0.344905,1.860217
10,jours approche,NOM VER:pres,64,0.389097,1.708479
