# **Catégorisez automatiquement des questions**

## partie 3/8 : Prédiction de tags, approche non-supervisée

### <br> Proposition de mots clés, de type LDA avec visualisation en 2D des topics

<br>


### Importation des librairies, réglages


In [1]:
import os, sys, random
# from zipfile import ZipFile
import numpy as np
import pandas as pd
from pandarallel import pandarallel

# Visualisation
import matplotlib.pyplot as plt
# import seaborn as sns
import plotly.express as px

# Feature engineering
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Modify if necessary
num_cores = os.cpu_count()
print(f"\nNumber of CPU cores: {num_cores}")
pandarallel.initialize(progress_bar=False, nb_workers=6)



Number of CPU cores: 8
INFO: Pandarallel will run on 6 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


### Fonctions


In [2]:
def get_missing_values(df):
    """Generates a DataFrame containing the count and proportion of missing values for each feature.

    Args:
        df (pandas.DataFrame): The input DataFrame to analyze.

    Returns:
        pandas.DataFrame: A DataFrame with columns for the feature name, count of missing values,
        count of non-missing values, proportion of missing values, and data type for each feature.
    """
    # Count the missing values for each column
    missing = df.isna().sum()

    # Calculate the percentage of missing values
    percent_missing = df.isna().mean() * 100

    # Create a DataFrame to store the results
    missings_df = pd.DataFrame({
        'column_name': df.columns,
        'missing': missing,
        'present': df.shape[0] - missing,  # Count of non-missing values
        'percent_missing': percent_missing.round(2),  # Rounded to 2 decimal places
        'type': df.dtypes
    })

    # Sort the DataFrame by the count of missing values
    missings_df.sort_values('missing', inplace=True)

    return missings_df

# with pd.option_context('display.max_rows', 1000):
#   display(get_missing_values(df))


def quick_look(df, miss=True):
    """
    Display a quick overview of a DataFrame, including shape, head, tail, unique values, and duplicates.

    Args:
        df (pandas.DataFrame): The input DataFrame to inspect.
        check_missing (bool, optional): Whether to check and display missing values (default is True).

    The function provides a summary of the DataFrame, including its shape, the first and last rows, the count of unique values per column, and the number of duplicates.
    If `check_missing` is set to True, it also displays missing value information.
    """
    print(f'shape : {df.shape}')

    display(df.head())
    display(df.tail())

    print('uniques :')
    display(df.nunique())

    print('Doublons ? ', df.duplicated(keep='first').sum(), '\n')

    if miss:
        display(get_missing_values(df))



### fin du preprocessing


In [3]:
# import

train_bow_classic = pd.read_csv('./../../data/cleaned_data/0_preprocessed_text/train_bow_classic.csv', sep=',')
test_bow_classic = pd.read_csv('./../../data/cleaned_data/0_preprocessed_text/test_bow_classic.csv', sep=',')

quick_look(train_bow_classic)


shape : (43016, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss
43015,2016-03-19 18:27:38,Localizing string resources added via build.gr...,This is in continuation to an answer which hel...,"['android', 'android-studio', 'android-gradle-...",localize string resource add build gradle use,localize string resource add build gradle use ...,localize string resource add build.gradle,continuation answer help post add string resou...


uniques :


CreationDate    43012
title           43015
body            43016
all_tags        41627
title_nltk      42561
body_nltk       43016
title_spacy     37767
body_spacy      43011
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,43016,0.0,object
title,title,0,43016,0.0,object
body,body,0,43016,0.0,object
all_tags,all_tags,0,43016,0.0,object
body_nltk,body_nltk,0,43016,0.0,object
body_spacy,body_spacy,0,43016,0.0,object
title_nltk,title_nltk,1,43015,0.0,object
title_spacy,title_spacy,7,43009,0.02,object


In [4]:
missing = train_bow_classic.loc[(train_bow_classic['title_nltk'].isna()) |
                                (train_bow_classic['title_spacy'].isna()), :]

print (missing.index)
display(missing)


Index([4532, 8280, 12992, 14957, 22934, 24964, 25950], dtype='int64')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
4532,2014-10-13 16:31:47,Laravel Eloquent OR WHERE IS NOT NULL,I am using the Laravel Administrator package f...,"['php', 'sql', 'laravel', 'eloquent', 'adminis...",eloquent,eloquent use laravel administrator package sto...,,package story run issue display result delete ...
8280,2015-04-22 11:41:34,Why is FusedLocationApi.getLastLocation null,I am trying to get location by using FusedLoca...,"['android', 'android-4.4-kitkat', 'android-loc...",,null try get location use get location permiss...,,try location location permission file use perm...
12992,2013-08-09 14:16:44,Using IS NULL and COALESCE in OrderBy Doctrine...,I basically have the following (My)SQL-Query\n...,"['mysql', 'symfony', 'doctrine-orm', 'doctrine...",use coalesce doctrine,use coalesce doctrine query select order compa...,,follow address order company job target follow...
14957,2016-08-17 23:26:55,Spring Boot multipartfile always null,I am using Spring Boot version = '1.4.0.RC1' w...,"['java', 'spring-mvc', 'spring-boot', 'retrofi...",spring boot multipartfile,spring boot multipartfile use spring boot vers...,,version try use file upload controller use inf...
22934,2014-03-27 21:18:08,Sqlite NULL and unique?,I noticed that I can have NULL values in colum...,"['sql', 'sqlite', 'null', 'unique', 'unique-co...",null unique,null notice value column constraint col genera...,,notice value column constraint generate issue ...
24964,2014-02-24 20:47:00,MVC HttpPostedFileBase always null,I have this controller and what I am trying to...,"['c#', 'asp.net', 'asp.net-mvc', 'asp.net-mvc-...",httppostedfilebase,httppostedfilebase controller try send image c...,,controller try send image controller byte cont...
25950,2012-11-28 01:42:30,Android Notification PendingIntent Extras null,I am trying to send information from notificat...,"['android', 'android-intent', 'bundle', 'andro...",notification pendingintent extra null,notification pendingintent try send informatio...,,try send information notification activity act...


In [5]:
def fix_false_null_values(df):
    df.loc[(df['title_nltk'].isna()), 'title_nltk'] = 'null'
    df.loc[(df['title_spacy'].isna()), 'title_spacy'] = 'null'


fix_false_null_values(train_bow_classic)
fix_false_null_values(test_bow_classic)

# Check for null values in the entire DataFrame
null_values = train_bow_classic[train_bow_classic.isnull().any(axis=1)]

# Print the rows with null values
print(null_values)


Empty DataFrame
Columns: [CreationDate, title, body, all_tags, title_nltk, body_nltk, title_spacy, body_spacy]
Index: []


In [6]:
quick_look(train_bow_classic)


shape : (43016, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss
43015,2016-03-19 18:27:38,Localizing string resources added via build.gr...,This is in continuation to an answer which hel...,"['android', 'android-studio', 'android-gradle-...",localize string resource add build gradle use,localize string resource add build gradle use ...,localize string resource add build.gradle,continuation answer help post add string resou...


uniques :


CreationDate    43012
title           43015
body            43016
all_tags        41627
title_nltk      42562
body_nltk       43016
title_spacy     37768
body_spacy      43011
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,43016,0.0,object
title,title,0,43016,0.0,object
body,body,0,43016,0.0,object
all_tags,all_tags,0,43016,0.0,object
title_nltk,title_nltk,0,43016,0.0,object
body_nltk,body_nltk,0,43016,0.0,object
title_spacy,title_spacy,0,43016,0.0,object
body_spacy,body_spacy,0,43016,0.0,object


In [7]:
index = [4532, 8280, 12992, 14957, 22934, 24964, 25950]

display(train_bow_classic.loc[train_bow_classic.index.isin(index), :])

# OK


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
4532,2014-10-13 16:31:47,Laravel Eloquent OR WHERE IS NOT NULL,I am using the Laravel Administrator package f...,"['php', 'sql', 'laravel', 'eloquent', 'adminis...",eloquent,eloquent use laravel administrator package sto...,,package story run issue display result delete ...
8280,2015-04-22 11:41:34,Why is FusedLocationApi.getLastLocation null,I am trying to get location by using FusedLoca...,"['android', 'android-4.4-kitkat', 'android-loc...",,null try get location use get location permiss...,,try location location permission file use perm...
12992,2013-08-09 14:16:44,Using IS NULL and COALESCE in OrderBy Doctrine...,I basically have the following (My)SQL-Query\n...,"['mysql', 'symfony', 'doctrine-orm', 'doctrine...",use coalesce doctrine,use coalesce doctrine query select order compa...,,follow address order company job target follow...
14957,2016-08-17 23:26:55,Spring Boot multipartfile always null,I am using Spring Boot version = '1.4.0.RC1' w...,"['java', 'spring-mvc', 'spring-boot', 'retrofi...",spring boot multipartfile,spring boot multipartfile use spring boot vers...,,version try use file upload controller use inf...
22934,2014-03-27 21:18:08,Sqlite NULL and unique?,I noticed that I can have NULL values in colum...,"['sql', 'sqlite', 'null', 'unique', 'unique-co...",null unique,null notice value column constraint col genera...,,notice value column constraint generate issue ...
24964,2014-02-24 20:47:00,MVC HttpPostedFileBase always null,I have this controller and what I am trying to...,"['c#', 'asp.net', 'asp.net-mvc', 'asp.net-mvc-...",httppostedfilebase,httppostedfilebase controller try send image c...,,controller try send image controller byte cont...
25950,2012-11-28 01:42:30,Android Notification PendingIntent Extras null,I am trying to send information from notificat...,"['android', 'android-intent', 'bundle', 'andro...",notification pendingintent extra null,notification pendingintent try send informatio...,,try send information notification activity act...


In [8]:
quick_look(test_bow_classic)


shape : (4780, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2015-04-27 13:42:58,Switch in Laravel 5 - Blade,How can I use switch in blade templates? When ...,"['php', 'laravel', 'switch-statement', 'larave...",switch blade,switch use switch blade template use switch ca...,switch blade,use switch blade template e mail input input r...
1,2012-04-11 11:32:58,How to play a notification sound on websites?,"When a certain event occurs, I want my website...","['javascript', 'html', 'audio', 'notifications...",play notification sound website,play notification sound website event occur wa...,play notification sound website,event occur want website play notification sou...
2,2019-12-29 09:42:48,IApplicationBuilder does not contain a definit...,I'm following an example to configure AspNet C...,"['c#', 'asp.net-core', '.net-core', 'asp.net-c...",iapplicationbuilder contain definition,iapplicationbuilder contain definition follow ...,contain definition,follow example configure code startup.cs file ...
3,2012-11-15 09:39:14,How to use join with multiple conditions in li...,I have two classes (Request & RequestDetail). ...,"['c#', 'linq', 'nhibernate', 'join', 'linq-to-...",use join condition linq nhibernate,use join condition linq nhibernate class reque...,use join condition,class need class join var var purpose line err...
4,2012-01-19 16:02:38,Why is the first element always blank in my Ra...,"I'm using Rails 3.2.0.rc2. I've got a Model, i...","['html', 'ruby-on-rails', 'forms', 'serializat...",element blank multi use array,element blank multi use array use rc2 get mode...,element rail multi embed array,array offer form user select subset save selec...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
4775,2012-11-02 22:45:46,How to create an edge list from a matrix in R?,The relationship is expressed as a matrix x li...,"['r', 'list', 'matrix', 'social-networking', '...",create list matrix r,create list matrix r relationship express matr...,create edge list matrix r,relationship express matrix b c b c d entry re...
4776,2017-01-14 03:33:52,How do you compare just the time of a Date in ...,I have two Date Objects:\n\n2017-01-13 11:40:1...,"['swift', 'sorting', 'date', 'datetime', 'time']",compare time date swift,compare time date swift date object need compa...,compare time,object need compare time value ignore date exa...
4777,2013-09-10 03:07:07,The authenticity of host 'github.com (192.30.2...,I am trying to use\nsudo npm install\n\nto ins...,"['node.js', 'git', 'ubuntu', 'github', 'ssh-ke...",host github com n establish,host github com n establish try use npm instal...,authenticity host establish,try use sudo install dependency application wr...
4778,2012-12-28 21:03:32,Xcode archive not creating DSYM file,"For most of my projects, I setup an ""archive"" ...","['ios', 'xcode', 'archive', 'debug-symbols', '...",create dsym file,create dsym file project setup scheme archive ...,archive create file,project setup scheme archive project create up...
4779,2013-04-24 15:11:48,Multiple UITableview in Single Viewcontroller,I have a viewcontroller in that i want to show...,"['ios', 'objective-c', 'uitableview', 'uiview'...",uitableview viewcontroller,uitableview viewcontroller viewcontroller want...,uitableview viewcontroller,viewcontroller want content table property add...


uniques :


CreationDate    4780
title           4780
body            4780
all_tags        4739
title_nltk      4769
body_nltk       4780
title_spacy     4485
body_spacy      4780
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,4780,0.0,object
title,title,0,4780,0.0,object
body,body,0,4780,0.0,object
all_tags,all_tags,0,4780,0.0,object
title_nltk,title_nltk,0,4780,0.0,object
body_nltk,body_nltk,0,4780,0.0,object
title_spacy,title_spacy,0,4780,0.0,object
body_spacy,body_spacy,0,4780,0.0,object


In [9]:
def add_bow_representation(train_df, test_df, feature, display_on=True):
    # Create a CountVectorizer instance
    vectorizer = CountVectorizer()

    # Fit and transform the training data
    X_train = vectorizer.fit_transform(train_df[feature])

    # Convert the result to a DataFrame
    bow_train_df = pd.DataFrame(X_train.toarray(), columns=vectorizer.get_feature_names_out())

    # Reset index to align with the original DataFrame
    bow_train_df.reset_index(drop=True, inplace=True)

    # Concatenate the bag of words DataFrame with the original DataFrame
    result_train = pd.concat([train_df.reset_index(drop=True), bow_train_df], axis=1)

    if display_on:
        display(result_train)

    # transform the testing data
    # NO FIT
    X_test = vectorizer.transform(test_df[feature])

    # Convert the result to a DataFrame
    bow_test_df = pd.DataFrame(X_test.toarray(), columns=vectorizer.get_feature_names_out())

    # Reset index to align with the original DataFrame
    bow_test_df.reset_index(drop=True, inplace=True)

    # Concatenate the bag of words DataFrame with the original DataFrame
    result_test = pd.concat([test_df.reset_index(drop=True), bow_test_df], axis=1)


    return result_train, result_test


train_bow_classic_title_nltk, test_bow_classic_title_nltk = add_bow_representation(train_bow_classic,
                                                                                   test_bow_classic,
                                                                                   'title_nltk')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,__attribute__,__bridge,...,zone,zoneddatetime,zoneid,zookeeper,zoom,zooming,zsh,zshrc,zuul,zxing
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
train_bow_classic_title_nltk


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,__attribute__,__bridge,...,zone,zoneddatetime,zoneid,zookeeper,zoom,zooming,zsh,zshrc,zuul,zxing
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [11]:
test_bow_classic_title_nltk

Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,__attribute__,__bridge,...,zone,zoneddatetime,zoneid,zookeeper,zoom,zooming,zsh,zshrc,zuul,zxing
0,2015-04-27 13:42:58,Switch in Laravel 5 - Blade,How can I use switch in blade templates? When ...,"['php', 'laravel', 'switch-statement', 'larave...",switch blade,switch use switch blade template use switch ca...,switch blade,use switch blade template e mail input input r...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2012-04-11 11:32:58,How to play a notification sound on websites?,"When a certain event occurs, I want my website...","['javascript', 'html', 'audio', 'notifications...",play notification sound website,play notification sound website event occur wa...,play notification sound website,event occur want website play notification sou...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2019-12-29 09:42:48,IApplicationBuilder does not contain a definit...,I'm following an example to configure AspNet C...,"['c#', 'asp.net-core', '.net-core', 'asp.net-c...",iapplicationbuilder contain definition,iapplicationbuilder contain definition follow ...,contain definition,follow example configure code startup.cs file ...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2012-11-15 09:39:14,How to use join with multiple conditions in li...,I have two classes (Request & RequestDetail). ...,"['c#', 'linq', 'nhibernate', 'join', 'linq-to-...",use join condition linq nhibernate,use join condition linq nhibernate class reque...,use join condition,class need class join var var purpose line err...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2012-01-19 16:02:38,Why is the first element always blank in my Ra...,"I'm using Rails 3.2.0.rc2. I've got a Model, i...","['html', 'ruby-on-rails', 'forms', 'serializat...",element blank multi use array,element blank multi use array use rc2 get mode...,element rail multi embed array,array offer form user select subset save selec...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4775,2012-11-02 22:45:46,How to create an edge list from a matrix in R?,The relationship is expressed as a matrix x li...,"['r', 'list', 'matrix', 'social-networking', '...",create list matrix r,create list matrix r relationship express matr...,create edge list matrix r,relationship express matrix b c b c d entry re...,0,0,...,0,0,0,0,0,0,0,0,0,0
4776,2017-01-14 03:33:52,How do you compare just the time of a Date in ...,I have two Date Objects:\n\n2017-01-13 11:40:1...,"['swift', 'sorting', 'date', 'datetime', 'time']",compare time date swift,compare time date swift date object need compa...,compare time,object need compare time value ignore date exa...,0,0,...,0,0,0,0,0,0,0,0,0,0
4777,2013-09-10 03:07:07,The authenticity of host 'github.com (192.30.2...,I am trying to use\nsudo npm install\n\nto ins...,"['node.js', 'git', 'ubuntu', 'github', 'ssh-ke...",host github com n establish,host github com n establish try use npm instal...,authenticity host establish,try use sudo install dependency application wr...,0,0,...,0,0,0,0,0,0,0,0,0,0
4778,2012-12-28 21:03:32,Xcode archive not creating DSYM file,"For most of my projects, I setup an ""archive"" ...","['ios', 'xcode', 'archive', 'debug-symbols', '...",create dsym file,create dsym file project setup scheme archive ...,archive create file,project setup scheme archive project create up...,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
# export
train_bow_classic_title_nltk.to_csv('./../../data/cleaned_data/1_bow_classic/train_bow_classic_title_nltk.csv', sep=',', index=False)
test_bow_classic_title_nltk.to_csv('./../../data/cleaned_data/1_bow_classic/test_bow_classic_title_nltk.csv', sep=',', index=False)


In [12]:
# title spacy
train_bow_classic_title_spacy, test_bow_classic_title_spacy = add_bow_representation(train_bow_classic,
                                                                                     test_bow_classic,
                                                                                     'title_spacy')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,11,12,...,zero,zip,zlib,zombie,zone,zoneddatetime,zookeeper,zoom,zooming,zsh
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
# verif
print(train_bow_classic_title_spacy.shape)
print(test_bow_classic_title_spacy.shape)


(43016, 4823)
(4780, 4823)


In [15]:
# export
train_bow_classic_title_spacy.to_csv('./../../data/cleaned_data/1_bow_classic/train_bow_classic_title_spacy.csv', sep=',', index=False)
test_bow_classic_title_spacy.to_csv('./../../data/cleaned_data/1_bow_classic/test_bow_classic_title_spacy.csv', sep=',', index=False)


In [15]:
# body nltk
train_bow_classic_body_nltk, test_bow_classic_body_nltk = add_bow_representation(train_bow_classic,
                                                                                   test_bow_classic,
                                                                                   'body_nltk')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,01,02,...,zsh,zshrc,zuul,zxing,zygoteinit,zza,zzz,µs,ôö,ôöçôöç
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
# verif
print(train_bow_classic_body_nltk.shape)
print(test_bow_classic_body_nltk.shape)


(43016, 11850)
(4780, 11850)


In [17]:
# export
train_bow_classic_body_nltk.to_csv('./../../data/cleaned_data/1_bow_classic/train_bow_classic_body_nltk.csv', sep=',', index=False)
test_bow_classic_body_nltk.to_csv('./../../data/cleaned_data/1_bow_classic/test_bow_classic_body_nltk.csv', sep=',', index=False)


In [17]:
# body spacy
train_bow_classic_body_spacy, test_bow_classic_body_spacy = add_bow_representation(train_bow_classic,
                                                                                   test_bow_classic,
                                                                                   'body_spacy')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,00,0000021c4a995ba0,...,zsh,zw,µs,ôö,ôöçôöç,быть,может,не,поле,пустым
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way error s...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make packag...,package c,look way package folder studio express way pac...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply blur...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference question sort array ...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override way force child class...,way method,way force child class override method class ne...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert web method push b...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like use col...,use attribute annotation doctrine,like use string error error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
# verif
print(train_bow_classic_body_spacy.shape)
print(test_bow_classic_body_spacy.shape)


(43016, 7699)
(4780, 7699)


: 

In [19]:
# export
train_bow_classic_body_spacy.to_csv('./../../data/cleaned_data/1_bow_classic/train_bow_classic_body_spacy.csv', sep=',', index=False)
test_bow_classic_body_spacy.to_csv('./../../data/cleaned_data/1_bow_classic/test_bow_classic_body_spacy.csv', sep=',', index=False)
