# **Catégorisez automatiquement des questions**

## partie 3/8 : Prédiction de tags, approche non-supervisée

### <br> Proposition de mots clés, de type LDA avec visualisation en 2D des topics

<br>


### Importation des librairies, réglages


In [11]:
import os, sys, random
# from zipfile import ZipFile
import numpy as np
import pandas as pd
from pandarallel import pandarallel

# Visualisation
import matplotlib.pyplot as plt
# import seaborn as sns
import plotly.express as px

# Feature engineering
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Modify if necessary
num_cores = os.cpu_count()
print(f"\nNumber of CPU cores: {num_cores}")
pandarallel.initialize(progress_bar=False, nb_workers=6)



Number of CPU cores: 8
INFO: Pandarallel will run on 6 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


### Fonctions


In [12]:
def get_missing_values(df):
    """Generates a DataFrame containing the count and proportion of missing values for each feature.

    Args:
        df (pandas.DataFrame): The input DataFrame to analyze.

    Returns:
        pandas.DataFrame: A DataFrame with columns for the feature name, count of missing values,
        count of non-missing values, proportion of missing values, and data type for each feature.
    """
    # Count the missing values for each column
    missing = df.isna().sum()

    # Calculate the percentage of missing values
    percent_missing = df.isna().mean() * 100

    # Create a DataFrame to store the results
    missings_df = pd.DataFrame({
        'column_name': df.columns,
        'missing': missing,
        'present': df.shape[0] - missing,  # Count of non-missing values
        'percent_missing': percent_missing.round(2),  # Rounded to 2 decimal places
        'type': df.dtypes
    })

    # Sort the DataFrame by the count of missing values
    missings_df.sort_values('missing', inplace=True)

    return missings_df

# with pd.option_context('display.max_rows', 1000):
#   display(get_missing_values(df))


def quick_look(df, miss=True):
    """
    Display a quick overview of a DataFrame, including shape, head, tail, unique values, and duplicates.

    Args:
        df (pandas.DataFrame): The input DataFrame to inspect.
        check_missing (bool, optional): Whether to check and display missing values (default is True).

    The function provides a summary of the DataFrame, including its shape, the first and last rows, the count of unique values per column, and the number of duplicates.
    If `check_missing` is set to True, it also displays missing value information.
    """
    print(f'shape : {df.shape}')

    display(df.head())
    display(df.tail())

    print('uniques :')
    display(df.nunique())

    print('Doublons ? ', df.duplicated(keep='first').sum(), '\n')

    if miss:
        display(get_missing_values(df))



### fin du preprocessing


In [13]:
# import

train_bow_uniques = pd.read_csv('./../../data/cleaned_data/0_preprocessed_text/train_bow_uniques.csv', sep=',')
test_bow_uniques = pd.read_csv('./../../data/cleaned_data/0_preprocessed_text/test_bow_uniques.csv', sep=',')

quick_look(train_bow_uniques)


shape : (43016, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way subscri...
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make folder...,package c,look way package folder studio express know pr...
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply try ...,use uivisualeffectview,example apply blur image try figure code uivis...
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference sort array think cas...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override force child class nee...,way method,way force child class override method need cre...
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert method push base ...
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like column ...,use attribute annotation doctrine,like use string error annotate support miss
43015,2016-03-19 18:27:38,Localizing string resources added via build.gr...,This is in continuation to an answer which hel...,"['android', 'android-studio', 'android-gradle-...",localize string resource add build gradle use,localize string resource add build gradle use ...,localize string resource add build.gradle,continuation answer help post add string resou...


uniques :


CreationDate    43012
title           43015
body            43016
all_tags        41627
title_nltk      42537
body_nltk       43016
title_spacy     37644
body_spacy      43010
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,43016,0.0,object
title,title,0,43016,0.0,object
body,body,0,43016,0.0,object
all_tags,all_tags,0,43016,0.0,object
body_nltk,body_nltk,0,43016,0.0,object
body_spacy,body_spacy,0,43016,0.0,object
title_nltk,title_nltk,1,43015,0.0,object
title_spacy,title_spacy,7,43009,0.02,object


In [14]:
missing = train_bow_uniques.loc[(train_bow_uniques['title_nltk'].isna()) |
                                (train_bow_uniques['title_spacy'].isna()), :]

print (missing.index)
display(missing)


Index([4532, 8280, 12992, 14957, 22934, 24964, 25950], dtype='int64')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
4532,2014-10-13 16:31:47,Laravel Eloquent OR WHERE IS NOT NULL,I am using the Laravel Administrator package f...,"['php', 'sql', 'laravel', 'eloquent', 'adminis...",eloquent,eloquent use laravel administrator package sto...,,package story run issue display result delete ...
8280,2015-04-22 11:41:34,Why is FusedLocationApi.getLastLocation null,I am trying to get location by using FusedLoca...,"['android', 'android-4.4-kitkat', 'android-loc...",,null try get location use permission file andr...,,try location permission file use android reque...
12992,2013-08-09 14:16:44,Using IS NULL and COALESCE in OrderBy Doctrine...,I basically have the following (My)SQL-Query\n...,"['mysql', 'symfony', 'doctrine-orm', 'doctrine...",use coalesce doctrine,use coalesce doctrine query select order compa...,,follow address order company job target | doct...
14957,2016-08-17 23:26:55,Spring Boot multipartfile always null,I am using Spring Boot version = '1.4.0.RC1' w...,"['java', 'spring-mvc', 'spring-boot', 'retrofi...",spring boot multipartfile,spring boot multipartfile use version rc1 try ...,,version try use file upload controller info re...
22934,2014-03-27 21:18:08,Sqlite NULL and unique?,I noticed that I can have NULL values in colum...,"['sql', 'sqlite', 'null', 'unique', 'unique-co...",null unique,null notice value column constraint col genera...,,notice value column constraint generate issue ...
24964,2014-02-24 20:47:00,MVC HttpPostedFileBase always null,I have this controller and what I am trying to...,"['c#', 'asp.net', 'asp.net-mvc', 'asp.net-mvc-...",httppostedfilebase,httppostedfilebase controller try send image b...,,controller try send image byte product content...
25950,2012-11-28 01:42:30,Android Notification PendingIntent Extras null,I am trying to send information from notificat...,"['android', 'android-intent', 'bundle', 'andro...",notification pendingintent extra null,notification pendingintent try send informatio...,,try send information notification activity cod...


In [15]:
quick_look(train_bow_uniques)


shape : (43016, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way subscri...
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make folder...,package c,look way package folder studio express know pr...
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply try ...,use uivisualeffectview,example apply blur image try figure code uivis...
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference sort array think cas...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override force child class nee...,way method,way force child class override method need cre...
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert method push base ...
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like column ...,use attribute annotation doctrine,like use string error annotate support miss
43015,2016-03-19 18:27:38,Localizing string resources added via build.gr...,This is in continuation to an answer which hel...,"['android', 'android-studio', 'android-gradle-...",localize string resource add build gradle use,localize string resource add build gradle use ...,localize string resource add build.gradle,continuation answer help post add string resou...


uniques :


CreationDate    43012
title           43015
body            43016
all_tags        41627
title_nltk      42537
body_nltk       43016
title_spacy     37644
body_spacy      43010
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,43016,0.0,object
title,title,0,43016,0.0,object
body,body,0,43016,0.0,object
all_tags,all_tags,0,43016,0.0,object
body_nltk,body_nltk,0,43016,0.0,object
body_spacy,body_spacy,0,43016,0.0,object
title_nltk,title_nltk,1,43015,0.0,object
title_spacy,title_spacy,7,43009,0.02,object


In [16]:
def fix_false_null_values(df):
    df.loc[(df['title_nltk'].isna()), 'title_nltk'] = 'null'
    df.loc[(df['title_spacy'].isna()), 'title_spacy'] = 'null'

fix_false_null_values(train_bow_uniques)
fix_false_null_values(test_bow_uniques)

# Check for null values in the entire DataFrame
null_values = train_bow_uniques[train_bow_uniques.isnull().any(axis=1)]

# Print the rows with null values
print(null_values)


Empty DataFrame
Columns: [CreationDate, title, body, all_tags, title_nltk, body_nltk, title_spacy, body_spacy]
Index: []


In [17]:
quick_look(train_bow_uniques)


shape : (43016, 8)


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way subscri...
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make folder...,package c,look way package folder studio express know pr...
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply try ...,use uivisualeffectview,example apply blur image try figure code uivis...
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference sort array think cas...


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override force child class nee...,way method,way force child class override method need cre...
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert method push base ...
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like column ...,use attribute annotation doctrine,like use string error annotate support miss
43015,2016-03-19 18:27:38,Localizing string resources added via build.gr...,This is in continuation to an answer which hel...,"['android', 'android-studio', 'android-gradle-...",localize string resource add build gradle use,localize string resource add build gradle use ...,localize string resource add build.gradle,continuation answer help post add string resou...


uniques :


CreationDate    43012
title           43015
body            43016
all_tags        41627
title_nltk      42538
body_nltk       43016
title_spacy     37645
body_spacy      43010
dtype: int64

Doublons ?  0 



Unnamed: 0,column_name,missing,present,percent_missing,type
CreationDate,CreationDate,0,43016,0.0,object
title,title,0,43016,0.0,object
body,body,0,43016,0.0,object
all_tags,all_tags,0,43016,0.0,object
title_nltk,title_nltk,0,43016,0.0,object
body_nltk,body_nltk,0,43016,0.0,object
title_spacy,title_spacy,0,43016,0.0,object
body_spacy,body_spacy,0,43016,0.0,object


In [18]:
index = [4532, 8280, 12992, 14957, 22934, 24964, 25950]

display(train_bow_uniques.loc[train_bow_uniques.index.isin(index), :])

# OK


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy
4532,2014-10-13 16:31:47,Laravel Eloquent OR WHERE IS NOT NULL,I am using the Laravel Administrator package f...,"['php', 'sql', 'laravel', 'eloquent', 'adminis...",eloquent,eloquent use laravel administrator package sto...,,package story run issue display result delete ...
8280,2015-04-22 11:41:34,Why is FusedLocationApi.getLastLocation null,I am trying to get location by using FusedLoca...,"['android', 'android-4.4-kitkat', 'android-loc...",,null try get location use permission file andr...,,try location permission file use android reque...
12992,2013-08-09 14:16:44,Using IS NULL and COALESCE in OrderBy Doctrine...,I basically have the following (My)SQL-Query\n...,"['mysql', 'symfony', 'doctrine-orm', 'doctrine...",use coalesce doctrine,use coalesce doctrine query select order compa...,,follow address order company job target | doct...
14957,2016-08-17 23:26:55,Spring Boot multipartfile always null,I am using Spring Boot version = '1.4.0.RC1' w...,"['java', 'spring-mvc', 'spring-boot', 'retrofi...",spring boot multipartfile,spring boot multipartfile use version rc1 try ...,,version try use file upload controller info re...
22934,2014-03-27 21:18:08,Sqlite NULL and unique?,I noticed that I can have NULL values in colum...,"['sql', 'sqlite', 'null', 'unique', 'unique-co...",null unique,null notice value column constraint col genera...,,notice value column constraint generate issue ...
24964,2014-02-24 20:47:00,MVC HttpPostedFileBase always null,I have this controller and what I am trying to...,"['c#', 'asp.net', 'asp.net-mvc', 'asp.net-mvc-...",httppostedfilebase,httppostedfilebase controller try send image b...,,controller try send image byte product content...
25950,2012-11-28 01:42:30,Android Notification PendingIntent Extras null,I am trying to send information from notificat...,"['android', 'android-intent', 'bundle', 'andro...",notification pendingintent extra null,notification pendingintent try send informatio...,,try send information notification activity cod...


In [19]:
def add_bow_representation(train_df, test_df, feature, display_on=True):
    # Create a CountVectorizer instance
    vectorizer = CountVectorizer()

    # Fit and transform the training data
    X_train = vectorizer.fit_transform(train_df[feature])

    # Convert the result to a DataFrame
    bow_train_df = pd.DataFrame(X_train.toarray(), columns=vectorizer.get_feature_names_out())

    # Reset index to align with the original DataFrame
    bow_train_df.reset_index(drop=True, inplace=True)

    # Concatenate the bag of words DataFrame with the original DataFrame
    result_train = pd.concat([train_df.reset_index(drop=True), bow_train_df], axis=1)

    if display_on:
        display(result_train)

    # transform the testing data
    # NO FIT
    X_test = vectorizer.transform(test_df[feature])

    # Convert the result to a DataFrame
    bow_test_df = pd.DataFrame(X_test.toarray(), columns=vectorizer.get_feature_names_out())

    # Reset index to align with the original DataFrame
    bow_test_df.reset_index(drop=True, inplace=True)

    # Concatenate the bag of words DataFrame with the original DataFrame
    result_test = pd.concat([test_df.reset_index(drop=True), bow_test_df], axis=1)


    return result_train, result_test


train_bow_uniques_title_nltk, test_bow_uniques_title_nltk = add_bow_representation(train_bow_uniques,
                                                                                   test_bow_uniques,
                                                                                   'title_nltk')


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,__attribute__,__bridge,...,zone,zoneddatetime,zoneid,zookeeper,zoom,zooming,zsh,zshrc,zuul,zxing
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way subscri...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make folder...,package c,look way package folder studio express know pr...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply try ...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference sort array think cas...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override force child class nee...,way method,way force child class override method need cre...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert method push base ...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like column ...,use attribute annotation doctrine,like use string error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
# verif
print(train_bow_uniques_title_nltk.shape)
print(test_bow_uniques_title_nltk.shape)


(43016, 6753)
(4780, 6753)


In [21]:
train_bow_uniques_title_nltk


Unnamed: 0,CreationDate,title,body,all_tags,title_nltk,body_nltk,title_spacy,body_spacy,__attribute__,__bridge,...,zone,zoneddatetime,zoneid,zookeeper,zoom,zooming,zsh,zshrc,zuul,zxing
0,2013-08-23 23:28:22,How to implement a ViewPager with different Fr...,When I start an activity which implements view...,"['android', 'android-layout', 'android-fragmen...",implement viewpager fragment layout,implement viewpager fragment layouts start act...,implement,start activity implement viewpager create frag...,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2015-04-26 06:13:36,Cannot subscript a value of [AnyObject]? with ...,This is in a class extending PFQueryTableViewC...,"['ios', 'xcode', 'swift', 'parse-platform', 'x...",subscript value anyobject index type int,subscript value anyobject index type int class...,subscript value index type,class extend follow error row cast way subscri...,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2014-08-06 12:33:53,Equivalent to java packages in C#,"I have been looking for a way to make a ""packa...","['java', 'c#', 'eclipse', 'visual-studio-2013'...",equivalent java package c,equivalent java package c look way make folder...,package c,look way package folder studio express know pr...,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2014-06-05 18:35:37,How to use UIVisualEffectView to Blur Image?,Could someone give a small example of applying...,"['ios', 'objective-c', 'uiview', 'uikit', 'uiv...",use blur image,use blur image someone give example apply try ...,use uivisualeffectview,example apply blur image try figure code uivis...,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2013-06-28 11:53:56,How can I sort arrays and data in PHP?,\nThis question is intended as a reference for...,"['php', 'arrays', 'sorting', 'object', 'spl']",sort array data php,sort array data php question intend reference ...,sort array datum,question intend reference sort array think cas...,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43011,2017-02-24 13:38:36,How to fully dump / print variable to console ...,Hey there I am searching for a function which ...,"['javascript', 'dart', 'debugging', 'console',...",dump print console dart language,dump print console dart language hey search fu...,dump print variable console language,search function print variable console languag...,0,0,...,0,0,0,0,0,0,0,0,0,0
43012,2011-10-20 07:21:34,Is there a way to make a method which is not a...,Is there any way of forcing child classes to o...,"['java', 'inheritance', 'overriding', 'abstrac...",way make method,way make method override force child class nee...,way method,way force child class override method need cre...,0,0,...,0,0,0,0,0,0,0,0,0,0
43013,2012-09-11 11:34:25,Can I incorporate both SignalR and a RESTful API?,I have a single page web app developed using A...,"['asp.net', 'rest', 'web-applications', 'asp.n...",incorporate signalr api,incorporate signalr api page web app develop u...,incorporate signalr api,page web app develop convert method push base ...,0,0,...,0,0,0,0,0,0,0,0,0,0
43014,2021-03-23 19:24:04,How can i use php8 attributes instead of annot...,This is what I would like to use:\n#[ORM\Colum...,"['php', 'symfony', 'doctrine-orm', 'doctrine',...",use attribute annotation doctrine,use attribute annotation doctrine like column ...,use attribute annotation doctrine,like use string error annotate support miss,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
# export
train_bow_uniques_title_nltk.to_csv('./../../data/cleaned_data/2_bow_uniques/train_bow_uniques_title_nltk.csv', sep=',', index=False)
test_bow_uniques_title_nltk.to_csv('./../../data/cleaned_data/2_bow_uniques/test_bow_uniques_title_nltk.csv', sep=',', index=False)


In [23]:
# title spacy
train_bow_uniques_title_spacy, test_bow_uniques_title_spacy = add_bow_representation(train_bow_uniques,
                                                                                   test_bow_uniques,
                                                                                   'title_nltk',
                                                                                   display_on=False)


In [29]:
# verif
print(train_bow_uniques_title_spacy.shape)
print(test_bow_uniques_title_spacy.shape)


(43016, 6753)
(4780, 6753)


In [24]:
# export
train_bow_uniques_title_spacy.to_csv('./../../data/cleaned_data/2_bow_uniques/train_bow_uniques_title_spacy.csv', sep=',', index=False)
test_bow_uniques_title_spacy.to_csv('./../../data/cleaned_data/2_bow_uniques/test_bow_uniques_title_spacy.csv', sep=',', index=False)


In [25]:
# body nltk
train_bow_uniques_body_nltk, test_bow_uniques_body_nltk = add_bow_representation(train_bow_uniques,
                                                                                   test_bow_uniques,
                                                                                   'title_nltk',
                                                                                   display_on=False)


In [30]:
# verif
print(train_bow_uniques_body_nltk.shape)
print(test_bow_uniques_body_nltk.shape)


(43016, 6753)
(4780, 6753)


In [26]:
# export
train_bow_uniques_body_nltk.to_csv('./../../data/cleaned_data/2_bow_uniques/train_bow_uniques_body_nltk.csv', sep=',', index=False)
test_bow_uniques_body_nltk.to_csv('./../../data/cleaned_data/2_bow_uniques/test_bow_uniques_body_nltk.csv', sep=',', index=False)


In [27]:
# body spacy
train_bow_uniques_body_spacy, test_bow_uniques_body_spacy = add_bow_representation(train_bow_uniques,
                                                                                   test_bow_uniques,
                                                                                   'title_nltk',
                                                                                   display_on=False)


In [31]:
# verif
print(train_bow_uniques_body_spacy.shape)
print(test_bow_uniques_body_spacy.shape)


(43016, 6753)
(4780, 6753)


In [28]:
# export
train_bow_uniques_body_spacy.to_csv('./../../data/cleaned_data/2_bow_uniques/train_bow_uniques_body_spacy.csv', sep=',', index=False)
test_bow_uniques_body_spacy.to_csv('./../../data/cleaned_data/2_bow_uniques/test_bow_uniques_body_spacy.csv', sep=',', index=False)
