In [1]:
import pandas as pd
import os
import pickle
import bs4
import spacy
import re
import numpy as np
import random

from bs4 import BeautifulSoup as soupe
from collections import Counter
from spacy.matcher import PhraseMatcher

## **Récupération des données...**

Source :

In [None]:
https://data.stackexchange.com/stackoverflow/query/new

Information sur les données que nous allons récupérer :

In [None]:
https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede

La table qui nous intéresse est **Posts**. Ses champs qui nous intéressent sont :<br>- titre<br>- Body<br>- tags

Cette table compte **68 millions** d'entrées parmi lesquelles il va falloir opérer une sélection. Cette sélection va s'opérer sur des critères de pertinence qu'on peut mesurer de plusieurs façons.<br><br>Nous choisissons de mesurer cette pertinence grâce au critère du **nombre de vues** de chaque question.<br><br>**Par ailleurs...**<br>Nous allons aussi opérer une **discrimination temporelle** en récupérant les données à partir de **2015**.

Obtention du début de la table pour voir les première date d'entrées...

La première entrée de la DB est datée du 31/07/2008

Afin de travailler sur des données qui  **collent mieux** à l'époque (les discussions ont évolué depuis 2008...), nous allons donc former notre corpus à partir de données qui ne seront pas antérieures à 20015.

Recherche de l'id de la première entrée datée de 2015

In [None]:
SELECT TOP 10 Id, CreationDate, ViewCount, Score, AnswerCount, CommentCount, FavoriteCount, Tags, Title, Body
FROM Posts
WHERE YEAR(CreationDate) => 2015
ORDER BY CreationDate ASC

L'Id du premier post de 2015 est : **27727381**, on va commencer la récupération à partir de cet identifiant.

Première requête :

In [None]:
SELECT Id, CreationDate, ViewCount, Score, AnswerCount, CommentCount, FavoriteCount, Tags, Title, Body
FROM Posts
WHERE
    (Id < 3000000 AND (NOT Id < 27727381))
    AND PostTypeId = 1
    AND Score > 0
    AND AnswerCount > 0
    AND CommentCount > 0
    AND ViewCount > 40000

Apres on incrémentera de 5M en 5M... En étant dégressif sur le nombre de vues (pour tenir compte de la récence des posts...)

Liste des fichiers issus de l'extraction :

In [3]:
cwd = os.getcwd()
path = cwd + "/Original_Data_1/"
files = os.listdir(path)
files

['QueryResults(1).csv',
 'QueryResults(2).csv',
 'QueryResults(3).csv',
 'QueryResults(4).csv',
 'QueryResults(5).csv',
 'QueryResults(6).csv',
 'QueryResults(7).csv',
 'QueryResults.csv']

In [5]:
nb_posts = 0

for f in files :
    
    fich = path + f
    df = pd.read_csv(fich)
    nb_posts = nb_posts + len(df)
    
print(f"Le nombre totale de question téléchargées est : {nb_posts}")

Le nombre totale de question téléchargées est : 25555


Création d'un fichier commun et vérification du nombre de posts téléchargés :

In [235]:
df = pd.DataFrame()

cwd = os.getcwd()
path = cwd + "/Original_Data_1/"
files = os.listdir(path)

for f in files :
    
    fichier = path + f
    df = pd.concat([df, pd.read_csv(fichier)], ignore_index = True)
    
print(f"Le nombre totale de question téléchargées est : {len(df)}")

Le nombre totale de question téléchargées est : 25555


Nombre parfait (au regard de la longueur du corpus conseillé...) !

In [236]:
df = df[["Tags", "Title", "Body"]]
df.head()

Unnamed: 0,Tags,Title,Body
0,<cmd>,Windows equivalent of 'touch' (i.e. the node.j...,<p>On a windows machine I get this error</p>\n...
1,<ios><swift><uiviewcontroller>,How can I pop specific View Controller in Swift,<p>I used the <code>Objective-C</code> code be...
2,<javascript><ajax>,Uncaught TypeError: Cannot read property 'appe...,<p>I'm getting the following error</p>\n\n<blo...
3,<javascript><filter><boolean>,javascript .filter() true booleans,<pre><code>function bouncer(arr) {\n // Don't...
4,<swift><uiimageview><uipinchgesturerecognizer>,UIImageView pinch zoom swift,<p>I was hoping someone could help me out. I a...


In [237]:
# Sauvegarde du fichier raw
pickle_out = open("Data/corpus_raw.pickle", "wb")
pickle.dump(df, pickle_out)
pickle_out.close()

# **Nettoyage des textes**

In [240]:
df = pickle.load(open("Data/corpus_raw.pickle", "rb"))
df.head(3)

Unnamed: 0,Tags,Title,Body
0,<cmd>,Windows equivalent of 'touch' (i.e. the node.j...,<p>On a windows machine I get this error</p>\n...
1,<ios><swift><uiviewcontroller>,How can I pop specific View Controller in Swift,<p>I used the <code>Objective-C</code> code be...
2,<javascript><ajax>,Uncaught TypeError: Cannot read property 'appe...,<p>I'm getting the following error</p>\n\n<blo...


Affichage d'une question.

In [239]:
df.Body[1]

'<p>I used the <code>Objective-C</code> code below to pop a specific <code>ViewController</code>.</p>\n\n<pre><code>for (UIViewController *controller in self.navigationController.viewControllers) {\n    if ([controller isKindOfClass:[AnOldViewController class]]) { \n        //Do not forget to import AnOldViewController.h\n        [self.navigationController popToViewController:controller\n                                              animated:YES];\n        break;\n    }\n}\n</code></pre>\n\n<p>How can I do that in Swift?</p>\n'

## **Beautiful Soup**

**Contrainte** :<br>Les textes sont encadrés de nombreux **tags** et **balises**. On veut tout les récupérer, sauf ceux contenus entre les balises **< PRE >** qui correspondent à du **code informatique**.

In [220]:
soup = soupe(df.Body[20000])
soup

<p>I am running a process on Spark which uses SQL for the most part. In one of the workflows I am getting the following error:</p>
<blockquote>
<p>mismatched input 'from' expecting </p>
</blockquote>
<p>The code is</p>
<pre><code> select a.ACCOUNT_IDENTIFIER,a.LAN_CD, a.BEST_CARD_NUMBER,  
 decision_id, 
 case when a.BEST_CARD_NUMBER = 1 then 'Y' else 'N' end as best_card_excl_flag 
 from (select a.ACCOUNT_IDENTIFIER,a.LAN_CD, a. decision_id row_number()
 over (partition by CUST_GRP_MBRP_ID 
    order by coalesce(BEST_CARD_RANK,999)) as BEST_CARD_NUMBER 
 from Accounts_Inclusions_Exclusions_Flagged a) a 
</code></pre>
<p>I cannot figure out what the error is for the life of me</p>
<p>I've tried checking for comma errors or unexpected brackets but that doesn't seem to be the issue.</p>

**Filtrage des éléments sans blocs PRE**

In [221]:
liste_sans_pre = []

for cont in soup :
    
    if cont.name != "pre":
        
        liste_sans_pre.append(cont)

Création d'une liste de textes extraits.

In [224]:
liste_textes = []

for c in liste_sans_pre :
    if isinstance(c, bs4.element.Tag):
        #print("tag", c.text)
        liste_textes.append(c.text.replace("\n", ""))
liste_textes

['I am running a process on Spark which uses SQL for the most part. In one of the workflows I am getting the following error:',
 "mismatched input 'from' expecting ",
 'The code is',
 'I cannot figure out what the error is for the life of me',
 "I've tried checking for comma errors or unexpected brackets but that doesn't seem to be the issue."]

regroupement du texte (mieux que des listes...)

In [229]:
texte = ""

for c in liste_sans_pre :
    
    if isinstance(c, bs4.element.Tag):
        
        texte = texte + c.text.replace("\n", "") + " "

texte = texte.rstrip()
texte

"I am running a process on Spark which uses SQL for the most part. In one of the workflows I am getting the following error: mismatched input 'from' expecting  The code is I cannot figure out what the error is for the life of me I've tried checking for comma errors or unexpected brackets but that doesn't seem to be the issue."

Des cellules précédentes, on crée une fonction de mapping pour créer une nouvelle colonne.

In [265]:
df = pickle.load(open("Data/corpus_raw.pickle", "rb"))

In [266]:
def map_text(x):
    
    liste_sans_pre = []
    texte = ""
    
    soup = soupe(x)

    for cont in soup :

        if cont.name != "pre":

            liste_sans_pre.append(cont)
            
    for c in liste_sans_pre :
    
        if isinstance(c, bs4.element.Tag):

            texte = texte + c.text.replace("\n", "") + " "

    texte = texte.rstrip()
    
    return texte.lower()

In [267]:
df["Body_texte"] = df["Body"].map(map_text)
df.head()

Unnamed: 0,Tags,Title,Body,Body_texte
0,<cmd>,Windows equivalent of 'touch' (i.e. the node.j...,<p>On a windows machine I get this error</p>\n...,on a windows machine i get this error 'touch' ...
1,<ios><swift><uiviewcontroller>,How can I pop specific View Controller in Swift,<p>I used the <code>Objective-C</code> code be...,i used the objective-c code below to pop a spe...
2,<javascript><ajax>,Uncaught TypeError: Cannot read property 'appe...,<p>I'm getting the following error</p>\n\n<blo...,i'm getting the following error uncaught typee...
3,<javascript><filter><boolean>,javascript .filter() true booleans,<pre><code>function bouncer(arr) {\n // Don't...,"i have to return true boolean statements only,..."
4,<swift><uiimageview><uipinchgesturerecognizer>,UIImageView pinch zoom swift,<p>I was hoping someone could help me out. I a...,i was hoping someone could help me out. i am t...


In [268]:
df["Body_texte"][0]

"on a windows machine i get this error 'touch' is not recognized as an internal or external command, operable program or batch file. when i follow the instructions to do: is there a windows equivalent of using 'touch'? do i need to create these files by hand (and modify them to change the timestamp) in order to implement this sort of command? that doesn't seem very ... node-ish..."

**Titre**

Les titres ont l'air d'être des lignes de texte normales... On peut tout coller dans une nouvelle colonne.

In [270]:
df["texte"] = df["Title"].str.lower() + " " + df["Body_texte"]

In [271]:
df.texte[5]

"django import error: no module named apps i just checked out a project with git. the project structure is  there are other directories and files, but i think those are the important ones.  when i run the server  i get when running manage.py check i get importerror: no module named apps. so i guess the problem has nothing to do with my setting module but with my apps directory. i'm not sure why it can't find my module apps, because project is on my sys.path and the direcory apps obviously exists. as i'm not very experienced as a python developer i don't find a solution myself."

**Tags**

In [252]:
df.Tags[1]

'<ios><swift><uiviewcontroller>'

In [254]:
texte = df.Tags[1]
texte

'<ios><swift><uiviewcontroller>'

In [257]:
texte = texte.replace("<", "")
texte = texte.replace(">", " ")
texte = texte.rstrip()
texte

'ios swift uiviewcontroller'

In [258]:
texte2 = df.Tags[2]

In [259]:
texte2 = texte2.replace("<", "")
texte2 = texte2.replace(">", " ")
texte2 = texte2.rstrip()
texte2

'javascript ajax'

In [272]:
def map_tags(x):
    
    x = x.replace("<", "")
    x = x.replace(">", " ")
    x = x.rstrip()
    
    return x   

In [260]:
def map_tags2(x):
    
    x = x.replace("<", "")
    x = x.replace(">", ", ")
    x = x.rstrip(", ")
    
    return x 

In [273]:
df["tags"] = df["Tags"].map(map_tags)

In [274]:
df = df[["tags", "texte"]]

In [275]:
df

Unnamed: 0,tags,texte
0,cmd,windows equivalent of 'touch' (i.e. the node.j...
1,ios swift uiviewcontroller,how can i pop specific view controller in swif...
2,javascript ajax,uncaught typeerror: cannot read property 'appe...
3,javascript filter boolean,javascript .filter() true booleans i have to r...
4,swift uiimageview uipinchgesturerecognizer,uiimageview pinch zoom swift i was hoping some...
...,...,...
25550,jquery ajax backbone.js reactjs,handling ajax with react how should i handle a...
25551,android android-keystore,android studio: cannot recover key i have sear...
25552,code-snippets visual-studio-code,how to add custom code snippets in vscode? is ...
25553,php laravel localhost mcrypt,use of undefined constant mcrypt_rijndael_128 ...


In [276]:
# Sauvegarde du fichier corpus
pickle_out = open("Data/corpus.pickle", "wb")
pickle.dump(df, pickle_out)
pickle_out.close()

## **Décompte des tags et adaptation du DF... Tags normaux**

In [2]:
df = pickle.load(open("Data/corpus.pickle", "rb"))
df.head(3)

Unnamed: 0,tags,texte
0,cmd,windows equivalent of 'touch' (i.e. the node.j...
1,ios swift uiviewcontroller,how can i pop specific view controller in swif...
2,javascript ajax,uncaught typeerror: cannot read property 'appe...


## **Décompte des tags et adaptation du DF... Tags clean**

On va en enlever les "-" et les chiffre (...) avec une fonction de mapping

In [3]:
nlp = spacy.blank("en")

In [4]:
def clean_tags(x) :
    
    # supprime les "-"
    x = x.replace("-", " ")
    # regex pour ne garder que les lettres + exceptions le #+ de c#
    x = re.sub('[^a-z#+\s.]+', '', x)
    # on clean le "x" seuls
    x = x.replace(" x", "")

    
    out = ""
    seen = set()
    doc = nlp(x)
    
    for word in doc :
        
        if word.text not in seen:
            out = out + " " + word.text
            out = out.lstrip()
            #out.append(word)
        seen.add(word.text)
    
    # corrections manuelles de découpages faits par spacy 
    out = out.replace(" .x", "")
    out = out.replace(" . ", " ")
    out = out.replace("c #", "c#")
    
    return out

In [5]:
df["tags_c"] = df["tags"].map(clean_tags)

## **Extraction du top des tags**

In [6]:
tag_c_text = ""

for t in df.tags_c.values :
    
    tag_c_text = tag_c_text + " " + t
    
tag_c_text = tag_c_text.lstrip()

In [7]:
tag_c_text



In [8]:
# suffixes = nlp.Defaults.suffixes + ("#",)
# suffix_regex = spacy.util.compile_suffix_regex(suffixes)
# nlp.tokenizer.suffix_search = suffix_regex.search

suffixes = list(nlp.Defaults.suffixes)
suffixes.remove("#")
suffix_regex = spacy.util.compile_suffix_regex(suffixes)
nlp.tokenizer.suffix_search = suffix_regex.search

doc2 = nlp(tag_c_text)
#tags_c = [token.text for token in doc2 if token.is_space == False]
tags_c = [token.text for token in doc2 if token.is_punct == False and token.is_space == False]
#print(f"Nombre total de tags utilisés : {len(tags_c)}")

## Top

In [13]:
tag_c_freq = Counter(tags_c)
top_c = tag_c_freq.most_common(27)
top_c

[('python', 3689),
 ('javascript', 3288),
 ('android', 2476),
 ('java', 1996),
 ('angular', 1962),
 ('reactjs', 1439),
 ('studio', 1259),
 ('html', 1030),
 ('typescript', 1006),
 ('php', 1004),
 ('c#', 951),
 ('node.js', 932),
 ('ios', 914),
 ('react', 902),
 ('css', 798),
 ('spring', 755),
 ('visual', 722),
 ('google', 708),
 ('swift', 706),
 ('asp.net', 638),
 ('laravel', 637),
 ('pandas', 618),
 ('flutter', 604),
 ('docker', 595),
 ('jquery', 568),
 ('core', 521),
 ('sql', 518)]

On procède à une analyse de la pertinence de chaque tag de ce top et on constate des irrégularités.<br><br>- **Termes parasites** : comme "studio" ou "core" on effacera ces termes de la liste<br><br>**Doublons** : on les gardes mais plus tard on les regroupera...<br>react.js et react  

**4 termes** dans la liste posent probleme. On les gèrera plus tard et en anticipation, afin d'avoir une liste finale de 25 tags, on aggrandit notre liste de 5 nouveaux termes.

In [315]:
df[df.tags_c.str.contains("boot")]

Unnamed: 0,tags,texte,tags_c
37,html css twitter-bootstrap responsive-design n...,make logo image responsive using bootstrap i a...,html css twitter bootstrap responsive design n...
59,html twitter-bootstrap,can i give the col-md-1.5 in bootstrap? i want...,html twitter bootstrap
100,java spring-boot monitoring jmx,remote monitoring with visualvm and jmx i woul...,java spring boot monitoring jmx
158,html css twitter-bootstrap,bootstrap and z-index i'm having an issue with...,html css twitter bootstrap
215,java spring web-services soap spring-boot,how to disable errorpagefilter in spring boot?...,java spring web services soap boot
...,...,...,...
25449,java spring-boot,how to get local server host and port in sprin...,java spring boot
25455,javascript twitter-bootstrap classiejs,"""cannot read property 'classlist' of null"" whe...",javascript twitter bootstrap classiejs
25458,html css twitter-bootstrap button whitespace,keeping a bit of vertical space between elemen...,html css twitter bootstrap button whitespace
25481,java spring spring-mvc gradle spring-boot,configure viewresolver with spring boot and an...,java spring mvc gradle boot


## **restriction du dataset**

Pour mener l'étude et comparer les méthodes, on doit uniquement utiliser des questions dont au moins un des tags fait partie du Top.

On fait une liste du top30 des tags cleanés... Puis on va utiliser **PhraseMatcher**.<br>On fera une nouvelle colone "main_tag" qui contiendra uniquement les tags d'un rang faisant partie du top. C'est avec cette colonne qu'on travaillera.

In [14]:
top = [t[0] for t in top_c]

On enlève de cette liste les termes parasite. Restera juste le doublon react/react.js à gérer plus tard

In [15]:
liste = ["studio", "core"]

for m in liste :
    
    top.remove(m)
    
top

['python',
 'javascript',
 'android',
 'java',
 'angular',
 'reactjs',
 'html',
 'typescript',
 'php',
 'c#',
 'node.js',
 'ios',
 'react',
 'css',
 'spring',
 'visual',
 'google',
 'swift',
 'asp.net',
 'laravel',
 'pandas',
 'flutter',
 'docker',
 'jquery',
 'sql']

In [16]:
len(top)

25

Création d'une colonne bolléenne indiquant si un des tags d'une question fait partie du top. On l'init à "0".

In [17]:
df["tag_in_top"] = 0

In [18]:
df.head(3)

Unnamed: 0,tags,texte,tags_c,tag_in_top
0,cmd,windows equivalent of 'touch' (i.e. the node.j...,cmd,0
1,ios swift uiviewcontroller,how can i pop specific view controller in swif...,ios swift uiviewcontroller,0
2,javascript ajax,uncaught typeerror: cannot read property 'appe...,javascript ajax,0


On va regarder la quantité d'individus dont aucun tag n'est dans le top.

In [19]:
matcher = PhraseMatcher(nlp.vocab)
patterns = [nlp(tag) for tag in top]
matcher.add("TOP", patterns) 

for idx, quest in df.iterrows() :
    
    doc = nlp(quest.tags_c)
    matches = matcher(doc)
    if len(matches) != 0 :
        df["tag_in_top"][idx] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["tag_in_top"][idx] = 1


In [20]:
df.head(3)

Unnamed: 0,tags,texte,tags_c,tag_in_top
0,cmd,windows equivalent of 'touch' (i.e. the node.j...,cmd,0
1,ios swift uiviewcontroller,how can i pop specific view controller in swif...,ios swift uiviewcontroller,1
2,javascript ajax,uncaught typeerror: cannot read property 'appe...,javascript ajax,1


In [21]:
print(f"Nombre des questions sans aucun tag dans le top : {len(df[df.tag_in_top == 0])}")

Nombre des questions sans aucun tag dans le top : 4954


In [22]:
#print(f"Proportion de questions sans aucun tag dans le top : {(len(df[df.tag_in_top == 0])/ len(df))*100} %")
res = (len(df[df.tag_in_top == 0])/ len(df))*100 
print(f"Proportion de questions sans aucun tag dans le top : {res:.2f} %")

Proportion de questions sans aucun tag dans le top : 19.39 %


**19,3%** des questions n'ont aucun de leurs tags dans le top. Dès lors, la démarche suivante nous parait raisonnable :<br>- Nous allons créer une nouvelle catégorie de tags "new_t".<br>- Elle contiendra uniquement les tags d'une question faisant partie du **top**.<br>- Si une question n'a pas de tag dans le top, elle sera taggée **misc**.

In [23]:
df["new_t"] = ""

In [24]:
matcher = PhraseMatcher(nlp.vocab)
patterns = [nlp(tag) for tag in top]
matcher.add("TOP", patterns) 

for idx, quest in df.iterrows() :
    
    doc = nlp(quest.tags_c)
    matches = matcher(doc)
    
    if len(matches) == 0 :
        df["new_t"][idx] = "misc"
        
    else :
                
        tag_str = ""
        for m in matches :    
    
            m_id, m_start, m_end = m
            tag_str = tag_str + " " + doc[m_start:m_end].text
            tag_str = tag_str.lstrip()
            
        df["new_t"][idx] = tag_str

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["new_t"][idx] = "misc"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["new_t"][idx] = tag_str


In [25]:
df = df[["texte", "new_t"]]
df = df.rename(columns={"new_t" : "tags"})

In [26]:
df[df.tags.str.contains("reactjs")].head()

Unnamed: 0,texte,tags
169,reactjs giving error uncaught typeerror: super...,reactjs
189,react js - uncaught typeerror: this.props.data...,javascript reactjs
209,"""this"" is undefined inside map function reactj...",javascript reactjs
236,how to disable a button when an input is empty...,reactjs
255,how to use `react.createelement` children para...,reactjs


**Gestion des doublons react / react.js**

In [27]:
def map_react(x):
    
    x = x.replace("reactjs", "react")
    
    out = ""
    seen = set()
    doc = nlp(x)
    
    # gestion doublon - cas ou du coup on se retrouve avec 2x react en tag
    for word in doc :
        
        if word.text not in seen:
            out = out + " " + word.text
            out = out.lstrip()
            #out.append(word)
        seen.add(word.text)
    
    return out

In [28]:
df.tags = df.tags.map(map_react)

In [34]:
# Sauvegarde du fichier raw
pickle_out = open("Data/data.pickle", "wb")
pickle.dump(df, pickle_out)
pickle_out.close()

In [35]:
df = pickle.load(open("Data/data.pickle", "rb"))

In [36]:
df[df.tags.str.contains("js")]

Unnamed: 0,texte,tags
74,docker-compose: node_modules not present in a ...,node.js docker
268,express - return binary data from webservice i...,node.js
325,node.js mysql - error: connect econnrefused i ...,node.js
339,change working directory for npm scripts q: is...,node.js
343,node forever /usr/bin/env: node: no such file ...,node.js
...,...,...
25279,sequelize where statement with date i am using...,javascript node.js
25385,"resolve ""uncaught referenceerror: require is n...",javascript node.js
25456,"`npm build` doesn't run the script named ""buil...",javascript node.js
25468,express error - typeerror: router.use() requir...,javascript node.js


In [37]:
print(df.tags.value_counts().index)

Index(['misc', 'python', 'android', 'java', 'javascript', 'angular', 'react',
       'javascript react', 'python pandas', 'docker',
       ...
       'spring angular', 'java html spring', 'javascript python pandas',
       'react typescript visual', 'android visual flutter', 'php python',
       'google php', 'css jquery', 'android ios google', 'spring google'],
      dtype='object', length=440)


## **Echantillonage Features/Labels**

In [38]:
df = pickle.load(open("Data/data.pickle", "rb"))

In [39]:
liste_data = []

for idx, row in df.iterrows():
    
    liste_data.append((row.texte, row.tags))

In [40]:
liste_data[0][0]

"windows equivalent of 'touch' (i.e. the node.js way to create an index.html) on a windows machine i get this error 'touch' is not recognized as an internal or external command, operable program or batch file. when i follow the instructions to do: is there a windows equivalent of using 'touch'? do i need to create these files by hand (and modify them to change the timestamp) in order to implement this sort of command? that doesn't seem very ... node-ish..."

In [41]:
random.seed(47)
random.shuffle(liste_data)
random.shuffle(liste_data)

In [42]:
liste_data[0][0]

'can bash script be written inside a aws lambda function can i write a bash script inside a lambda function? i read in the aws docs that it can execute code written in python, nodejs and java 8. it is mentioned in some documents that it might be possible to use bash but there is no concrete evidence supporting it or any example'

In [43]:
X = [t[0] for t in liste_data]
y = [t[1] for t in liste_data]

In [44]:
# Sauvegardes
pickle_out = open("Data/X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()

pickle_out = open("Data/y.pickle", "wb")
pickle.dump(y, pickle_out)
pickle_out.close()