**Date :** Created on Tuesday January 12 2021  

**Group 8 - Innovation**

**queries_simulation_v0** 

**@author :** Damien Sonneville. 

**Description :** Simulation of requests for improvement of our models from scrapping data  art_tag and management and innovation keywords provided by the client.

- On the `art_tag` simulation of requests in 3 different ways:

> - `Naïve`: list without duplicates of all tags
> - `Random`: random draw without replacement on all the words present in the tags 
> - `By weight`: draw with the weight of each word in the tags with a maximum number of 10 repetitions of a word in the queries.

- On the `keywords`:

> - Random selection of N words with a maximum number of repetitions of a word in queries

# Part 1 : Install / Download / Import Librairy

## Download Librairy

In [1]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Boulanger\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Boulanger\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Import librairy

### - Usefull librairy :

In [2]:
import pandas as pd
import numpy as np
import spacy
import pickle
from tqdm import trange

### - Text librairy :

In [3]:
import re
import string
import unicodedata
from nltk.corpus import stopwords
from nltk import word_tokenize

# Part 2 : Data Loading

In [4]:
def Load_data(helper_path : str) -> pd.DataFrame:
    """Documentation
    
    Parameters :
        - helper_path : the file path

    Output (if exists) :
        - df : My Dataframe cleaned and reindexed

    """
    
    # Data Load with pandas librairy
    df = pd.read_json(helper_path)

    # Drop articles with no content
    df = df[df['art_content'] != '']

    # Reset my dataframe index
    df = df.reset_index(drop = True)
    
    # Returns my clean dataframe
    return df

In [5]:
def Load_Pickle(helper_path: str) -> pd.DataFrame:
    """Documentation
    
    Parameters :
        - helper_path : the file path

    Output (if exists) :
        - pick_file : My pickle file

    """

    # Open My file path
    with open(helper_path, 'rb') as f1:

        # Load Pickle file
        pick_file = pickle.load(f1)

        # Return Pickle file
        return pick_file

- **Phase 1 :** Load json Data file

In [6]:
# My file path for the fonction
Helper_path_F : str = 'C:/Users/Boulanger/Documents/interpromo_2021/df_deduplicated_v4.json'

# My DataFrame variable
First_data : pd.DataFrame = Load_data(Helper_path_F)

# To show my DataFrame
First_data.head(10)

Unnamed: 0,art_id,art_content,art_content_html,art_extract_datetime,art_lang,art_title,art_url,src_name,src_type,src_url,src_img,art_auth,art_tag
0,1,La FNCDG et l’ANDCDG ont publié en septembre l...,"<p style=""text-align: justify;"">La FNCDG et l’...",22 septembre 2020,fr,9ème édition du Panorama de l’emploi territorial,http://fncdg.com/9eme-edition-du-panorama-de-l...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2020/09/im...,,
1,2,Malgré la levée des mesures de confinement le ...,"<p style=""text-align: justify;"">Malgré la levé...",17 mars 2020,fr,ACTUALITÉS FNCDG / COVID19,http://fncdg.com/actualites-covid19/,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2020/03/co...,,
2,25,Quels étaient les objectifs poursuivis par le ...,"<p style=""text-align: justify;""><strong>Quels ...",24 octobre 2019,fr,"Interview de M. Olivier DUSSOPT, Secretaire d’...",http://fncdg.com/interview-de-m-olivier-dussop...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2019/10/in...,,
3,27,"La journée thématique, qui aura lieu durant le...","<p style=""text-align: justify;""><strong>La jo...",31 mai 2017,fr,Journée Thématique FNCDG « Les services de san...,http://fncdg.com/journee-thematique-fncdg-les-...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2017/05/pu...,,
4,28,La 1ère journée thématique en région sur le th...,"<p style=""text-align: justify;"">La 1<sup>ère</...",13 mars 2017,fr,Journée Thématique FNCDG « Vers de nouveaux mo...,http://fncdg.com/journee-thematique-fncdg-vers...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2017/03/Sa...,,
5,30,L’une des innovations de la loi n°2019-828 du ...,"<p style=""text-align: justify;"">L’une des inno...",22 octobre 2020,fr,La publication d’un guide d’accompagnement à l...,http://fncdg.com/la-publication-dun-guide-dacc...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2020/10/LG...,,
6,31,"La FNCDG mène, en collaboration avec d’autres ...","<p style=""text-align: justify;"">La FNCDG mène,...",10 décembre 2020,fr,La publication d’un guide de sensibilisation a...,http://fncdg.com/la-publication-dun-guide-de-s...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2020/12/im...,,
7,32,"Créé pour et par les décideurs territoriaux, É...","<p style=""text-align: justify;"">Créé pour et p...",24 février 2017,fr,Lancement du réseau Étoile,http://fncdg.com/lancement-du-reseau-etoile/,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2017/02/re...,,
8,34,Les décrets n°2017-397 et n°2017-318 du 24 mar...,"<p style=""text-align: justify;"">Les décrets n°...",5 avril 2017,fr,Le cadre d’emplois des agents de police munici...,http://fncdg.com/le-cadre-demplois-des-agents-...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2017/04/po...,,
9,35,Une candidate à un examen professionnel organi...,"<p style=""text-align: justify;"">Une candidate ...",6 juillet 2017,fr,Le Conseil d’Etat confirme la souveraineté des...,http://fncdg.com/le-conseil-detat-confirme-la-...,FNCDG,xpath_source,http://fncdg.com/actualites/,http://fncdg.com/wp-content/uploads/2017/07/Co...,,


In [7]:
# My file path for the fonction
Helper_path_S : str = 'C:/Users/Boulanger/Documents/interpromo_2021/usines-digitale.json'

# My DataFrame variable
Second_data : pd.DataFrame = Load_data(Helper_path_S,)

# Change type column
Second_data['art_tag'] : pd.DataFrame = Second_data['art_tag'].astype(str)
    
# To show my DataFrame
Second_data.head(10)

Unnamed: 0,art_content_html,art_content,art_published_datetime,art_lang,art_title,art_url,src_name,src_type,src_url,art_img,art_auth,art_tag,art_id
0,"<article class=""contenuArticle"" itemscope="""" i...","\n\n\nNiantic acquiert la start-up 6D.ai, spéc...",1585612800000,fr_FR,"Niantic acquiert la start-up 6D.ai, spécialist...",https://www.usine-digitale.fr/article/niantic-...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/6/1/...,[Arthur Le Denn],"['Acquisition', 'Réalité augmentée', 'Start-up']",g2_1_0
1,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nNiantic lève 245 millions de dollars pou...,1548028800000,fr_FR,Niantic lève 245 millions de dollars pour déve...,https://www.usine-digitale.fr/article/niantic-...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/5/5/...,[Julien Bergounhoux],"['Réalité augmentée', 'Start-up', 'Innovation']",g2_1_1
2,"<article class=""contenuArticle"" itemscope="""" i...","\n\n\nTwitch, Mixer, YouTube... La guerre du s...",1574294400000,fr_FR,"Twitch, Mixer, YouTube... La guerre du streami...",https://www.usine-digitale.fr/article/twitch-m...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/9/6/...,[Fabrice Deblock],"['Jeux Video', 'Streaming', 'Twitch']",g2_1_2
3,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nL'Armée française utilisera les drones d...,1610409600000,fr_FR,L'Armée française utilisera les drones de Parr...,https://www.usine-digitale.fr/article/l-armee-...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/4/...,[Alice Vitard],"['Drone', 'Aéronautique - Spatial', 'Parrot']",g2_1_3
4,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nAlibaba et Ant Group pourraient être nat...,1610409600000,fr_FR,Alibaba et Ant Group pourraient être nationali...,https://www.usine-digitale.fr/article/alibaba-...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/5/...,[Aude Chardenon],"['Politique', 'Digital Retail', 'e-commerce']",g2_1_4
5,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\n[CES 2021] Sony a débuté les tests sur r...,1610409600000,fr_FR,[CES 2021] Sony a débuté les tests sur route d...,https://www.usine-digitale.fr/article/ces-2021...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/5/...,[Léna Corot],"['Automobile', 'CES 2021', 'Mobilité']",g2_1_5
6,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nGetfluence lève 5 millions d'euros pour ...,1610409600000,fr_FR,Getfluence lève 5 millions d'euros pour étendr...,https://www.usine-digitale.fr/article/getfluen...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/9/0/...,[Aude Chardenon],"['Start-up', 'Marketing', 'French Tech']",g2_1_6
7,"<article class=""contenuArticle"" itemscope="""" i...","\n\n\nRoblox lève 520 millions de dollars, sa ...",1609977600000,fr_FR,"Roblox lève 520 millions de dollars, sa valori...",https://www.usine-digitale.fr/article/roblox-l...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/8/...,[Aude Chardenon],"['Jeux Video', 'Start-up', 'Financement']",g2_1_7
8,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nNintendo va acquérir le studio canadien ...,1609804800000,fr_FR,Nintendo va acquérir le studio canadien Next L...,https://www.usine-digitale.fr/article/nintendo...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/5/...,[Aude Chardenon],"['Acquisition', 'Nintendo', 'Jeux Video']",g2_1_8
9,"<article class=""contenuArticle"" itemscope="""" i...",\n\n\nGoogle et Disney animent les personnages...,1606176000000,fr_FR,Google et Disney animent les personnages de Th...,https://www.usine-digitale.fr/article/google-l...,L'Usine Digitale,xpath_source,usine-digitale.fr,https://www.usine-digitale.fr/mediatheque/4/8/...,[Aude Chardenon],"['Réalité augmentée', 'Entertainment', 'Loisir...",g2_1_9


- **Phase 2 :** Load Pickle Data file

In [9]:
# My file path for the fonction
Helper_path_FP : str = 'C:/Users\Boulanger/Documents/interpromo_2021/pickle_files/liste_mots_cles_innovation.p'

# My file path for the fonction
Helper_path_SP : str = 'C:/Users\Boulanger/Documents/interpromo_2021/pickle_files/liste_mots_cles_gestion.p'
    
# Load Keywords Pickle
List_keywords : list = Load_Pickle(Helper_path_FP)

# Extend Keywords Pickle
List_keywords.extend(Load_Pickle(Helper_path_SP))

# Part 3 : Data Preprocessing

In [12]:
def Cleanning(item: str, special_character : list) -> str :
    """Documentation
    
    Parameters :
        - item: all articles without the removal of unnecessary words
        (for example : "stopWords") in the column ["art_content"]
        - special_character: a list of specials characters to 
        remove to the articles in the column ["art_content"]

    Output (if exists) :
        - result: all articles without unnecessary words 
        and characters 
        
    """

    # Convert text to lowercase
    item : str = item.lower()

    # Remove mail
    item : str = re.sub("(\w+)@(\w+).(\w+)","",item)

    # Remove twitter name
    item : str = re.sub("@(\w+)","",item)

    # Remove site ".com"
    item : str = re.sub("(\S+).com(\S+)","",item)
    item : str = re.sub("(\S+).com","",item)
        
    # Remove site ".fr"   
    item : str = re.sub("(\S+).fr(\S+)","",item)
    item : str = re.sub("(\S+).fr","",item)

    # Remove numbers
    # item : str = re.sub(r'\d+', '', item)

    # Remove hastags
    item : str = re.sub("#(\w+)","",item)

    # Remove years
    # item : str = re.sub("en (\d+)","",item)
    
    # Remove punctuation
    item : str = item.translate(str.maketrans("", "", string.punctuation))
    
    # Step 1 : Remove French accents
    for i in range(len(item)):
    
        # Get the article
        try:
        
            # Transform to 'utf-8'
            item = unicode(item , 'utf-8')
    
        except NameError: # unicode is a default on python 3 
        
            pass
        
        # Remove the accents
        item = unicodedata.normalize('NFD', str(item)) \
                            .encode('ascii', 'ignore') \
                            .decode("utf-8")
    
    # Step 2 : Remove Special character 
    for i in special_character:
        
        item = item.replace(i, "")
    
    # Remove stopwords
    stop_words = set(nltk.corpus.stopwords.words('french'))
    tokens = word_tokenize(item)
    result = [i for i in tokens if not i in stop_words]
    result = " ".join(result)
    
    # Remove whitespaces
    result = result.strip()
    
    # Return my cleaned article
    return result

In [17]:
def Clean_column(data: pd.DataFrame, name_column: str, spe_charact : list) -> pd.DataFrame:
    """Documentation
    
    Parameters :
        - data: DataFrame containing the column we want to clean
        - name_column: name of the column to clean

    Output (if exists) :
        - df: My Dataframe cleaned and reindexed
        
    """
    
    # Selected column for clean
    df: pd.DataFrame = data[[name_column]]
    
    # My initialized column
    df['clean']: pd.DataFrame = ''
    
    # Remove missing data
    df: pd.DataFrame = df.dropna()
    
    # Reset index DataFrame
    df: pd.DataFrame = df.reset_index(drop = True)
    
    # Step 1 : Cleanning each rows
    for i in trange(len(df), position=0, leave=True):
        
        # Call Cleanning functiion
        df.loc[i, 'clean'] = Cleanning(df.loc[i, name_column], \
                                       Special_character)

    # Remove NaN data
    df["clean"]: pd.DataFrame = df["clean"].replace("", np.NaN)
    
    # Remove missing data
    df: pd.DataFrame = df.dropna()
    
    # Reset index DataFrame
    df: pd.DataFrame = df.reset_index(drop=True)
    
    # Return my clean column
    return df

In [18]:
# Create my list of special characters
Special_character: list = ["!", "\"", "#", "$", "%", "&", "\\", "(", \
                           ")", "*", "+", ",", "-", ".", "/", ":", ";", \
                           "<", "=", ">", "?", "@", "[", "]", "^", "_", \
                           "{", "|", "}", "~", "«", "»", "’", "•", "…", \
                           "â", "€", "™", "—", "�", "–", "“", "”"]
    

# Cleanning my column 
First_df_tag : pd.DataFrame = Clean_column(First_data, 'art_tag', Special_character)
    
# Cleanning my column 
Second_df_tag : pd.DataFrame = Clean_column(Second_data, 'art_tag', Special_character)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
100%|████████████████████████████████████████████████████████████████████████████| 6185/6185 [00:06<00:00, 1000.84it/s]
100%|████████████████████████████████████████████████████████████████████████████| 1602/1602 [00:01<00:00, 1074.29it/s]


# Part 4 : Data Creation 

- **Phase 1 :** Words list creation

In [24]:
def Create_list_of_words(list_tag: list) -> list:
    """Documentation
    
    Parameters :
        - list_tag: list of all tag without the removal of unnecessary words
        (for example : "stopWords") in the column ["art_tag"]

    Output (if exists) :
        - list_word: list of all the words in the tags 
        
    """
    
    # My initialized words list
    list_word : list = []
    
    # Step 1 : route each tag
    for tag in list_tag:
        
        # Step 2 : learn each words
        for word in word_tokenize(tag):
            
            # Save the word not in
            if (len(word) > 1) and \
                (word not in list_word):
                
                # My list saving word
                list_word.append(word)
    
    # Return my list word
    return list_word

- **Phase 2 :** Dictionnary creation

In [34]:
def Create_dictionnary(list_tag: list, list_to_del: list = []) -> dict:
    """Documentation
    
    Parameters :
        - list_tag: list of all tag without the removal of unnecessary words
        (for example : "stopWords") in the column ["art_tag"]
        - list_to_del: list of words that have a repetition greater than 10 in queries,
          for which we don't recalculate the weight

    Output (if exists) :
        - dict_word: dictionnary of all words with their weight in the tags
    """
    
    # My initialized sum
    total : int = 0
    
    # Call fonction for my list
    list_word_unique : list = Create_list_of_words(list_tag)
    
    # Check the contents of the list
    if list_to_del != []:
        
        # Step 1 : route each word
        for elmt in list_to_del:
            
            # Remove duplicate word
            del list_word_unique[list_word_unique.index(elmt)]
    
    # Create my word Dictionnary
    dict_word : dict = {el: 0 for el in list_word_unique}
    
    # Tokenize all word
    list_all_word : list = word_tokenize(' '.join(list_tag))
    
    # Step 2 : Dictionnary implement
    for word in list_all_word:
        
        # Check word in my list
        if word in list_word_unique:
            
            # Add to my dictionnary
            dict_word[word] += 1
    
    # Step 3 : sum implement
    for key, value in dict_word.items():
        
        # Add value to my sum
        total += value
    
    # Step 4 : Dictionnary implement
    for key, value in dict_word.items():
        
        # Add value to my dictionnary
        dict_word[key] = value/total
    
    # Return my dictionnary
    return dict_word

# Part 5 : Requests Creation

- **Phase 1 :** Initial requests list creation

**Function Description (To help understanding) :**  
- We create a list of requests with random choice based on the words in the tags and without replace.

- After this creation, we create requests based on the weight of the words in the tags.

In [35]:
def Create_requests(data: pd.DataFrame,
                    nb_word: int,
                    tag_only: bool = False,
                    random: bool = False,
                    weight: bool = False,
                    nb_requests: int = 500) -> list:
    
    """Documentation
    
    Parameters :
        - data: dataframe containing tags clean
        - nb_word: number of words we want in each request
        - tag_only: True if we want to create random requests based on the tags
        - random: True if we want to create random requests based on the words 
        in the tags and without replace,
        - wheight: True if we want to create requests based on 
        the weight of the words in the tags
        - nb_requests: number of requests we want for requests based on 
        the weight of the words in the tags

    Output (if exists) :
        - list_requests : list of requests
        
    """
    
    # list of requests with tag
    list_tag : pd.DataFrame  = data['clean'].unique().tolist()
    
    # Check condition
    if tag_only:
        
        # My request variable
        list_requests = list_tag

    # create list of all the words in the tags
    list_word_unique : list = Create_list_of_words(list_tag)

    # Check condition
    if random:
        
        # My initialized list
        list_requests : list = []
        
        # Step 1 : check length condition
        while len(list_word_unique) >= nb_word:
            
            # Choice a random list
            temp : list = np.random.choice(list_word_unique, \
                                           nb_word, \
                                           replace=False)
            
            # Step 2 : check length condition
            for elmt in temp:
                
                # Remove duplicate word
                del list_word_unique[list_word_unique.index(elmt)]
            
            # Adding request list
            temp = ' '.join(temp)
            list_requests.append(temp)
        
        # Check condition
        if list_word_unique != []:
            
            # Adding request list
            list_requests.append(' '.join(list_word_unique))

    # Check condition
    if weight:
        
        # My initialized list
        list_requests : list = []
        
        # My initialized list
        words_to_del : list = []
        
        # Fonction application to create dictionnary
        dict_word : dict = Create_dictionnary(list_tag)
        
        # New dictionnary for requests
        dict_word_in_requests : dict = {el: 0 for el in list_word_unique}
        
        # Step 3 : check length condition
        while (len(dict_word) >= nb_word) and \
                len(list_requests) < nb_requests:
            
            # Choice a new random list
            temp = np.random.choice(list(dict_word.keys()), 
                                    nb_word, 
                                    replace = False, 
                                    p = list(dict_word.values()))
            
            # Step 4 : check all element in my list
            for elmt in temp:
                
                # Add element to my dict
                dict_word_in_requests[elmt] += 1
                
                # Check condition
                if dict_word_in_requests[elmt] == 10:
                    
                    # Add my revome list
                    words_to_del.append(elmt)
                    
                    # Fonction application to create last dictionnary
                    dict_word = Create_dictionnary(list_tag, words_to_del)
            
            # Adding request list        
            temp = ' '.join(temp)
            list_requests.append(temp)
    
    # Return my completed requests list
    return list_requests

- **Phase 2 :** Keywords requests list creation

In [41]:
def Create_requests_keywords(list_keywords: list,
                             nb_word: int,
                             nb_words_repetition: int,
                             Special_character: list) -> list:
    
    """Documentation
    
    Parameters :
        - list_keywords: list of keywords
        - nb_word: number of words we want in each request
        - nb_words_repetition: maximum number of repetitions of a word in queries

    Output (if exists) :
        - list_requests: my list of requests based to the keywords
    """

    # My Keywords DataFrame
    df_keywords : pd.DataFrame = pd.DataFrame(list_keywords, \
                                              columns=['keywords'])
    
    # My cleaned DataFrame
    df_clean : pd.DataFrame = Clean_column(df_keywords, \
                                           'keywords', \
                                           Special_character)
        
    # My words list
    list_word : list = df_clean['clean'].unique().tolist()
    
    # Initialized my requests list
    list_requests : list = []
    
    # Create words dictionnary
    dict_word_in_requests : dict = {el: 0 for el in list_word}
    
    # Step 1 : check length condition
    while len(list_word) >= nb_word:
        
        # Choice a random list
        temp = np.random.choice(list_word, nb_word, replace=False)
        
        # Step 2 : check all element in my list
        for elmt in temp:
            
            # Add element to my dict
            dict_word_in_requests[elmt] += 1
            
            # Check condition
            if dict_word_in_requests[elmt] == nb_words_repetition:
                
                # Remove duplicate word
                del list_word[list_word.index(elmt)]
        
        # Adding request list
        temp = ' '.join(temp)
        list_requests.append(temp)
    
    # Check condition
    if list_word != []:
        
        # Adding request list
        list_requests.append(' '.join(list_word))
    
    # Return my completed requests list
    return list_requests

# Part 6 : Requests Application

In [43]:
# Create my basics requests
First_req_tag = Create_requests(First_df_tag, 7, tag_only = True)
Second_req_tag = Create_requests(Second_df_tag, 7, tag_only = True)
First_req_rand = Create_requests(First_df_tag, 7, random = True)
Second_req_rand = Create_requests(Second_df_tag, 7, random = True)
First_req_weight = Create_requests(First_df_tag, 7, weight = True, nb_requests = 1000)
Second_req_weight = Create_requests(Second_df_tag, 7, weight = True)

# Create my keywords request
Request_keywords = Create_requests_keywords(List_keywords, 5, 10, Special_character)

100%|██████████████████████████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 1323.64it/s]


- **Optionnal Application :** print all requests to see the contains

In [46]:
# Basics Requests
# print(request_tag[:10])
# print(request_tag2[:10])
# print(request_random_word[:10])
# print(request_random_word2[:10])
# print(request_word_weight[:10])
# print(request_word_weight2[:10])

# Keywords Requests
# print(request_keywords[:10])

# Part 7 : Requests list creation

In [47]:
List_request = Create_requests_keywords(List_keywords, 5, 10, Special_character)
List_request.extend(Create_requests(Second_df_tag, 7, weight=True))
List_request.extend(Create_requests(First_df_tag, 7, weight=True, nb_requests=1000))

100%|██████████████████████████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 1422.82it/s]


# Part 8 : Export Data

In [49]:
with open('list_queries_simulation', 'wb') as f1:
    pickle.dump(List_request, f1)