# INF8111 - Fouille de données


## TP1 ÉTÉ 2020 - Système de recommandation

##### Membres de l'équipe:

    - Kacem Khaled
    - Oumayma Messoussi
    - Semah Aissaoui


## Résumé

*Stack Exchange* est un réseau de sites chacun contenant des discussions sur un sujet. Une discussion, ou *thread* en anglais, contient une question et potentiellement plusieurs réponses et commentaires. Dans ce TP, vous implémenterez un système de recommandations qui retourne les discussions en rapport à une question spécifique. Avant de soumettre une question, le site utilisera ce système de recommandations pour proposer les discussions les plus similaires afin de limiter le nombre de discussions dupliquées.


## 2 - Installation

Pour ce TP, vous aurez besoin des librairies `numpy`, `sklearn` et `scipy` (que vous avez sans doute déjà), ainsi que la librairie `nltk`, qui est une libraire utilisée pour faire du traitement du language (Natural Language Processing, NLP)

Installez les libraires en question et exécutez le code ci-dessous :

In [14]:
# Si vous le souhaitez, vous pouvez utiliser anaconda

# pip install --user numpy
# pip install --user sklearn
# pip install --user scipy
# pip install --user nltk


#python
import numpy as np
import scipy as sp
import nltk
nltk.download("punkt")

[nltk_data] Downloading package punkt to C:\Users\Sameh
[nltk_data]     Aissaoui\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## 3 - Jeu de données

Téléchargez l'archive à l'adresse suivante: https://drive.google.com/file/d/1032N1oZkytHlHs20AXE9jPMBQCTyhb6H/view?usp=sharing

L'archive contient:
1. test.json: Ce fichier contient les nouvelles questions avec les discussions pertinentes associées.
2. threads: Ce dossier contient le code HTML des discussions. Chaque fichier HTML est nommé selon le motif **thread_id.html**.

L'image ci-dessous illustre un exemple de discussion :

![thread_img](thread_example.png)

- A : le sujet de la question
- B : le corps de la question
- C : les commentaires de la question
- D : le corps de la réponse
- E : les commentaires de la réponse



In [15]:
import os

# define the folder path that contain the data
# FOLDER_PATH = "Define folder path that contain threads folder and test.json"
FOLDER_PATH ="C:/Users/Sameh Aissaoui/Desktop/Nouveau PC/Maitrise/Cours/Été 2020/INF8111/Labs/dataset"#"dataset/"
THREAD_FOLDER = os.path.join(FOLDER_PATH, 'threads')


# Load the evaluation dataset
import json


test = json.load(open(os.path.join(FOLDER_PATH, "test.json")))
relevant_threads_by_query = dict()


for (query_id, cand_id, label) in test: 
    if label == 'Irrelevant':
        continue
        
    l = relevant_threads_by_query.setdefault(query_id, [])
    l.append(cand_id)

# 4 - Web scraping

"Le *web scraping* (parfois appelé harvesting) est une technique d'extraction du contenu de sites Web, via un script ou un programme, dans le but de le transformer pour permettre son utilisation dans un autre contexte, par exemple le référencement." [Wikipedia](https://fr.wikipedia.org/wiki/Web_scraping)

## 4.1 - Question 1 (0.5 point)

Les caractères spéciaux et non-ASCII peuvent être encodés avec leur représentation HTML. Par exemple, l'apostrophe (') est encodée par **\&AMP;**. L'encodage des pages web n’est pas uniforme, seuls les caractères spéciaux et non-ASCII sont encodés avec leur représentation HTML. Cela sera rectifié en convertissant les représentations HTML en caractère UTF-8. Par exemple, **\&AMP;** sera remplacé par **'**.

Implémentez la fonction fix_encoding qui convertit les représentations HTML en caractères UTF-8.


In [16]:
def fix_encoding(text):
    """
    Encodes the html entities in a text into UTF-8 encoding. For instance, "I&apos;m ..." => "I'm ..."
    
    :param text: string.
    :return: fixed text(sting)
    """
    text=[text]
    text=[w.replace("&apos;","'")for w in text]
    new_text=''
#     for w in text:
#         new_text+=w
    return new_text.join(text) #new_text
        
    

In [17]:
print(fix_encoding(" I&apos;m Semah Aissaoui and I&apos;m 25 years old, I&apos;m working with oumayma and kacem."))
print(fix_encoding(" "))
print(fix_encoding(" I&apos;m &"))
print(fix_encoding(" &apos; "))


 I'm Semah Aissaoui and I'm 25 years old, I'm working with oumayma and kacem.
 
 I'm &
 ' 


## 4.2 - Question 2 (3 points)

Implémentez la fonction extract_data_from_page. Cette fonction extrait le sujet de la question, son corps, ses commentaires, le corps des réponses et leurs commentaires des pages HTML. La fonction retourne un dictionnaire avec la structure suivante : *{"thread_id": int,"question":{"subject": string, "body": string, "comments": [string]}, answers: [{"body": string, "comments": [string]}]}*

**Utilisez la fonction fix_encoding pour convertir le texte. Vouz pouvez utiliser la bibliothèque [Beatiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) pour cette question. Vous devez retirer toutes HTML du texte de la question, des commentaires et des réponses. **


In [27]:
import numpy as np
from bs4 import BeautifulSoup
def extract_data_from_page(pagepath):
    """
    Scrap question, answer and comments from thread page.
    
    :param pagepath: the path of thread html file.
    :return: 
        {
            "thread_id": thread id,
            "question":{
                "subject": question subject text (Area A in the figure), 
                "body": question body text (Area B in the figure), 
                "comments": list of comment texts (Area C in the figure)
                }, 
            "answers": [
                {
                    "body": answer body text (Area D in the figure),
                    "comments": list of answer texts (Area E in the figure)
                }
                ]
            }
    """
    soup = BeautifulSoup(open(pagepath,encoding='utf8')) #/100724500520.html:4 answers #/100021749708.html : one answer
#     data={}
#     question_details={}
#     # The thread id
#     data["thread_id"] = pagepath.split('/')[-1][:-5]

#     # The question subject
#     subject=soup.find('a', class_="question-hyperlink").get_text()
#     # print("The subject of the question is : ",subject)
#     question_details["subject"]=subject
#     # The question body
#     body_question=soup.find('div',class_='question').find(class_='post-text').get_text()
#     # print("The body of the question is : ",body_question)
#     question_details["body"]=body_question
#     # The question comments
#     comments=[]
#     for comment in soup.find('div',class_='question').find_all('span',class_='comment-copy'):
#         comments=np.append(comments,comment.get_text())
#     # for c in comments:
#     #     print(c)
#     question_details["comments"]=comments
#     data["question"]=question_details

#     # The answer body    
#     answers=[]
#     for ans in soup.find_all('div',class_='answer'):
#         answer_comments=[]
#         answer={}
#         answer["body"]=ans.find(class_='post-text').get_text()
#     #     print(ans.find(class_='post-text').get_text())
#         # The answer comments
#         for com_ans in ans.find_all('span',class_='comment-copy'):
#             answer_comments=np.append(answer_comments,com_ans.get_text())
#         answer["comments"]=answer_comments
#     #         print(com_ans.get_text())
#         answers=np.append(answers,answer)
#     # print(answers)
#     data["answers"]=answers
    data = {}
    answer = {}
    data['thread_id'] = pagepath.split('/')[-1][:-5]
    data['question'] = {}
    soup = BeautifulSoup(open(pagepath,encoding='utf8'))
    question =  soup.find("div", class_="question")
    answers =  soup.find_all("div", class_="answer")
    data['question']['subject'] = soup.find("a", class_="question-hyperlink").get_text()
    data['question']['body'] = question.find("div", class_="postcell post-layout--right").find("div", class_="post-text").get_text().strip()
    data['question']['comments'] = [s.get_text().strip() for s in question.find_all("span", class_="comment-copy")]
    data['answers'] = []
    for ans in answers:
        answer['body'] = ans.find("div", class_="post-text").get_text().strip()
        answer['comments'] = [s.get_text().strip() for s in ans.find_all("span", class_="comment-copy")]
        data['answers'].append(answer)
        answer = {}
    return data
    

In [28]:
# pagepath=THREAD_FOLDER+'/100724500520.html'
# data_web=extract_data_from_page(pagepath)
# print(data_web["question"]["subject"])
# print(data_web["question"]["body"])
# print(data_web["question"]["comments"])
# print(data_web["answers"][0]["body"])
# print(data_web["answers"][0]["comments"])

## 4.3 - Extraire le texte du code HTML

In [29]:
import os
from multiprocessing import Pool, TimeoutError
from time import time
from tqdm import tqdm
import json
# Index each thread by its id
index_path = os.path.join(THREAD_FOLDER, 'threads.json')
if os.path.isfile(index_path):
    # Load threads that webpage content were already extracted.
    thread_index = json.load(open(index_path))
else:
    # Extract webpage content
    # This can be slow (around 30 minutes). Test your code with a small sample. lxml parse is faster than html.parser
    files = (os.path.join(THREAD_FOLDER, filename) for filename in os.listdir(THREAD_FOLDER))
    threads = map(extract_data_from_page, files)
    print('Ok')
    thread_index = dict(((thread['thread_id'], thread) for thread in tqdm(threads)))
    # Save preprocessed threads
    json.dump(thread_index, open(index_path,'w'))
    

Ok



0it [00:00, ?it/s]
1it [00:00,  9.69it/s]
2it [00:00,  6.58it/s]
4it [00:00,  7.42it/s]
6it [00:00,  8.33it/s]
7it [00:00,  8.63it/s]
9it [00:01,  9.41it/s]
10it [00:01,  7.46it/s]
11it [00:01,  7.63it/s]
13it [00:01,  8.50it/s]
15it [00:01,  9.24it/s]
17it [00:01,  8.06it/s]
19it [00:02,  8.73it/s]
20it [00:02,  9.03it/s]
22it [00:02,  9.48it/s]
24it [00:02, 10.10it/s]
26it [00:02,  8.49it/s]
28it [00:03,  8.84it/s]
30it [00:03,  9.29it/s]
31it [00:03,  9.44it/s]
33it [00:03,  8.64it/s]
34it [00:03,  8.94it/s]
35it [00:03,  8.84it/s]
36it [00:04,  7.78it/s]
37it [00:04,  7.74it/s]
39it [00:04,  7.42it/s]
41it [00:04,  8.16it/s]
42it [00:04,  8.45it/s]
43it [00:04,  8.56it/s]
44it [00:05,  8.24it/s]
45it [00:05,  7.84it/s]
47it [00:05,  7.57it/s]
48it [00:05,  8.02it/s]
49it [00:05,  8.08it/s]
51it [00:05,  8.69it/s]
53it [00:06,  9.46it/s]
54it [00:06,  6.91it/s]
55it [00:06,  7.60it/s]
57it [00:06,  8.40it/s]
58it [00:06,  8.72it/s]
59it [00:06,  9.03it/s]
60it [00:06,  8.99it/s]
61

693it [01:45,  6.43it/s]
694it [01:46,  5.04it/s]
695it [01:46,  5.52it/s]
696it [01:46,  5.54it/s]
697it [01:46,  5.88it/s]
698it [01:46,  6.01it/s]
699it [01:47,  6.14it/s]
700it [01:47,  6.65it/s]
701it [01:47,  5.05it/s]
702it [01:47,  5.70it/s]
703it [01:47,  6.50it/s]
704it [01:47,  6.47it/s]
705it [01:48,  6.77it/s]
706it [01:48,  7.09it/s]
707it [01:48,  6.64it/s]
708it [01:48,  6.57it/s]
709it [01:48,  5.40it/s]
710it [01:48,  5.80it/s]
712it [01:49,  6.34it/s]
713it [01:49,  6.80it/s]
714it [01:49,  6.88it/s]
715it [01:49,  6.75it/s]
716it [01:49,  7.04it/s]
717it [01:49,  5.50it/s]
718it [01:50,  5.84it/s]
719it [01:50,  6.24it/s]
720it [01:50,  6.26it/s]
721it [01:50,  6.52it/s]
722it [01:50,  6.43it/s]
723it [01:50,  6.83it/s]
724it [01:51,  6.17it/s]
725it [01:51,  4.98it/s]
726it [01:51,  5.37it/s]
727it [01:51,  5.86it/s]
728it [01:51,  5.99it/s]
729it [01:51,  6.54it/s]
730it [01:51,  7.23it/s]
731it [01:52,  6.91it/s]
732it [01:52,  7.02it/s]
733it [01:52,  5.34it/s]


1339it [03:32,  6.83it/s]
1340it [03:32,  7.04it/s]
1341it [03:32,  5.36it/s]
1342it [03:33,  5.74it/s]
1343it [03:33,  5.83it/s]
1344it [03:33,  6.53it/s]
1345it [03:33,  6.60it/s]
1346it [03:33,  6.37it/s]
1347it [03:33,  6.32it/s]
1348it [03:33,  6.24it/s]
1349it [03:34,  4.75it/s]
1350it [03:34,  5.12it/s]
1351it [03:34,  5.59it/s]
1352it [03:34,  5.95it/s]
1353it [03:34,  6.30it/s]
1354it [03:34,  6.48it/s]
1355it [03:35,  6.46it/s]
1356it [03:35,  4.94it/s]
1357it [03:35,  5.20it/s]
1358it [03:35,  5.56it/s]
1359it [03:35,  6.22it/s]
1360it [03:36,  6.12it/s]
1361it [03:36,  6.09it/s]
1362it [03:36,  6.37it/s]
1363it [03:36,  6.60it/s]
1364it [03:36,  6.83it/s]
1365it [03:36,  5.47it/s]
1366it [03:37,  5.88it/s]
1367it [03:37,  6.32it/s]
1368it [03:37,  6.08it/s]
1369it [03:37,  5.44it/s]
1370it [03:37,  5.71it/s]
1371it [03:37,  5.95it/s]
1372it [03:38,  6.39it/s]
1373it [03:38,  4.90it/s]
1374it [03:38,  5.30it/s]
1375it [03:38,  5.92it/s]
1376it [03:38,  6.34it/s]
1377it [03:3

1976it [05:18,  5.89it/s]
1977it [05:18,  6.36it/s]
1978it [05:18,  6.42it/s]
1979it [05:18,  6.21it/s]
1980it [05:18,  6.36it/s]
1981it [05:18,  6.49it/s]
1982it [05:18,  6.90it/s]
1983it [05:19,  5.20it/s]
1984it [05:19,  5.68it/s]
1985it [05:19,  5.46it/s]
1986it [05:19,  5.90it/s]
1987it [05:19,  6.35it/s]
1988it [05:19,  6.77it/s]
1989it [05:20,  7.03it/s]
1990it [05:20,  6.77it/s]
1991it [05:20,  4.81it/s]
1992it [05:20,  5.27it/s]
1993it [05:20,  5.90it/s]
1994it [05:21,  5.71it/s]
1995it [05:21,  5.92it/s]
1996it [05:21,  6.39it/s]
1997it [05:21,  6.82it/s]
1998it [05:21,  6.93it/s]
1999it [05:21,  6.87it/s]
2000it [05:21,  7.00it/s]
2001it [05:22,  5.31it/s]
2002it [05:22,  5.54it/s]
2003it [05:22,  6.13it/s]
2004it [05:22,  5.89it/s]
2005it [05:22,  6.29it/s]
2006it [05:22,  6.43it/s]
2007it [05:23,  6.82it/s]
2008it [05:23,  7.35it/s]
2009it [05:23,  5.36it/s]
2010it [05:23,  5.64it/s]
2011it [05:23,  5.97it/s]
2012it [05:23,  6.43it/s]
2013it [05:24,  6.76it/s]
2014it [05:2

2610it [07:03,  3.84it/s]
2611it [07:04,  3.71it/s]
2612it [07:04,  3.75it/s]
2613it [07:05,  2.91it/s]
2614it [07:05,  3.12it/s]
2615it [07:05,  3.71it/s]
2616it [07:05,  4.27it/s]
2618it [07:05,  4.80it/s]
2619it [07:06,  4.00it/s]
2620it [07:06,  3.02it/s]
2621it [07:07,  3.20it/s]
2622it [07:07,  3.39it/s]
2623it [07:07,  3.39it/s]
2624it [07:07,  3.52it/s]
2625it [07:08,  3.65it/s]
2626it [07:08,  3.66it/s]
2627it [07:08,  4.13it/s]
2628it [07:08,  3.72it/s]
2629it [07:09,  3.04it/s]
2630it [07:09,  3.23it/s]
2631it [07:09,  3.62it/s]
2632it [07:09,  4.02it/s]
2633it [07:10,  4.56it/s]
2634it [07:10,  5.03it/s]
2635it [07:10,  5.20it/s]
2636it [07:10,  4.66it/s]
2637it [07:10,  5.07it/s]
2638it [07:11,  5.33it/s]
2639it [07:11,  5.84it/s]
2640it [07:11,  6.21it/s]
2641it [07:11,  6.18it/s]
2642it [07:11,  6.10it/s]
2643it [07:11,  4.81it/s]
2644it [07:12,  5.15it/s]
2645it [07:12,  5.79it/s]
2646it [07:12,  6.04it/s]
2647it [07:12,  6.15it/s]
2648it [07:12,  6.05it/s]
2649it [07:1

3255it [08:53,  6.88it/s]
3256it [08:53,  7.14it/s]
3257it [08:53,  7.28it/s]
3259it [08:53,  8.03it/s]
3260it [08:53,  7.55it/s]
3261it [08:53,  6.71it/s]
3262it [08:54,  5.14it/s]
3263it [08:54,  5.50it/s]
3264it [08:54,  6.08it/s]
3265it [08:54,  6.70it/s]
3266it [08:54,  7.01it/s]
3267it [08:54,  7.29it/s]
3268it [08:54,  7.22it/s]
3269it [08:55,  7.20it/s]
3270it [08:55,  5.64it/s]
3271it [08:55,  6.47it/s]
3272it [08:55,  6.74it/s]
3273it [08:55,  7.34it/s]
3274it [08:55,  7.74it/s]
3275it [08:55,  8.21it/s]
3276it [08:56,  7.54it/s]
3277it [08:56,  7.98it/s]
3278it [08:56,  6.12it/s]
3279it [08:56,  6.60it/s]
3280it [08:56,  7.11it/s]
3281it [08:56,  7.04it/s]
3282it [08:56,  7.22it/s]
3283it [08:57,  6.94it/s]
3284it [08:57,  7.22it/s]
3285it [08:57,  5.29it/s]
3286it [08:57,  5.85it/s]
3287it [08:57,  6.67it/s]
3288it [08:57,  7.22it/s]
3290it [08:58,  7.79it/s]
3291it [08:58,  7.76it/s]
3292it [08:58,  7.97it/s]
3293it [08:58,  8.18it/s]
3295it [08:58,  6.74it/s]
3297it [08:5

3897it [10:39,  5.13it/s]
3898it [10:39,  5.66it/s]
3899it [10:39,  5.82it/s]
3900it [10:39,  6.10it/s]
3901it [10:40,  6.60it/s]
3902it [10:40,  6.64it/s]
3903it [10:40,  6.75it/s]
3904it [10:40,  6.87it/s]
3905it [10:40,  5.18it/s]
3906it [10:40,  5.57it/s]
3907it [10:41,  5.66it/s]
3908it [10:41,  6.33it/s]
3909it [10:41,  5.27it/s]
3910it [10:41,  5.71it/s]
3912it [10:42,  5.37it/s]
3913it [10:42,  5.98it/s]
3914it [10:42,  6.41it/s]
3915it [10:42,  6.53it/s]
3916it [10:42,  6.59it/s]
3917it [10:42,  6.76it/s]
3918it [10:42,  6.91it/s]
3919it [10:42,  6.81it/s]
3920it [10:43,  5.15it/s]
3921it [10:43,  5.37it/s]
3922it [10:43,  5.73it/s]
3923it [10:43,  6.14it/s]
3924it [10:43,  6.07it/s]
3925it [10:44,  6.52it/s]
3926it [10:44,  6.70it/s]
3927it [10:44,  6.45it/s]
3928it [10:44,  4.72it/s]
3929it [10:44,  5.36it/s]
3930it [10:44,  6.04it/s]
3931it [10:45,  5.98it/s]
3932it [10:45,  6.27it/s]
3933it [10:45,  6.63it/s]
3934it [10:45,  6.72it/s]
3935it [10:45,  6.81it/s]
3936it [10:4

4537it [12:28,  4.16it/s]
4539it [12:28,  4.92it/s]
4540it [12:28,  5.29it/s]
4541it [12:28,  5.52it/s]
4542it [12:28,  4.60it/s]
4543it [12:29,  5.10it/s]
4544it [12:29,  5.67it/s]
4545it [12:29,  6.34it/s]
4546it [12:29,  6.37it/s]
4547it [12:29,  6.48it/s]
4548it [12:29,  6.49it/s]
4549it [12:29,  6.36it/s]
4550it [12:30,  6.17it/s]
4551it [12:30,  4.75it/s]
4552it [12:30,  5.15it/s]
4553it [12:30,  5.32it/s]
4554it [12:30,  5.95it/s]
4555it [12:31,  6.14it/s]
4556it [12:31,  6.21it/s]
4557it [12:31,  6.73it/s]
4558it [12:31,  7.20it/s]
4559it [12:31,  5.00it/s]
4560it [12:31,  5.36it/s]
4561it [12:32,  5.55it/s]
4562it [12:32,  5.83it/s]
4563it [12:32,  6.06it/s]
4564it [12:32,  6.71it/s]
4565it [12:32,  6.64it/s]
4566it [12:32,  5.09it/s]
4567it [12:33,  5.93it/s]
4568it [12:33,  6.26it/s]
4569it [12:33,  6.24it/s]
4570it [12:33,  6.49it/s]
4571it [12:33,  6.83it/s]
4572it [12:33,  6.50it/s]
4573it [12:33,  6.29it/s]
4574it [12:34,  4.80it/s]
4575it [12:34,  4.71it/s]
4576it [12:3

5178it [14:16,  6.12it/s]
5179it [14:16,  6.31it/s]
5180it [14:16,  6.52it/s]
5181it [14:16,  6.82it/s]
5182it [14:16,  6.96it/s]
5183it [14:17,  6.57it/s]
5184it [14:17,  6.44it/s]
5185it [14:17,  4.98it/s]
5186it [14:17,  5.45it/s]
5187it [14:17,  5.30it/s]
5188it [14:17,  5.99it/s]
5189it [14:18,  6.07it/s]
5190it [14:18,  5.72it/s]
5191it [14:18,  6.33it/s]
5192it [14:18,  6.44it/s]
5193it [14:18,  4.79it/s]
5194it [14:19,  5.46it/s]
5195it [14:19,  6.20it/s]
5196it [14:19,  6.01it/s]
5197it [14:19,  6.52it/s]
5198it [14:19,  6.74it/s]
5199it [14:19,  7.00it/s]
5200it [14:19,  7.12it/s]
5201it [14:19,  7.35it/s]
5202it [14:20,  4.94it/s]
5203it [14:20,  5.42it/s]
5204it [14:20,  6.14it/s]
5205it [14:20,  6.23it/s]
5206it [14:20,  6.30it/s]
5207it [14:21,  5.79it/s]
5208it [14:21,  5.79it/s]
5209it [14:21,  6.09it/s]
5210it [14:21,  4.69it/s]
5211it [14:21,  5.18it/s]
5212it [14:22,  5.49it/s]
5213it [14:22,  5.93it/s]
5214it [14:22,  6.10it/s]
5215it [14:22,  6.04it/s]
5216it [14:2

5817it [16:02,  6.62it/s]
5818it [16:02,  6.71it/s]
5819it [16:02,  6.27it/s]
5820it [16:03,  4.69it/s]
5821it [16:03,  5.55it/s]
5822it [16:03,  6.00it/s]
5823it [16:03,  6.14it/s]
5824it [16:03,  6.36it/s]
5825it [16:03,  5.81it/s]
5826it [16:04,  6.06it/s]
5827it [16:04,  5.97it/s]
5828it [16:04,  4.90it/s]
5829it [16:04,  5.42it/s]
5830it [16:04,  5.65it/s]
5831it [16:04,  6.43it/s]
5832it [16:05,  6.46it/s]
5833it [16:05,  6.50it/s]
5834it [16:05,  6.62it/s]
5835it [16:05,  6.95it/s]
5836it [16:05,  7.29it/s]
5837it [16:05,  7.27it/s]
5838it [16:06,  5.19it/s]
5839it [16:06,  5.68it/s]
5840it [16:06,  5.97it/s]
5841it [16:06,  6.42it/s]
5842it [16:06,  6.20it/s]
5843it [16:06,  5.35it/s]
5844it [16:06,  6.12it/s]
5845it [16:07,  4.88it/s]
5846it [16:07,  5.28it/s]
5847it [16:07,  5.98it/s]
5848it [16:07,  6.00it/s]
5849it [16:07,  6.15it/s]
5850it [16:08,  6.41it/s]
5851it [16:08,  6.33it/s]
5852it [16:08,  6.52it/s]
5853it [16:08,  6.64it/s]
5854it [16:08,  4.91it/s]
5855it [16:0

6462it [17:51,  5.51it/s]
6463it [17:52,  5.75it/s]
6464it [17:52,  5.83it/s]
6465it [17:52,  6.26it/s]
6466it [17:52,  6.51it/s]
6467it [17:52,  6.74it/s]
6468it [17:52,  6.97it/s]
6469it [17:52,  7.30it/s]
6470it [17:53,  7.23it/s]
6471it [17:53,  4.74it/s]
6472it [17:53,  5.19it/s]
6473it [17:53,  5.30it/s]
6474it [17:53,  5.76it/s]
6475it [17:53,  6.30it/s]
6476it [17:54,  6.27it/s]
6477it [17:54,  6.16it/s]
6478it [17:54,  4.48it/s]
6479it [17:54,  4.84it/s]
6480it [17:55,  5.35it/s]
6481it [17:55,  6.04it/s]
6482it [17:55,  6.07it/s]
6483it [17:55,  6.18it/s]
6484it [17:55,  6.61it/s]
6485it [17:55,  6.56it/s]
6486it [17:55,  6.78it/s]
6487it [17:56,  5.01it/s]
6488it [17:56,  5.34it/s]
6489it [17:56,  5.36it/s]
6490it [17:56,  5.83it/s]
6491it [17:56,  5.97it/s]
6492it [17:56,  6.64it/s]
6493it [17:57,  6.43it/s]
6494it [17:57,  6.91it/s]
6495it [17:57,  5.07it/s]
6496it [17:57,  5.25it/s]
6497it [17:57,  5.75it/s]
6498it [17:58,  5.82it/s]
6499it [17:58,  6.17it/s]
6500it [17:5

7097it [19:39,  5.99it/s]
7098it [19:39,  6.78it/s]
7099it [19:40,  4.96it/s]
7100it [19:40,  5.62it/s]
7101it [19:40,  5.87it/s]
7102it [19:40,  6.08it/s]
7103it [19:40,  5.71it/s]
7104it [19:40,  5.93it/s]
7105it [19:41,  6.52it/s]
7106it [19:41,  4.68it/s]
7107it [19:41,  4.97it/s]
7108it [19:41,  5.45it/s]
7109it [19:41,  5.64it/s]
7110it [19:42,  6.10it/s]
7111it [19:42,  6.33it/s]
7112it [19:42,  6.30it/s]
7113it [19:42,  6.48it/s]
7114it [19:42,  5.15it/s]
7115it [19:42,  5.60it/s]
7116it [19:43,  5.76it/s]
7117it [19:43,  6.13it/s]
7118it [19:43,  6.25it/s]
7119it [19:43,  6.76it/s]
7120it [19:43,  6.82it/s]
7121it [19:43,  6.81it/s]
7122it [19:43,  6.64it/s]
7123it [19:44,  5.01it/s]
7124it [19:44,  5.30it/s]
7125it [19:44,  5.85it/s]
7126it [19:44,  5.79it/s]
7127it [19:44,  5.84it/s]
7128it [19:45,  6.03it/s]
7129it [19:45,  5.62it/s]
7130it [19:45,  6.04it/s]
7131it [19:45,  6.23it/s]
7132it [19:45,  4.91it/s]
7133it [19:46,  5.36it/s]
7134it [19:46,  5.56it/s]
7135it [19:4

7730it [21:27,  6.65it/s]
7731it [21:27,  6.59it/s]
7732it [21:27,  6.96it/s]
7733it [21:27,  4.94it/s]
7734it [21:27,  5.74it/s]
7735it [21:28,  5.86it/s]
7736it [21:28,  6.03it/s]
7737it [21:28,  6.17it/s]
7738it [21:28,  6.47it/s]
7739it [21:28,  6.44it/s]
7740it [21:28,  6.53it/s]
7741it [21:28,  6.77it/s]
7742it [21:29,  4.92it/s]
7743it [21:29,  5.46it/s]
7744it [21:29,  5.95it/s]
7745it [21:29,  6.21it/s]
7746it [21:29,  6.24it/s]
7747it [21:29,  6.36it/s]
7748it [21:30,  6.52it/s]
7749it [21:30,  6.67it/s]
7750it [21:30,  4.98it/s]
7751it [21:30,  5.58it/s]
7752it [21:30,  5.81it/s]
7753it [21:31,  5.78it/s]
7754it [21:31,  6.14it/s]
7755it [21:31,  6.26it/s]
7756it [21:31,  6.41it/s]
7757it [21:31,  6.53it/s]
7758it [21:31,  4.99it/s]
7759it [21:32,  5.41it/s]
7760it [21:32,  5.86it/s]
7761it [21:32,  6.02it/s]
7762it [21:32,  6.26it/s]
7763it [21:32,  6.47it/s]
7764it [21:32,  5.70it/s]
7765it [21:33,  6.06it/s]
7766it [21:33,  6.24it/s]
7767it [21:33,  4.83it/s]
7768it [21:3

8367it [23:17,  5.84it/s]
8368it [23:17,  6.07it/s]
8369it [23:17,  6.10it/s]
8370it [23:17,  6.08it/s]
8372it [23:18,  6.58it/s]
8373it [23:18,  4.76it/s]
8374it [23:18,  5.54it/s]
8375it [23:18,  5.80it/s]
8376it [23:18,  5.96it/s]
8377it [23:19,  6.34it/s]
8378it [23:19,  6.39it/s]
8379it [23:19,  6.59it/s]
8380it [23:19,  6.53it/s]
8381it [23:19,  6.73it/s]
8382it [23:19,  5.02it/s]
8383it [23:20,  5.53it/s]
8384it [23:20,  6.19it/s]
8385it [23:20,  6.55it/s]
8386it [23:20,  6.71it/s]
8387it [23:20,  6.55it/s]
8388it [23:20,  6.66it/s]
8389it [23:20,  6.88it/s]
8390it [23:21,  4.93it/s]
8391it [23:21,  5.58it/s]
8392it [23:21,  5.34it/s]
8393it [23:21,  5.84it/s]
8394it [23:21,  6.26it/s]
8395it [23:21,  6.27it/s]
8396it [23:22,  6.31it/s]
8397it [23:22,  6.63it/s]
8398it [23:22,  6.91it/s]
8399it [23:22,  6.70it/s]
8400it [23:22,  4.79it/s]
8401it [23:23,  5.26it/s]
8402it [23:23,  5.94it/s]
8403it [23:23,  6.39it/s]
8404it [23:23,  6.65it/s]
8405it [23:23,  5.85it/s]
8406it [23:2

9004it [25:04,  5.97it/s]
9005it [25:04,  6.04it/s]
9006it [25:05,  6.03it/s]
9007it [25:05,  6.43it/s]
9008it [25:05,  6.74it/s]
9009it [25:05,  4.60it/s]
9010it [25:05,  5.21it/s]
9011it [25:05,  5.95it/s]
9012it [25:06,  5.81it/s]
9013it [25:06,  6.29it/s]
9014it [25:06,  6.46it/s]
9015it [25:06,  6.82it/s]
9016it [25:06,  6.84it/s]
9017it [25:06,  7.01it/s]
9018it [25:07,  4.99it/s]
9019it [25:07,  5.44it/s]
9020it [25:07,  5.82it/s]
9021it [25:07,  5.17it/s]
9022it [25:07,  5.72it/s]
9023it [25:07,  6.12it/s]
9024it [25:08,  6.42it/s]
9025it [25:08,  6.24it/s]
9026it [25:08,  4.64it/s]
9027it [25:08,  5.31it/s]
9028it [25:08,  5.61it/s]
9029it [25:09,  5.71it/s]
9030it [25:09,  6.23it/s]
9031it [25:09,  6.62it/s]
9032it [25:09,  6.42it/s]
9033it [25:09,  6.72it/s]
9034it [25:09,  7.25it/s]
9035it [25:10,  4.89it/s]
9036it [25:10,  5.22it/s]
9037it [25:10,  5.50it/s]
9038it [25:10,  5.64it/s]
9039it [25:10,  5.83it/s]
9040it [25:10,  6.43it/s]
9041it [25:11,  6.38it/s]
9042it [25:1

9640it [26:52,  5.11it/s]
9641it [26:52,  5.54it/s]
9642it [26:52,  5.89it/s]
9643it [26:52,  6.15it/s]
9644it [26:53,  6.15it/s]
9645it [26:53,  6.19it/s]
9646it [26:53,  6.42it/s]
9647it [26:53,  6.78it/s]
9648it [26:53,  6.61it/s]
9649it [26:53,  4.91it/s]
9650it [26:54,  5.27it/s]
9651it [26:54,  5.43it/s]
9652it [26:54,  5.66it/s]
9653it [26:54,  6.24it/s]
9654it [26:54,  6.40it/s]
9655it [26:54,  6.66it/s]
9656it [26:54,  6.91it/s]
9657it [26:55,  4.79it/s]
9658it [26:55,  5.43it/s]
9659it [26:55,  5.94it/s]
9660it [26:55,  6.22it/s]
9661it [26:55,  6.71it/s]
9662it [26:55,  6.84it/s]
9663it [26:56,  7.10it/s]
9664it [26:56,  7.02it/s]
9665it [26:56,  6.91it/s]
9666it [26:56,  7.53it/s]
9667it [26:56,  5.90it/s]
9669it [26:56,  6.91it/s]
9671it [26:57,  7.81it/s]
9673it [26:57,  8.17it/s]
9674it [26:57,  7.62it/s]
9675it [26:57,  6.98it/s]
9676it [26:58,  4.48it/s]
9678it [26:58,  5.20it/s]
9679it [26:58,  5.53it/s]
9680it [26:58,  5.92it/s]
9681it [26:58,  6.55it/s]
9682it [26:5

10272it [28:41,  6.46it/s]
10273it [28:42,  6.34it/s]
10274it [28:42,  4.18it/s]
10275it [28:42,  4.91it/s]
10276it [28:42,  5.41it/s]
10277it [28:42,  5.82it/s]
10278it [28:43,  5.63it/s]
10279it [28:43,  5.97it/s]
10280it [28:43,  6.45it/s]
10281it [28:43,  6.78it/s]
10282it [28:43,  5.09it/s]
10283it [28:44,  5.76it/s]
10284it [28:44,  5.86it/s]
10285it [28:44,  6.21it/s]
10286it [28:44,  6.40it/s]
10287it [28:44,  6.50it/s]
10288it [28:44,  6.51it/s]
10289it [28:44,  6.99it/s]
10290it [28:45,  6.95it/s]
10291it [28:45,  7.00it/s]
10292it [28:45,  4.83it/s]
10293it [28:45,  5.44it/s]
10294it [28:45,  5.57it/s]
10295it [28:45,  5.63it/s]
10296it [28:46,  6.45it/s]
10297it [28:46,  6.54it/s]
10298it [28:46,  6.55it/s]
10299it [28:46,  6.15it/s]
10300it [28:46,  4.47it/s]
10301it [28:47,  4.86it/s]
10302it [28:47,  5.71it/s]
10303it [28:47,  6.03it/s]
10304it [28:47,  5.98it/s]
10305it [28:47,  6.24it/s]
10306it [28:47,  6.57it/s]
10307it [28:47,  6.73it/s]
10308it [28:48,  4.48it/s]
1

10884it [30:27,  5.54it/s]
10885it [30:27,  5.91it/s]
10886it [30:27,  6.40it/s]
10887it [30:27,  6.89it/s]
10888it [30:28,  4.56it/s]
10889it [30:28,  5.09it/s]
10890it [30:28,  5.48it/s]
10891it [30:28,  5.82it/s]
10892it [30:28,  6.03it/s]
10893it [30:28,  6.26it/s]
10894it [30:28,  6.34it/s]
10895it [30:29,  6.49it/s]
10896it [30:29,  6.56it/s]
10897it [30:29,  4.76it/s]
10898it [30:29,  5.19it/s]
10899it [30:29,  5.73it/s]
10900it [30:29,  5.85it/s]
10901it [30:30,  5.49it/s]
10902it [30:30,  5.76it/s]
10903it [30:30,  5.73it/s]
10904it [30:30,  5.74it/s]
10905it [30:31,  4.54it/s]
10906it [30:31,  5.18it/s]
10907it [30:31,  5.44it/s]
10908it [30:31,  5.82it/s]
10909it [30:31,  6.28it/s]
10910it [30:31,  6.69it/s]
10911it [30:31,  6.47it/s]
10912it [30:32,  6.86it/s]
10913it [30:32,  6.89it/s]
10914it [30:32,  4.96it/s]
10915it [30:32,  5.27it/s]
10916it [30:32,  5.77it/s]
10917it [30:32,  6.15it/s]
10918it [30:33,  6.60it/s]
10919it [30:33,  6.71it/s]
10920it [30:33,  6.91it/s]
1

11495it [32:10,  5.26it/s]
11496it [32:11,  5.57it/s]
11497it [32:11,  6.05it/s]
11498it [32:11,  6.57it/s]
11499it [32:11,  6.44it/s]
11500it [32:11,  7.00it/s]
11501it [32:11,  7.53it/s]
11502it [32:11,  7.66it/s]
11503it [32:11,  7.50it/s]
11504it [32:12,  4.87it/s]
11505it [32:12,  5.17it/s]
11506it [32:12,  5.64it/s]
11507it [32:12,  5.86it/s]
11508it [32:12,  6.23it/s]
11509it [32:13,  6.06it/s]
11510it [32:13,  6.43it/s]
11511it [32:13,  6.60it/s]
11512it [32:13,  6.76it/s]
11513it [32:13,  6.72it/s]
11514it [32:14,  4.99it/s]
11515it [32:14,  5.45it/s]
11516it [32:14,  5.43it/s]
11518it [32:14,  6.21it/s]
11520it [32:14,  7.20it/s]
11522it [32:15,  6.51it/s]
11523it [32:15,  6.58it/s]
11524it [32:15,  6.88it/s]
11525it [32:15,  7.08it/s]
11526it [32:15,  7.49it/s]
11527it [32:15,  6.72it/s]
11528it [32:15,  6.45it/s]
11529it [32:16,  6.72it/s]
11530it [32:16,  4.55it/s]
11531it [32:16,  5.24it/s]
11532it [32:16,  5.59it/s]
11533it [32:16,  6.13it/s]
11534it [32:17,  6.02it/s]
1

12113it [33:58,  4.78it/s]
12114it [33:58,  5.11it/s]
12115it [33:59,  5.45it/s]
12116it [33:59,  5.79it/s]
12117it [33:59,  6.17it/s]
12119it [33:59,  5.43it/s]
12120it [33:59,  6.23it/s]
12121it [34:00,  6.33it/s]
12122it [34:00,  6.37it/s]
12123it [34:00,  6.21it/s]
12124it [34:00,  6.45it/s]
12125it [34:00,  6.64it/s]
12126it [34:00,  6.67it/s]
12127it [34:01,  7.00it/s]
12128it [34:01,  7.40it/s]
12129it [34:01,  4.77it/s]
12130it [34:01,  5.65it/s]
12131it [34:01,  5.77it/s]
12132it [34:01,  6.02it/s]
12133it [34:02,  6.10it/s]
12134it [34:02,  6.25it/s]
12135it [34:02,  5.74it/s]
12136it [34:02,  5.91it/s]
12137it [34:02,  6.17it/s]
12138it [34:03,  4.35it/s]
12139it [34:03,  5.15it/s]
12140it [34:03,  5.19it/s]
12141it [34:03,  5.28it/s]
12142it [34:03,  6.12it/s]
12143it [34:03,  6.28it/s]
12144it [34:04,  6.21it/s]
12145it [34:04,  6.66it/s]
12146it [34:04,  7.37it/s]
12147it [34:04,  4.97it/s]
12148it [34:04,  5.27it/s]
12149it [34:04,  5.47it/s]
12150it [34:05,  5.53it/s]
1

12732it [35:42,  7.00it/s]
12733it [35:42,  7.14it/s]
12734it [35:42,  6.20it/s]
12735it [35:43,  4.44it/s]
12736it [35:43,  4.76it/s]
12737it [35:43,  5.14it/s]
12738it [35:43,  5.94it/s]
12739it [35:43,  5.85it/s]
12740it [35:43,  6.14it/s]
12741it [35:44,  6.25it/s]
12742it [35:44,  6.53it/s]
12743it [35:44,  6.47it/s]
12744it [35:44,  4.52it/s]
12745it [35:44,  4.90it/s]
12746it [35:45,  4.85it/s]
12747it [35:45,  5.48it/s]
12748it [35:45,  6.01it/s]
12749it [35:45,  6.19it/s]
12750it [35:45,  6.71it/s]
12751it [35:45,  6.75it/s]
12752it [35:45,  6.74it/s]
12753it [35:46,  4.66it/s]
12754it [35:46,  5.27it/s]
12755it [35:46,  5.46it/s]
12756it [35:46,  5.81it/s]
12757it [35:47,  5.29it/s]
12758it [35:47,  5.53it/s]
12759it [35:47,  6.23it/s]
12760it [35:47,  6.74it/s]
12761it [35:47,  7.19it/s]
12762it [35:47,  7.03it/s]
12763it [35:48,  4.70it/s]
12764it [35:48,  5.23it/s]
12765it [35:48,  5.92it/s]
12766it [35:48,  6.30it/s]
12767it [35:48,  6.53it/s]
12768it [35:48,  6.78it/s]
1

13357it [37:27,  5.45it/s]
13358it [37:27,  5.76it/s]
13359it [37:28,  5.49it/s]
13360it [37:28,  5.70it/s]
13361it [37:28,  5.99it/s]
13362it [37:28,  6.29it/s]
13364it [37:28,  6.90it/s]
13365it [37:28,  6.88it/s]
13366it [37:28,  6.80it/s]
13367it [37:29,  4.41it/s]
13368it [37:29,  5.10it/s]
13370it [37:29,  5.70it/s]
13371it [37:29,  5.36it/s]
13372it [37:30,  5.94it/s]
13374it [37:30,  6.51it/s]
13375it [37:30,  4.66it/s]
13376it [37:30,  5.30it/s]
13377it [37:30,  5.68it/s]
13378it [37:31,  6.18it/s]
13379it [37:31,  6.95it/s]
13380it [37:31,  6.89it/s]
13381it [37:31,  7.16it/s]
13382it [37:31,  7.76it/s]
13383it [37:31,  7.80it/s]
13384it [37:31,  8.26it/s]
13385it [37:31,  7.55it/s]
13386it [37:32,  4.68it/s]
13387it [37:32,  5.30it/s]
13388it [37:32,  5.81it/s]
13389it [37:32,  6.13it/s]
13390it [37:32,  6.45it/s]
13391it [37:33,  6.56it/s]
13392it [37:33,  6.17it/s]
13394it [37:33,  6.75it/s]
13395it [37:33,  4.52it/s]
13396it [37:34,  4.96it/s]
13397it [37:34,  5.53it/s]
1

13987it [39:13,  6.57it/s]
13988it [39:13,  7.24it/s]
13989it [39:13,  7.57it/s]
13990it [39:13,  7.24it/s]
13991it [39:14,  4.48it/s]
13992it [39:14,  4.90it/s]
13993it [39:14,  5.28it/s]
13994it [39:14,  6.10it/s]
13995it [39:14,  6.25it/s]
13996it [39:14,  6.57it/s]
13997it [39:15,  6.63it/s]
13998it [39:15,  7.31it/s]
13999it [39:15,  7.40it/s]
14000it [39:15,  4.89it/s]
14001it [39:15,  5.47it/s]
14002it [39:15,  5.82it/s]
14003it [39:16,  5.82it/s]
14005it [39:16,  6.49it/s]
14006it [39:16,  6.35it/s]
14007it [39:16,  6.54it/s]
14008it [39:16,  7.06it/s]
14009it [39:17,  4.56it/s]
14010it [39:17,  5.32it/s]
14011it [39:17,  5.65it/s]
14012it [39:17,  6.12it/s]
14013it [39:17,  5.88it/s]
14014it [39:17,  6.42it/s]
14015it [39:17,  6.75it/s]
14016it [39:18,  6.84it/s]
14017it [39:18,  6.41it/s]
14018it [39:18,  4.64it/s]
14019it [39:18,  5.15it/s]
14021it [39:19,  5.78it/s]
14022it [39:19,  5.98it/s]
14023it [39:19,  6.43it/s]
14024it [39:19,  7.03it/s]
14025it [39:19,  7.17it/s]
1

14621it [40:58,  5.94it/s]
14622it [40:58,  6.66it/s]
14623it [40:59,  6.91it/s]
14624it [40:59,  6.94it/s]
14625it [40:59,  6.51it/s]
14626it [40:59,  6.86it/s]
14627it [40:59,  7.06it/s]
14628it [40:59,  6.86it/s]
14629it [41:00,  6.76it/s]
14630it [41:00,  7.18it/s]
14631it [41:00,  4.55it/s]
14632it [41:00,  5.24it/s]
14633it [41:00,  5.67it/s]
14634it [41:00,  6.15it/s]
14635it [41:01,  6.37it/s]
14636it [41:01,  6.65it/s]
14637it [41:01,  6.81it/s]
14638it [41:01,  6.81it/s]
14639it [41:01,  6.76it/s]
14640it [41:02,  4.62it/s]
14641it [41:02,  5.33it/s]
14642it [41:02,  5.77it/s]
14643it [41:02,  6.14it/s]
14644it [41:02,  6.32it/s]
14645it [41:02,  6.37it/s]
14646it [41:02,  6.51it/s]
14647it [41:02,  6.96it/s]
14648it [41:03,  6.90it/s]
14649it [41:03,  6.84it/s]
14650it [41:03,  4.49it/s]
14651it [41:03,  5.35it/s]
14652it [41:03,  5.96it/s]
14653it [41:04,  6.21it/s]
14654it [41:04,  6.45it/s]
14655it [41:04,  6.88it/s]
14656it [41:04,  7.04it/s]
14657it [41:04,  6.83it/s]
1

15249it [42:43,  4.57it/s]
15250it [42:43,  5.24it/s]
15252it [42:44,  5.83it/s]
15253it [42:44,  5.94it/s]
15254it [42:44,  6.15it/s]
15255it [42:44,  6.62it/s]
15256it [42:44,  6.55it/s]
15257it [42:44,  6.64it/s]
15258it [42:45,  6.54it/s]
15259it [42:45,  7.18it/s]
15260it [42:45,  4.43it/s]
15262it [42:45,  5.27it/s]
15263it [42:45,  5.64it/s]
15264it [42:46,  5.98it/s]
15265it [42:46,  6.59it/s]
15266it [42:46,  6.86it/s]
15267it [42:46,  6.51it/s]
15268it [42:46,  6.68it/s]
15269it [42:46,  7.40it/s]
15270it [42:47,  4.02it/s]
15271it [42:47,  4.54it/s]
15272it [42:47,  5.01it/s]
15273it [42:47,  5.38it/s]
15275it [42:47,  5.98it/s]
15276it [42:48,  5.98it/s]
15277it [42:48,  6.51it/s]
15278it [42:48,  4.50it/s]
15279it [42:48,  4.93it/s]
15280it [42:48,  5.59it/s]
15281it [42:49,  5.96it/s]
15282it [42:49,  6.27it/s]
15283it [42:49,  6.43it/s]
15284it [42:49,  6.82it/s]
15285it [42:49,  7.10it/s]
15286it [42:49,  6.92it/s]
15287it [42:50,  4.44it/s]
15288it [42:50,  5.19it/s]
1

15881it [44:28,  5.76it/s]
15882it [44:28,  5.81it/s]
15883it [44:29,  6.35it/s]
15884it [44:29,  6.64it/s]
15885it [44:29,  6.78it/s]
15886it [44:29,  6.66it/s]
15887it [44:29,  7.18it/s]
15888it [44:30,  4.59it/s]
15890it [44:30,  5.18it/s]
15891it [44:30,  5.81it/s]
15892it [44:30,  6.04it/s]
15893it [44:30,  6.56it/s]
15894it [44:30,  6.24it/s]
15895it [44:31,  6.19it/s]
15896it [44:31,  6.51it/s]
15897it [44:31,  4.47it/s]
15898it [44:31,  5.11it/s]
15899it [44:31,  5.68it/s]
15900it [44:31,  5.99it/s]
15901it [44:32,  5.78it/s]
15902it [44:32,  6.16it/s]
15903it [44:32,  6.54it/s]
15904it [44:32,  6.55it/s]
15905it [44:32,  6.92it/s]
15906it [44:32,  7.35it/s]
15907it [44:32,  7.51it/s]
15908it [44:33,  4.87it/s]
15909it [44:33,  5.15it/s]
15910it [44:33,  5.48it/s]
15911it [44:33,  5.86it/s]
15912it [44:33,  6.39it/s]
15913it [44:33,  6.96it/s]
15914it [44:34,  6.89it/s]
15915it [44:34,  6.94it/s]
15916it [44:34,  7.63it/s]
15917it [44:34,  7.44it/s]
15918it [44:34,  5.00it/s]
1

16509it [46:13,  6.91it/s]
16510it [46:13,  7.05it/s]
16511it [46:13,  6.84it/s]
16512it [46:14,  6.70it/s]
16513it [46:14,  4.69it/s]
16514it [46:14,  4.87it/s]
16515it [46:14,  5.40it/s]
16516it [46:14,  5.81it/s]
16517it [46:14,  6.23it/s]
16518it [46:15,  6.62it/s]
16519it [46:15,  7.13it/s]
16520it [46:15,  6.97it/s]
16521it [46:15,  7.39it/s]
16522it [46:15,  6.96it/s]
16523it [46:15,  6.94it/s]
16524it [46:16,  4.68it/s]
16525it [46:16,  5.41it/s]
16526it [46:16,  5.03it/s]
16527it [46:16,  5.75it/s]
16528it [46:16,  6.50it/s]
16529it [46:16,  6.87it/s]
16530it [46:17,  7.10it/s]
16531it [46:17,  6.91it/s]
16532it [46:17,  7.11it/s]
16533it [46:17,  6.93it/s]
16534it [46:17,  4.70it/s]
16535it [46:17,  4.99it/s]
16536it [46:18,  5.52it/s]
16537it [46:18,  5.94it/s]
16538it [46:18,  6.60it/s]
16539it [46:18,  6.79it/s]
16540it [46:18,  6.79it/s]
16541it [46:18,  6.68it/s]
16542it [46:18,  6.69it/s]
16543it [46:19,  4.39it/s]
16544it [46:19,  5.28it/s]
16545it [46:19,  5.80it/s]
1

17133it [47:58,  6.60it/s]
17134it [47:58,  4.27it/s]
17135it [47:58,  4.87it/s]
17136it [47:58,  5.36it/s]
17137it [47:59,  5.84it/s]
17138it [47:59,  6.09it/s]
17139it [47:59,  6.45it/s]
17140it [47:59,  6.36it/s]
17141it [47:59,  6.78it/s]
17142it [47:59,  7.00it/s]
17143it [47:59,  6.80it/s]
17144it [48:00,  4.61it/s]
17145it [48:00,  5.02it/s]
17146it [48:00,  5.58it/s]
17147it [48:00,  6.14it/s]
17148it [48:00,  6.30it/s]
17149it [48:01,  6.12it/s]
17150it [48:01,  6.86it/s]
17151it [48:01,  7.15it/s]
17152it [48:01,  7.06it/s]
17153it [48:01,  4.51it/s]
17154it [48:01,  5.06it/s]
17155it [48:02,  5.13it/s]
17156it [48:02,  5.84it/s]
17157it [48:02,  5.41it/s]
17158it [48:02,  5.57it/s]
17159it [48:02,  6.35it/s]
17160it [48:02,  6.01it/s]
17161it [48:03,  6.30it/s]
17162it [48:03,  6.63it/s]
17163it [48:03,  4.38it/s]
17164it [48:03,  4.88it/s]
17165it [48:03,  5.65it/s]
17166it [48:04,  5.94it/s]
17167it [48:04,  6.14it/s]
17168it [48:04,  6.25it/s]
17169it [48:04,  6.68it/s]
1

17756it [49:45,  5.06it/s]
17757it [49:45,  5.35it/s]
17758it [49:46,  5.62it/s]
17759it [49:46,  5.82it/s]
17760it [49:46,  6.12it/s]
17761it [49:46,  5.99it/s]
17762it [49:46,  6.60it/s]
17764it [49:46,  6.85it/s]
17765it [49:47,  4.72it/s]
17766it [49:47,  5.29it/s]
17767it [49:47,  5.97it/s]
17768it [49:47,  6.04it/s]
17769it [49:47,  6.59it/s]
17770it [49:47,  6.45it/s]
17771it [49:48,  5.90it/s]
17772it [49:48,  6.40it/s]
17773it [49:48,  5.77it/s]
17774it [49:48,  3.99it/s]
17775it [49:49,  4.58it/s]
17776it [49:49,  4.84it/s]
17777it [49:49,  5.57it/s]
17778it [49:49,  6.02it/s]
17779it [49:49,  6.04it/s]
17780it [49:49,  6.44it/s]
17781it [49:49,  7.02it/s]
17782it [49:50,  7.25it/s]
17783it [49:50,  6.81it/s]
17784it [49:50,  6.91it/s]
17785it [49:50,  4.71it/s]
17786it [49:50,  4.83it/s]
17787it [49:51,  5.29it/s]
17788it [49:51,  5.35it/s]
17789it [49:51,  5.88it/s]
17790it [49:51,  6.57it/s]
17791it [49:51,  6.65it/s]
17792it [49:51,  6.59it/s]
17793it [49:51,  6.55it/s]
1

18379it [51:32,  6.03it/s]
18380it [51:32,  6.24it/s]
18381it [51:32,  6.08it/s]
18382it [51:32,  6.60it/s]
18383it [51:32,  6.80it/s]
18384it [51:33,  6.82it/s]
18385it [51:33,  7.14it/s]
18386it [51:33,  4.70it/s]
18387it [51:33,  5.24it/s]
18388it [51:33,  5.35it/s]
18389it [51:34,  5.75it/s]
18390it [51:34,  6.08it/s]
18391it [51:34,  6.25it/s]
18392it [51:34,  6.76it/s]
18393it [51:34,  6.55it/s]
18394it [51:34,  6.59it/s]
18395it [51:35,  4.36it/s]
18396it [51:35,  4.94it/s]
18397it [51:35,  5.50it/s]
18398it [51:35,  5.48it/s]
18399it [51:35,  5.65it/s]
18400it [51:35,  5.35it/s]
18401it [51:36,  5.68it/s]
18402it [51:36,  6.12it/s]
18403it [51:36,  6.51it/s]
18404it [51:36,  4.36it/s]
18405it [51:36,  4.91it/s]
18406it [51:37,  5.70it/s]
18407it [51:37,  5.97it/s]
18408it [51:37,  6.07it/s]
18409it [51:37,  6.47it/s]
18410it [51:37,  7.05it/s]
18411it [51:37,  6.10it/s]
18412it [51:37,  6.25it/s]
18413it [51:38,  6.50it/s]
18414it [51:38,  4.31it/s]
18416it [51:38,  5.19it/s]
1

19004it [53:20,  5.24it/s]
19005it [53:20,  5.57it/s]
19006it [53:20,  6.08it/s]
19007it [53:21,  6.24it/s]
19008it [53:21,  6.35it/s]
19009it [53:21,  6.14it/s]
19010it [53:21,  6.08it/s]
19011it [53:21,  4.30it/s]
19012it [53:22,  4.81it/s]
19013it [53:22,  5.09it/s]
19014it [53:22,  5.29it/s]
19015it [53:22,  5.50it/s]
19016it [53:22,  5.80it/s]
19017it [53:22,  6.07it/s]
19018it [53:23,  6.36it/s]
19019it [53:23,  6.59it/s]
19020it [53:23,  4.40it/s]
19021it [53:23,  4.28it/s]
19022it [53:23,  4.99it/s]
19023it [53:24,  5.56it/s]
19024it [53:24,  5.72it/s]
19025it [53:24,  5.90it/s]
19026it [53:24,  6.13it/s]
19027it [53:24,  6.26it/s]
19028it [53:24,  6.43it/s]
19029it [53:25,  4.26it/s]
19030it [53:25,  4.78it/s]
19031it [53:25,  5.19it/s]
19032it [53:25,  5.86it/s]
19033it [53:25,  5.92it/s]
19034it [53:26,  5.73it/s]
19035it [53:26,  6.21it/s]
19036it [53:26,  6.73it/s]
19037it [53:26,  6.33it/s]
19038it [53:26,  4.06it/s]
19039it [53:27,  4.51it/s]
19040it [53:27,  4.97it/s]
1

19622it [55:07,  6.41it/s]
19623it [55:07,  4.37it/s]
19624it [55:07,  4.86it/s]
19625it [55:07,  5.65it/s]
19626it [55:08,  6.31it/s]
19627it [55:08,  6.55it/s]
19628it [55:08,  6.79it/s]
19629it [55:08,  6.78it/s]
19630it [55:08,  6.77it/s]
19631it [55:08,  6.90it/s]
19632it [55:08,  6.52it/s]
19633it [55:09,  6.85it/s]
19634it [55:09,  4.39it/s]
19635it [55:09,  5.05it/s]
19636it [55:09,  4.86it/s]
19637it [55:09,  5.37it/s]
19638it [55:10,  5.72it/s]
19639it [55:10,  5.74it/s]
19640it [55:10,  6.02it/s]
19641it [55:10,  6.40it/s]
19642it [55:10,  4.25it/s]
19643it [55:11,  4.93it/s]
19644it [55:11,  5.33it/s]
19645it [55:11,  5.77it/s]
19646it [55:11,  6.01it/s]
19647it [55:11,  6.39it/s]
19648it [55:11,  6.69it/s]
19649it [55:11,  6.89it/s]
19650it [55:12,  6.76it/s]
19651it [55:12,  6.39it/s]
19652it [55:12,  6.99it/s]
19653it [55:12,  4.54it/s]
19654it [55:12,  5.25it/s]
19655it [55:13,  5.72it/s]
19656it [55:13,  5.87it/s]
19657it [55:13,  6.37it/s]
19658it [55:13,  6.44it/s]
1

20240it [56:55,  5.94it/s]
20241it [56:55,  6.06it/s]
20242it [56:56,  5.87it/s]
20243it [56:56,  5.93it/s]
20244it [56:56,  5.97it/s]
20245it [56:56,  6.24it/s]
20246it [56:56,  6.91it/s]
20247it [56:57,  4.44it/s]
20248it [56:57,  5.24it/s]
20249it [56:57,  6.02it/s]
20250it [56:57,  6.27it/s]
20251it [56:57,  6.29it/s]
20252it [56:57,  6.39it/s]
20253it [56:57,  6.85it/s]
20254it [56:58,  7.32it/s]
20255it [56:58,  7.54it/s]
20256it [56:58,  7.73it/s]
20257it [56:58,  8.16it/s]
20258it [56:58,  7.70it/s]
20259it [56:58,  4.86it/s]
20260it [56:59,  5.48it/s]
20261it [56:59,  5.74it/s]
20262it [56:59,  5.77it/s]
20263it [56:59,  6.21it/s]
20264it [56:59,  6.35it/s]
20265it [56:59,  6.30it/s]
20266it [56:59,  7.08it/s]
20267it [57:00,  7.02it/s]
20268it [57:00,  6.44it/s]
20269it [57:00,  6.61it/s]
20270it [57:00,  3.97it/s]
20271it [57:01,  4.71it/s]
20272it [57:01,  5.37it/s]
20273it [57:01,  5.63it/s]
20274it [57:01,  5.96it/s]
20275it [57:01,  6.54it/s]
20276it [57:01,  6.62it/s]
2

20872it [58:42,  5.98it/s]
20873it [58:42,  6.26it/s]
20874it [58:42,  6.66it/s]
20875it [58:42,  6.65it/s]
20876it [58:43,  6.14it/s]
20877it [58:43,  6.89it/s]
20878it [58:43,  5.97it/s]
20879it [58:43,  4.22it/s]
20880it [58:44,  4.69it/s]
20881it [58:44,  4.79it/s]
20882it [58:44,  5.54it/s]
20883it [58:44,  6.34it/s]
20884it [58:44,  6.71it/s]
20885it [58:44,  6.67it/s]
20886it [58:44,  6.61it/s]
20887it [58:45,  5.88it/s]
20888it [58:45,  6.34it/s]
20889it [58:45,  4.26it/s]
20890it [58:45,  4.39it/s]
20891it [58:46,  4.84it/s]
20892it [58:46,  5.25it/s]
20893it [58:46,  5.70it/s]
20894it [58:46,  5.65it/s]
20895it [58:46,  6.12it/s]
20896it [58:46,  5.61it/s]
20897it [58:46,  6.08it/s]
20898it [58:47,  4.19it/s]
20899it [58:47,  4.63it/s]
20900it [58:47,  5.01it/s]
20901it [58:47,  5.47it/s]
20902it [58:47,  6.01it/s]
20903it [58:48,  5.90it/s]
20904it [58:48,  6.34it/s]
20905it [58:48,  6.12it/s]
20906it [58:48,  6.47it/s]
20907it [58:48,  4.44it/s]
20908it [58:49,  4.90it/s]
2

21479it [1:00:28,  4.44it/s]
21480it [1:00:28,  4.91it/s]
21481it [1:00:28,  5.63it/s]
21482it [1:00:28,  5.95it/s]
21483it [1:00:28,  6.51it/s]
21484it [1:00:28,  5.91it/s]
21486it [1:00:29,  6.40it/s]
21487it [1:00:29,  6.10it/s]
21488it [1:00:29,  6.47it/s]
21489it [1:00:29,  4.20it/s]
21490it [1:00:30,  4.74it/s]
21491it [1:00:30,  5.53it/s]
21492it [1:00:30,  5.63it/s]
21493it [1:00:30,  6.06it/s]
21494it [1:00:30,  6.53it/s]
21495it [1:00:30,  6.14it/s]
21496it [1:00:30,  6.59it/s]
21497it [1:00:31,  6.55it/s]
21498it [1:00:31,  6.76it/s]
21499it [1:00:31,  4.35it/s]
21500it [1:00:31,  4.99it/s]
21501it [1:00:31,  4.87it/s]
21502it [1:00:32,  5.42it/s]
21503it [1:00:32,  5.94it/s]
21504it [1:00:32,  6.30it/s]
21505it [1:00:32,  6.63it/s]
21506it [1:00:32,  6.61it/s]
21507it [1:00:32,  6.08it/s]
21508it [1:00:32,  6.43it/s]
21509it [1:00:33,  6.44it/s]
21510it [1:00:33,  3.87it/s]
21511it [1:00:33,  4.46it/s]
21512it [1:00:33,  5.08it/s]
21513it [1:00:34,  5.37it/s]
21514it [1:00:

22059it [1:02:08,  5.00it/s]
22060it [1:02:08,  5.84it/s]
22061it [1:02:08,  6.33it/s]
22062it [1:02:08,  5.99it/s]
22063it [1:02:09,  6.19it/s]
22064it [1:02:09,  6.83it/s]
22065it [1:02:09,  6.81it/s]
22066it [1:02:09,  6.77it/s]
22067it [1:02:09,  7.31it/s]
22068it [1:02:09,  7.11it/s]
22069it [1:02:10,  4.42it/s]
22070it [1:02:10,  5.03it/s]
22071it [1:02:10,  5.42it/s]
22072it [1:02:10,  5.82it/s]
22073it [1:02:10,  6.49it/s]
22074it [1:02:10,  7.01it/s]
22075it [1:02:10,  6.60it/s]
22076it [1:02:11,  6.31it/s]
22077it [1:02:11,  6.67it/s]
22078it [1:02:11,  7.19it/s]
22079it [1:02:11,  6.22it/s]
22080it [1:02:12,  4.00it/s]
22081it [1:02:12,  4.64it/s]
22082it [1:02:12,  5.29it/s]
22083it [1:02:12,  5.72it/s]
22084it [1:02:12,  6.13it/s]
22085it [1:02:12,  6.08it/s]
22086it [1:02:12,  6.34it/s]
22087it [1:02:13,  6.50it/s]
22088it [1:02:13,  6.78it/s]
22089it [1:02:13,  7.06it/s]
22090it [1:02:13,  7.35it/s]
22091it [1:02:13,  4.22it/s]
22092it [1:02:14,  4.84it/s]
22093it [1:02:

22636it [1:03:48,  5.04it/s]
22637it [1:03:48,  5.38it/s]
22638it [1:03:48,  6.08it/s]
22639it [1:03:48,  6.70it/s]
22640it [1:03:49,  6.87it/s]
22641it [1:03:49,  6.40it/s]
22642it [1:03:49,  6.77it/s]
22643it [1:03:49,  6.80it/s]
22644it [1:03:49,  7.23it/s]
22645it [1:03:49,  7.02it/s]
22646it [1:03:49,  7.38it/s]
22647it [1:03:50,  4.20it/s]
22648it [1:03:50,  4.61it/s]
22649it [1:03:50,  4.81it/s]
22650it [1:03:50,  5.31it/s]
22651it [1:03:51,  5.87it/s]
22652it [1:03:51,  6.05it/s]
22653it [1:03:51,  6.19it/s]
22654it [1:03:51,  6.65it/s]
22655it [1:03:51,  7.15it/s]
22656it [1:03:51,  6.37it/s]
22657it [1:03:52,  4.40it/s]
22658it [1:03:52,  4.99it/s]
22659it [1:03:52,  5.16it/s]
22660it [1:03:52,  5.78it/s]
22661it [1:03:52,  5.86it/s]
22662it [1:03:52,  6.20it/s]
22663it [1:03:53,  5.88it/s]
22664it [1:03:53,  5.91it/s]
22665it [1:03:53,  6.15it/s]
22666it [1:03:53,  6.28it/s]
22668it [1:03:53,  6.89it/s]
22669it [1:03:54,  4.28it/s]
22670it [1:03:54,  4.96it/s]
22671it [1:03:

23220it [1:05:28,  6.22it/s]
23221it [1:05:29,  5.88it/s]
23222it [1:05:29,  5.68it/s]
23223it [1:05:29,  3.87it/s]
23224it [1:05:29,  4.37it/s]
23225it [1:05:30,  4.65it/s]
23227it [1:05:30,  5.42it/s]
23228it [1:05:30,  5.67it/s]
23229it [1:05:30,  6.13it/s]
23230it [1:05:30,  6.76it/s]
23231it [1:05:30,  7.32it/s]
23232it [1:05:30,  6.94it/s]
23233it [1:05:31,  4.04it/s]
23234it [1:05:31,  4.90it/s]
23235it [1:05:31,  5.29it/s]
23236it [1:05:31,  5.44it/s]
23237it [1:05:31,  5.90it/s]
23238it [1:05:32,  6.18it/s]
23239it [1:05:32,  6.64it/s]
23240it [1:05:32,  5.85it/s]
23241it [1:05:32,  6.01it/s]
23242it [1:05:32,  6.48it/s]
23243it [1:05:33,  4.30it/s]
23244it [1:05:33,  4.83it/s]
23245it [1:05:33,  5.40it/s]
23246it [1:05:33,  5.67it/s]
23247it [1:05:33,  5.91it/s]
23248it [1:05:33,  6.37it/s]
23249it [1:05:34,  6.52it/s]
23250it [1:05:34,  6.61it/s]
23251it [1:05:34,  6.13it/s]
23252it [1:05:34,  6.87it/s]
23253it [1:05:34,  7.12it/s]
23254it [1:05:35,  4.11it/s]
23255it [1:05:

23803it [1:07:09,  6.12it/s]
23804it [1:07:09,  6.57it/s]
23805it [1:07:09,  6.69it/s]
23806it [1:07:10,  7.03it/s]
23807it [1:07:10,  6.51it/s]
23808it [1:07:10,  7.27it/s]
23809it [1:07:10,  7.17it/s]
23810it [1:07:10,  4.44it/s]
23811it [1:07:11,  4.69it/s]
23812it [1:07:11,  5.29it/s]
23813it [1:07:11,  5.65it/s]
23815it [1:07:11,  6.35it/s]
23816it [1:07:11,  5.87it/s]
23817it [1:07:11,  6.24it/s]
23818it [1:07:12,  6.51it/s]
23819it [1:07:12,  6.41it/s]
23820it [1:07:12,  6.46it/s]
23821it [1:07:12,  4.26it/s]
23822it [1:07:12,  4.57it/s]
23823it [1:07:13,  4.96it/s]
23824it [1:07:13,  5.17it/s]
23825it [1:07:13,  5.70it/s]
23826it [1:07:13,  6.25it/s]
23827it [1:07:13,  5.56it/s]
23828it [1:07:13,  5.91it/s]
23829it [1:07:14,  6.47it/s]
23830it [1:07:14,  6.35it/s]
23831it [1:07:14,  4.20it/s]
23832it [1:07:14,  4.87it/s]
23833it [1:07:14,  5.17it/s]
23834it [1:07:15,  5.54it/s]
23835it [1:07:15,  5.86it/s]
23836it [1:07:15,  6.02it/s]
23837it [1:07:15,  5.69it/s]
23838it [1:07:

24388it [1:08:49,  7.52it/s]
24389it [1:08:49,  7.92it/s]
24390it [1:08:49,  7.44it/s]
24391it [1:08:50,  7.83it/s]
24392it [1:08:50,  7.45it/s]
24393it [1:08:50,  7.59it/s]
24394it [1:08:50,  4.67it/s]
24396it [1:08:50,  5.44it/s]
24397it [1:08:51,  6.13it/s]
24398it [1:08:51,  6.85it/s]
24399it [1:08:51,  7.01it/s]
24400it [1:08:51,  6.43it/s]
24401it [1:08:51,  7.13it/s]
24402it [1:08:51,  7.62it/s]
24403it [1:08:51,  7.70it/s]
24404it [1:08:51,  8.13it/s]
24405it [1:08:52,  8.48it/s]
24406it [1:08:52,  4.67it/s]
24407it [1:08:52,  5.25it/s]
24408it [1:08:52,  5.84it/s]
24409it [1:08:52,  6.57it/s]
24411it [1:08:53,  7.43it/s]
24412it [1:08:53,  7.43it/s]
24413it [1:08:53,  7.40it/s]
24414it [1:08:53,  7.71it/s]
24415it [1:08:53,  7.83it/s]
24416it [1:08:53,  7.19it/s]
24417it [1:08:54,  4.75it/s]
24418it [1:08:54,  4.70it/s]
24419it [1:08:54,  5.24it/s]
24420it [1:08:54,  5.42it/s]
24421it [1:08:54,  6.07it/s]
24422it [1:08:54,  5.87it/s]
24423it [1:08:55,  6.47it/s]
24425it [1:08:

24974it [1:10:27,  7.48it/s]
24975it [1:10:27,  7.54it/s]
24976it [1:10:27,  7.22it/s]
24977it [1:10:28,  7.78it/s]
24978it [1:10:28,  7.73it/s]
24979it [1:10:28,  7.49it/s]
24980it [1:10:28,  6.78it/s]
24981it [1:10:28,  4.24it/s]
24982it [1:10:29,  4.71it/s]
24983it [1:10:29,  5.47it/s]
24984it [1:10:29,  5.73it/s]
24985it [1:10:29,  5.29it/s]
24986it [1:10:29,  5.73it/s]
24987it [1:10:29,  6.14it/s]
24988it [1:10:30,  6.32it/s]
24990it [1:10:30,  7.23it/s]
24991it [1:10:30,  4.27it/s]
24993it [1:10:30,  5.04it/s]
24994it [1:10:31,  5.57it/s]
24995it [1:10:31,  5.65it/s]
24996it [1:10:31,  5.98it/s]
24997it [1:10:31,  6.38it/s]
24998it [1:10:31,  6.34it/s]
24999it [1:10:31,  7.06it/s]
25000it [1:10:31,  7.52it/s]
25001it [1:10:32,  7.50it/s]
25002it [1:10:32,  4.44it/s]
25003it [1:10:32,  4.58it/s]
25004it [1:10:32,  5.21it/s]
25005it [1:10:32,  5.54it/s]
25006it [1:10:33,  5.72it/s]
25007it [1:10:33,  5.90it/s]
25008it [1:10:33,  6.35it/s]
25009it [1:10:33,  6.97it/s]
25010it [1:10:

25561it [1:12:07,  7.53it/s]
25562it [1:12:08,  4.43it/s]
25563it [1:12:08,  4.95it/s]
25564it [1:12:08,  5.19it/s]
25566it [1:12:08,  5.94it/s]
25567it [1:12:09,  6.18it/s]
25568it [1:12:09,  6.01it/s]
25569it [1:12:09,  6.10it/s]
25570it [1:12:09,  6.27it/s]
25571it [1:12:09,  5.92it/s]
25572it [1:12:09,  6.68it/s]
25573it [1:12:10,  4.23it/s]
25575it [1:12:10,  5.03it/s]
25576it [1:12:10,  5.50it/s]
25577it [1:12:10,  5.50it/s]
25578it [1:12:10,  6.14it/s]
25579it [1:12:11,  6.19it/s]
25580it [1:12:11,  6.43it/s]
25581it [1:12:11,  7.02it/s]
25583it [1:12:11,  5.59it/s]
25584it [1:12:12,  6.21it/s]
25585it [1:12:12,  6.15it/s]
25586it [1:12:12,  6.41it/s]
25587it [1:12:12,  6.32it/s]
25588it [1:12:12,  6.41it/s]
25589it [1:12:12,  6.06it/s]
25590it [1:12:12,  6.22it/s]
25591it [1:12:13,  6.35it/s]
25592it [1:12:13,  6.58it/s]
25593it [1:12:13,  6.65it/s]
25594it [1:12:13,  4.09it/s]
25595it [1:12:14,  4.74it/s]
25596it [1:12:14,  5.30it/s]
25597it [1:12:14,  5.73it/s]
25598it [1:12:

26140it [1:13:49,  5.90it/s]
26141it [1:13:49,  6.15it/s]
26142it [1:13:49,  6.21it/s]
26143it [1:13:49,  6.80it/s]
26144it [1:13:49,  7.19it/s]
26145it [1:13:49,  6.86it/s]
26146it [1:13:50,  6.80it/s]
26148it [1:13:50,  6.89it/s]
26149it [1:13:50,  6.83it/s]
26150it [1:13:50,  4.02it/s]
26151it [1:13:51,  4.15it/s]
26153it [1:13:51,  4.88it/s]
26154it [1:13:51,  5.49it/s]
26155it [1:13:51,  5.99it/s]
26156it [1:13:51,  6.80it/s]
26157it [1:13:51,  6.35it/s]
26158it [1:13:52,  6.71it/s]
26159it [1:13:52,  6.95it/s]
26160it [1:13:52,  7.55it/s]
26161it [1:13:52,  3.85it/s]
26162it [1:13:53,  4.58it/s]
26163it [1:13:53,  5.04it/s]
26164it [1:13:53,  5.22it/s]
26165it [1:13:53,  5.75it/s]
26166it [1:13:53,  5.81it/s]
26167it [1:13:53,  5.70it/s]
26169it [1:13:54,  6.29it/s]
26170it [1:13:54,  6.03it/s]
26171it [1:13:54,  6.43it/s]
26172it [1:13:54,  4.08it/s]
26173it [1:13:54,  4.93it/s]
26174it [1:13:55,  5.49it/s]
26175it [1:13:55,  5.02it/s]
26176it [1:13:55,  5.47it/s]
26177it [1:13:

26721it [1:15:31,  6.26it/s]
26722it [1:15:31,  6.28it/s]
26723it [1:15:31,  6.72it/s]
26724it [1:15:31,  6.57it/s]
26725it [1:15:31,  6.99it/s]
26726it [1:15:32,  7.32it/s]
26727it [1:15:32,  7.23it/s]
26728it [1:15:32,  7.44it/s]
26729it [1:15:32,  4.67it/s]
26731it [1:15:32,  5.60it/s]
26732it [1:15:33,  6.41it/s]
26733it [1:15:33,  7.02it/s]
26734it [1:15:33,  7.23it/s]
26735it [1:15:33,  7.77it/s]
26737it [1:15:33,  8.59it/s]
26739it [1:15:33,  8.63it/s]
26740it [1:15:33,  8.08it/s]
26741it [1:15:34,  4.45it/s]
26742it [1:15:34,  4.45it/s]
26744it [1:15:34,  5.21it/s]
26745it [1:15:34,  5.84it/s]
26746it [1:15:35,  6.27it/s]
26747it [1:15:35,  6.46it/s]
26748it [1:15:35,  6.62it/s]
26749it [1:15:35,  6.20it/s]
26750it [1:15:35,  5.99it/s]
26751it [1:15:35,  6.04it/s]
26752it [1:15:36,  3.84it/s]
26753it [1:15:36,  4.32it/s]
26754it [1:15:36,  4.94it/s]
26755it [1:15:36,  5.60it/s]
26756it [1:15:36,  5.92it/s]
26757it [1:15:37,  6.20it/s]
26758it [1:15:37,  6.12it/s]
26759it [1:15:

27302it [1:17:11,  4.75it/s]
27303it [1:17:11,  5.29it/s]
27304it [1:17:11,  5.91it/s]
27305it [1:17:11,  6.00it/s]
27306it [1:17:12,  6.09it/s]
27307it [1:17:12,  6.58it/s]
27308it [1:17:12,  6.35it/s]
27309it [1:17:12,  6.44it/s]
27310it [1:17:12,  6.59it/s]
27311it [1:17:12,  6.80it/s]
27312it [1:17:12,  6.61it/s]
27313it [1:17:13,  3.94it/s]
27314it [1:17:13,  4.23it/s]
27315it [1:17:13,  4.89it/s]
27316it [1:17:13,  5.61it/s]
27317it [1:17:14,  6.10it/s]
27318it [1:17:14,  6.06it/s]
27319it [1:17:14,  6.42it/s]
27321it [1:17:14,  6.94it/s]
27322it [1:17:14,  6.80it/s]
27323it [1:17:14,  6.83it/s]
27324it [1:17:15,  6.92it/s]
27325it [1:17:15,  4.31it/s]
27326it [1:17:15,  4.94it/s]
27327it [1:17:15,  5.39it/s]
27328it [1:17:15,  5.74it/s]
27329it [1:17:15,  6.33it/s]
27330it [1:17:16,  6.22it/s]
27331it [1:17:16,  6.13it/s]
27332it [1:17:16,  6.75it/s]
27333it [1:17:16,  6.86it/s]
27334it [1:17:16,  6.85it/s]
27335it [1:17:16,  7.11it/s]
27336it [1:17:16,  7.44it/s]
27337it [1:17:

27877it [1:18:51,  6.73it/s]
27878it [1:18:52,  4.05it/s]
27880it [1:18:52,  4.72it/s]
27881it [1:18:52,  5.18it/s]
27882it [1:18:52,  5.54it/s]
27883it [1:18:53,  6.00it/s]
27884it [1:18:53,  5.89it/s]
27885it [1:18:53,  6.32it/s]
27886it [1:18:53,  6.92it/s]
27887it [1:18:53,  6.71it/s]
27888it [1:18:54,  3.95it/s]
27889it [1:18:54,  4.42it/s]
27890it [1:18:54,  4.91it/s]
27891it [1:18:54,  5.75it/s]
27892it [1:18:54,  6.50it/s]
27893it [1:18:54,  6.70it/s]
27894it [1:18:55,  6.79it/s]
27895it [1:18:55,  6.11it/s]
27896it [1:18:55,  6.45it/s]
27897it [1:18:55,  6.38it/s]
27898it [1:18:55,  6.43it/s]
27899it [1:18:55,  6.46it/s]
27900it [1:18:55,  6.44it/s]
27901it [1:18:56,  3.95it/s]
27902it [1:18:56,  4.30it/s]
27903it [1:18:56,  4.68it/s]
27904it [1:18:56,  4.96it/s]
27905it [1:18:57,  5.14it/s]
27906it [1:18:57,  4.68it/s]
27907it [1:18:57,  4.97it/s]
27908it [1:18:57,  5.57it/s]
27909it [1:18:57,  5.97it/s]
27910it [1:18:58,  3.83it/s]
27911it [1:18:58,  4.65it/s]
27912it [1:18:

## 5 - Prétraitement des données

Le prétraitement des données est une tache cruciale en fouille de données. Cette étape nettoie et transforme les données brutes dans un format qui permet leur analyse, et leur utilisation avec des algorithmes de *machine learning*. En traitement des langages (natural language processing, NLP), la *tokenization* et le *stemming* sont des étapes cruciales. De plus, vous implémenterez une étape supplémentaire pour filtrer les mots sans importance.

### 5.1 - Tokenization

Cette étape permet de séparer un texte en séquence de *tokens* (= jetons, ici des mots, symboles ou ponctuation).

Par exemple, la phrase *"It's the student's notebook."* peut être séparé en liste de tokens de cette manière: ["it", " 's", "the", "student", " 's", "notebook", "."].

**De plus, tous les tokenizers ont également le rôle de mettre le texte en minuscule.**

#### 5.1.1 - Question 3 (0.5 point) 

Implémentez la fonction suivante :

- **tokenize_space** qui tokenize le texte à partir des blancs (espace, tabulation, nouvelle ligne). Ce tokenizer est naïf.
- **tokenize_nltk** qui utilise le tokenizer par défaut de la librairie nltk (https://www.nltk.org/api/nltk.html).



In [33]:
from nltk.tokenize import WhitespaceTokenizer
from nltk.tokenize import word_tokenize
def tokenize_space(text):

    """
    Tokenize the tokens that are separated by whitespace (space, tab, newline). 
    We consider that any tokenization was applied in the text when we use this tokenizer.
    
    For example: "hello\tworld of\nNLP" is split in ['hello', 'world', 'of', 'NLP']
    """
    tokenized_text=text.split()
    # return a list of tokens
    return tokenized_text
        
def tokenize_nltk(text):
    """
    This tokenizer uses the default function of nltk package (https://www.nltk.org/api/nltk.html) to tokenize the text.
    """
    # return a list of tokens
    return word_tokenize(text)#WhitespaceTokenizer().tokenize(text)
    
    
        

In [34]:
text="hello\tworld of\nNLP"
print(tokenize_space(text))
print(tokenize_nltk(text))

['hello', 'world', 'of', 'NLP']
['hello', 'world', 'of', 'NLP']


### 5.2 - Filtrer les tokens sans importance

#### 5.2.1 - Question 4 (1 point)

Certains tokens sont sans importance pour la comparaison, car ils apparaissent dans la majorité des discussions. Les supprimer réduit la dimension du vecteur et accélère les calculs.

Expliquez quels tokens sont sans importances pour la comparaison des discussions. Implémentez la fonction filter_tokens qui retire ces mots de la liste des tokens.


In [38]:
from nltk.corpus import stopwords
nltk.download('stopwords')
def filter_tokens(tokens):
    tokens_without_sw = [word for word in tokens if not word in stopwords.words()]
    return tokens_without_sw


[nltk_data] Downloading package stopwords to C:\Users\Sameh
[nltk_data]     Aissaoui\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [40]:
print(stopwords.words())

['إذ', 'إذا', 'إذما', 'إذن', 'أف', 'أقل', 'أكثر', 'ألا', 'إلا', 'التي', 'الذي', 'الذين', 'اللاتي', 'اللائي', 'اللتان', 'اللتيا', 'اللتين', 'اللذان', 'اللذين', 'اللواتي', 'إلى', 'إليك', 'إليكم', 'إليكما', 'إليكن', 'أم', 'أما', 'أما', 'إما', 'أن', 'إن', 'إنا', 'أنا', 'أنت', 'أنتم', 'أنتما', 'أنتن', 'إنما', 'إنه', 'أنى', 'أنى', 'آه', 'آها', 'أو', 'أولاء', 'أولئك', 'أوه', 'آي', 'أي', 'أيها', 'إي', 'أين', 'أين', 'أينما', 'إيه', 'بخ', 'بس', 'بعد', 'بعض', 'بك', 'بكم', 'بكم', 'بكما', 'بكن', 'بل', 'بلى', 'بما', 'بماذا', 'بمن', 'بنا', 'به', 'بها', 'بهم', 'بهما', 'بهن', 'بي', 'بين', 'بيد', 'تلك', 'تلكم', 'تلكما', 'ته', 'تي', 'تين', 'تينك', 'ثم', 'ثمة', 'حاشا', 'حبذا', 'حتى', 'حيث', 'حيثما', 'حين', 'خلا', 'دون', 'ذا', 'ذات', 'ذاك', 'ذان', 'ذانك', 'ذلك', 'ذلكم', 'ذلكما', 'ذلكن', 'ذه', 'ذو', 'ذوا', 'ذواتا', 'ذواتي', 'ذي', 'ذين', 'ذينك', 'ريث', 'سوف', 'سوى', 'شتان', 'عدا', 'عسى', 'عل', 'على', 'عليك', 'عليه', 'عما', 'عن', 'عند', 'غير', 'فإذا', 'فإن', 'فلا', 'فمن', 'في', 'فيم', 'فيما', 'فيه', 'فيها', '




In [41]:
tx="Nick likes to play football, however he is not too fond of tennis."
tok_tx=tokenize_nltk(tx)
print(tok_tx)
print(filter_tokens(tok_tx))


['Nick', 'likes', 'to', 'play', 'football', ',', 'however', 'he', 'is', 'not', 'too', 'fond', 'of', 'tennis', '.']
['Nick', 'likes', 'play', 'football', ',', 'however', 'fond', 'tennis', '.']


### 5.3 - Stemming

La racinisation (stemming) est un procédé de transformation des flexions en leur radical ou racine. Par example, en anglais, la racinisation de "fishing", "fished" and "fish" donne "fish" (stem). 



In [42]:
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer("english")



word1 = ["Visitors", "from", "all", "over", "the", "world", "fishes", "during", "the", "summer","."]

print([ stemmer.stem(w) for w in word1])

word2 = ['I', 'was', 'fishing',]
print([ stemmer.stem(w) for w in word2])

['visitor', 'from', 'all', 'over', 'the', 'world', 'fish', 'dure', 'the', 'summer', '.']
['i', 'was', 'fish']


#### 5.3.1 - Question 5 (1 point) 

*Expliquez comment et pourquoi le stemming est utile à note système de comparaison.*


Le stemming est utile pour notre comparaison puisque en regardant deux sujets, on peut trouver deux mots qui ont été écrites de façons différentes mais ils ont le même racine tels que : poisson et pêche. Alors pour distinguer si deux mots ont le même sens ou la même racine, on applique le stemming pour comparer directement les mots par leurs racines.

# 6 - Data representation

## 6.1 - Bag of Words

De nombreux algorithmes demande des entrées qui soient toutes de la même taille, ce qui n'est forcément le cas pour des types de données comme les textes, qui peuvent avoir un nombre variable de mots.  

Par exemple, considérons la phrase 1, ”Board games are much better than video games” et la phrase 2, ”Monopoly is an awesome game!”. La table ci-dessous montre un exemple d'un moyen de représentation de ces deux phrases en utilisant une représentation fixe : 

|<i></i>     | an | are | ! | Monopoly | awesome | better | games | than | video | much | board | is | game |
|------------|----|-----|---|----------|---------|--------|-------|------|-------|------|-------|----|------|
| Sentence 1 | 0  | 1   | 0 | 0        | 0       | 1      | 2     | 1    | 1     | 1    | 1     | 0  | 0    |
| Sentence 2 | 1  | 0   | 1 | 1        | 1       | 0      | 0     | 0    | 0     | 0    | 0     | 1  | 1    |

Chaque colonne représente un mot du vocabulaire (de longueur 13), tandis que chaque ligne contient l'occurence des mots dans une phrase. Ainsi, la valeur 2 à la position (1,7) est due au fait que le mot *"games"* apparait deux fois dans la phrase 1. 

Ainsi, chaque ligne étant de longueur 13, on peut les utiliser comme vecteur pour représenter les phrases 1 et 2. Ainsi, c'est cette méthode que l'on appelle *Bag-of-Words* : c'est une représentation de documents par des vecteurs dont la dimension est égale à la taille du vocabulaire, et qui est construit en comptant le nombre d'occurence de chaque mot. Ainsi, chaque token est ici associé à une dimension.

### 6.1.2 - Question 6 (2.5 points)


*Implémentez le Bag-of-Words*

Pour cette question, vous ne pouvez pas utiliser de librairie Python externe comme scikit-learn, hormis si vous avez des problèmes de mémoire, vous pouvez utiliser la classe sparse.csr_matrix de scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html).

In [1]:
import numpy as np
from scipy.sparse import csc_matrix

In [54]:
def transform_count_bow(X):
    """
    This method preprocesses the data using the pipeline object, relates each token to a specific integer and  
    transforms the text in a vector. Vectors are weighted using the token frequencies in the sentence.

    X: document tokens. e.g: [['I','will', 'be', 'back', '.'], ['Helllo', 'world', '!'], ['If', 'you', 'insist', 'on', 'using', 'a', 'damp', 'cloth']]

    :return: vector representation of each document
    """   
    words=[]
    for sentence  in X:
        for word in sentence:
            if word not in words:
                words.append(word)
    sentence_number=len(X)
    data=[]
    for sentence in X:
        occurence=[0]*len(words)
        for w in words:
            if w in sentence:
                index = words.index(w)
                freq=sentence.count(w)
                occurence[index]=freq
        data.append(occurence)
    data=np.asarray(data)
    
    return data

In [53]:
import re
a=[['I','will', 'be', 'back', '!'], ['Helllo', 'world', '!'], ['If', 'you', 'insist', 'on', 'using', 'a', 'damp', 'cloth']]

print(len(transform_count_bow(a)[1]))
print(len(transform_count_bow(a)[0][0]))
print(transform_count_bow(a)[1])
print(transform_count_bow(a)[0])



15
15
['I', 'will', 'be', 'back', '!', 'Helllo', 'world', 'If', 'you', 'insist', 'on', 'using', 'a', 'damp', 'cloth']
[[1 1 1 1 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]]


## 6.2 - TF-IDF

L'utilisation de la fréquence d'apparition brute des mots, comme c'est le cas avec le bag-of-words, peut être problématique. En effet, peu de tokens auront une fréquence très élevée dans un document, et de ce fait, le poids de ces mots sera beaucoup plus grand que les autres, ce qui aura tendance à biaiser l'ensemble des poids. De plus, les mots qui apparaissent dans la plupart des documents n'aident pas à les discriminer. Par exemple, le mot "*de*" apparaît dans beaucoup de documents de la base de données, et pour autant, avoir ce mot en commun ne permet pas de conclure que des documents sont similaires. Au contraire, le mot "*génial*" est plus rare, mais les documents qui contiennent ce mot sont plus susceptibles d'être positif. TF-IDF est donc une méthode qui permet de pallier à ce problème.

TF-IDF pondère le vecteur en utilisant une fréquence de document inverse (IDF) et une fréquence de termes (TF).

TF est l'information locale sur l'importance qu'a un mot dans un document donné, tandis que IDF mesure la capacité de discrimination des mots dans un jeu de données. 

L'IDF d'un mot se calcule de la façon suivante:

\begin{equation}
	\text{idf}_i = \log\left( \frac{N}{\text{df}_i} \right),
\end{equation}

avec $N$ le nombre de documents dans la base de donnée, et $\text{df}_i$ le nombre de documents qui contiennent le mot $i$.

Le nouveau poids $w_{ij}$ d'un mot $i$ dans un document $j$ peut ensuite être calculé de la façon suivante:

\begin{equation}
	w_{ij} = \text{tf}_{ij} \times \text{idf}_i,
\end{equation}

avec $\text{tf}_{ij}$ la fréquence du mot $i$ dans le document $j$.



### 6.2.1 - Question 7 (3.5 points)

Implémentez le bag-of-words avec la pondération de TF-IDF

Pour cette question, vous ne pouvez pas utiliser de librairie Python externe comme scikit-learn, hormis si vous avez des problèmes de mémoire, vous pouvez utiliser la classe sparse.csr_matrix de scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html).

In [86]:
import math 
def transform_tf_idf_bow(X):
    """
    This method preprocesses the data using the pipeline object, calculates the IDF and TF and 
    transforms the text in vectors. Vectors are weighted using TF-IDF method.

    X: document tokens. e.g: [['I','will', 'be', 'back', '.'], ['Helllo', 'world', '!'], ['If', 'you', 'insist', 'on', 'using', 'a', 'damp', 'cloth']]

    :return: vector representation of each document
    """
    data=transform_count_bow(X)
    N=data.shape[0]
    W=np.zeros(shape = (N, data.shape[1]))
    for i in range(data.shape[1]):
        df=np.count_nonzero(data[:,i])
        IDF=math.log(N/df,10)
        W[:,i]=data[:,i]*IDF
    return W

In [89]:
print(transform_tf_idf_bow(a))

[[0.47712125 0.47712125 0.47712125 0.47712125 0.17609126 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.17609126 0.47712125
  0.47712125 0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.47712125 0.47712125 0.47712125 0.47712125 0.47712125
  0.47712125 0.47712125 0.47712125]]


# 7 - Système de recommandations

## 7.1 - Question 8 (1.5 points)


La *pipeline* est la séquence d'étapes de prétraitement des données qui transforme les données brutes dans un format qui permet leur analyse. Pour le problème de système de recommandations, implémentez un pipeline composé des étapes suivantes :

1. Concatène le texte de la question, des réponses et des commentaires pour chaque discussion $t$ dans le dictionnaire thread_dict.
2. Tokenize le texte.
3. Filtre les tokens sans importance.
4. Stem les tokens
5. Génère la représentation vectorielle avec transform_tf_idf_bow ou transform_count_bow.
6. Retourne les identifiants des discussions et les représentations vectorielles des discussions. 



In [None]:
def nlp_pipeline(thread_dict, tokenization_type, vectorizer_type, enable_filter_tokens, enable_stemming):
    """
    Preprocess and vectorize the threads.
    
    thread_dict: dictionary whose keys and values are thread ids and thread objects, respectively.
    tokenization_type: two possible values "space_tokenization" and "nltk_tokenization".
                            - space_tokenization: tokenize_space function is used to tokenize.
                            - nltk_tokenization: tokenize_nltk function is used to tokenize.
                            
    vectorizer_type: two possible values "count" and "tf_idf".
                            - count: use transform_count_bow to vectorize the text
                            - tf_idf: use transform_tf_idf_bow to vectorize the text
                            
    enable_filter_tokens: enable the insignificant token removal;
    
    enable_stemming: enable stemming
    
    return: a list L with thread ids and matrix B that contains the vector of each thread. B[idx] is the fixed-length representation of L[idx].
    """
    
    t
    
    raise NotImplementedError("")
    

## 7.2 - Question 9 (1.5 points)

Implémentez la fonction rank qui retourne la liste des identifiants des discussions triés par leur similarité avec la nouvelle question (query). Vous utiliserez la [cosine similarity function](https://en.wikipedia.org/wiki/Cosine_similarity) pour comparer deux discussions. Considérez la nouvelle question comme une discussion sans réponse ni commentaire.

**Retirez la nouvelle question dans la liste de recommandations**




In [None]:
def rank(query_id, all_thread_ids, X):
    """
    Return a list of thread ids sorted by thread and query similarity. Cosine similarity is used to compare threads. 
    
    query_id: thread id 
    all_thread_ids: list of thread ids
    X: thread data representations
    
    return: ranked list of thread ids. 
    """
    
    # Compute the similarity of thread representations(vectors) using cosine similarity function
    # Sort the thread ids by the similarity
    
    raise NotImplementedError("")
    

## 7.3 - Évaluation

Vous allez tester différentes configurations du système de recommandations. Ces configurations seront comparées avec la [mean average precision (MAP) metric](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision). Plus les discussions pertinentes sont recommandées rapidement (c.-à-d. en haut de la liste), plus élevé sera le score MAP. 

Ressources supplémentaires pour comprendre MAP: [recall and precision over ranks](https://youtu.be/H7oAofuZjjE) et [MAP](https://youtu.be/pM6DJ0ZZee0).


La fonction *eval* évalue une configuration spécifique du système de recommandations.

In [24]:
from statistics import mean 


def calculate_map(x):
    res = 0.0
    n = 0.0
    
    
    for relevant_threads, ranked_list in x:
        precisions = []
               
        for k, thread_id in enumerate(ranked_list):
            if thread_id in relevant_threads:
                prec_at_k = (len(precisions) + 1)/(k+1)
                precisions.append(prec_at_k)
                
            if len(precisions) == len(relevant_threads):
                break
        res += mean(precisions)
        n += 1
    
    return res/n
            

def eval(tokenization_type, vectorizer, enable_filter_tokens, enable_stemming):
    all_thread_ids, X = nlp_pipeline(thread_index, tokenization_type, vectorizer, enable_filter_tokens, enable_stemming)
    all_thread_ids = [int(t_id) for t_id in all_thread_ids]    
    queries,relevant_threads = zip(*relevant_threads_by_query.items())
        
    def generate_rank_list(query_id):
        return rank(query_id, all_thread_ids, X)
    
    with Pool(processes=2) as pool:
        ranked_list = pool.map(generate_rank_list, queries)
        
        
    return calculate_map(zip(relevant_threads,ranked_list))
        

## 7.4 - Question 10 (5 points)

Évaluez la précision (MAP) du système de recommandations avec chacune des configurations suivantes :
1. count(BoW) + space_tokenization (sans tokenizer)
2. count(BoW) + nltk_tokenization
3. count(BoW) + nltk_tokenization + Filtrer les tokens sans importance
4. count(BoW) + nltk_tokenization + Filtrer les tokens sans importance + Stemming
5. tf_idf + nltk_tokenization
6. tf_idf + nltk_tokenization + Filtrer les tokens sans importance
7. tf_idf + nltk_tokenization + Filtrer les tokens sans importance + Stemming 

Enfin, commentez vos résultats en répondant aux questions suivantes :
- Notre système de recommandation a-t-il été influencé négativement ou positivement par les étapes de prétraitement des données ??
- Est-ce que TF-IDF est plus performant que BoW? Si oui, à votre avis, pourquoi?
