**INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**DOWNLOADING LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ DOWNLOADING THE LIBRARIES AND DEPENDENCIES:
# !pip install -U d2l
from d2l import torch as d2l

import os, re
import torch     
from torch import nn                                
from IPython import display

**GETTING THE DATASET:**
- I have used google colab for this notebook so the process of downloading and reading the data might be different in other platforms. I will use **Stanford Natural Language Inference Corpus** for this notebook. The SNLI Corpus is a collection of over 500000 labeled english pairs. 

In [5]:
#@ GETTING THE DATASET: 
d2l.DATA_HUB["SNLI"] = ('https://nlp.stanford.edu/projects/snli/snli_1.0.zip',
                        '9fcde07509c7e87ec61c640c1b2753d9041758e4')               # Reading the Dataset. 
data_dir = d2l.download_extract("SNLI")                                           # Extracting the Dataset. 

**READING THE DATASET:**
- I will define a function to only extract part of the dataset and then return list of premises, hypothesis and their labels. 

In [8]:
#@ READING THE DATASET: 
def read_snli(data_dir, is_train):                                # Reading Dataset into Premises, Hypothesis and Labels. 
  def extract_text(s):                                            # Removing unwanted Texts. 
    s = re.sub("\\(", "", s)                                      # Removing Information. 
    s = re.sub("\\)", "", s)                                      # Removing Information. 
    s = re.sub("\\s{2,}", " ", s)                                 # Replacing Whitespaces with Space. 
    return s.strip()
  
  label_set = {"entailment": 0, "contradiction": 1, 
               "neutral": 2}                                      # Initializing Labels. 
  file_name = os.path.join(data_dir, "snli_1.0_train.txt" if \
                           is_train else "snli_1.0_test.txt")
  with open(file_name, "r") as f: 
    rows = [row.split("\t") for row in f.readlines()[1:]]
  premises = [extract_text(row[1]) for row in rows if row[0] in 
              label_set]                                          # Initializing Premises. 
  hypothesis = [extract_text(row[2]) for row in rows if row[0] \
                in label_set]                                     # Initializing Hypothesis. 
  labels = [label_set[row[0]] for row in rows if row[0] in 
            label_set]                                            # Initializing Labels. 
  return premises, hypothesis, labels

In [9]:
#@ IMPLEMENTATION: 
train_data = read_snli(data_dir, is_train=True)                   # Implementation of Function. 
for x0, x1, y in zip(train_data[0][:3], train_data[1][:3], 
                     train_data[2][:3]):
  print("premise:", x0)                                           # Inspecting Premises. 
  print("hypothesis:", x1)                                        # Inspecting Hypothesis. 
  print("label:", y)                                              # Inspecting Labels. 

premise: A person on a horse jumps over a broken down airplane .
hypothesis: A person is training his horse for a competition .
label: 2
premise: A person on a horse jumps over a broken down airplane .
hypothesis: A person is at a diner , ordering an omelette .
label: 1
premise: A person on a horse jumps over a broken down airplane .
hypothesis: A person is outdoors , on a horse .
label: 0


In [10]:
#@ READING THE DATASET: 
test_data = read_snli(data_dir, is_train=False)                   # Implementation of Function. 
for data in [train_data, test_data]:
  print([[row for row in data[2]].count(i) for i in range(3)])    # Inspecting the Data. 

[183416, 183187, 182764]
[3368, 3237, 3219]
