<h1 style="text-align:center;"> Hugging Face Datasets</h1>

<div style="width:100%;text-align: center;"> <img align=middle src="https://huggingface.co/docs/datasets/_images/datasets_logo.png" alt="hf dataset logo" style="height:300px;margin:3rem auto;"> </div>


#### Recently, `datasets` version 1.12.1 became a default package in Kaggle notebooks. This is a big relief because there were previously issues with `pyarrow` and `fsspec` (depending on which version of `datasets` you used) that made it annoying to use.

#### I thought I would do a quick tour of `datasets` by showing what is possible and how to use it for this competition (full example at the very end).

#### A lot of this is adapted from the [official documentation](https://huggingface.co/docs/datasets/), with some tailored for the chaii-qa competition. I can't possibly cover every feature, so please explore the documentation to see the full extent of what is possible or [check out their course](https://huggingface.co/course/chapter1). I'm sure the developers will continue to add great new features in the future. 

#### Also, please explore the publicly available datasets (more than 1,500 as of Sep 30, 2021) at [huggingface.co/datasets](https://huggingface.co/datasets) (or [hf.co/datasets](https://hf.co/datasets) if don't want to type as much 😉). Feel free to add one!

#### Lastly, the team at Hugging Face also has a [paper in EMNLP 2021](https://arxiv.org/abs/2109.02846) that gives a formal overview of the package.

# 🔥 The BIGGEST advantage is
####  the fact that the datasets are memory-mapped using Apache Arrow and cached locally. This means that only the necessary data will be loaded into memory, allowing the possibility to work with a dataset that is larger than the system memory (e.g. c4 is hundreds of GB, mc4 is several TB).  `datasets` can work on local files, data in memory (e.g. a pandas dataframe or a dict), or easily pull from over 1,500 datasets from the Hugging Face Hub. For the large datasets, a streaming mode is also possible (details later), so that you don't have to download a ton of data. Many functions mirror their analogs in sklearn or pandas, and you can even stick the dataset straight into a PyTorch DataLoader!

In [None]:
# package details
!pip show datasets

# Loading local files

### I'm putting this first because chaii gives the training and test data in csv format. Here is how you could read it.

In [None]:
from datasets import load_dataset

dataset = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv')
dataset # DatasetDict

## The code above loads a DatasetDict, which is basically a dict object with split names(train, validation, test) as keys and datasets as values.  

## If you specify the split when loading, you can load in multiple files, as seen below.

#### All of the files need to have the same column names and column types.

In [None]:
dataset = load_dataset("csv", data_files={"train": "../input/mlqa-hindi-processed/mlqa_hindi.csv",  "validation": "../input/mlqa-hindi-processed/xquad.csv"})
dataset

# If you just want to load a file into a Dataset without it turning into a DatasetDict, use the `split` argument


In [None]:
dataset = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train")
dataset # This will be a Dataset and not DatasetDict

# Split dataset when loading

### Splitting by percentage

In [None]:
dataset_20pct = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train[:20%]")
dataset_80pct = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train[:80%]")
dataset_20pct, dataset_80pct

# Splitting by slice

In [None]:
dataset_first100 = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train[:100]")
dataset_last50 = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train[-50:]")
dataset_first100, dataset_last50

# 10-fold split

In [None]:
# 10-fold cross-validation (see also next section on rounding behavior):
# The validation datasets are each going to be 10%:
# [0%:10%], [10%:20%], ..., [90%:100%].
# And the training datasets are each going to be the complementary 90%:
# [10%:100%] (for a corresponding validation set of [0%:10%]),
# [0%:10%] + [20%:100%] (for a validation set of [10%:20%]), ...,
# [0%:90%] (for a validation set of [90%:100%]).

# For fold0, use val_ds_folds[0] and train_ds_folds[0]
val_ds_folds = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split=[f'train[{k}%:{k+10}%]' for k in range(0, 100, 10)])
train_ds_folds = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split=[f'train[:{k}%]+train[{k+10}%:]' for k in range(0, 100, 10)])
val_ds_folds, train_ds_folds

# Funkier splits

### This takes the first 20% and the last 20%

In [None]:
dataset = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train[:20%]+train[-20%:]")
dataset

# Split after loading

In [None]:
dataset = load_dataset('csv', data_files='../input/chaii-hindi-and-tamil-question-answering/train.csv', split="train")

splits = dataset.train_test_split(test_size=0.2, seed=2021) # sklearn syntax
splits

# Selecting data

### For when you don't want all of it.

In [None]:
selected  = dataset.select(range(100))
print(selected.shape)
selected = dataset.select(range(1,1000, 100))
print(selected.shape)

# Dataset behaves both like a dict and a list
### Can select by index first or by column name

In [None]:
dataset[42]["question"], dataset["question"][42]

In [None]:
dataset[42]

In [None]:
dataset["question"][:5]

# pandas behavior that does NOT work: selecting multiple columns at once.

####  To get the same effect, remove all the columns that are not needed (if you have columns A,B,C,D and want A&B, you would have to remove C&D)

In [None]:
try:
    dataset[["question", "answer_text"]]
except ValueError as e:
    print("Value Error!", e)

# `datasets` does not allow values to be assigned

In [None]:
dataset["question"][42] = "Why does this not work?"
dataset["question"][42]

# nor does it allow a column to be added like pandas

In [None]:
try:
    dataset["new column"] = "single value"
except TypeError as e:
    print("TypeError!", e)
try:
    dataset["new column2"] = [x[0] for x in dataset["question"]]
except TypeError as e:
    print("TypeError!", e)

# If the dataset has splits, access each split like a dict

### `splits` is a `DatasetDict`

In [None]:
print(splits["train"])
print(splits["test"])

# Shuffling


In [None]:
%%time
shuffled = dataset.shuffle(seed=2021)

# Shuffling is cached so if you run it again, it doesn't have to do all the work over again.

### This is much more important on big datasets than this one.

In [None]:
%%time
shuffled = dataset.shuffle(seed=2021)

# Loading is also cached

### This is downloading the dataset from the HF Hub. They have many benchmarks (squad, glue) and community-added datasets 

In [None]:
%time load_dataset('squad', split="validation")

In [None]:
%time load_dataset('squad', split="validation")

# Most datasets on the hub have info about them

In [None]:
dataset = load_dataset('squad', split="validation")
dataset.info

# Processing

### `map` is your go-to function to make changes to the data.


### `map` gets passed a single example as a `dict` if not using batches, and when using batches, it gets multiple examples as a `dict` where the keys are the column names and the values are the examples (of length batch_size).  


### `map` must return a dict, so modifying the values in place will not change them

```python
def map_fn(example):
    """
    example looks like this when not batched:
    {"col a": 1, "col b": "string_val1", "col c": [1,2,3]}
    
    example likes like this when batched (batch_size=4):
    {"col a": [1,2,3,4], "col b": ["string_val1", "val2", "val3", "val4"], "col c": [[1,2,3], [2,4,6], [-1,-2,-3], [0,0,0]]}
    """
    
    # this is where you would modify the data
    example["x"] = example["y"]*10
    
    return example
```

In [None]:
def map_fn(example):
    example["banana"] = "monkey"
    example["id"] = example["id"].upper()
    
    # since nothing is returned, this map_fn does nothing
    
x = dataset.select(range(1)).map(map_fn)
x[0]

# A common step with these QA datasets

#### Getting it in the right format

In [None]:
mlqa = load_dataset("csv", data_files="../input/mlqa-hindi-processed/mlqa_hindi.csv", split="train")

def form_answers(example):
    example["answers"] = {
        "text": [example["answer_text"]],
        "answer_start": [example["answer_start"]]
    }
    return example

new_mlqa = mlqa.map(form_answers, remove_columns=["answer_text", "answer_start"])
new_mlqa[0]

# `map` is good for tokenizing

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")

def tokenize_function(example):
    return tokenizer(example["context"])

%time tokenized = dataset.map(tokenize_function)
[tokenized[0][x][:10] for x in ["input_ids", "attention_mask"]]

# Using batches usually results in a speed up

### default batch size is 1000
## Time taken to tokenize the dataset
without batching: ~12s  
with batching: ~4s

In [None]:
%time tokenized = dataset.map(tokenize_function, batched=True, batch_size=1000)

# Number of rows and columns before `map` does not have to equal number of rows and columns after `map`

### Especially useful when tokenizing.   
### NOTE: The number of rows for each column must be the same, so columns that are not being "wrapped" need to be removed.  For instance when tokenizing, you could map one example (a long string and its `id`) and get out 3 `input_ids` and 3 `attention_mask` if you return the overflowing tokens. You do not, however, get 3 `id` so `id` either must be duplicated 3 times or dropped.

In [None]:
length_before = len(dataset)

def qa_tokenize_function(examples):
    return tokenizer(
            examples["question"],
            examples["context"],
            truncation="only_second",
            max_length=128,
            stride=64,
            return_overflowing_tokens=True,
            return_offsets_mapping=True,
            padding="max_length",
        )

# this will error because it doesn't remove the other columns like id
# tokenized = dataset.map(qa_tokenize_function, batched=True, batch_size=1000) 

%time tokenized = dataset.map(qa_tokenize_function, batched=True, batch_size=1000, remove_columns=dataset.column_names) 
length_before, len(tokenized)

# Other `map` features: lambda functions and task descriptions

### `remove_columns` does not remove the column name that gets returned from the function. In the example below, "context" is included in `dataset.column_names` but the output still has the "context" column because it gets returned from the function

### Use `desc` to add a description next to the progress bar. Useful when running a script with multiple mapping steps

In [None]:
capitalized = dataset.map(lambda x: {"context": x["context"].upper()}, desc="Capitalizing", remove_columns=dataset.column_names)
capitalized[0] # should be capitalized

# Format the dataset

### Can turn lists into numpy, torch, or tensorflow tensors

In [None]:
# stop annoying tf warnings
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

In [None]:
print(tokenized[0], "\n")

torch_tensors = tokenized.with_format("torch", columns=['attention_mask', "input_ids"],)
print(torch_tensors[0], "\n")
# print(torch_tensors[0]["input_ids"][:10], torch_tensors[0]["attention_mask"][:10])

np_tensors = tokenized.with_format("numpy", columns=['attention_mask', "input_ids"])
print(np_tensors[0], "\n")

tf_tensors = tokenized.with_format("tf", columns=['attention_mask', "input_ids"])
print(tf_tensors[0], "\n")

# Cast data in the dataset

### This is a trivial example where int64 gets cast to int32

In [None]:
xquad = load_dataset("csv", data_files="../input/mlqa-hindi-processed/xquad.csv", split="train")
print(xquad.features)

from datasets import Value
new_features = xquad.features.copy()
new_features["answer_start"] = Value('int32')
xquad = xquad.cast(new_features)
xquad.features

# Filter the dataset

### Use a function that returns `True` if the example should be kept, `False` if ignored.

In [None]:
# Let's keep only the examples where the context is less than 200 words when splitting at whitespace
def filter_by_num_words(example):
    return len(example["context"].split()) < 200


length_before = len(dataset)
filtered = dataset.filter(filter_by_num_words)
length_after = len(filtered)
length_before, length_after, filtered[-1]["context"]

# lambda functions also possible for `filter`

In [None]:
length_before = len(dataset)
filtered = dataset.filter(lambda x: "football" in x["context"].lower()) # keep examples that have the word football in the context
length_after = len(filtered)
length_before, length_after, filtered[-1]["context"]

# Remove columns

#### Since there isn't an easy way to select a 

In [None]:
original_columns = dataset.column_names
new_ds = dataset.remove_columns([x for x in original_columns if x != "id"])
print(original_columns)
print(new_ds.column_names)

# Streaming

For when the datasets are many GBs and you don't want to save it to your disk first.

In [None]:
stream_dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True)
print(next(iter(stream_dataset)))

# Shuffle a streamed dataset

It won't shuffle the entire dataset (doing so would require looping through the entire dataset), but it will shuffle one buffer (say, 10,000 examples) at a time

In [None]:
shuffled_dataset = stream_dataset.shuffle(buffer_size=10_000, seed=42)

# Map a streamed dataset

In [None]:
mapped_stream = stream_dataset.map(lambda x: {"text": x["text"].upper()})
print(next(iter(mapped_stream)))

# Interleave streamed datasets
### Mix multiple streams

In [None]:
from datasets import interleave_datasets
from itertools import islice
en_dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True)
fr_dataset = load_dataset('oscar', "unshuffled_deduplicated_fr", split='train', streaming=True)

multilingual_dataset = interleave_datasets([en_dataset, fr_dataset])
print(list(islice(multilingual_dataset, 2)))

# Save dataset

In [None]:
dataset.to_json("dataset.json")
dataset.to_csv("dataset.csv", index=False) #  pandas syntax

# Save processed dataset

If the dataset has special types (e.g. tensors) that can't save to csv or json, use `save_to_disk`

In [None]:
tokenized.save_to_disk("tokenized_dataset")
%ls tokenized_dataset

# Load processed dataset

In [None]:
from datasets import Dataset

loaded = Dataset.load_from_disk("tokenized_dataset")
{key:loaded[0][key][:10] for key in ["input_ids", "attention_mask", "offset_mapping"]}

# Concatenate Datasets

### All datasets need to have the exact same features.

In [None]:
from datasets import concatenate_datasets

mlqa = load_dataset("csv", data_files="../input/mlqa-hindi-processed/mlqa_hindi.csv", split="train")
xquad = load_dataset("csv", data_files="../input/mlqa-hindi-processed/xquad.csv", split="train")

concat = concatenate_datasets([mlqa, xquad])

len(mlqa), len(xquad), len(concat)

# Load dataset from memory (pandas or dict)

In [None]:
from datasets import Dataset
import pandas as pd

df = pd.read_csv("../input/chaii-hindi-and-tamil-question-answering/train.csv")
dataset = Dataset.from_pandas(df)

my_dict = {"col_a": ["apple", "orange", "banana"], "col_b": list(range(3)), "col_c": [True, False, True], "col_d": [["x", "y", "z"]]*3}
dataset = Dataset.from_dict(my_dict)

# Example for chaii-qa

### This is slightly modified from Abhishek's notebook here: https://www.kaggle.com/abhishek/hello-friends-tez-se-chaii-train-kar-lo

### My version doesn't use pandas at all 😉

In [None]:
from datasets import DatasetDict
from functools import partial

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")

PAD_ON_RIGHT = tokenizer.padding_side == "right"
FOLD = 0
MAX_LEN = 384
DOC_STRIDE = 128

chaii_data = load_dataset("csv", data_files="../input/chaii-extra/train_folds.csv", split="train")
external_data = load_dataset("csv", data_files=["../input/chaii-extra/mlqa_hindi.csv", "../input/chaii-extra/xquad.csv"], split="train")

train_data = chaii_data.filter(lambda x: x["kfold"]!=FOLD)
valid_data = chaii_data.filter(lambda x: x["kfold"]==FOLD)

def form_answers(example):
    example["answers"] = {
        "text": [example["answer_text"]],
        "answer_start": [example["answer_start"]]
    }
    return example

def add_id(example, idx):
    # validation features need a unique id
    example["id"] = "id" + str(idx)
    return example


cols = ["context", "question", "answer_text", "answer_start"]

train_data = train_data.remove_columns([x for x in train_data.column_names if x not in cols])
valid_data = valid_data.remove_columns([x for x in valid_data.column_names if x not in cols])
external_data = external_data.remove_columns([x for x in external_data.column_names if x not in cols])

raw_dataset = DatasetDict()
raw_dataset["train"] = concatenate_datasets([train_data, external_data], axis=0)
raw_dataset["validation"] = valid_data

# When using map on a DatasetDict, all splits will get mapped
raw_dataset = raw_dataset.map(form_answers, desc="Formatting answers")
raw_dataset = raw_dataset.map(add_id, desc="Adding id column", with_indices=True)

### `prepare_train_features` and `prepare_validation_features` functions hidden in next cell

In [None]:
def prepare_train_features(examples, tokenizer, pad_on_right, max_length, doc_stride):
    # ref: https://github.com/huggingface/notebooks/blob/master/examples/question_answering.ipynb
    # Some of the questions have lots of whitespace on the left, which is not useful and will make the
    # truncation of the context fail (the tokenized question will take a lots of space). So we remove that
    # left whitespace
    examples["question"] = [q.lstrip() for q in examples["question"]]

    # Tokenize our examples with truncation and padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=max_length,
        stride=doc_stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")
    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = tokenized_examples.pop("offset_mapping")

    # Let's label those examples!
    tokenized_examples["start_positions"] = []
    tokenized_examples["end_positions"] = []

    for i, offsets in enumerate(offset_mapping):
        # We will label impossible answers with the index of the CLS token.
        input_ids = tokenized_examples["input_ids"][i]
        cls_index = input_ids.index(tokenizer.cls_token_id)

        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        answers = examples["answers"][sample_index]
        # If no answers are given, set the cls_index as answer.
        if len(answers["answer_start"]) == 0:
            tokenized_examples["start_positions"].append(cls_index)
            tokenized_examples["end_positions"].append(cls_index)
        else:
            # Start/end character index of the answer in the text.
            start_char = answers["answer_start"][0]
            end_char = start_char + len(answers["text"][0])

            # Start token index of the current span in the text.
            token_start_index = 0
            while sequence_ids[token_start_index] != (1 if pad_on_right else 0):
                token_start_index += 1

            # End token index of the current span in the text.
            token_end_index = len(input_ids) - 1
            while sequence_ids[token_end_index] != (1 if pad_on_right else 0):
                token_end_index -= 1

            # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
            if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
                tokenized_examples["start_positions"].append(cls_index)
                tokenized_examples["end_positions"].append(cls_index)
            else:
                # Otherwise move the token_start_index and token_end_index to the two ends of the answer.
                # Note: we could go after the last offset if the answer is the last word (edge case).
                while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
                    token_start_index += 1
                tokenized_examples["start_positions"].append(token_start_index - 1)
                while offsets[token_end_index][1] >= end_char:
                    token_end_index -= 1
                tokenized_examples["end_positions"].append(token_end_index + 1)

    return tokenized_examples


def prepare_validation_features(examples, tokenizer, pad_on_right, max_length, doc_stride):
    # ref: https://github.com/huggingface/notebooks/blob/master/examples/question_answering.ipynb
    # Some of the questions have lots of whitespace on the left, which is not useful and will make the
    # truncation of the context fail (the tokenized question will take a lots of space). So we remove that
    # left whitespace
    examples["question"] = [q.lstrip() for q in examples["question"]]

    # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=max_length,
        stride=doc_stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")

    # We keep the example_id that gave us this feature and we will store the offset mappings.
    tokenized_examples["example_id"] = []

    for i in range(len(tokenized_examples["input_ids"])):
        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)
        context_index = 1 if pad_on_right else 0

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        tokenized_examples["example_id"].append(examples["id"][sample_index])

        # Set to None the offset_mapping that are not part of the context so it's easy to determine if a token
        # position is part of the context or not.
        tokenized_examples["offset_mapping"][i] = [
            (o if sequence_ids[k] == context_index else None)
            for k, o in enumerate(tokenized_examples["offset_mapping"][i])
        ]

    return tokenized_examples

# This will tokenize the examples, adding padding, allowing for overflow, and using a stride

The number of rows and columns before mapping != rows and columns afterwards

In [None]:
train_features = raw_dataset["train"].map(
    partial(
        prepare_train_features,
        tokenizer=tokenizer,
        pad_on_right=PAD_ON_RIGHT,
        max_length=MAX_LEN,
        doc_stride=DOC_STRIDE,
    ),
    batched=True,
    remove_columns=raw_dataset["train"].column_names,
    desc="Creating train features"
)

valid_features = raw_dataset["validation"].map(
    partial(
        prepare_validation_features,
        tokenizer=tokenizer,
        pad_on_right=PAD_ON_RIGHT,
        max_length=MAX_LEN,
        doc_stride=DOC_STRIDE,
    ),
    batched=True,
    remove_columns=raw_dataset["validation"].column_names,
    desc="Creating validation features"
)

In [None]:
train_features, valid_features

# Use with PyTorch DataLoader

In [None]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(train_features.with_format("torch"), batch_size=16, num_workers=4, shuffle=True)
valid_dataloader = DataLoader(valid_features.with_format("torch"), batch_size=32, num_workers=4, shuffle=False)

In [None]:
# this just prevents annoying warnings from popping up
%env TOKENIZERS_PARALLELISM=true

for x in train_dataloader:
    break
x

# Use with Tensorflow

In [None]:
import tensorflow as tf


tf_train_features = train_features.with_format('tensorflow')

tf_train_x = {x: tf_train_features[x].to_tensor(default_value=0, shape=[None, MAX_LEN]) for x in ['input_ids', 'attention_mask']}
tf_train_y = {x: tf_train_features[x] for x in ['start_positions', 'end_positions']}

tf_train_dataset = tf.data.Dataset.from_tensor_slices((tf_train_x, tf_train_y)).batch(32)
next(iter(tf_train_dataset))

# That's it for now. Please check out the documentation at https://huggingface.co/docs/datasets/ or the course at https://huggingface.co/course/chapter1 for more details.

# Thanks for reading!

I hope it was useful 😊

If you spot any errors, please let me know! I'm a human, after all. Feedback welcome!

<div style="width:100%;text-align: center;"> 
    <iframe src="https://giphy.com/embed/xULW8v7LtZrgcaGvC0" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe>
</div>