# CORRECTNESS NOTE:
The `sort_values().head()` and `nsmallest()` versions may not produce the same thing. This can happen if there are multiple rows with the same value. For example, if the lowest value appears in 3 rows, then these 3 rows will appear in both versions but possibly in different order. This value can even appear more times than the `n` argument to `head()`, in which case some rows may appear in one version and not in another.

All this even though the [Pandas docs mention that the two are equivalent](https://github.com/pandas-dev/pandas/blob/478d340667831908b5b4bf09a2787a11a14560c9/pandas/core/frame.py#L7217). We assume that the users consider the two versions correct and so we don't disable the pattern. But, in these tests, we should make sure that we compare the values without the indexes (the values _must_ be the same; the indexes create the problem because they reflect that one version may have different rows than another)

In [None]:
import pandas as pd
import numpy as np
import dias.rewriter

In [None]:
train_df = pd.read_csv('./datasets/lextoumbourou__feedback3-eda-hf-custom-trainer-sift__train.csv')

In [None]:
LABEL_COLUMNS = ['cohesion', 'syntax', 'vocabulary', 'phraseology', 'grammar', 'conventions']
train_df['total_score'] = train_df[LABEL_COLUMNS].sum(axis=1)

In [None]:
# DIAS_DISABLE
defa = train_df['total_score'].sort_values().head(4)

In [None]:
our = train_df['total_score'].sort_values().head(4)

In [None]:
# Convert to list so that we don't take account of the index. See correctness note above.
comp = [x == y for x, y in zip(our.tolist(), defa.tolist())]
assert all(comp)