Skip to content

DataFrame.OrderBy methods incorrect behavior with null values #7102

@asmirnov82

Description

@asmirnov82

DataFrame OrderBy method should always place null values at the bottom of the list (after not nullable values) independently of sorting (ascending or descending). This is how Python does and how DataFrameColumn.Sort method works.

To Reproduce:

var col1 = new Int32DataFrameColumn("Index", new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
var col2 = new StringDataFrameColumn("Country", new[] { "USA", "France", "UK", "Brazil", "Russia", "India", null, "China", null });
var col3 = new StringDataFrameColumn("Capital", new[] { "Washington", "Paris", "London", "Brasilia", "Moscow", "New Dehli", null, "Beijing", null});

var df = new DataFrame(col1, col2, col3);
Console.WriteLine(df.OrderByDescending("Capital"));

Actual behaiour:

Index Country Capital
9 null null
7 null null
1 USA Washington
2 France Paris
6 India New Dehli
5 Russia Moscow
3 UK London
4 Brazil Brasilia
8 China Beijing

Expected behaiour:

Index Country Capital
1 USA Washington
2 France Paris
6 India New Dehli
5 Russia Moscow
3 UK London
4 Brazil Brasilia
8 China Beijing
9 null null
7 null null

Notes:

'Console.WriteLine(new DataFrame([col3.Sort(ascending: false)]));' works correctly

Capital
Washington
Paris
New Dehli
Moscow
London
Brasilia
Beijing
null
null

Issue was already mention in https://github.com/dotnet/machinelearning/pull/5776/files#r624316355

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions