Skip to content

Conversation

@sevenzees
Copy link
Contributor

@sevenzees sevenzees commented Feb 8, 2026

Add DataFrame.Melt() method for transforming wide to long format

Description

This PR implements a Melt() method for the DataFrame class that transforms data from wide format to long format, similar to Pandas' pandas.melt() function. This is a fundamental data reshaping operation that "unpivots" multiple value columns into a pair of variable-value columns.

Fixes #7577

What does this change do?

The Melt() method:

  • Accepts identifier columns that remain fixed in the output
  • Unpivots specified value columns (or auto-detects them if not specified) into two new columns: one containing the original column names and one containing the values
  • Supports customizable column names for the variable and value columns
  • Handles mixed data types across value columns by converting to string when necessary
  • Optionally filters out null or empty values from the result
  • Includes comprehensive input validation with clear error messages

Why this approach?

Performance optimizations:

  • Pre-calculates the total output size to allocate columns once upfront, eliminating expensive incremental resize operations
  • Uses direct iteration instead of creating intermediate index arrays
  • Caches column references to reduce repeated dictionary lookups

Design decisions:

  • Separated data processing into focused helper methods for maintainability
  • Follows the Pandas API design for familiarity to users coming from Python
  • Maintains type safety by preserving column types when all value columns share the same type
  • Defaults to using all non-ID columns as value columns for convenience (matches Pandas behavior)

API signature:

public DataFrame Melt(
    IEnumerable<string> idColumns, 
    IEnumerable<string> valueColumns = null, 
    string variableName = "variable", 
    string valueName = "value", 
    bool dropNulls = false)

Changes included

  • DataFrame.cs: Added Melt() method and supporting helper methods
    • CalculateTotalOutputRows(): Pre-calculates output size for efficient allocation
    • InitializeIdColumns(): Sets up ID columns with correct size
    • CreateValueColumn(): Creates appropriately typed value column
    • FillMeltedData(): Performs the actual unpivoting operation

Example usage

// Transform quarterly sales data from wide to long format
var df = new DataFrame(new[]
{
    new StringDataFrameColumn("Region", new[] { "North", "South" }),
    new Int32DataFrameColumn("Q1", new[] { 1000, 800 }),
    new Int32DataFrameColumn("Q2", new[] { 1200, 900 }),
    new Int32DataFrameColumn("Q3", new[] { 1100, 950 })
});

var melted = df.Melt(
    idColumns: new[] { "Region" },
    valueColumns: new[] { "Q1", "Q2", "Q3" },
    variableName: "Quarter",
    valueName: "Sales"
);

// Result:
// | Region | Quarter | Sales |
// |--------|---------|-------|
// | North  | Q1      | 1000  |
// | North  | Q2      | 1200  |
// | North  | Q3      | 1100  |
// | South  | Q1      | 800   |
// | South  | Q2      | 900   |
// | South  | Q3      | 950   |

Additional notes

This implementation brings the .NET DataFrame API closer to feature parity with Pandas and supports common data transformation workflows needed for analysis and visualization. The method is optimized for performance while maintaining code readability and maintainability.

@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.10%. Comparing base (3604580) to head (a3c5002).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7578      +/-   ##
==========================================
+ Coverage   69.05%   69.10%   +0.04%     
==========================================
  Files        1483     1483              
  Lines      274362   274693     +331     
  Branches    28270    28294      +24     
==========================================
+ Hits       189466   189824     +358     
+ Misses      77510    77484      -26     
+ Partials     7386     7385       -1     
Flag Coverage Δ
Debug 69.10% <100.00%> (+0.04%) ⬆️
production 63.34% <100.00%> (+0.03%) ⬆️
test 89.55% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/Microsoft.Data.Analysis/DataFrame.cs 93.73% <100.00%> (+1.88%) ⬆️
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs 99.91% <100.00%> (+0.01%) ⬆️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sevenzees sevenzees marked this pull request as ready for review February 9, 2026 15:30
@sevenzees
Copy link
Contributor Author

Not sure who all would want to look at this, but I have another PR here. @tarekgh @ericstj @jeffhandley

@tarekgh
Copy link
Member

tarekgh commented Feb 9, 2026

CC @rokonec @ManickaP who may help with that.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a DataFrame.Melt() API to reshape data from wide to long format (similar to pandas.melt), enabling a common “unpivot” transformation within Microsoft.Data.Analysis.

Changes:

  • Introduces DataFrame.Melt(...) plus helper methods for validation, sizing, column initialization, and filling.
  • Implements optional null/empty filtering (dropNulls) and mixed-type handling (stringifying values when needed).
  • Adds new unit tests covering core melt scenarios and some invalid-input cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/Microsoft.Data.Analysis/DataFrame.cs Adds the Melt() API and its helper methods to produce a long-format DataFrame.
test/Microsoft.Data.Analysis.Tests/DataFrameTests.cs Adds theory data + tests validating melt output and a few invalid-input cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sevenzees
Copy link
Contributor Author

I have implemented all the Copilot suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Melt method

2 participants