Skip to content

Add Melt method to DataFrame#7578

Merged
rokonec merged 30 commits intodotnet:mainfrom
sevenzees:main
Mar 20, 2026
Merged

Add Melt method to DataFrame#7578
rokonec merged 30 commits intodotnet:mainfrom
sevenzees:main

Conversation

@sevenzees
Copy link
Copy Markdown
Contributor

@sevenzees sevenzees commented Feb 8, 2026

Add DataFrame.Melt() method for transforming wide to long format

Description

This PR implements a Melt() method for the DataFrame class that transforms data from wide format to long format, similar to Pandas' pandas.melt() function. This is a fundamental data reshaping operation that "unpivots" multiple value columns into a pair of variable-value columns.

Fixes #7577

What does this change do?

The Melt() method:

  • Accepts identifier columns that remain fixed in the output
  • Unpivots specified value columns (or auto-detects them if not specified) into two new columns: one containing the original column names and one containing the values
  • Supports customizable column names for the variable and value columns
  • Handles mixed data types across value columns by converting to string when necessary
  • Optionally filters out null or empty values from the result
  • Includes comprehensive input validation with clear error messages

Why this approach?

Performance optimizations:

  • Pre-calculates the total output size to allocate columns once upfront, eliminating expensive incremental resize operations
  • Uses direct iteration instead of creating intermediate index arrays
  • Caches column references to reduce repeated dictionary lookups

Design decisions:

  • Separated data processing into focused helper methods for maintainability
  • Follows the Pandas API design for familiarity to users coming from Python
  • Maintains type safety by preserving column types when all value columns share the same type
  • Defaults to using all non-ID columns as value columns for convenience (matches Pandas behavior)

API signature:

public DataFrame Melt(
    IEnumerable<string> idColumns, 
    IEnumerable<string> valueColumns = null, 
    string variableName = "variable", 
    string valueName = "value", 
    bool dropNulls = false)

Changes included

  • DataFrame.cs: Added Melt() method and supporting helper methods
    • CalculateTotalOutputRows(): Pre-calculates output size for efficient allocation
    • InitializeIdColumns(): Sets up ID columns with correct size
    • CreateValueColumn(): Creates appropriately typed value column
    • FillMeltedData(): Performs the actual unpivoting operation

Example usage

// Transform quarterly sales data from wide to long format
var df = new DataFrame(new[]
{
    new StringDataFrameColumn("Region", new[] { "North", "South" }),
    new Int32DataFrameColumn("Q1", new[] { 1000, 800 }),
    new Int32DataFrameColumn("Q2", new[] { 1200, 900 }),
    new Int32DataFrameColumn("Q3", new[] { 1100, 950 })
});

var melted = df.Melt(
    idColumns: new[] { "Region" },
    valueColumns: new[] { "Q1", "Q2", "Q3" },
    variableName: "Quarter",
    valueName: "Sales"
);

// Result:
// | Region | Quarter | Sales |
// |--------|---------|-------|
// | North  | Q1      | 1000  |
// | North  | Q2      | 1200  |
// | North  | Q3      | 1100  |
// | South  | Q1      | 800   |
// | South  | Q2      | 900   |
// | South  | Q3      | 950   |

Additional notes

This implementation brings the .NET DataFrame API closer to feature parity with Pandas and supports common data transformation workflows needed for analysis and visualization. The method is optimized for performance while maintaining code readability and maintainability.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.58%. Comparing base (a564b13) to head (46e6924).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7578      +/-   ##
==========================================
+ Coverage   69.55%   69.58%   +0.03%     
==========================================
  Files        1484     1484              
  Lines      273209   273604     +395     
  Branches    27919    27948      +29     
==========================================
+ Hits       190029   190394     +365     
- Misses      75817    75850      +33     
+ Partials     7363     7360       -3     
Flag Coverage Δ
Debug 69.58% <100.00%> (+0.03%) ⬆️
production 63.83% <100.00%> (+0.01%) ⬆️
test 89.63% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/Microsoft.Data.Analysis/DataFrame.cs 93.93% <100.00%> (+2.08%) ⬆️
src/Microsoft.Data.Analysis/Strings.Designer.cs 51.46% <100.00%> (+7.92%) ⬆️
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs 99.92% <100.00%> (+0.01%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sevenzees sevenzees marked this pull request as ready for review February 9, 2026 15:30
@sevenzees
Copy link
Copy Markdown
Contributor Author

Not sure who all would want to look at this, but I have another PR here. @tarekgh @ericstj @jeffhandley

@tarekgh
Copy link
Copy Markdown
Member

tarekgh commented Feb 9, 2026

CC @rokonec @ManickaP who may help with that.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a DataFrame.Melt() API to reshape data from wide to long format (similar to pandas.melt), enabling a common “unpivot” transformation within Microsoft.Data.Analysis.

Changes:

  • Introduces DataFrame.Melt(...) plus helper methods for validation, sizing, column initialization, and filling.
  • Implements optional null/empty filtering (dropNulls) and mixed-type handling (stringifying values when needed).
  • Adds new unit tests covering core melt scenarios and some invalid-input cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/Microsoft.Data.Analysis/DataFrame.cs Adds the Melt() API and its helper methods to produce a long-format DataFrame.
test/Microsoft.Data.Analysis.Tests/DataFrameTests.cs Adds theory data + tests validating melt output and a few invalid-input cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Microsoft.Data.Analysis/DataFrame.cs
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread test/Microsoft.Data.Analysis.Tests/DataFrameTests.cs
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs
@rokonec
Copy link
Copy Markdown
Member

rokonec commented Mar 12, 2026

I was snowed under, I will look at it, and will replay in a three working days.

Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs
Comment thread test/Microsoft.Data.Analysis.Tests/DataFrameTests.cs
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
@sevenzees
Copy link
Copy Markdown
Contributor Author

@rokonec Thank you for your review! I have implemented all your suggestions.

Copy link
Copy Markdown
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work addressing the review comments! Three remaining nits before this is ready to merge.

Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Copy link
Copy Markdown
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work addressing the review comments! Three remaining nits before this is ready to merge.

Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrame.cs Outdated
@sevenzees
Copy link
Copy Markdown
Contributor Author

@rokonec Good catches! Thanks again for your reviews! I pushed those changes.

Copy link
Copy Markdown
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for great work

@rokonec rokonec merged commit d25ef12 into dotnet:main Mar 20, 2026
25 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Melt method

4 participants