Skip to content

[SYSTEMDS-3857] set get names on dataframes#2495

Open
t99-i wants to merge 6 commits into
apache:mainfrom
t99-i:SYSTEMDS-3857-set-get-names-on-dataframes
Open

[SYSTEMDS-3857] set get names on dataframes#2495
t99-i wants to merge 6 commits into
apache:mainfrom
t99-i:SYSTEMDS-3857-set-get-names-on-dataframes

Conversation

@t99-i

@t99-i t99-i commented Jun 21, 2026

Copy link
Copy Markdown

This PR adds the frame operations getNames and setNames.

Changes:

  • Added getNames support in compiler and CP runtime
  • Added setNames support in compiler and CP runtime
  • Added tests for getNames and setNames
  • Added column name propagation tests for cbind, rbind, and slice
  • Updated DML language reference

Tests:

  • FrameColumnNamesTest
  • FrameColNamesPropagationTest

The propagation tests verify that column names are preserved correctly during frame operations. The covered operations are cbind, rbind, and slice.

The tests use frame dimensions ranging from 10 to 1000 columns.

Note:
I observed unusual column-name propagation behavior for cbind on wider frames around 3000 columns. I left this case out of the regression tests to keep the PR focused on getNames/setNames and the CP runtime scope.

t99-i added 5 commits June 14, 2026 17:36
- fix dim for SetNames
- implemented tests for SetName and GetName
- add a test for propagation of column names during cbind operations
- test for other operations following
This patch adds the language references for the newly implemented getName and setName function.
The order in Builtins.java was fixed to be alphabetical again
@t99-i t99-i changed the title Systemds 3857 set get names on dataframes [SYSTEMDS-3857] set get names on dataframes Jun 21, 2026

@janniklinde janniklinde left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @t99-i. I left some comments in the code that should be addressed.

In general, please configure your IDE to use tabs instead of spaces. Also, please remove the unnecessary TODO comments.

As a next step, please implement support for the spark backend and add the according tests. In general, you should have a more systematic approach to verifying metatdata propagation through tests as you currently only include cbind/rbind/slice for the CP backend (should be extended to spark backend and should test other frame related functions systematically).

Comment on lines +2765 to +2772
case SET_NAMES:
currBuiltinOp = new BinaryOp(
target.getName(),
target.getDataType(),
target.getValueType(),
OpOp2.SET_COLNAMES, expr, expr2
);
break;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above cases will now be mapped to a BinaryOp of type SET_COLNAMES. This is incorrect and the reason why the tests fail (see for example: https://github.com/apache/systemds/actions/runs/27905840894/job/82663839671?pr=2495#step:3:3857).

else if( opcode.equalsIgnoreCase(Opcodes.FREPLICATE.toString()))
return new BinaryOperator(Builtin.getBuiltinFnObject("freplicate"));
else if( opcode.equalsIgnoreCase(Opcodes.SET_COLNAMES.toString()))
return new BinaryOperator(Builtin.getBuiltinFnObject("set_colnames"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return a proper function object currently?

Comment on lines +71 to +74
String[] colNames = new String[(int) names.getNumColumns()];
for(int i = 0; i < colNames.length; i++){
colNames[i] = names.get(0, i).toString();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No size/data validation happening

output.setDataType(DataType.FRAME);
output.setDimensions(id.getDim1(), id.getDim2());
output.setBlocksize (id.getBlocksize());
output.setValueType(ValueType.STRING);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setName does not have STRING as a return type

Comment on lines +55 to +62
//TODO: Check if new OPcode handling has to be implemented
else if(getOpcode().equals(Opcodes.COLNAMES.toString())) {
FrameBlock inBlock = ec.getFrameInput(input1.getName());
FrameBlock retBlock = inBlock.getColumnNamesAsFrame();
ec.releaseFrameInput(input1.getName());
ec.setFrameOutput(output.getName(), retBlock);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code, not needed.

Comment on lines +65 to +66
addTestConfiguration(TEST_NAME_GET, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME_SET, new String[] {"B"}));
addTestConfiguration(TEST_NAME_SET, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME_GET, new String[] {"B"}));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get/set swapped

names.set(0, i, columnNames[i]);
FrameWriter nameWriter = FrameWriterFactory.createFrameWriter(FileFormat.CSV,
new FileFormatPropertiesCSV(false, ",", false));
System.out.println("N path = " + input("N"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid those prints

private final static String TEST_NAME_RBIND = "ColNameRbindPropagation";
private final static String TEST_NAME_SLICE = "ColNameSlicePropagation";
private final static String TEST_DIR = "functions/frame/";
private static final String TEST_CLASS_DIR = TEST_DIR + FrameColumnNamesTest.class.getSimpleName() + "/";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong class

else if ( opcode.equalsIgnoreCase(Opcodes.VALUESWAP.toString()) || opcode.equalsIgnoreCase("mapValueSwap") )
return new BinaryOperator(Builtin.getBuiltinFnObject("valueSwap"));
//TODO: Check what "|| opcode.equalsIgnoreCase("mapValueSwap"))" does
else if (opcode.equalsIgnoreCase(Opcodes.SET_COLNAMES.toString()) || opcode.equalsIgnoreCase("mapValueSwap"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be mapValueSwap

@github-project-automation github-project-automation Bot moved this from In Progress to In Review in SystemDS PR Queue Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants