Skip to content

Add LogisticRegressionOperator with API training support and integration test#570

Merged
zkaoudi merged 9 commits intoapache:mainfrom
xristlamp:main
May 20, 2025
Merged

Add LogisticRegressionOperator with API training support and integration test#570
zkaoudi merged 9 commits intoapache:mainfrom
xristlamp:main

Conversation

@xristlamp
Copy link
Contributor

This pull request introduces support for training logistic regression models in Apache Wayang via the public API.

Summary of changes:

  • Implemented LogisticRegressionOperator (logical operator).

  • Added SparkLogisticRegressionOperator for Spark execution.

  • Defined LogisticRegressionModel interface.

  • Created LogisticRegressionMapping to map the logical operator to Spark.

  • Registered the mapping in Spark's ML plugin.

  • Extended the public API:

    • Added .trainLogisticRegression(...) to DataQuanta and DataQuantaBuilder.
    • Introduced LogisticRegressionDataQuantaBuilder class (consistent with dlTraining).
  • Added an integration test: testLogisticRegressionWithAPI in SparkIntegrationIT.java that verifies:

  • .loadCollection(...)

  • .trainLogisticRegression(...)

  • .predict(...)

  • .collect()

Testing:

  • Verified with mvn clean install: all tests pass successfully.

val operator = new LogisticRegressionOperator(fitIntercept)
this.connectTo(operator, 0)
labels.connectTo(operator, 1)
new DataQuanta[LogisticRegressionModel](operator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a new DataQuanta and not just output the operator as it's done for the other cases (see dlTrainingJava)?

@zkaoudi zkaoudi merged commit e7426e1 into apache:main May 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants