Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retrieve and include extra field in GET /execution-events #1339

Merged

Conversation

sethjones348
Copy link
Contributor

Overview

Adding this feature as per discussion with @wajda on the spark-spline-agent repo. See discussion for more details.

Changes

  • Update the WriteEventInfo model to include the extra field which is already available within the ArangoDB
  • Update the ExecutionEventRepositoryImpl to read the extra field from the execution event table in the DB

@@ -192,5 +193,6 @@ object ExecutionEventRepositoryImpl {
"extra.appId",
"execPlanDetails.dataSourceUri",
"execPlanDetails.dataSourceType",
"extra"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this field is not analysed, we don't search on it.

Copy link
Contributor Author

@sethjones348 sethjones348 Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wajda I added that because I would like to be able to search on the field I am adding in the extraMetaRule. I understand if thats not supported. Alternatively could I add this right after the analyze function in the arango query?

OR CONTAINS(LOWER(ee.extra), @searchTerm)

E.g.,

         |        AND (@searchTerm == null
      ${
        SearchFields.map({
          fld =>
            s"""      OR ANALYZER(LIKE(ee.$fld, CONCAT("%", TOKENS(@searchTerm, "norm_en")[0], "%")), "norm_en")"""
        }).mkString("\n")
      }
         |             OR CONTAINS(LOWER(ee.extra), @searchTerm)
         |        )

Let me know. Otherwise I will just remove the field and make do without the search capability!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the field extra is nothing but a blackbox payload for Spline. It's an extension point in the data model that enables you to cary some information through Spline, while Spline itself has no idea what it is and how to handle it. That's why we specifically don't want to look into that field - we can barely carry it over, but not to look inside it.
If you want a custom property that you could search on, then I would suggest you to use labels. It basically just other place to store a custom data in the Spline model, but while extra is a wild card, the labels is a set of key-value pairs which you can filter on (not search but filter).
Look at the example here - https://github.com/AbsaOSS/spline-spark-agent/blob/e558e2a92a72b00c3e45c0dd5dda274be837262f/examples/src/main/resources/spline.yaml#L38
And then on the UI / Consumer API you can do this - https://github.com/AbsaOSS/spline/blob/develop/consumer-rest-core/src/main/scala/za/co/absa/spline/consumer/rest/controller/LabelsController.scala and also this - https://github.com/AbsaOSS/spline/blob/develop/consumer-rest-core/src/main/scala/za/co/absa/spline/consumer/rest/controller/ExecutionEventsController.scala#L64.

Basically, the API allows you to list available label names, as well as the distinct values for the given label, and then you can filter your events by combination of label key:value pairs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. Yes looks like the use of labels could fill the gap that we were trying to fill with the extra field. I think I missed this because we are on the latest release which doesn't appear to support either the labels controller or the labels field in the request/response of the GET /execution-events.

With that in mind, I'll pare this PR down to simply adding the extra field into the arango query and into the response object. However, when I look at WriteEventInfo, I am not seeing the labels field here. Is this something I should add as part of this PR as well similar to how I have done for extra?

case class WriteEventInfo
(
  @ApiModelProperty(value = "Id of the execution event")
  executionEventId: WriteEventInfo.Id,
  @ApiModelProperty(value = "Id of the execution plan")
  executionPlanId: ExecutionPlanInfo.Id,
  @ApiModelProperty(value = "Name of the framework that triggered this execution event")
  frameworkName: String,
  @ApiModelProperty(value = "Name of the application/job")
  applicationName: String,
  @ApiModelProperty(value = "Id of the application/job")
  applicationId: String,
  @ApiModelProperty(value = "When the execution was triggered")
  timestamp: WriteEventInfo.Timestamp,
  @ApiModelProperty(value = "Duration of execution in nanoseconds (for successful executions)")
  durationNs: Option[WriteEventInfo.DurationNs],
  @ApiModelProperty(value = "Error (for failed executions)")
  error: Option[WriteEventInfo.Error],
  @ApiModelProperty(value = "Output data source name")
  dataSourceName: String,
  @ApiModelProperty(value = "Output data source URI")
  dataSourceUri: String,
  @ApiModelProperty(value = "Output data source (or data) type")
  dataSourceType: String,
  @ApiModelProperty(value = "Write mode - (true=Append; false=Override)")
  append: WriteEventInfo.Append,
  @ApiModelProperty(value = "Other extra info")
  extra: Map[String, Any]
) {
  def this() = this(null, null, null, null, null, null, null, null, null, null, null, null, null)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I don't mind, please go ahead. But please test it thoroughly on your side first.

Copy link
Contributor Author

@sethjones348 sethjones348 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see the labels filtering happening in the query:

      ${
        lblNames.zipWithIndex.map({
          case (lblName, i) =>
            s"""
               AND (
                   @lblValues[$i] ANY == ee.labels['${escapeJavaScript(lblName)}']
                   OR @lblValues[$i] ANY == ee.execPlanDetails.labels['${escapeJavaScript(lblName)}']
               )
             """.stripMargin
        }).mkString("\n")
      }

However, it would be nice to see those coming back in the API response as well. Is that something you'd be comfortable with me adding @wajda ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wajda I took a stab at wiring up the labels field. I won't be able to test this right away, so I think I'll just move forward with my forked spline repo with these changes. If everything is working as I get to testing I will let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wajda The latest commit has the working code for including labels and extra in the response for GET /execution-events. I was able to get this tested. This is the new responses below and their associated userExtraMeta rule.

Let me know if there was anything else you wanted to see here.

Labels

userExtraMeta rule:

{
  "executionEvent": {
    "labels": {
      "sparkContextId": {
        "$js": "session.conf().get('spark.databricks.sparkContextId')"
      }
    }
  }
}

Observed Response of GET /execution-events

{
    "items": [
        {
            "executionEventId": "6d454497-338b-529e-9c9c-7363abad856b:lxtce2er",
            "executionPlanId": "6d454497-338b-529e-9c9c-7363abad856b",
            "frameworkName": "spark 3.3.2",
            "applicationName": "Databricks Shell",
            "applicationId": "...",
            "timestamp": 1719255593907,
            "durationNs": 4.4308673125E10,
            "error": null,
            "dataSourceName": "newcountries",
            "dataSourceUri": "...",
            "dataSourceType": "delta",
            "append": true,
            "extra": {
                "appId": "...",
                "user": "root",
                "readMetrics": {},
                "writeMetrics": {}
            },
            "labels": {
                "sparkContextId": [
                    "8900849439166745669"
                ]
            }
        }
    ],
    "totalCount": 9,
    "pageNum": 1,
    "pageSize": 10,
    "totalDateRange": []
}

Extra

userExtraMeta rule:

{
  "executionEvent": {
    "extra": {
      "sparkContextId": {
        "$js": "session.conf().get('spark.databricks.sparkContextId')"
      }
    }
  }
}

Observed Response of GET /execution-events

{
    "items": [
        {
            "executionEventId": "3da8af99-80b9-5622-847f-209d6f9b10a0:lxtduewe",
            "executionPlanId": "3da8af99-80b9-5622-847f-209d6f9b10a0",
            "frameworkName": "spark 3.3.2",
            "applicationName": "Databricks Shell",
            "applicationId": "...",
            "timestamp": 1719258036206,
            "durationNs": 2.2877156976E10,
            "error": null,
            "dataSourceName": "...",
            "dataSourceUri": ...,
            "dataSourceType": "delta",
            "append": true,
            "extra": {
                "readMetrics": {},
                "sparkContextId": "6677040947717205628",
                "appId": "...",
                "user": "root",
                "writeMetrics": {}
            },
            "labels": {}
        }
    ],
    "totalCount": 9,
    "pageNum": 1,
    "pageSize": 10,
    "totalDateRange": []
}

@sethjones348 sethjones348 force-pushed the feature/add-extra-to-execution-events branch 2 times, most recently from 8be707d to 311465e Compare June 20, 2024 15:57
@sethjones348 sethjones348 force-pushed the feature/add-extra-to-execution-events branch from 311465e to e57d07b Compare June 20, 2024 15:59
@sethjones348 sethjones348 requested a review from wajda June 20, 2024 16:12
@sethjones348 sethjones348 force-pushed the feature/add-extra-to-execution-events branch 2 times, most recently from 9471179 to 61495db Compare June 20, 2024 16:13
Seth Jones and others added 2 commits June 20, 2024 14:43
…execution-events

Add fields extra and labels to response of GET /execution-events
@sethjones348 sethjones348 force-pushed the feature/add-extra-to-execution-events branch from 61495db to 85db72e Compare June 21, 2024 18:50
@sethjones348 sethjones348 force-pushed the feature/add-extra-to-execution-events branch from 85db72e to ae88544 Compare June 21, 2024 18:54
wajda
wajda previously approved these changes Jun 25, 2024
Copy link
Contributor

@wajda wajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wajda wajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minor change in typing: ArrayList -> List

@wajda wajda merged commit de82e68 into AbsaOSS:develop Jun 25, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants