Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot send lineage data #522

Closed
galizzio opened this issue Dec 12, 2019 · 5 comments
Closed

Cannot send lineage data #522

galizzio opened this issue Dec 12, 2019 · 5 comments
Assignees
Milestone

Comments

@galizzio
Copy link

Have a very simple test that creates and materializes 1 dataframe using the following SQL query : select 123 as id

I get the following error

15:25:49 [ScalaTest-run-running-LineageTest] WARN  org.apache.spark.sql.util.ExecutionListenerManager Error executing query execution listener
java.lang.RuntimeException: Cannot send lineage data to http://localhost:8080/producer/execution-plans
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.sendJson(HttpLineageDispatcher.scala:57)

rest gateway is returning :

{
    "error": "JSON parse error: ; nested exception is com.twitter.finatra.json.internal.caseclass.exceptions.CaseClassMappingException: \nErrors:\t\tcom.twitter.finatra.json.internal.caseclass.exceptions.CaseClassValidationException: operations.other.childIds: field is required\n\n"
}

Execution plan sent to the rest gateway by spline agent has indeed a missing childIds field for operations.other with id=3

{
    "id": "070e41dd-7d70-422e-a8f7-305ab4fcdd92",
    "operations": {
        "write": {
            "outputSource": "file:/c:/tmp/instrument",
            "append": false,
            "id": 0,
            "childIds": [
                1
            ],
            "params": {
                "path": "c:\\tmp\\instrument"
            },
            "extra": {
                "name": "InsertIntoHadoopFsRelationCommand",
                "destinationType": "Parquet"
            }
        },
        "other": [
            {
                "id": 3,
                "extra": {
                    "name": "OneRowRelation"
                }
            },
            {
                "id": 2,
                "childIds": [
                    3
                ],
                "schema": [
                    "ae29dfe3-39fe-4271-8482-f6826b2c00b5"
                ],
                "params": {
                    "projectList": [
                        {
                            "_typeHint": "expr.Alias",
                            "alias": "id",
                            "child": {
                                "_typeHint": "expr.Literal",
                                "value": 123,
                                "dataTypeId": "129f2969-214f-43dd-8f13-ebf285c6cb5f"
                            }
                        }
                    ]
                },
                "extra": {
                    "name": "Project"
                }
            },
            {
                "id": 1,
                "childIds": [
                    2
                ],
                "schema": [
                    "1e8bf5b0-1863-4674-b802-f6c238a5cf90"
                ],
                "params": {
                    "name": "`instrument`"
                },
                "extra": {
                    "name": "SubqueryAlias"
                }
            }
        ]
    },
    "systemInfo": {
        "name": "spark",
        "version": "2.4.1"
    },
    "agentInfo": {
        "name": "spline",
        "version": "0.4.0"
    },
    "extraInfo": {
        "appName": "runner_test",
        "dataTypes": [
            {
                "_typeHint": "dt.Simple",
                "id": "129f2969-214f-43dd-8f13-ebf285c6cb5f",
                "name": "integer",
                "nullable": false
            }
        ],
        "attributes": [
            {
                "id": "ae29dfe3-39fe-4271-8482-f6826b2c00b5",
                "name": "id",
                "dataTypeId": "129f2969-214f-43dd-8f13-ebf285c6cb5f"
            },
            {
                "id": "1e8bf5b0-1863-4674-b802-f6c238a5cf90",
                "name": "id",
                "dataTypeId": "129f2969-214f-43dd-8f13-ebf285c6cb5f"
            }
        ]
    }
}

Adding an empty array childIds field and sending the same json with an API testing tool works fine.

@wajda wajda added the bug label Dec 12, 2019
@wajda wajda added this to the 0.4.1 milestone Dec 12, 2019
@wajda wajda self-assigned this Dec 12, 2019
@shubhluck
Copy link

shubhluck commented Dec 13, 2019

@wajda getting similar below error while running spark scala tests "Example1 Job" on branch develop or 0.4-rc1, here are the steps that was performed:

  1. Pulled and ran arrangodb from docker repo
  2. Initialised spline db successfully from maven build admin jar(as it was throwing below error from given bundled admin jar file)
    .net.MalformedURLException: arrangodb://localhost/spline (of class java.lang.String)
  3. Started spline server from docker command as stated in documentation for 0.4(also a suggestion to change ip for connecting to arrangodb based on below command)
    docker inspect --format '{{ .NetworkSettings.IPAddress }}' <image_name>
  4. Started spline ui from docker command mentioned in documentation(faced below error while opening client url in browser)
    missing argument exception spline.server.rest_endpoint
    Fixed it by adding this environment variable in docker command
    -e spline.server.rest_endpoint=http://localhost:8080
  5. Ran a sample spark example by providing spline.producer.url in system property before initialising spline tracking, it got initialised successfully but post saving dataframe results it was throwing below error:
19/12/13 18:01:21 WARN ExecutionListenerManager: Error executing query execution listener
java.lang.RuntimeException: Cannot send lineage data to http://localhost:8080/producer/execution-plans
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.sendJson(HttpLineageDispatcher.scala:57)
	at za.co.absa.spline.harvester.dispatcher.HttpLineageDispatcher.send(HttpLineageDispatcher.scala:40)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler$$anonfun$onSuccess$1.apply(QueryExecutionEventHandler.scala:45)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler$$anonfun$onSuccess$1.apply(QueryExecutionEventHandler.scala:43)
	at scala.Option.foreach(Option.scala:257)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:43)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
	at scala.Option.foreach(Option.scala:257)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:37)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:114)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:113)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:135)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:133)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
	at org.apache.spark.sql.util.ExecutionListenerManager.org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling(QueryExecutionListener.scala:133)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply$mcV$sp(QueryExecutionListener.scala:113)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
	at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:146)
	at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:112)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:611)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
	at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:508)
	at za.co.absa.spline.example.batch.Example1Job$.delayedEndpoint$za$co$absa$spline$example$batch$Example1Job$1(Example1Job.scala:49) at za.co.absa.spline.example.batch.Example1Job$delayedInit$body.apply(Example1Job.scala:23)

Also it was throwing below error in the logs of spline server
12:31:21.511 [http-nio-8080-exec-1] WARN o.s.web.servlet.PageNotFound - No mapping for POST /producer/execution-plans

Please help me out if I am missing any step while building service or there is a workaround for this.

@wajda
Copy link
Contributor

wajda commented Dec 13, 2019

@shubhluck, from your logs it looks like you are using an old snapshot Spline version. The config parameter spline.server.rest_endpoint doesn't exist in Spline 0.4.0. Please make sure you are running a correct version.

@shubhluck
Copy link

Thanks a lot @wajda , I was directly running the docker run command without pulling latest tag for server and UI, please mention in the doc to pull latest repo and then start services.
Thanks again 👍

@wajda
Copy link
Contributor

wajda commented Dec 16, 2019

@galizzio, the issue should be fixed in 0.4.1. Can you please confirm it?

@wajda
Copy link
Contributor

wajda commented Dec 16, 2019

Our tests passed. Closing the issue. Please open another one if the error persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants