Skip to content

Comments

[SPARK-38484][PYTHON] Move usage logging instrumentation util functions from pandas module to pyspark.util module#35790

Closed
heyihong wants to merge 5 commits intoapache:masterfrom
heyihong:SPARK-38484
Closed

[SPARK-38484][PYTHON] Move usage logging instrumentation util functions from pandas module to pyspark.util module#35790
heyihong wants to merge 5 commits intoapache:masterfrom
heyihong:SPARK-38484

Conversation

@heyihong
Copy link
Contributor

@heyihong heyihong commented Mar 9, 2022

What changes were proposed in this pull request?

Move usage logging instrumentation util functions from pandas module to pyspark.util module

Why are the changes needed?

It will be helpful to attach the usage logger to other modules (e.g. sql) besides Pandas but other modules should not depend on Pandas modules to use the instrumentation utils (e.g. _wrap_function, _wrap_property ...).

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Existing unit tests
  • Manual test by running ./bin/pyspark and verified the output:
>>> sc.setLogLevel("info")
>>> import pyspark.pandas as ps
22/03/15 17:17:16 INFO Log4jUsageLogger: pandasOnSparkImported=1.0, tags=List(), blob=
22/03/15 17:18:21 INFO Log4jUsageLogger: initialConfigLogging=1.0, tags=List(sparkApplicationId=local-1647360920525, sparkExecutionId=null, sparkJobGroupId=null), blob={"spark.sql.warehouse.dir":"file:/Users/yihong.he/spark/spark-warehouse","spark.executor.extraJavaOptions":"-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED","spark.driver.host":"10.120.131.148","spark.serializer.objectStreamReset":"100","spark.driver.port":"61338","spark.rdd.compress":"True","spark.app.name":"PySparkShell","spark.submit.pyFiles":"","spark.ui.showConsoleProgress":"true","spark.app.startTime":"1647360919721","spark.executor.id":"driver","spark.driver.extraJavaOptions":"-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED","spark.submit.deployMode":"client","spark.master":"local[*]","spark.sql.catalogImplementation":"hive","spark.app.id":"local-1647360920525"}
22/03/15 17:18:25 INFO Log4jUsageLogger: pandasOnSparkFunctionCalled=1.0, tags=List(pandasOnSparkFunction=__init__(self, data=None, index=None, columns=None, dtype=None, copy=False), className=DataFrame, status=success), blob={"duration": 3781.477633999998}
>>> psdf.columns
22/03/15 17:18:49 INFO Log4jUsageLogger: pandasOnSparkFunctionCalled=1.0, tags=List(pandasOnSparkProperty=columns, className=DataFrame, status=success), blob={"duration": 0.24742499999774736}
Index(['a', 'b'], dtype='object')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be _local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@heyihong heyihong requested a review from hvanhovell March 9, 2022 21:05
@HyukjinKwon
Copy link
Member

cc @ueshin FYI

@HyukjinKwon HyukjinKwon changed the title [SPARK-38353][PYTHON] Move usage logging instrumentation util functions from pandas module to pyspark.util module [SPARK-38484][PYTHON] Move usage logging instrumentation util functions from pandas module to pyspark.util module Mar 10, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

@heyihong https://github.com/heyihong/spark/runs/5488300464?check_suite_focus=true got fixed in 54abb85. Would you mind rebasing your PR to fix the build up?

@heyihong
Copy link
Contributor Author

@heyihong https://github.com/heyihong/spark/runs/5488300464?check_suite_focus=true got fixed in 54abb85. Would you mind rebasing your PR to fix the build up?

Sure, done

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heyihong just to double check, it's just moving the codes around right? LGTM

@HyukjinKwon
Copy link
Member

cc @ueshin mind taking a look too when you find some time?

for name, prop in inspect.getmembers(missing, lambda o: isinstance(o, property)):
setattr(missing, name, _wrap_missing_property(original.__name__, name, prop, logger))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should have another file for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I moved the code to instrumentation_utils.py



def _attach(
logger_module: Union[str, ModuleType], modules: Any, classes: Any, missings: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid using Anys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Changed to stronger types

@heyihong
Copy link
Contributor Author

@heyihong just to double check, it's just moving the codes around right? LGTM

Yes, no logic changes

@heyihong heyihong requested a review from ueshin March 15, 2022 14:45
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Member

ueshin commented Mar 15, 2022

Thanks! merging to master.

@ueshin ueshin closed this in 21db916 Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants