Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于 str.replace 不被内置函数支持的问题 #14

Closed
Cheonsoon opened this issue Jan 14, 2017 · 4 comments
Closed

关于 str.replace 不被内置函数支持的问题 #14

Cheonsoon opened this issue Jan 14, 2017 · 4 comments

Comments

@Cheonsoon
Copy link

现在想用 str.replace 正则替换,但是发现 compile 出来的SQL是带 pyodps_udf_xxxx 的,于是跟踪源码发现,strings.Replace 确实没在 compiler 中实现。
目前发现 strings.Contains 是实现了内部函数以及正则,于是想参考实现 strings.Replace,但是发现补充 elif 后打断点程序并不会执行(不会进入 visit_string_op )。跟了一下代码,但还是不清楚哪儿有问题,麻烦指导一下子,谢谢~

@qinxuye
Copy link
Contributor

qinxuye commented Jan 16, 2017

dataframe在compile之前,会有个analyze的过程,代码在:https://github.com/aliyun/aliyun-odps-python-sdk/blob/master/odps/df/backends/odpssql/analyzer.py#L607

由于Python的正则和MaxCompute的正则不相同,为了保证和Python正则的兼容性,这里在analyzer里会被改写成自定义函数,在里面调用Python正则来替换。所以在compile里其实不会遇到strings.Replace了(被改写掉了)。

现在这么写是报错不支持Python UDF吗?

@qinxuye
Copy link
Contributor

qinxuye commented Jan 16, 2017

contains这里其实应该是有问题的,应该也会有正则不一致的情况。这里应该也是要和replace是一样的处理方式。

现在,要想对一列调用MaxCompute内建函数,只能:

df.field.map('regexp_replace', '**your pattern**', '**your replace string**')

@Cheonsoon
Copy link
Author

哦哦哦,好的好的,我跟跟 analyzer 的相关代码看看~
唉,是啊,估计是因为我这边用得是odps的公共服务,没法用python的udf吧。map的话好像也得用到udf,我们索性重写一下replace算了😂😂

@qinxuye
Copy link
Contributor

qinxuye commented Jan 25, 2017

这个应该不是bug。先close。

@qinxuye qinxuye closed this as completed Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants