New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python client v3 (UASTv2) #128

Open
wants to merge 53 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@dennwc
Copy link
Member

dennwc commented Oct 4, 2018

Work-in-progress implementation of Python client v3 for the new libuast.

TODO:

  • Proper memory management
  • Fix install scripts
  • Update gRPC client to use v1 and v2 protocols
  • Test that the static linking works in Windows and macos and include static files in the libuast package.
  • Tests
  • Fix some TO-DOs in the code about potential memory problems.
  • Update README.md
  • Enable failing test testFilterToken once the SDK problem has been fixed.
  • Add Node compatibility layer.

Signed-off-by: Denys Smirnov denys@sourced.tech

use the new libuast; rewrite bindings using cpp layer of libuast
Signed-off-by: Denys Smirnov <denys@sourced.tech>

@dennwc dennwc self-assigned this Oct 4, 2018

@dennwc dennwc requested a review from juanjux Oct 4, 2018

@dennwc dennwc changed the title Client v3 (UASTv2) Python client v3 (UASTv2) Oct 4, 2018

dennwc added some commits Oct 11, 2018

working bindings prototype
Signed-off-by: Denys Smirnov <denys@sourced.tech>
fix string memory management
Signed-off-by: Denys Smirnov <denys@sourced.tech>

@bzz bzz referenced this pull request Oct 17, 2018

Merged

V3 actual consistent naming #1

2 of 2 tasks complete

bzz and others added some commits Oct 16, 2018

refactoring: NodeExtType->PyNodeExtType for consistency
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
refactoring: NodeExt->PyNodeExt for consistency
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
refactoring: PyUastType->PyContextType for consistency
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
refactoring: PyUast->PyContext for consistency
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
refactoring: fix comments + fmt after rebase
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
fix replace
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Build fixes, comment out v1 things, some other adjustements
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Recover grpc sdk v1 protocol for some grpc objects
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Oct 22, 2018

Install script should be fixed.

juanjux added some commits Oct 22, 2018

Forward port the aliases refactor by Vadim
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Forward port travis changes
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
@bzz

This comment has been minimized.

Copy link
Contributor

bzz commented Oct 28, 2018

Was trying to test bblfshd/client-python.v3 locally (have libuast.3-rc2 globally installed)

virtualenv -p python3 .venv3
source .venv3/bin/activate
pip3 install -r requirements.txt
 edit setup.py // GET_LIBUAST = False
python3 setup.py install --getdeps --log
python3 setup.py install //build seems to finish fine

but then

python3 -c "import bblfsh"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "~/src-d/bblfsh/client-python/bblfsh/__init__.py", line 1, in <module>
    from bblfsh.client import BblfshClient
  File "~/src-d/bblfsh/client-python/bblfsh/client.py", line 6, in <module>
    from bblfsh.aliases import ParseRequest, DriverStub
  File "~/src-d/bblfsh/client-python/bblfsh/aliases.py", line 15, in <module>
    Node = uast_module.Node
AttributeError: module 'bblfsh.gopkg.in.bblfsh.sdk.v2.uast.generated_pb2' has no attribute 'Node'

@juanjo I belive it has to do with 24fd7b6 as bblfsh.gopkg.in.bblfsh.sdk.v2.uast.generated_pb2 indeed does not have Node, only v1 has.

When I manually roll it back to aliases.py it seems to work.

dennwc and others added some commits Oct 30, 2018

fix pip install
Signed-off-by: Denys Smirnov <denys@sourced.tech>
update the client to use both protocols
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Remove unused and broken import
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Compile the ext module from an static libuast object
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Oct 31, 2018

@dennwc PTAL at the last commit. Should work on Linux (works on my machine and my Docker image) but as we talked on Slack it'll require the libuast package to include static libraries and it's untested on Windows and Darwin. It also have GET_LIBUAST to False to ease testing this, should be changed before merging.

dennwc added some commits Nov 1, 2018

enable building the client with static libuast
Signed-off-by: Denys Smirnov <denys@sourced.tech>
do not free the query string in filter, it seems to be borrowed
Signed-off-by: Denys Smirnov <denys@sourced.tech>
improve the native Python wrappers and update the readme
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Show resolved Hide resolved README.md

juanjux added some commits Dec 12, 2018

Use range for grpcio and grpciotools
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Fixed some of @bzz feedback from review
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Dec 12, 2018

@bzz fixed some of the feedback, thanks! I'll leave some for @dennwc since that commented stuff seems to be probably related with the memory debugging that he did.

@bzz

This comment has been minimized.

Copy link
Contributor

bzz commented Dec 12, 2018

@juanjux awesome, thank you! A bit confused - is a2752b7 intentionally included in this PR?

Two more questions:

@mcuadros already approved of breaking changes in the API on a leads meeting

Is there a chance we could link in relevant minutes where it has happened?
Could not find any with quick search of breaking.

So if everything else works here and @vmarkovtsev is happy with Python API,

Did anybody hear back from Machine Learning team on these API changes ? Would appreciate if anyone can point me to the right place with the discussion.

A while ago has asked to be included in the conversation but did not get any feedback so far src-d/minutes#372 (comment) - sorry if I have missed it.

@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Dec 12, 2018

No idea what meeting was, one of the latest I was in, surely @mcuadros remembers it, I proposed to discuss if we had to support backward breaking changes and creating a compatibility layer and he said it wouldn't be needed since it was a major version increase. Maybe it wasn't recorded.

We asked for ML feedback on the leads meeting and quoting several times here, but we had to go forward. Anyway the Python layer I added over the native one while not backwards compatible should be pretty easy to use.

add error checks for iterators and clarify comments
Signed-off-by: Denys Smirnov <denys@sourced.tech>
@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Dec 13, 2018

@bzz everything in your review should be solved.

@bzz

This comment has been minimized.

Copy link
Contributor

bzz commented Dec 17, 2018

@vmarkovtsev would you be so kind to please review the API changes? Thanks!

@zurk
Copy link

zurk left a comment

My minor comments and questions. Main part looks good 👍 .

Show resolved Hide resolved .travis.yml
Show resolved Hide resolved bblfsh/aliases.py Outdated
Show resolved Hide resolved bblfsh/client.py
Show resolved Hide resolved bblfsh/client.py Outdated
Show resolved Hide resolved bblfsh/client.py Outdated
Show resolved Hide resolved bblfsh/fixtures/test.py
Show resolved Hide resolved bblfsh/result_context.py
Show resolved Hide resolved bblfsh/tree_order.py Outdated
Show resolved Hide resolved setup.py

juanjux added some commits Dec 18, 2018

Fixed from @zurk review (thanks!)
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Show resolved Hide resolved README.md Outdated
# are alterative typed versions:
x = next(ctx.filter("boolean(//*[@strtOffset or @endOffset])").get_bool()
y = next(ctx.filter("name(//*[1])")).get_str()
z = next(ctx.filter("count(//*)").get_int() # or get_float()

This comment has been minimized.

@vmarkovtsev

vmarkovtsev Dec 18, 2018

Contributor

This is not very Pythonic. IDK if it is hard to guess the type, but the following is more Pythonic:

x = next(ctx.filter("boolean(//*[@strtOffset or @endOffset])").iterate())
for name in ctx.filter("name(//*[1])").iterate():
    print(name)

Where iterate() would be an unordered iterator over the resulting values. Or name it results(). Or remove it and iterate directly, anyway.

This comment has been minimized.

@juanjux

juanjux Dec 18, 2018

Member

Those are typed functions for queries returning boolean/string/integer/float values instead of nodes so the "typed get" is needed when the user doesn't have to do isinstance of the results to check that they're right every time (but if he wants, he can use the normal get()).

for i in newiter: ...
# You can also get the non semantic UAST or native AST:
ctx = client.parse("file.py", mode=bblfsh.ModeDict["NATIVE"])

This comment has been minimized.

@vmarkovtsev

vmarkovtsev Dec 18, 2018

Contributor

We should use an enum class or define separate constants instead of raw strings.

This comment has been minimized.

@juanjux

juanjux Dec 18, 2018

Member

It's mapping the protocol_v2_module.Mode.DESCRIPTOR.values_by_name field. Maybe we could generate variable names from these strings using eval or exec but that's almost worse.

Show resolved Hide resolved bblfsh/client.py Outdated
Show resolved Hide resolved bblfsh/client.py
Show resolved Hide resolved bblfsh/client.py Outdated
Show resolved Hide resolved bblfsh/client.py Outdated
@vmarkovtsev

This comment has been minimized.

Copy link
Contributor

vmarkovtsev commented Dec 18, 2018

So given that we no longer have a Node, we have to rewrite literally everything on our side, because Node is used everywhere in ML code, in nearly half of the files. Alternatively, we will write our own emulation wrapper because we heavily rely on that abstraction anyway and don't want to spend months on debugging very very tricky stuff bound to nodes :)

@vmarkovtsev

This comment has been minimized.

Copy link
Contributor

vmarkovtsev commented Dec 18, 2018

Daaaamn, we have to rewrite bloody months of work in the style-analyzer's feature extraction without this emulation. Very painful.

@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Dec 18, 2018

@vmarkovtsev That was my point of view, when I disliked the breaking changes. It doesn't make sense for you to write a Node compatibility layer because if there is any it should be here if it's needed (from bblfsh.past import Node or something like that).

juanjux added some commits Dec 18, 2018

Fixes and improvements from @vmarkovtsev review
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
PEP8 fix
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Show resolved Hide resolved bblfsh/aliases.py Outdated

juanjux added some commits Dec 18, 2018

Changed ModeDict to a Modes enum-like class
Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
Allow to create Clients with an instanced grpc channel as suggested b…
…y Vadim

Signed-off-by: Juanjo Alvarez <juanjo@sourced.tech>
@dennwc

This comment has been minimized.

Copy link
Member Author

dennwc commented Jan 3, 2019

Re "typed get" issue mentioned above, I agree with @vmarkovtsev - I don't see the reason for typed get helpers. Python is a dynamic language, so it should be okay to iterate over nodes directly (that already are strings/ints/whatever). Old client had those helpers because the way libxml was wrapped. It's not necessary right now.

@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Jan 14, 2019

Python is dynamically typed but not weakly typed so it's not considered good practice on Python to have variant return types from the same function other than None. Also, having typed functions make easier to use mypy and avoid doing "if isinstance(...)" on every value, which would also be slower (or risking having buggy code using types without checking them). Plus is basically free to have them on the API.

@dennwc

This comment has been minimized.

Copy link
Member Author

dennwc commented Jan 14, 2019

@juanjux OK, thanks for clarifying it. I trust your decision here.

@bzz

bzz approved these changes Jan 24, 2019

@bzz bzz referenced this pull request Jan 24, 2019

Closed

failed to install on macos #140

@juanjux

This comment has been minimized.

Copy link
Member

juanjux commented Jan 24, 2019

Update: I started to work on a compatiblity layer with v2 before the holidays, but it's stopped to work on the very prioritary C# driver, I'll continue with it ASAP once the driver is finished so we can finally publish this version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment