Skip to content
This repository has been archived by the owner on Dec 10, 2018. It is now read-only.

thriftpy requires a file path to the '.thrift' file, rather than supporting a file-like object or string #153

Closed
amontalenti opened this issue Aug 23, 2015 · 6 comments

Comments

@amontalenti
Copy link

If you look at the core code in thriftpy's parser and entrypoint function, thriftpy.load (which calls thriftpy.parser.parse), you'll notice that it requires a string file path (expected to be a ".thrift" file) rather than supporting a file-like object.

Although this is OK for convenience, this is bad for a number of deployment scenarios.

a) if a thriftpy-dependent codebase is deployed as part of a Python egg, then it is likely the '.thrift' file will be part of the code egg; in this case, it should be accessed via the pkg_resources API, and in this case, the ideal functions to use are resource_string or resource_stream. As the pkg_resources docs say, "Instead of manipulating __file__, you simply pass a module name or package name to resource_string, resource_stream, or resource_filename, along with the name of the resource. Normally, you should try to use resource_string or resource_stream, unless you are interfacing with code you don't control (especially C code) that absolutely must have a filename. The reason is that if you ask for a filename, and your package is packed into a zipfile, then the resource must be extracted to a temporary directory, which is a more costly operation than just returning a string or file-like object."

b) if a thriftpy-dependent codebase is deployed as part of a Python wheel, then it is likely that the '.thrift' file will be part of the wheel. In this case, the pkgutil.get_data API is the right one to use. Again, we run into a problem with requiring file paths. As the docs state, "The function returns a binary string that is the contents of the specified resource." No file paths available. Note that the Wheel PEP also indicates that Wheels may optionally have a .data directory for storing files, but it isn't clear how to access them via an API.

Thankfully, the code that needs to change here is relatively straightforward. The problematic code is in the parser here, where open() is called directly.

https://github.com/eleme/thriftpy/blob/develop/thriftpy/parser/parser.py#L436-L451

If we let thriftpy.load() accept a file-like object by inspecting for .read() via hasattr, and, in that case, allowing us to simply call it instead of trying to resolve a '.thrift' file on-disk, we will satisfy the requirements. If you agree with this change, I'll go ahead and draft a PR for it. cc @lxyu

@hit9
Copy link
Contributor

hit9 commented Aug 24, 2015

One reason why I use a string but not a file object is the include statement: include "child-thrift".

Another one is to keep the same with apache thrift usage.

@lxyu
Copy link
Contributor

lxyu commented Aug 24, 2015

I think we may support both, it would be nice to be able to load file-like object, even though apache thrift don't have it.

As @hit9 mentioned, the main concern is the include keyword in the thrift file, which needs a path. May be we can give a warning in this case?

@hit9
Copy link
Contributor

hit9 commented Aug 24, 2015

Hi @amontalenti , we are ready to add support for file-like object to parser, only for non-include-keyword thrifts.

@amontalenti
Copy link
Author

@lxyu @hit9 Cool, that makes sense. I see your point about the include keyword, but I do think many .thrift files don't make use of it.

Do you think I should simply raise an exception if I detect an include in the .thrift file when loading using the file-like object method? If so, can you give me a pointer as to the part of the code to hook in this validation? If you're short on time, don't worry, I can probably figure it out.

@hit9
Copy link
Contributor

hit9 commented Aug 24, 2015

@amontalenti

Yeah, just raise an exception.

I suggest to create a new function, for instance, named parse_fp (similar to parse, but no file reading, nor path handling, nor __thrift_file__ attribute etc.. ). You can detect if we are loading using the object or string path by hasattr(thrift_stack[-1], '__thrift_file__'), the location for include handling is at function p_include.

@amontalenti
Copy link
Author

Looks like pull request #154 is addressing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants