Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path to image in url #16

Merged
merged 7 commits into from
Aug 20, 2014
Merged

path to image in url #16

merged 7 commits into from
Aug 20, 2014

Conversation

maryokhin
Copy link
Contributor

There is a setting --implicit_base_url that allows the following url format:

http://localhost:8888/?url=image.jpg&w=300&h=300&mode=crop

Which is great, but what I want is the following:

http://localhost:8888/image.jpg?w=300&h=300&mode=crop

Is it possible to add an optional flag to pass the image path in the url? Would you accept a pull request and how would you see it done?

P.S. I understand that I can rewrite the urls internally with something like nginx, but more of interested of why it was chosen to set the image path only as a url param.

@agschwender
Copy link
Owner

I've generally tried to take the approach of just do one thing and that's it to avoid all the complexity that inherently goes along with having a bunch of different ways to call/use an application. And from my perspective, passing options/data to an application naturally lends itself to passing via query parameters.

I do recognize though that callers of pilbox might have very different requirements for how they would need to call it. But just as the application doesn't have caching or storage, which are also needed, I did not build in any url rewriting. I did not do this for a couple of reasons: I think the job can be done in the web server (e.g. nginx or apache) and I doubted I would be able to cover everything that everyone would need.

So I typically recommend, as you mentioned, doing the url rewriting in nginx (or apache, or whatever) to pass the appropriate parameters to pilbox when pilbox's url format is inappropriate for your needs. That said, you don't need to run pilbox directly as an application; you can use it as a library and extend from PilboxApplication to override get_handlers and ImageHandler to override the get method. In which case, you pretty much get what you're asking for: pilbox understanding your particular url format. If you need me to refactor the internals of ImageHandler.get to make it easier to reuse, just let me know and I'd be happy to do it.

Anyway, that's the short of my thinking: do as a little as possible to avoid complexity and promote clarity and extension.

@maryokhin
Copy link
Contributor Author

What I am thinking of, to make the app more extensible (and a bit more decoupled):

  1. Remove get_handlers and create a urls.py with url_patterns (something like tornado-boilerplate project structure), and add a setting urls to compliment the config that you already have, that will allow to point to urls.py. That way the urls can be changed without having to touch PilboxApplication.
  2. Move settings from app.py to settings.py for readability and have config point to it by default.
  3. Split ImageHandler to ImageHandler and BaseImageHandler, where BaseImageHandler will have all the logic, and ImageHandler will extend it and process all the url params for the logic of BaseImageHandler to work. That will make Pilbox work out of the box as it did and allow clients to write their own handler wrappers for url handling logic.
  4. Allow clients to extend BaseImageHandler and point to their urls.py file to get custom url behavior.
  5. Improve docs on how to use Pilbox as a stand-alone app or a library.

@agschwender
Copy link
Owner

I'd have to see how it turned out, but gut reaction would be not super thrilled about introducing two more files and an extra class. Also, and I'm not sure your changes imply this, but I definitely would not want to require that the application run with a config file to work. I think there's probably happy medium where we can avoid the additional assets and still achieve the flexibility you need.

@maryokhin
Copy link
Contributor Author

The application would use it's internal settings.py/urls.pyfile by default, the settings are just to let the client switch them out if needed. It can be just a settings.py and a url_patterns dict in the settings.py to not introduce another config file/flag and since there's an option to set path with config flag anyway, or even the same without a separate settings.py (to me it just seems to make sense to seperate that stuff), it doesn't really matter.
In any case, the url handling has to be separated out of the ImageHandler, it's too hardcoded I think. Plus having clients extend PilboxApplication just to change the urls I think is bad practice, because any changes you make in the future, not even related to urls, to PilboxApplication will be propagated to all the guys already extending it.

@agschwender
Copy link
Owner

I'm just not entirely sure what that would look like. I could very well be misunderstanding what you're describing, but if I'm importing settings.py in app.py, how would you be able to override the url patterns without either forking the project to use your own settings.py or by having url_patterns as a variable defined in the config. If the former, I'd prefer not require that people fork the project in order to customize it and if the latter, not sure why we would need settings.py unless you believe the settings have become sufficiently complex that they warrant their own file? If you do believe that, then I think that should be discussed as a separate issue because it's not necessarily related to the handling of url patterns.

Right now, ImageHandler is performing 3 tasks: parameter validation, image retrieval and image processing. I agree that should definitely be broken up so that any of those can be called as independently. Once that's done though, I don't think it's a huge lift to:

  • Create an application that extends PilboxApplication and overrides get_handlers to use your own url patterns/handlers
  • Extend ImageHandler to perform your own url/parameter validation and call what would be the new image retrieval and image processing methods.

That said, if you want to implement what you're suggesting and send me a pull request, I'm definitely willing to be proven wrong.

@maryokhin
Copy link
Contributor Author

by having url_patterns as a variable defined in the config

yep, this is what I meant, otherwise the app will use the default url_patterns in app.py if it is not defined in the config

I think it is easy to avoid extending PilboxApplication and therefore should be avoided, because:

  • Changes to PilboxApplication would not cause changes to everyone who extends it and would allow for more freedom of modification without having to think if it would break something or not
  • Urls are kind of a configuration step, not a runtime setting, and belong more with the settings
  • Instructions on how to use the app as a library would be more concise: set the urls in the config and implement all the handlers that you defined there by extending ImageHandler; no need to think about extending another thing, and the flow feels more natural to me as a dev from Django land :).

But that is all a matter of taste I guess.

I can start with making a pull request for granulating the get method of ImageHandler, since that is enough to get the flexibility needed anyway.

@agschwender
Copy link
Owner

Yeah, I agree the url_patterns seem more of a configuration, however because they reference handlers which must exist in the code in order for the url_patterns to function, it makes it all seem like code to me. That's what worries me about defining it as a configuration, but I guess we'll see how it turns out.

I'm not too worried about locking PilboxApplication into a particular interface. Aside from adding settings, I don't think I've changed it since the start and I don't really anticipate having to change it in the future beyond adding more settings. It's really ImageHandler that contains the bulk of the pilbox application logic.

ImageHandler though probably needs to be given some thought on its public interface to ensure that the internals can be changed in the future without impacting existing callers. I'm interested in seeing how you decide to break it up.

@maryokhin
Copy link
Contributor Author

I have only 3 days worth experience with Tornado, so tips would be appreciated.
I was thinking of making get() really small, and move all the logic to a method that would accept a dictionary with everything mixed together (url captures and arguments) and would therefore be agnostic to the way these values are obtained.
Why I am thinking like that is because in the docs it says that url captures work by overloading the method. So the default behavior of Pilbox will call the get(self) method, but i.e. a client's (r"/image/([0-9]+)", CustomImageHandler) would call get(self, image_id). Therefore I would say it makes sense for them to gather all the info in their overloaded methods and make a call to a method with a uniform public interface.

Btw, is there a reason for overriding?

def get_argument(self, name, default=None):
        return super(ImageHandler, self).get_argument(name, default)

@agschwender
Copy link
Owner

Yeah, you'd want to make the internals of get as small as possible, where it just deals with translating args into some generalized format and calling the public methods that you want to make available in the ImageHandler. The application already translates most of the options into a generalized format when they're passed to the image library for processing, so you may want to use that as a guide.

Are you asking me why I overrode get_argument? If so, I overrode it because by default get_argument will throw an exception if the argument is not defined in the request and a default is not provided. Basically, it's just a convenience to not have to specify get_argument("param", None) on every call.

@agschwender
Copy link
Owner

Also, when you push your changes it will get sent to travis-ci, where it will get run against pep8, pyflakes and coverage. You can run all those things locally if you are running via the provided vagrant setup. I'd recommend running them yourself as relying on travis-ci to do it will be kind of a slow, tedious check and fix cycle.

@maryokhin
Copy link
Contributor Author

I wanted you to look at a first try and hear your comments on the commit.
Tried to make the least possible amount of changes.
It passes the old tests, but having issues in implementing new tests.

@agschwender
Copy link
Owner

OK, will run it and look at it more closely tonight, but here's some quick comments. First, take a look at this fork. Basically, he's doing as a fork what you'd want to make possible without forking. So with that in mind:

  • I think you want to break up process_image to expose 3 methods which do: validate input, fetch image, render image. In this way, the caller can do validation (or not as in the fork above). Those you would call in the get method.
  • I don't know what the prepare and get_argument modifications buy you, since the class that extends this could easily do the same. In the common instance (where this isn't extended), you now have an extra, unnecessary dict copy call. Though I do think it'd be worthwhile to point that implementation out in the documentation.

Re: testing. Yeah, I don't think I made it super clear on how to write new tests. The case logic is pretty clunky. If there's something you'd want me to add or look at, let me know and I can help with that.

@maryokhin
Copy link
Contributor Author

Yes, I am actually thinking about doing the same thing, as 90% of the features will not be used in our app it seems like an overhead, plus I have concerns with when I expose the API to other clients, a malicious client might go about jamming the cache with random images. That is why I am considering leaving just resizing and having only 10 image resize steps clients can use.

Breaking up even more makes sense. prepare() is convenience to allow the client to use yield process_image() without having to copy the arguments, which would probably need to be done most of the time anyway, but is not important. The get_argument() modification is more important, because we have a dict of key : [list] from the arguments, and by default Tornado fetches the last arg from the list and decodes it (if it is encoded), therefore it was chosen to use Tornado's understood mechanism instead of implementing a crappy version of my own. Another thing is that the common args dict does not have to be passed around as a param to each method, which would clutter them all, and plus kind of leaves things looking as it was.

@agschwender
Copy link
Owner

Yeah, I like that you're using get_argument to get the args that would be available. I just don't think it needs to be in pilbox, I think that's something the caller could easily do. And I think documentation would make it very clear that that's how you're supposed to do it.

I think the overhead of including those additional operations is minor, but exposing operations you don't want is actually an issue. I think the code can be modified rather simply to make it so the extending class can override the OPERATIONS property to be a subset. Or maybe that's even worth making a configuration setting. Or maybe a combination of both. I'll want to think on that one.

I agree with you about a malicious caller of the service, that's why I provided allowed_hosts and signature functionality. But you can further lock this down by having very specific url patterns or rewrite rules that either hardcode dimensions or translates size names to dimensions, e.g.

/i/small/foo.jpg --> ?op=resize&mode=crop&w=40&h=40&url=http%3A%2F%2Fexample.org%2Fimages%2Ffoo.jpg

Don't know if you're doing it, but I'd encourage you to extend from your fork and see if you can do the things you want to do with it. Because that will be the real test to know if you're making the right changes. And when we review it, we can just determine if the changes to pilbox are too specific to your app and go from there.

@maryokhin
Copy link
Contributor Author

I would say allowed_operations is probably worth having as a setting, as it is sometimes as important as allowed_hosts.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.43%) when pulling f07175f on NextHub:master into d59adfe on agschwender:master.

@@ -19,3 +19,6 @@ build
/bin
/include
/lib

# PyCharm
.idea
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like editor specific ignores should be in the developer's global git ignore file.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.43%) when pulling 4d8261f on NextHub:master into d59adfe on agschwender:master.

@agschwender agschwender merged commit 4d8261f into agschwender:master Aug 20, 2014
@agschwender
Copy link
Owner

This is all merged in, thanks for contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants