-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
**Blocking SVCS-699** [SVCS-678] Refactor and optimize handlers/extensions #333
**Blocking SVCS-699** [SVCS-678] Refactor and optimize handlers/extensions #333
Conversation
4a700e2
to
2de5396
Compare
self.output_wb_path = await self.local_cache_provider.validate_path( | ||
'/export/{}'.format(self.output_file_id) | ||
) | ||
self.output_file_path = self.output_wb_path.full_path | ||
self.exporter_metrics.merge({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to know more about what these metrics are trying to cover so I make sure this does'nt break data we've already collected.
mfr/core/extension.py
Outdated
# Execute the extension's export method asynchronously and wait for it | ||
# to finish | ||
loop = asyncio.get_event_loop() | ||
await loop.run_in_executor(None, self.export) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the export object handle its own async issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to renderer, are there any side effects for let renderer/exporter handle asyncio
? Is it possible to move this back to the handler but still keep your code move. I feel more comfortable for the handlers to take care of the asynchronous business all in one place stead of delegating it to the exporter instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would probably be ok.
mfr/core/extension.py
Outdated
return await self.write_to_stream() | ||
|
||
def __del__(self): | ||
self.output_fp.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I need a better solution for - this only works because the exporter goes out of scope when the request finishes. It works and will be reliable, but I'd like something a little more pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned this somewhere above, is this really necessary? Why context manager doesn't work in your case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think context manager is a better solution
mfr/core/extension.py
Outdated
self.exporter_metrics = MetricsRecord('exporter') | ||
if self._get_module_name(): | ||
self.metrics = self.exporter_metrics.new_subrecord(self._get_module_name()) | ||
|
||
async def __call__(self): | ||
|
||
self.source_wb_path = await self.local_cache_provider.validate_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe I'll move this path/file/stream setup stuff to its own method
mfr/core/extension.py
Outdated
@@ -72,6 +138,27 @@ def __init__(self, metadata, file_path, url, assets_url, export_url): | |||
except AttributeError: | |||
pass | |||
|
|||
async def __call__(self): | |||
|
|||
self.renderer_metrics.add('class', self._get_module_name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I need some input on where metrics need to go.
mfr/core/extension.py
Outdated
self.metrics.add('source_file.upload.required', False) | ||
|
||
loop = asyncio.get_event_loop() | ||
rendition = await loop.run_in_executor(None, self.render) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the renderer handle its own business
mfr/core/utils.py
Outdated
@@ -63,6 +63,51 @@ def make_exporter(name, source_file_path, output_file_path, format): | |||
} | |||
) | |||
|
|||
def bind_render(metadata, file_stream, url, assets_url, export_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: need to remove the old make_renderer/make_exporter utils in favor of the ones that treat export and render as functions.
self._cache_provider = None | ||
|
||
@property | ||
def cache_provider(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the cache provider lazy so its only instatiated if used.
mfr/server/handlers/export.py
Outdated
Next, if caching is enabled, try to use a cached version. | ||
|
||
Finally, do the actual conversion and export the converted file. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus the logic in the handler so its clearer what this endpoint does.
mfr/server/handlers/export.py
Outdated
) | ||
|
||
self.output_file_id = '{}.{}'.format(self.source_file_path.name, self.format) | ||
self.output_file_path = await self.local_cache_provider.validate_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to make sure all unnecessary path stiff is removed. Only remote cache and osf provider should remain.
mfr/server/handlers/export.py
Outdated
await self.write_stream(self.cache_provider.download(self.cache_file_path)) | ||
logger.info('Cached file found; Sending downstream [{}]'.format(self.cache_file_path)) | ||
self.metrics.add('cache_file.result', 'hit') | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up the cache try.
test.py
Outdated
raise "ERR" | ||
|
||
def test(): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops need to remove.
mfr/server/handlers/render.py
Outdated
# Spin off upload into non-blocking operation | ||
if renderer.cache_result and settings.CACHE_ENABLED: | ||
# Spin off upload of cached render into non-blocking operation | ||
if render.cache_result and settings.CACHE_ENABLED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't line up with what the exporter does to cache. Should decide on a similar approach and use in both handlers.
mfr/core/extension.py
Outdated
|
||
class BaseExporter(metaclass=abc.ABCMeta): | ||
|
||
def __init__(self, ext, source_file_path, output_file_path, format): | ||
def __init__(self, metadata, input_stream, format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass the stream in to the exporter so the subcalsses' export method can choose to use the stream rather than writing to disk if it is feasable and appropriate.
This probably breaks a bunch of tests. I want to make sure we like the implementation before spending time fixing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR 🎆 and for helping me go through the code. Please refer to the comments for details. Here I'd like to raise/provide two questions/suggestions.
One Big Ticket VS Two Separate Ones
Personally, I really love your change that moves code from the handler to the exporter. However, it is not necessary for the fix. I would recommend having only the the fix with one PR (this ticket) and having the refactor in another PR (another ticket). Here is my arguments:
- It is easy for us to get the quick fix in develop and probably to prod for next release.
- For the code as it stands, the main logic is good. However, it still needs efforts for verification, optimization, rewriting tests, local regression test and documentation. To me, this is more of an improvement than bug fix.
All About Metrics
We probably should have a discussion with the team on how the metrics work and if there is a general rules to follow (I bet there is a documentation somewhere).
mfr/core/extension.py
Outdated
from mfr.server import settings | ||
|
||
|
||
class Cacheable(metaclass=abc.ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a good reason, there is no need for an extra layer. Remove this class and put cache_result
back.
mfr/server/handlers/export.py
Outdated
@@ -37,61 +37,49 @@ class ExportHandler(core.BaseHandler): | |||
cache_file_path_str = '/export/{}.{}'.format(self.cache_file_id, self.exporter_name) | |||
else: | |||
cache_file_path_str = '/export/{}'.format(self.cache_file_id) | |||
self.cache_file_path = await self.cache_provider.validate_path(cache_file_path_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One side effect of this PR is that . cache_file_id
and cache_file_path_str
are calculated both in the handler and the exporter. I think we should move this to the get()
when cache is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be in the exporter - the exporter doesn't do caching it should only be in the handler.
mfr/server/handlers/export.py
Outdated
|
||
async def get(self): | ||
"""Export a file to the format specified via the associated extension library""" | ||
"""Export a file to the format specified via the associated extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line length is 100 characters for MFR. Please confirm and update the DocStr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm this seems odd.
Finally, do the actual conversion and export the converted file. | ||
""" | ||
|
||
self._set_headers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 ._set_headers()
should be called in the first place and in one place.
mfr/server/handlers/export.py
Outdated
# File is already in the requested format | ||
if self.metadata.ext == ".{}".format(self.format): | ||
await self.write_stream(self.provider.download()) | ||
logger.info('Exported with no conversion.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.debug()
is preferred with more detail (e.g. what's the extension and what's the URL)
mfr/core/utils.py
Outdated
} | ||
) | ||
|
||
def bind_convert(metadata, file_stream, format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bind_exporter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Exporter constructor is basically binding the arguments to a function. kind of like functools.partial?:
>>> def hello(msg):
... print(msg)
...
>>> bound_fn = bind(hello, "Hello")
>>> bound_fn()
Hello
The export objects is a function bound to the context passes to the class constructor
mfr/core/extension.py
Outdated
@abc.abstractmethod | ||
def export(self): | ||
pass | ||
|
||
async def write_to_stream(self): | ||
self.output_fp = open(self.output_file_path, 'rb') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using the context manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to keep the fp alive even after the fn returns. the context manager would need to be inside the handler, and this class will need to define __enter__
and __leave__
with render() as rendered:
self.write_stream(rendered)
mfr/server/handlers/render.py
Outdated
# Spin off upload into non-blocking operation | ||
if renderer.cache_result and settings.CACHE_ENABLED: | ||
# Spin off upload of cached render into non-blocking operation | ||
if render.cache_result and settings.CACHE_ENABLED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using the name renderer? I mean renderer = utils.bind_render()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the flow is: render -> future cache -> write. Now it is render -> write -> future cache? But the rendition
is not longer available here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
render - the action of rendering, renderer - a thing that renders, render is a callable that performs the action of rendering. Can't use renderer.render
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this order matters. I placed the ensure_future after to help visualize that this is something that temporally happens after writing the response. I do need a tweak here to store the stream in a local variable for the response and the cache.
mfr/core/extension.py
Outdated
# Execute the extension's export method asynchronously and wait for it | ||
# to finish | ||
loop = asyncio.get_event_loop() | ||
await loop.run_in_executor(None, self.export) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to renderer, are there any side effects for let renderer/exporter handle asyncio
? Is it possible to move this back to the handler but still keep your code move. I feel more comfortable for the handlers to take care of the asynchronous business all in one place stead of delegating it to the exporter instance.
mfr/core/extension.py
Outdated
return await self.write_to_stream() | ||
|
||
def __del__(self): | ||
self.output_fp.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned this somewhere above, is this really necessary? Why context manager doesn't work in your case?
- Move responsibility for setting up local filesystem provider into the exporter - Modifies exporters to take a file stream as an input - Make exporter instances a callable that returns a converted file stream - Makes remote cache provider lazily instantiated - Modify handlers to only prepare whats explicitly neccesary to hanle the request.
f56eabe
to
cb52cd2
Compare
Changes the jsc3d exporter to run async without needing to create another thread
- Plugin name determinations combined - Cache path construction moved to core handler
The unoconv renderer is obsolete because handling of the conversion if the file cannot be rendererd using the given rederer occurs in the render handler. This allows any filetype to be rendered by any renderer, provided there is a converter that is capable of converting the filetype to a filtype that the given renderer can render.
- Two utile, make_renderer and make_exporter are replaced by new utils to construct their respective plugins - Make sure the get_plugin_name uses the group to resolve the correct name for the plugin.
mfr/core/extension.py
Outdated
self._local_cache_provider = waterbutler.core.utils.make_provider( | ||
'filesystem', {}, {}, settings.LOCAL_CACHE_PROVIDER_SETTINGS | ||
) | ||
return self._local_cache_provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be try/except
mfr/core/extension.py
Outdated
self._source_file_path = await self.local_cache_provider.validate_path( | ||
'/render/{}'.format(self.source_file_id) | ||
) | ||
return self._source_file_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be try/except
mfr/core/extension.py
Outdated
await self.local_cache_provider.upload( | ||
await self.file_stream, | ||
await self.source_file_path | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be irrelevant
try: | ||
remove(self._source_file_path.full_path) | ||
except FileNotFoundError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be irrelevant
:rtype: :class:`mfr.core.extension.BaseExporter` | ||
""" | ||
normalized_name = (name and name.lower()) or 'none' | ||
def bind_render(metadata, file_stream, url, assets_url, export_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs docstring
# 'ppt': {'renderer': '.pdf', 'format': 'pdf'}, | ||
# 'pptx': {'renderer': '.pdf', 'format': 'pdf'}, | ||
} | ||
|
||
|
||
class RenderHandler(core.BaseHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring
return self._source_stream | ||
|
||
@property | ||
def export_url(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring
map = RENDER_MAP[self.metadata.ext] | ||
except KeyError: | ||
map = DEFAULT_RENDER | ||
self._source_stream = utils.bind_convert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a lambda(?) and handle the case in the property so it's not called unless needed
os.remove(self.source_file_path.full_path) | ||
except FileNotFoundError: | ||
pass | ||
def cache_result(self, rendition): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring
setup.py
Outdated
'.fodp = mfr.extensions.pdf:PdfRenderer', | ||
'.fods = mfr.extensions.pdf:PdfRenderer', | ||
'.fodt = mfr.extensions.pdf:PdfRenderer', | ||
# '.gif = mfr.extensions.pdf:UnoconvRenderer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix all of them
What
Improve architecture of handlers and plugins.
@property
, so we don't waste cpu time calculating values we won't use. (This should make cache hits faster)#Why
A no-op is a simple operation to support, why not do it?
Acceptance criteria
A PR to support
QA notes
Devs will test this. Dev, please add a list of affected and unaffected file types.
Build an MFR export url requesting to export a file of type foo as type foo. This should start a download of the raw file without modification. Exception: image files should respect scaling parameters.