Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding write-time aggregation functions #35

Closed
JohnAD opened this issue Jan 22, 2018 · 10 comments
Closed

Adding write-time aggregation functions #35

JohnAD opened this issue Jan 22, 2018 · 10 comments

Comments

@JohnAD
Copy link
Contributor

JohnAD commented Jan 22, 2018

Typically, in a web site, the usage data is written lightly and as fast as possible to a log. Then, a secondary, process (and possibly a secondary server) does statistical analysis on that data. This totally make sense for storage in a traditional database such as SQL and certainly for an apache-style text log.

Philosophically, however, MongoDB (and related NoSQL) databases can take a different approach. They are designed for scalablity using a variety of techniques: including an emphasis of read-optimization at the expense of write-optimization. The lack of normalization, for example, makes it expensive to update (write) certain data because such updates might have to occur across many documents. But in exchange for that, a read of any one document in a collection need never reference another document because all the important information is already gathered.

Sorry be so windy, but I'm wanting to justify my crazy idea. :) And that is this:

Rather than just write a single document to a collection on each page response, also allow aggregate updates on other documents at the same time.

For example, it is common in web log analysis to record the number of visits to each page over certain periods of time. Say hourly, daily, monthly. So, when flask-track-usage records a response, it could also upsert the url/datetime/period documents corresponding to it with incremented totals. In this example, it would update 3 additional documents.

This would be implemented as an option of course. There would be scenarios where such aggregate work would be a bad idea or pointless.

One possiblity is to have it done as a post-storage function call. For example:

def myCrazySummationUtility(data):
    # here is where I do all my extra stuff with the dictionary
    # contained in 'data'
    ....

t = TrackUsage(app, PrintStorage(post=myCrazySummation))

Thoughts?

@JohnAD
Copy link
Contributor Author

JohnAD commented Jan 22, 2018

If it were done simply has a simple TrackUsage parameter, then other people could write different 'post' libraries. I could write a flask-track-usage-summary library that does the math I'm describing. Making it a separate project would prevent making this project overly specific. Something like:

from flask.ext.track_usage import TrackUsage
from flask.ext.track_usage.storage.printer import PrintStorage

from flask.ext.track_usage_summary import summarizer

t = TrackUsage(app, PrintStorage(post=summarizer))

But I don't mind placing such code here either. Your call.

@ashcrow
Copy link
Owner

ashcrow commented Jan 23, 2018

I like the idea and I think having summarizer as part of the main codebase makes sense. If we make it as part of the TrackUsage and bump release to 2.x we could:

t = TrackUsage(app, [PrintStorage(hooks=[summarizer]), AnotherStore()])

We could then support multiple storages for different things and allow them to have their own hooks. The hooks would use their parent storages methods to do their work. It wouldn't be a hard change to make but I feel strongly about not keeping that in the 1.x release 😄. WDYT?

@ashcrow
Copy link
Owner

ashcrow commented Jan 23, 2018

From #34:

I also have a second expansion idea, but I'm not sure it is best placed into this library. I'll post a second issue to discuss that.

The only time I'd say having summarizer code outside of the this codebase would be if it also would be loaded and used in other flask plugins.

@ashcrow
Copy link
Owner

ashcrow commented Jan 23, 2018

Multiple storage merged in https://github.com/ashcrow/flask-track-usage/tree/2.0.dev0. Feel free to do work on top of this and I'll merge it in there.

@JohnAD
Copy link
Contributor Author

JohnAD commented Jan 23, 2018

Sounds like a great plan!

@JohnAD
Copy link
Contributor Author

JohnAD commented Feb 2, 2018

Prior to writing any code, I've started writing a rough draft of the docs. I do that sometimes to setup an overall feel of how it could work. Here is the first draft (on my fork):

https://github.com/JohnAD/flask-track-usage/blob/2.0.dev0/docs/hooks.rst

Sound like a good overall direction?

I'm thinking that when "outside" hooks are written by an end user, a simple set of standard **kwargs are based to the function. But when "internal hooks" are reference (as documented on that hooks.rst page) a reference to the parent storage class is passed to the function so that the function can call the corresponding method in the storage class itself.

@ashcrow
Copy link
Owner

ashcrow commented Feb 2, 2018

👍 to the direction.

@JohnAD
Copy link
Contributor Author

JohnAD commented Feb 12, 2018

I'm about 30% done.

It will not be possible, at least at first, to have summaries supported on all storage classes. I'm designing it to gracefully handle that. But, for first release of 2.0; the MongoEngineStorage class will definitely work (due to my own self-interest.) Which other storage class would you like to see fully supporting all seven summaries?

@ashcrow
Copy link
Owner

ashcrow commented Feb 12, 2018

The most used storage classes seem to be MongoStorage and SQLStorage. If I had to pick one I'd say SQLStorage as that still lets folks who want to use mongodb the ability to switch to MongoEngineStorage from MongoStorage.

ashcrow pushed a commit that referenced this issue Apr 18, 2018
Adds time write time aggregate functions and hooks.

Closes #35
ashcrow pushed a commit that referenced this issue Apr 18, 2018
Adds time write time aggregate functions and hooks.

Closes #35
@ashcrow
Copy link
Owner

ashcrow commented Apr 18, 2018

Merged this work into the 2.0.0 branch!

@ashcrow ashcrow closed this as completed Apr 18, 2018
ashcrow pushed a commit that referenced this issue May 21, 2018
Adds time write time aggregate functions and hooks.

Closes #35
ashcrow pushed a commit that referenced this issue May 30, 2018
Adds time write time aggregate functions and hooks.

Closes #35
ashcrow pushed a commit that referenced this issue May 30, 2018
Adds time write time aggregate functions and hooks.

Closes #35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants