Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce basic approach to concurrent log writing. #45

Merged
merged 1 commit into from Oct 4, 2018
Merged

Introduce basic approach to concurrent log writing. #45

merged 1 commit into from Oct 4, 2018

Conversation

qnikst
Copy link
Contributor

@qnikst qnikst commented Oct 1, 2018

Introduce two approaches for concurrent log writing
and basic utilities for working with concurrent logs.
Two approaches are needed in order to allow user to
chose between generality and speed, though approaches
were not benchmarked to check the hypothesys.

@qnikst
Copy link
Contributor Author

qnikst commented Oct 1, 2018

I'm not sure about the best approach here, so I came with two, both seem to make sense and can be further extended. It's really hard to get into the trap of overgeneralization here :/

@chshersh chshersh added enhancement New feature or request log-action Everything related to LogAction data type labels Oct 2, 2018
@chshersh chshersh added this to In progress in #2: Hacktoberfest (October, 2018) via automation Oct 2, 2018
Copy link
Contributor

@chshersh chshersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, @qnikst! This looks sooo great! I highly appreciate your effort on this issue! I left some comments, but mostly they are questions. I'm totally okay with having different strategies for concurrency. The co-log library is very young, so it's okay to have different ways for doing things and figure out the best one later.

My another concern is about function like fileLogAction:

https://github.com/kowainik/co-log/blob/695a27fe0111918d2e810c9170a2388015595fd9/co-log/src/Colog/Actions.hs#L70-L72

I'm not sure it will work properly if multiple threads try to write concurrently to the same file using the same Handler. Though, I assume we could add one bracket-style combinator on top of the forkBackgroundLogger:

withBackgroundLogger :: Natural -> LogAction IO msg -> (LogAction IO msg -> IO r) -> IO r

@@ -0,0 +1,129 @@
-- |
-- Sometimes an application may need to dump logs asynchonously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for this module is super cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

-- @
--
-- This code will run a background thread and process different
-- loggers in that thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some note like This can be useful if logging a single message requires some computation? Or general guideline when people might want to use concurrent logging. I personally aware only of the cases when people might not want to use concurrent logging due to some subtle concurrency issues. But it's still very useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will add a note and more comments about use-cases and warnings.

-- $worker-thread
-- Worker thread approach allows to create a single worker
-- thread that will perform all logging. This approach allows
-- user to use different loggers and change them on the flight.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and change them on the flight

That's a really cool feature — to be able to replace logging function with different one. Am I right that it's possible to do something like this: implement simple program that prints messages every second in some format A and listens input and after user gives some input to this program it instantly starts to print messages in format B?

Also, it's not that clear from existing documentation and code, how to replace current logging function with different one on the flight...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see a problem, will discuss.
But I think I'll discuss the direct passing of the log action, and leave Colog.Monad approach aside.

-- @
-- bracket ('mkBackgroundThread' 4096)
-- ('killBackgroundThread')
-- $ \thread ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like thread variable in the documentation should be named as worker according to the code below

Copy link
Contributor Author

@qnikst qnikst Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+

-- write logs will be blocked.
mkBackgroundThread :: Int -> IO (BackgroundWorker (IO ()))
mkBackgroundThread capacity = do
queue <- newTBQueueIO (fromIntegral capacity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that newTBQueueIO function takes Natural in stm-2.5.0.0. Maybe it's a good idea to have stm ^>= 2.5.0.0 in build-depends in that case and pass Natural to the mkBackgroundThread function as well? Also, probably it's not enough to have stm >= 2.4 because stm-2.4.5.1 contains very important bugfix for TBQueue:

I even expect from GHC to give you warning about fromIntegral usage after you change type from Int to Natural because we have -Widentities warning enabled for co-log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you. As far as I understand stm-2.5.0.0 will cut many older GHC's, in the case if you want to support 3 major versions then it's not an option. And I find it's easer to keep the same interface for all versions then doing CPP magic to provide the best option for each version.

-- | Create a background worker with a given capacity.
-- If capacity is reached, then the thread that tries to
-- write logs will be blocked.
mkBackgroundThread :: Int -> IO (BackgroundWorker (IO ()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input message as IO () looks interesting 🤔 I wonder how useful such simple LogAction could be:

idLogAction :: LogAction m (m ())
idLogAction = LogAction id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very good at CT to find what that object will be and derive a use-case for that based on that knowledge. Right now I don't see how it can be used :/

-- Logger is executed in the other thread entirely, so if
-- logger takes any thread related context it will be
-- read from the other thread.
runInBackgroundThread :: BackgroundWorker (IO ()) -> LogAction IO msg -> LogAction IO msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to have the result type of runInBackgroundThread function to be MonadIO m => LogAction m msg but I wonder how useful it can be. Whole current API is built on top of the plain IO and the only sensible generalisation that I know involves MonadUnliftIO but I'm not sure it's a good idea to bring MonadUnliftIO now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be useful, because that way you may provide useful capabilities there. But I'm uneasy with that change, currently, there are a number of different approaches that allow such generalizations:

  1. MonadUnliftIO and unliftio ecosystem in general
  2. monad-control that is still widely used
  3. MTL-heavy libraries that are using MTL as final tagless system
  4. a family of the effects systems.
  5. direct handle approach

Making a preference to the one system other the others may be a bad thing, as it increases dependencies set and makes people used to other systems uneasy when using your library.

And IO on it's own is very powerful and composable type, in general with any of the approaches mentioned above you can write LogAction IO msg and "unlift" your system to those. So basically you need unlift :: SomeConstraints m => LogAction m msg -> LogAction IO msg that will be called in the parent thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can just add liftLogIO :: MonadIO m => LogAction IO msg -> LogAction m msg to co-log-core to make life of users easier. So they have some way accessible without extra dependencies introduced.

-- takes a 'LogAction' that should describe how to write
-- logs.
--
-- @capability@ - parameter tells how many in flight messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable called capacity in multiple places, so probably it's a good idea to name it capacity here as well instead of capability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

-- @capability@ - parameter tells how many in flight messages
-- are allowed, if that value is reached then user's thread
-- that emits logs will be blocked until any message will be
-- written.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think alternative strategy is to drop all log messages after reaching the limit. But I agree with making current behavior the default one. Probably in production it's more important to lose some performance and hang for a while rather than losing important logs about what is actually going on within the running system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll add some docs and expose more internal so users will be able to define such strategies on their own. I don't like dropping messages, it should be some very special use-case. If you have too many logs so you can't dump them, it means that your environment has some severe problems that should be solved anyway.

queue <- newTBQueueIO (fromIntegral capacity)
tid <- forkIO $ forever $ do
msg <- atomically $ readTBQueue queue
unLogAction logAction msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, just general question. Would you appreciate some fancy operator for unwrapping LogAction newtype? So instead of

unLogAction logAction msg

you could write something like:

logAction <| msg

Or probably have shorter alias for unLogAction to partially apply logAction in more convenient way?

toLogFun logAction :: msg -> m ()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may be useful, but in internals, I like to use the full name. So it was not difficult for me to write unLogAction

@chshersh chshersh added the package:co-log For: co-log label Oct 2, 2018
@qnikst
Copy link
Contributor Author

qnikst commented Oct 2, 2018

With any of those approaches withLogByteStringFile is fine as long as you reuse the same worker because logging is happening in the single thread. But, it's possible to craft bad examples, for example, if you use withLogByteStringFile directly and using background thread or two different workers, then you may have a problem. I think I'll have to add more documentation and rationale about that.

Just a bit of discussion while I'm trying to prepare myself for addressing the comments :). In general, we have two types of concurrency possible, I'll call them depth and width concurrency.

  1. depth concurrency, is a way to provide a concurrency between the application thread and logger thread. With this kind of concurrency, the application thread is doing a bit of job (usually CPU bound) and then send prepared messages to the logger, then logger is doing a heavier job of serializing a message and sending that to the sink (usually I/O bound). With this kind of concurrency we increase a throughput of the application and balance our bounds, this if very relevant if you log your data to some external application over the network. This is what my PR was about. There are some tradeoffs on that path:
    a. memory usage (if the worker is very slow and the communication channel is unbound then we can get too many messages on the channel). (This is solved in the PR by providing configurable capacity)
    b. persistence - some actions may care that the logs they emit are actually persisted before they continue. With the concurrent approach, you may lose all the messages that are kept on the channel. This is not solved though can be solved by the user as we provide all required internals. (I'd tell that this use-case is bad and the user should use other techniques in that case, but this use-case exists).
  2. width concurrency - in case if you mappend many actions you may want to run them concurrently, something that parallel strategies may shine at if logActions were pure. This is completely unsolved in my PR and left to the user.

@qnikst
Copy link
Contributor Author

qnikst commented Oct 2, 2018

@chshersh, please take another look, I've reworked my branch, fixed few concurrency issues and improved documentation.

Copy link
Contributor

@chshersh chshersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example documentation is super cool! I really liked the ASCII schema ❤️ I have some comments to the Haddock, but mostly looks great 👍


-- | A wrapper type that carries capacity. The internal
-- type may be differrent for the different GHC versions.
#if MIN_VERSION_base_noprelude(4,12,0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here you should specify MIN_VERSION_stm_(2,5,0). Currently Travis CI is failing for cabal-install and GHC 8.4.3.

-- affected or not.
--
-- 1. Unbounded memory usage - if there is no backpressure mechanism
-- the the user threads, they may generate more logs that can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double the. Probably a typo

--
-- 1. Unbounded memory usage - if there is no backpressure mechanism
-- the the user threads, they may generate more logs that can be
-- written in the same about of time. In those cases messages will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here you wanted to say in the same amount of time?

-- $ 'Colog.Monad.usingLoggerT' $ __do__
-- 'Colog.Monad.logMsg' "Starting application..."
-- where
-- logger = 'Colog.Action.withLogByteStringFile' "/var/log/myapp/log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this line wasn't rendered properly with Haddock (because of weird Haddock markup syntax). Probably replacing every / with \/ should help.

screen shot 2018-10-03 at 3 43 58 pm

-- main =
-- 'withBackgroundLogger'
-- 'defCapacity'
-- (Aeson.encode `'cmap'` logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow cmap doesn't render properly with Haddock 🤔

screen shot 2018-10-03 at 3 53 43 pm

I found this blog post with common suggestions for Haddock extremely useful!

But there's no such case for this situation...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\``Core.Action.cmap`\`

seems to work

(killBackgroundLogger)
(action . convertToLogAction)

-- | Default capacity size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say in the documentation that it's equal to 4096? Because on Hackage you can't see the source code without clicking Source link. This might be useful piece of information. We only need to not forget to update it after the value changes.

-- $background-logger
--
-- Background logger is specialized version of the 'BackgroundWorker' process.
-- Instead of running any jobu it will accept @msg@ type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobu probably a typo

Introduce two approaches for concurrent log writing
and basic utilities for working with concurrent logs.
Two approaches are needed in order to allow user to
chose between generality and speed, though approaches
were not benchmarked to check the hypothesys.
@qnikst
Copy link
Contributor Author

qnikst commented Oct 3, 2018

I hope I've fixed CI and typos.

Copy link
Contributor

@chshersh chshersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super cool!

#2: Hacktoberfest (October, 2018) automation moved this from In progress to Reviewer approved Oct 4, 2018
@chshersh chshersh added the Hacktoberfest https://hacktoberfest.digitalocean.com/ label Oct 4, 2018
@chshersh chshersh merged commit f19aa28 into co-log:master Oct 4, 2018
#2: Hacktoberfest (October, 2018) automation moved this from Reviewer approved to Done: PR's Oct 4, 2018
@chshersh chshersh removed this from Done: PR's in #2: Hacktoberfest (October, 2018) Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Hacktoberfest https://hacktoberfest.digitalocean.com/ log-action Everything related to LogAction data type package:co-log For: co-log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants