Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line delimited JSON for logs #927

Open
bartvm opened this issue Nov 29, 2015 · 31 comments
Open

Line delimited JSON for logs #927

bartvm opened this issue Nov 29, 2015 · 31 comments

Comments

@bartvm
Copy link
Member

bartvm commented Nov 29, 2015

So I've been frustrated with the current state of logging in Blocks for a while. Pickled Python dictionaries have problematic memory requirements and aren't ideal for analysis given all the issues that come with pickling. The SQLite backend is not very user-friendly and doesn't do well with non-tabular logs, and is not straightforward to analyze either if you don't know SQL.

I've been trying to find an alternative that addresses these issues and meets the following requirements:

  • Command-line friendly, i.e. I want to be able to do things like rm !("$(paste <(ls *.log | get_validation_error) <(ls *.model) | sort -g | cut -f 2)") in order to delete all but the best model.
  • Ability to easily stream logs locally and over a network to e.g. do live plotting
  • Possibility to stream logs to disk (preferably relatively space-efficient)
  • Possibility to not keep the log in memory, so that we don't run out of memory for long-running experiments
  • Non-tabular e.g. some values that are logged on irregular intervals, or nested arrays/dictionaries
  • Minimal assumptions about e.g. the existence of batches, epochs, etc. and hence maximal flexibility
  • Widely supported formats and standards

Given these requirements, I want to propose storing logs in newline delimited JSON from now on, and to write a small toolbox to deal with this format. The advantages I imagine are as follows:

  • JSON delimited files are very easy to process using some great tools like jq, enabling things like jq -s 'max_by(.validation_error).training_error < log, and everything is streaming (yay)
  • We could stream the log to disk using GZIP, which should reduce space requirements significantly regardless of the repeated keys.
  • Using e.g. ZeroMQ we can stream the JSON entries to clients easily for plotting (depending on whether memory-usage is a problem we can use a memory-less pub-sub pattern or something with a persistent state. Either way, sending JSON objects is supported by pretty much every messaging framework.

My proposal would be to write a mini-framework using this logging approach that is separate from Blocks, and then write Blocks extensions that use it. Basic functionality would be:

  • A logging object that defines a log method (which just takes a Python dict) with several handlers:
    • Append the log entry as a JSON object to a raw file
    • Append the log entry as a JSON object to a Gzipped file
    • Format the JSON object and write to stdout or stderr
    • Publish the JSON object over a TCP socket (non-persistent)
    • Publish the JSON object over a TCP socket (persistent)
  • The ability to pass custom serializers (in case there are custom Python objects, like NumPy arrays, in the log)
  • Lots of documentation and tutorials on how to parse these logs using e.g. gunzip and jq
  • Helper functions to subscribe to the TCP streams and live-plot data using Bokeh/matplotlib/IPython Notebooks

I'd love to know if people think this would meet all their logging needs. Logging is something that everyone ends up having to do, regardless of the framework they use, and I think it would be nice to have a clean and simple solution for it that is widely supported and used in the lab.

The main shortcoming of the format I can think of is that it doesn't allow to store any metadata about the model e.g. the learning rate, momentum velocity, etc. This makes it harder to do things like "plot minimum validation error vs. learning rate". Perhaps we could store this information in a separate JSON file? Or maybe it shouldn't be part of this mini-framework at all?

@rizar
Copy link
Contributor

rizar commented Dec 5, 2015

Sorry for the late reply!

Before giving any meaningful comments, I would like to ask the following question: do you plan to keep random access possible for this JSON-based log? I do not immediately see how this can be done, whereas for the in-memory and SQLite backends that we currently support this is trivially supported.

@bartvm
Copy link
Member Author

bartvm commented Dec 5, 2015

Although I put this as a Blocks issue, because I can't think of a better place, this isn't really a drop-in replacement for either of the logging backends, and the proposal is very much supposed to be framework-agnostic. Blocks' use for logging is quite specific (since extensions use it to communicate), whereas here I'm aiming more for a minimalist "standard" that harmonizes the hacky ways people currently save and analyze logs in. Needing to unpickle a Python object and then loading that into a Pandas dataframe just to be able to do rudimentary analysis is overkill. Some people instead resort to grepping through the printed log statements, which is equally unsatisfactory. Having a standard logging format that is used throughout the lab would allow us to share simple data analysis and plotting scripts, live plotting servers, etc. regardless of the framework people use.

So, the question here is more about what format to store data in to disk and send it over the network so that it can easily be consumed and analyzed. Random access isn't really my concern then. To make this work in the context of Blocks, you'd probably have to combine an in-memory or SQLite backend with an extension that adheres to this specification of saving the logs to disk/sending them over the network.

I understand that it isn't entirely satisfactory form Blocks' perspective, because we now have a backend which is separate from the serialization, but I personally can't think of a solution that can easily support random access, streaming to disk, pushing over the network, while also producing files that are universally easy to analyze and interpret.

@rizar
Copy link
Contributor

rizar commented Dec 5, 2015

Okay, I guess we could support access only to a few last iterations in the respective Blocks wrapper.

More questions:

  • how do you plan to store the iteration number? As a JSON attribute?
  • I think it is pretty important to have something similar to the current status attribute to store the configuration of the experiment, like e.g. the learning rate in your example. Working with two files is always quite a bit less handy than with two. So I would rather have this data stored in the same file.

@bartvm
Copy link
Member Author

bartvm commented Dec 5, 2015

Yes, iteration number would just be a key in the JSON log. That makes the format entirely agnostic to things like iterations or epochs. Tools built on top of this format would just use the default convention that one entry is one datapoint, with the option of using any key in the log as a time measure instead.

Although I understand the desire to want to store the state, I would not want to store it in the same file.

  • It means that parsing suddenly becomes more complicated e.g. if we store the status as a JSON object, and it contains some of the same keys as in the log entries, parsing the file with jq or something would return strange things. All tools and scripts would have to be aware of this custom "header" in the file.
  • The entire UNIX philosophy goes out the window, because commands like tail, cat, grep etc. would result in non-conforming log files (the status part might be missing)
  • Logs can be assumed to be immutable, status usuallyisn't

I don't see a problem with having two files e.g. model.log and model.config. Then you can still do things like

for model in $(ls *.config)
do
  learning_rate=$(jq .learning_rate < "$model")
  loss=$(jq -s 'min_by(.loss).loss' < "${filename%.*}.log")
  echo "Learning rate $learning_rate had loss $loss"
done

And do all the processing you want that way.

@rizar
Copy link
Contributor

rizar commented Dec 6, 2015

All right, let's assume for now that you convinced me that the status
should go into a separate file.

My next question is whether it is creation of a new standalone repository
for this functionality is justified. Yes, UNIX philosophy prescribes that a
tool should do one thing and do it well. But there are also pragmatic
considerations. The full software stack of MILA for now includes Blocks,
Theano, Fuel, and in practice also often Blocks-extras. Now you also want
to add Platoon and a logging framework to it. Considering the fact that it
is the cutting edge version that is always used for research, this means
that one will have to occasionally do git pull for 6 (!) repositories to
keep his dependencies up-to-date. And then there is a change in Platoon
which makes it incompatible with the cutting edge Blocks, and the user has
to revert to a known compatible version, and so on and so forth.

That being said, my opinion is that creating a new repository which
provides tools on top of Theano should be well justified. The scale of the
project might be the reason. For instance Fuel is indeed quite a big
project, with potentially huge codebase since it has to contain code for a
lot of datasets. But even considering that, I am really not sure that we
won by moving it out of Blocks. It costs us so much to maintain two
documentations, two testing configurations, review two PRs simultaneously,
etc.. And in the end of the day PyLearn2 never switched to it...

So I think it would be very practical to develop these new mini-frameworks
in Blocks repository. We should keep an eye on the dependency graph of
course.

On 5 December 2015 at 16:48, Bart van Merriënboer notifications@github.com
wrote:

Yes, iteration number would just be a key in the JSON log. That makes the
format entirely agnostic to things like iterations or epochs. Tools built
on top of this format would just use the default convention that one entry
is one datapoint, with the option of using any key in the log as a time
measure instead.

Although I understand the desire to want to store the state, I would not
want to store it in the same file.

  • It means that parsing suddenly becomes more complicated e.g. if we
    store the status as a JSON object, and it contains some of the same
    keys as in the log entries, parsing the file with jq or something
    would return strange things. All tools and scripts would have to be aware
    of this custom "header" in the file.
  • The entire UNIX philosophy goes out the window, because commands
    like tail, cat, grep etc. would result in non-conforming log files
    (the status part might be missing)
  • Logs can be assumed to be immutable, status usuallyisn't

I don't see a problem with having two files e.g. model.log and
model.config. Then you can still do things like

for model in $(ls .config)do
learning_rate=$(jq .learning_rate < "$model")
loss=$(jq -s 'min_by(.loss).loss < "${filename%.
}.log") echo "Learning rate $learning_rate had loss $loss"done

And do all the processing you want that way.


Reply to this email directly or view it on GitHub
#927 (comment).

@dwf
Copy link
Contributor

dwf commented Dec 7, 2015

I don't think pylearn2 not switching to Fuel should be taken as evidence of
the spinoff's failure. I've definitely seen people using Fuel without
Blocks.

On Sun, Dec 6, 2015 at 12:39 AM, Dzmitry Bahdanau notifications@github.com
wrote:

All right, let's assume for now that you convinced me that the status
should go into a separate file.

My next question is whether it is creation of a new standalone repository
for this functionality is justified. Yes, UNIX philosophy prescribes that a
tool should do one thing and do it well. But there are also pragmatic
considerations. The full software stack of MILA for now includes Blocks,
Theano, Fuel, and in practice also often Blocks-extras. Now you also want
to add Platoon and a logging framework to it. Considering the fact that it
is the cutting edge version that is always used for research, this means
that one will have to occasionally do git pull for 6 (!) repositories to
keep his dependencies up-to-date. And then there is a change in Platoon
which makes it incompatible with the cutting edge Blocks, and the user has
to revert to a known compatible version, and so on and so forth.

That being said, my opinion is that creating a new repository which
provides tools on top of Theano should be well justified. The scale of the
project might be the reason. For instance Fuel is indeed quite a big
project, with potentially huge codebase since it has to contain code for a
lot of datasets. But even considering that, I am really not sure that we
won by moving it out of Blocks. It costs us so much to maintain two
documentations, two testing configurations, review two PRs simultaneously,
etc.. And in the end of the day PyLearn2 never switched to it...

So I think it would be very practical to develop these new mini-frameworks
in Blocks repository. We should keep an eye on the dependency graph of
course.

On 5 December 2015 at 16:48, Bart van Merriënboer <
notifications@github.com>
wrote:

Yes, iteration number would just be a key in the JSON log. That makes the
format entirely agnostic to things like iterations or epochs. Tools built
on top of this format would just use the default convention that one
entry
is one datapoint, with the option of using any key in the log as a time
measure instead.

Although I understand the desire to want to store the state, I would not
want to store it in the same file.

  • It means that parsing suddenly becomes more complicated e.g. if we
    store the status as a JSON object, and it contains some of the same
    keys as in the log entries, parsing the file with jq or something
    would return strange things. All tools and scripts would have to be aware
    of this custom "header" in the file.
  • The entire UNIX philosophy goes out the window, because commands
    like tail, cat, grep etc. would result in non-conforming log files
    (the status part might be missing)
  • Logs can be assumed to be immutable, status usuallyisn't

I don't see a problem with having two files e.g. model.log and
model.config. Then you can still do things like

for model in $(ls .config)do
learning_rate=$(jq .learning_rate < "$model")
loss=$(jq -s 'min_by(.loss).loss < "${filename%.
}.log") echo "Learning
rate $learning_rate had loss $loss"done

And do all the processing you want that way.


Reply to this email directly or view it on GitHub
#927 (comment).


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Dec 7, 2015

I think that consideration shouldn't just be the size of the codebase. I think a bigger consideration is actually how large and changeable the public API is. Take picklable-itertools as an example, which has worked just fine as a separate package because by its very nature it has a completely stable API; function signatures and return values simply won't change, so there's little chance things will break. For this potential logging framework (and Platoon) the API would be very minimal i.e. in the order of 2-3 methods (preferably as simple as Log.__init__ and Log.log).

In the long term, if we end up switching from Theano to TensorFlow for example, having Fuel separate will be a big win. Similarly, this logging mini-framework would have no dependencies on Theano (or Blocks) whatsoever and could be more future-proof as well as more widely applicable as a separate package.

@rizar
Copy link
Contributor

rizar commented Dec 8, 2015

Picklable-itertools has an extremely clear purpose. I don't think that a
general logging framework and a general parallelization frameworks will
converge nearly as quickly.

Large part of Fuel will not be really useful for Tensorflow, since data
processing is a part of the computation graph there. Also, if Fuel as it is
now was in Blocks repository, but did not depend on Theano, we would have
all the pros and none of the cons that we have now.

In short, I am really tired of this DJ experience. Btw, look at Tensorflow:
they have logging, plotting and data preprocessing in the same framework as
the computation graph and stuff. I assume they just found it more
practical.
On Dec 7, 2015 10:56 AM, "Bart van Merriënboer" notifications@github.com
wrote:

I think that consideration shouldn't just be the size of the codebase. I
think a bigger consideration is actually how large and changeable the
public API is. Take picklable-itertools as an example, which has worked
just fine as a separate package because by its very nature it has a
completely stable API; function signatures and return values simply won't
change, so there's little chance things will break. For this potential
logging framework (and Platoon) the API would be very minimal i.e. in the
order of 2-3 methods (preferably as simple as Log.init and Log.log).

In the long term, if we end up switching from Theano to TensorFlow for
example, having Fuel separate will be a big win. Similarly, this logging
mini-framework would have no dependencies on Theano (or Blocks) whatsoever
and could be more future-proof as well as more widely applicable as a
separate package.


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Dec 8, 2015

Within a company it's easier to create a monolithic system, or an intricate web of dependencies, because you have full-time software engineers responsible for resolving dependency issues, and your software is never used as part of a wider software stack that it needs to play nice with.

For open source software I think that the ideal should be different, and separate codebases for separate projects are important, to enforce modularity, avoid coupling, and to simplify use. For example, I personally don't like having to install two dozen dependencies for Blocks if I just want one particular, tiny functionality.

At Google everything is part of a single, supported software stack that each employee is expected to run, so our approaches aren't very comparable. As an alternative comparison, have a look at all the different repositories under the Torch umbrella.

I think the issues you highlight are basically issues with Python's package management, the way Blocks and Fuel are managed at the lab, our lack of a unified testing infrastructure, and GitHub's inability to make pull requests that span multiple repositories. I'd be much happier trying to find practical solutions to those problems, than throwing everything into a single, monolithic codebase which is likely to rot eventually, where interdependencies are easily sneaked in, the list of external dependencies grows with each new functionality, where it becomes harder to swap or replace particular parts of the stack as time goes on, and the number of issues/PRs per day becomes overwhelming because they are not separated depending on their functionality, etc.

If the management of multiple repositories really becomes too much of a burden, I'd go down the route of writing a few helper scripts instead, which allow us to make pull requests to Blocks and Fuel simultaneously and ensure that their test suites are then run against each other. Likewise, we should maybe find a way of not needing to duplicate .travis.yml and everything that comes with it for each repo.

@dmitriy-serdyuk
Copy link
Contributor

Concerning practical things. I took a big log from our speech experiment and converted it into json lines format, here is a gist. Surprisingly, json takes a bit less space than cPickle and gzipped json is much smaller:

-rw-r--r--  1 dima  staff   109M Dec  9 15:05 main_log.json
-rw-r--r--  1 dima  staff    17M Dec  9 15:03 main_log.json.gz
-rwxr-xr-x  1 dima  staff   122M Dec  9 14:08 main_log.zip

@dmitriy-serdyuk
Copy link
Contributor

It might be misleading: the log is not zipped despite .zip extension, it's just a cPickle file.

@bartvm
Copy link
Member Author

bartvm commented Dec 9, 2015

Thanks, the size of the compressed JSON files is very encouraging!

@rizar
Copy link
Contributor

rizar commented Dec 9, 2015

Indeed! Dima, can you please check that from this particular pickle the
compression was used.

Concerning one repo vs many: I think we should have a meeting next week and
discuss it person. It is a crucial question. Also, I remember Bart
expressing dissatisfaction that we only have administrative CCW meetings
and no design meeting: this might just as well be a reason to have the
first one. If you agree, propose a time and date.

On 9 December 2015 at 15:54, Bart van Merriënboer notifications@github.com
wrote:

Thanks, the size of the compressed JSON files is very encouraging!


Reply to this email directly or view it on GitHub
#927 (comment).

@dmitriy-serdyuk
Copy link
Contributor

You are right, it was saved with protocol 0, here are sizes with protocol 1, 2 and gzipped:

-rw-r--r--  1 dima  staff    77M Dec  9 16:55 main_log_1.zip
-rw-r--r--  1 dima  staff    73M Dec  9 16:57 main_log_2.zip
-rwxr-xr-x  1 dima  staff    26M Dec  9 14:08 main_log.zip.gz

@bartvm
Copy link
Member Author

bartvm commented Dec 9, 2015

I guess that seems more realistic. I'm slightly curious though about the roles of NumPy arrays here; if 95% of the log data is array data, the actual numbers don't tell you much about the differences between the data formats. That said, a 35% decrease in size is still pretty good.

Also, I guess this is another argument for dropping pickling, because it's clear that even experienced users can forget to specify the right protocol and end up with unnecessarily large files.

I'll send an e-mail about a design meeting in a bit

@dmitriy-serdyuk
Copy link
Contributor

There are no large numpy arrays in this log. I think, it is mostly field
names and other meta information that consumes space.

On Wed, Dec 9, 2015 at 5:09 PM, Bart van Merriënboer <
notifications@github.com> wrote:

I guess that seems more realistic. I'm slightly curious though about the
roles of NumPy arrays here; if 95% of the log data is array data, the
actual numbers don't tell you much about the differences between the data
formats. That said, a 35% decrease in size is still pretty good.

Also, I guess this is another argument for dropping pickling, because it's
clear that even experienced users can forget to specify the right protocol
and end up with unnecessarily large files.

I'll send an e-mail about a design meeting in a bit


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Jan 19, 2016

First code here: https://github.com/bartvm/mimir

Once it's more stable I can merge it into Blocks, but a separate repo was easier to start with (since it contains C and Cython code).

@bartvm
Copy link
Member Author

bartvm commented Feb 11, 2016

I implemented all the important functionality, and am quite happy with it. In the end it does quite a bit of nice magic (like using threads in order to respond to clients who want data for live plotting or analysis, and using two buffers to make sure that writing the JSON data to disk is both efficient and well-compressed while still being relatively fail-safe). Integrating it into Blocks might not be entirely straightforward because of the nightmares from pickling file objects (and in this case, text wrappers over buffered streams over Cython type extensions that hold file handles...).

@dmitriy-serdyuk
Copy link
Contributor

You are using bokeh 0.11 inside a notebook for live plotting, right? It seems a very nice solution to mila-iqia/blocks-extras#35.

@dwf
Copy link
Contributor

dwf commented Feb 12, 2016

I think IPython notebook is a pretty heavy dependency for that, and if I
understand it'd be a very much "assembly required" affair. I think a simple
Flask app is probably the solution to mila-iqia/blocks-extras#35 but if we
can use gzlog for the storage backend that'd be great.

On Fri, Feb 12, 2016 at 11:49 AM, dmitriy-serdyuk notifications@github.com
wrote:

You are using bokeh 0.11 inside a notebook for live plotting, right? It
seems a very nice solution to mila-iqia/blocks-extras#35
mila-iqia/blocks-extras#35.


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Feb 12, 2016

Yes, I think I use the latest Bokeh. IPython Notebook is not a dependency though, there's also the option of using the Bokeh server.

That said, sending/receiving data is entirely done through ZMQ, so you can use whatever plotting framework that you want.

@dwf
Copy link
Contributor

dwf commented Feb 12, 2016

Err, the "Bokeh server" (as in bokeh-server executable) doesn't exist
anymore so probably you're using 0.10.

On Fri, Feb 12, 2016 at 12:33 PM, Bart van Merriënboer <
notifications@github.com> wrote:

Yes, I think I use the latest Bokeh. IPython Notebook is not a dependency
though, there's also the option of using the Bokeh server.

That said, sending/receiving data is entirely done through ZMQ, so you can
use whatever plotting framework that you want.


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Feb 12, 2016

There's the bokeh serve command. No dashboard, but it still serves live plots if you go to http://localhost:5006/?bokeh-session-id=<session-id>

http://bokeh.pydata.org/en/latest/docs/user_guide/server.html#specifying-output-server

@rizar
Copy link
Contributor

rizar commented Feb 12, 2016

Looks nice!

As for Blocks integration, can not we create a new Logger object upon unpickling?

I am also curious if there is a support for multiple processes writing to the same file. I have recently had a usecase when it was necessary.

@dmitriy-serdyuk
Copy link
Contributor

@rizar , I think we can do it, the logger appends to the file, so it shouldn't be a problem. A small trick is needed to fetch the last row (it's needed by some extensions) from the log, though. I'm trying to implement it now and I'll see if it works fine.

@dwf
Copy link
Contributor

dwf commented Feb 12, 2016

Atomicity of the appends is a concern, isn't it? It sounded from what Bart
said that mimir is already in a place where it's brokering multiple client
consumers, I don't know if it can do the same for producers.

On Fri, Feb 12, 2016 at 2:19 PM, dmitriy-serdyuk notifications@github.com
wrote:

@rizar https://github.com/rizar , I think we can do it, the logger
appends to the file, so it shouldn't be a problem. A small trick is needed
to fetch the last row (it's needed by some extensions) from the log,
though. I'm trying to implement it now and I'll see if it works fine.


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Feb 12, 2016

Writing to the same file from multiple processes isn't trivial. You'd need to lock the files to make things atomic like @dwf said, and I think that would be way too slow for many workers. Since ZMQ is already a dependency we could consider implementing something like this, but it would mean you need to designate a single master process to collect the log entries from the workers.

I think this is overkill though. Since the log is just line-delimited JSON, why not have each process write to a separate log and then do zcat *.jsonl.gz | gzip -c > full_log.jsonl.gz? Your log entries are going to be out of order, but you could just add a timestamp and pass them through jq -s -c 'sort_by(.timestamp)[]' if you need to.

@dwf
Copy link
Contributor

dwf commented Feb 12, 2016

👍

@rizar
Copy link
Contributor

rizar commented Feb 12, 2016

All right, that sounds reasonable.

Sorry for my being lazy to read the code, but can you please explain how
are the consumers updated that a new record has been appended to the log?

On 12 February 2016 at 15:29, David Warde-Farley notifications@github.com
wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#927 (comment).

@bartvm
Copy link
Member Author

bartvm commented Feb 12, 2016

It's simply a ZMQ publish and subscribe pair of sockets (PUB-SUB). The log simple pushes the log entries as JSON strings over the socket, and all the clients that are subscribed will receive them.

There's also a "persistent" client-server interface which is slightly more complex. In this case the server starts a separate thread. Each log entry is pushed over the PUB socket to the clients, but also to the other thread. This separate thread has a ROUTER socket. Now, if a client wants to join this happens:

  1. Client asks server thread for data (DEALER-ROUTER)
  2. The separate server thread replies to this client by sending each log entry that it has stored in memory (ROUTER-DEALER)
  3. Client start listening for new entries on the SUB socket.

Meanwhile, the main thread sends each log entry to the subscribed clients (PUB) as well as to the second separate thread where it will be stored for potential future clients to receive.

@bartvm
Copy link
Member Author

bartvm commented Feb 15, 2016

So although catting a bunch of log files together is fine for analysis, it's a bad too bad that you lose the ability to live plot, print to a single output, etc. so I ended up writing a simple log for multi-GPU training anyway: bartvm/mimir@39b9401

Still need to write docstrings, but the usage is really simple:

# Main process
from mimir import ServerLogger
# Accepts the same arguments as a normal Logger object
# blocks until all workers are done
logger = ServerLogger(filename='combined_log.jsonl.gz', stream=True)
# A worker
from mimir import RemoteLogger
# A remote logger only needs connection details, localhost:5555 by default
logger = RemoteLogger()
# Log away
logger.log({'iteration': 0})

The server logger receives the log entries, adds a unique ID that identifies each remote logger, and then handles it like it usually would (so it streams it, writes it to a log file, etc.)

Some current shortcomings:

  • No robust error handling when the server or client dies
  • Would be nice if the worker could give an ID that identifies it so that you can match the log entries to e.g. the GPU used
  • No utility to start the server logger non-blocking (just need to launch it in a separate thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants