Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: print the source event, output data or event in JSON instead of plain text format #92

Open
opk12 opened this issue Aug 31, 2022 · 28 comments
Labels
enhancement New feature or request

Comments

@opk12
Copy link

opk12 commented Aug 31, 2022

--listen and --tail print events in a human-readable format. I want the json, to parse it easily in my script. Could matrix-commander print the json by default or with a command-line flag?

@opk12
Copy link
Author

opk12 commented Aug 31, 2022

Local patch

On my test machine, I replaced line 2053

    print(complete_msg, flush=True)

with

    print(json.dumps(event.source))

This prints one line for each event. My script reads each line and parses with json.loads().

Does event.source always esist in all event types received from the callback function? I'm asking because the change works on my machine for my limited, interactive usage, but I'd like to be sure for automated usage and for sharing the patch with other people.

Note on newlines: json.dumps() defaults to indent=None which selects the most compact representation (in reference to newlines) according to the docs

@opk12
Copy link
Author

opk12 commented Aug 31, 2022

To parse the lines from matrix-commander I am using:

def get_latest_events(*, room_id: str, tail: int=100) -> list[dict]:
    """
    Requires this patch to matrix-commander::  
        https://github.com/8go/matrix-commander/issues/92#issuecomment-1232843897

    :return: Events from the latest to the oldest.
    """

    import collections, json, subprocess

    args = ['matrix-commander', '--tail', str(tail), '--listen-self', '--room', room_id]
    mc_stdout = subprocess.run(args, capture_output=True).stdout
    mc_lines = mc_stdout.decode('utf-8').splitlines()
    return [json.loads(line) for line in mc_lines]

@HoboDev
Copy link

HoboDev commented Sep 16, 2022

I've made some very basic implementation which uses the flag --output-json
It's not yet incomplete but I was driven by the same idea wanting to have JSON output.

Maybe with a bit of work this can be turned into a PR

https://github.com/HoboDev/matrix-commander/tree/feat-output-json

@opk12
Copy link
Author

opk12 commented Sep 17, 2022

@HoboDev That's unrelated. This issue is for the data structure returned verbatim from the server. A Matrix room is a list of json objects that follow the Client-Server API specification and matrix-commander has the json inside event.source.

A client should use the Client-Server specification to parse matrix-commander output into anything.

Instead your code parses the standard json into a non-standard json with limited information. This is a maintainance nightmare in the long term, because it ties the downloader with the parser, the Client-Server spec is changing, and anyone will request their fields of interest.

@HoboDev Could you write a library to parse the Client-Server json (= wanted matrix-commander output) into a high-level data structure (list[Message] where each Message has text_revisions, reactions/annotations, in reply to, ...)? I don't know one for Python.

@HoboDev
Copy link

HoboDev commented Sep 17, 2022

I've replaced the very same line you have changed to get json output and added some more code.
Looking at the keywords "json output" I fail to see how this isn't related. Aren't we talking about json output as the common denominator?

What I did is more like a rough draft or proof of concept since I didn't know which specification to follow and didn't have time for going into details. I don't recall that the code I've added parses anything that must be the existing code. I'm just using some variables to get the info and put it into a structure.
Of course this can always be changed to a standard. Which obviously is more sensible.

Unfortunately I've seen your issue after I was done with my draft. Would have been easier to start like this.

I guess I could write a library but neither am I well versed in python nor do I have the enough free time for digging into this.
Maybe as I tag along and more knowledge builds up the path will shorten.
For this endeavor we'd need the desired output from matrix-commander first anyway. Also it would make things a lot easier.
But currently I just need the json output and am looking for a sensible solution to implement that.

So what is your recommendation? Just outputting all json data as it arrives?
That shouldn't take too long and would be quite an easy pull request if it's accompanied by the right flag and documentation.
What about matrix-commanders info messages? I've rerouted the first line to stdErr. The same could be done for all info messages (if there are any, e.g. --debug messages).

@HoboDev
Copy link

HoboDev commented Sep 17, 2022

Another thing just came through my mind (because I don't really know the spec and didn't have time to look at the resulting data):

Wouldn't it be more sensible to output the whole event?

Let's talk flags...
E.g.

--print-raw-event
--print-raw-event-source

Then we could have both. The spec would remain unchanged and the next program could sort it out.

@8go 8go changed the title print the source event Enhancement: print the source event, output data or event in JSON instead of plain text format Oct 2, 2022
@8go
Copy link
Owner

8go commented Oct 2, 2022

Hi @opk12 and @HoboDev

Thank you for your valuable input.

You make valid points. I also see that there are several issues, all somewhat related.

Let me think a bit about it, maybe the best is an incremental approach. Get the JSON event data published for listening, thereafter step-by-step add other event/JSON related output issues.

8go pushed a commit that referenced this issue Oct 4, 2022
- bug fix in --joined-members
- new feature: --output

- --output is currently only implemented for 2 functions: --joined-members and --joined-rooms

- see Issue #94
- see Issue #95
- see Issue #92
- see Issue #89
@8go
Copy link
Owner

8go commented Oct 4, 2022

I want to centralize all JSON-related, event-data, raw-data, output-format etc. discussion in one place, here in Issue #92.

Hence I closed Issue #95.

@8go
Copy link
Owner

8go commented Oct 4, 2022

@HoboDev , @opk12

Please check out the new release and the --output feature, currently implemented for --joined-members and --joined-rooms.

Test it and give feedback.

Is that what you expected?
Can it be improved?

@8go
Copy link
Owner

8go commented Oct 4, 2022

@HoboDev wrote this in Issue #94

I have started a basic implementation of --output-json to get JSON on stdout, so I don't have to deconstruct the message lines. You can check it out here: https://github.com/HoboDev/matrix-commander/tree/feat-output-json

It is lacking a bit on documentation and maybe does not adhere too much to existing variable names. But at least it's a start. So far I've covered basic messages and media files.

I moved it over here because I want all the discussion to be in the same place, here in Issue #92.

@8go
Copy link
Owner

8go commented Oct 4, 2022

A question I have: --joined-members "*" in text mode produces multiple lines as output, one line for each room.

In JSON, should that be

  • 1 line with 1 JSON array (with N objects in it) or
  • N lines each one with 1 JSON object?

If you run m-c now with --joined-members "*" you will see that it returns 1 line with 1 JSON array (with N objects).

Is that the best?

Same question will come up once received messages are done. Should it output 1 line with 1 JSON object for each message read, or just a total of 1 line with a JSON array of all N messages?

What do you think? I was leaning towards just returning 1 single array object. The user can then use jq or something else to go thru the array.

@opk12
Copy link
Author

opk12 commented Oct 4, 2022

I'd follow the standard Unix practice of printing one object on one line.

  • For --joined-members, the object of interest is a member, and there is no pro in emitting JSON instead of @user:homeserver, unless you plan to add fields in the future and document that the returned JSON may have fields added. Edit: then a separate call can get the user info from the ID.
  • For messages, the object of interest is the event as returned from the homeserver.

The purpose of the Unix way is that a client can read an object as soon as it's downloaded. Otherwise

  • a client that only needs some objects, or decides to terminate sooner (error etc), must wait to download everything, and fails for a network failure while downloading unnecessary events
  • it is inefficient to parse a very huge string (many MB) then navigate through a very huge array. It can also be painful when the system is low on memory or on a low end device (think Mobian).
  • The standard Unix tools can't be applied, you must be a programmer (not just a power user) and learn jq, and not all systems have jq available. For example, with the patch above that prints one json per line, I am filtering the jsons with grep and head (no parsing) to get all the mxc:// URLs for a specific room.

The Unix way also has the nice property of SIGPIPE: matrix-commander | client is optimized by the OS, by terminating matrix-commander as soon as client terminates. So the download is stopped automatically as soon as client gets the necessary material. This requires that messages are sent to client as soon as they are available.

@8go
Copy link
Owner

8go commented Oct 4, 2022

@opk12 thanks for the input and feedback.

Anyone else, any more feedback?

@8go
Copy link
Owner

8go commented Oct 4, 2022

Option 1 , currently implemented, output is 1 line with 1 JSON array (with N objects in it)

matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx"  --output rAW | jq 
[
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!rrr:xxx.xxx.xxx"
 },
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     },
     {
       "user_id": "@u2:xxx.xxx.xxx",
       "display_name": "u2",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!sss:xxx.xxx.xxx"
 }
]

Option 2 , currently not implemented, output is N lines each one with 1 JSON object; preferred by @opk12 as explained above

matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx"  --output rAW | jq 
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!rrr:xxx.xxx.xxx"
 }
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     },
     {
       "user_id": "@u2:xxx.xxx.xxx",
       "display_name": "u2",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!sss:xxx.xxx.xxx"
 }

8go pushed a commit that referenced this issue Oct 5, 2022
- new action workflow for Docker
- changed `--output raw` to print 1 line PER data-item, i.e. N lines with 1 JSON object each (instead of 1 line in TOTAL with a JSON array on N JSON objects)
- added code to cover --output for actions --listen and --tail
- see Issue #92
@8go
Copy link
Owner

8go commented Oct 5, 2022

I changed the code to follow @opk12 advice. Now the Option 2 is implemented.

Now a separate line and a separate JSON object is printed for each data-set (e.g. message received, room-member found, etc.)

Test it with --listen, --tail or --joined-members and --joined-rooms.

E.g. the new code delivers this:

matrix-commander --joined-members "\!rrr:xxx.sytes.net" "\!sss:xxx.xxx.xxx"  --output raw | jq 
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!rrr:xxx.xxx.xxx"
 }
 {
   "members": [
     {
       "user_id": "@u1:xxx.xxx.xxx",
       "display_name": "u1",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     },
     {
       "user_id": "@u2:xxx.xxx.xxx",
       "display_name": "u2",
       "avatar_url": "mxc://xxx.xxx.xxx/zzz"
     }
   ],
   "room_id": "!sss:xxx.xxx.xxx"
 }

8go pushed a commit that referenced this issue Oct 6, 2022
- upgrade code from matrix-nio v0.19 to 0.20
- added --output to more functions
- renamed --output options from `human` to `text`
- renamed --output options from `raw` to `json`
- renamed --output options from `raw-details` to `json-max`
- might be useful for Issue #89
- see Issue #92
@8go
Copy link
Owner

8go commented Oct 6, 2022

Have a look at the new release. --output now covers --listen and --tail.

Try something like --tail 1 --listen-self --output json-max | jq or --tail 1 --listen-self --output json | jq

@opk12
Your 20-line Python program (posted above) to read messages should work. I have not tested it, but please report back.

@longregen
Copy link

Wow, I was literally looking for how to do this and saw your release 4 hours ago. Thank you!

@opk12
Copy link
Author

opk12 commented Oct 7, 2022

@8go My use case is specifically about having spec-defined events:

  • when the spec changes, I should not wait for my program to break, report on GitHub, wait for m-c to implement, test the m-c update (pray without bugs), re-deploy in production
  • programmers should not try the option to learn what is output, or guess about what 'close to matrix-nio data' can mean
  • programmers need the guarantee that a future implementation does not change the output format

In other words, my use case is specifically about printing the source event unaltered, as returned verbatim from the Matrix homeserver without any kind of manipulation. It's not about collecting the fields and making a new dict, it's about treating the matrix-nio event object as opaque and passing it on as-is to the client.

@opk12
Copy link
Author

opk12 commented Oct 7, 2022

Another point is guaranteed interoperability with other libraries in the Matrix ecosystem, i.e.

  • it's not about having the fields in the output, or my personal trust in the m-c implementation quality, it's being able to prove the correctness of my client code to the future myself / dev team lead / ... by quoting the m-c manpage that says that the output follows the spec
  • if I copy a function from another FLOSS library into my software, it will manipulate spec events, and passing m-c output to it should work de jure (not as a de facto "coincidence")

To be clear, experience shows that, if one needs to touch the Matrix events because an easy interface is not sufficient, learning the spec is the best favor they can do to themselves in the long term.

@8go
Copy link
Owner

8go commented Oct 7, 2022

OK, @opk12 , your point is valid. But ...

Let's take an example: --get-room-info . --get-room-info
a) gets the room object from matrix-nio and
b) adds display_name obtained via matrix-nio API call display_name and
c) outputs the combined Json object (which includes the display_name)

Observation:
a) Does the matrix-nio give me a room object the follows the official Matrix spec? Hopefully? But I do not know for sure, I have not compared the matrix-nio code with the spec.
b) display_name is not in the matrix-nio object. There is name. There is group_name, etc. Calling the display_name method in the nio API, it computes the display_name based on name, groupname, etc. It seems convenient to have the display_name readily available without having to making another m-c call.

In summary,

  • there are a few instances where one or two fields are added (like display_name in the example above).
  • nothing is ever removed, except transport_response for json because transport_response is not not the data, it is about how the data was obtained and transported.

How would you improve that?

8go pushed a commit that referenced this issue Oct 7, 2022
- added new option `--get-client-info`
- `--output` has been added to all functions, all functions that have output can do the output in JSON now
- see Issue #92
@8go
Copy link
Owner

8go commented Oct 7, 2022

The description now reads:

  --output OUTPUT       This option decides on how the output is presented. Currently offered choices are: 'text', 'json' and 'json-max'.
                        Provide one of these choices. The default is 'text'. If you want to use the default, then there is no need to use
                        this option. If you have chosen 'text', the output will be formatted with the intention to be consumed by humans,
                        i.e. readable text. If you have chosen 'json', the output will be formatted as JSON. The content of the JSON
                        object matches the data provided by the matrix-nio API. In some occassions the output is enhanced by having added
                        a few data items for convenience. These convenient data items are added to the data from matrix-nio. In most cases
                        the output will be processed by other programs rather than read by humans. Option 'json-max' is practically the
                        same as 'json', but yet another additional field has been added. The data item 'transport_response' which
                        gives information on how the data was obtained and transported is being added.

@opk12
Copy link
Author

opk12 commented Oct 7, 2022

Props for the big work and effort. I am following the releases and I see the pervasive code changes.

I personally need a room's events (--listen and --tail) for piping into a spec-compliant parser. This was lost in the thread above, sorry for not clarifying that I referred to the room events today.

This is just a 2-line change, Callbacks.message_callback() doing if gs.pa.output == '--speced-raw-event-from-homeserver': print(json.dumps(original event)) (without obj_to_dict and custom fields, to spare me a compatibility layer around m-c that depends on m-c implementation details).

@8go
Copy link
Owner

8go commented Oct 8, 2022

How about a --output json-spec that is equal to json (for the time being) except for the message callback.

?

@8go
Copy link
Owner

8go commented Oct 8, 2022

I kind of foresaw that some day something like this would happen (I didnt think it would happen so soon, hahaha), that is why it is not a binary on-off --json but a --output many-sub-options.

8go pushed a commit that referenced this issue Oct 8, 2022
- implemented `--output json-spec` to sprint messages etc as spec, as provided by matrix-nio, without modifications
- see Issue #92
@8go
Copy link
Owner

8go commented Oct 8, 2022

@opk12 try out --output json-spec

@opk12
Copy link
Author

opk12 commented Oct 8, 2022

Thank you! Here are some points.

  1. Is there a reason now for json-max and json to be separate?

  2. Shouldn't json-spec be defined in the help text in terms of the C-S spec? Non-programmer power users, or programmers who know Matrix but not Python, won't even try it if they read "matrix-nio".

  3. json-spec does not really make sense for --get-room-info and other queries that are not for room events. It feels weird to write -spec and get stuff from matrix-nio (one of many libraries).

About -spec: Reading the commit, it feels like having an umbrella "matrix-nio stuff" option for conceptually different things (room events and matrix-nio objects) could be caused by my other feature request, of calling the m-c functions to avoid reimplementing the same snippets around matrix-nio. But it's really as simple as that (= function call) because serialization (= json.dump of matrix-nio objects) is contrary to the feature:

  • deserializing requires me more effort than calling or copy/pasting a snippet from matrix-commander
  • while the output is understandable by a human, it is unusable programmatically, because __dict__ shows the implementation-dependent private fields (so my app cannot rely on it) and because __dict__ won't output properties (= fields that are not stored in the object, but computed on every call)
  • proper de/serialization needs dedicated methods in matrix-nio (merely exchanging fields won't make it)

Maybe there is some use case that I am missing, but why printing matrix-nio objects at all in the -spec case.

  1. Some find it convenient to shorten if a == A1 or a == A2 to the idiomatic if a in (A1, A2, A3)

@8go
Copy link
Owner

8go commented Oct 10, 2022

@opk12 Regarding:

  1. Is there a reason now for json-max and json to be separate?

Only reason is that json-max also gives transport-response but fwe people will care about that

  1. Shouldn't json-spec be defined in the help text in terms of the C-S spec? Non-programmer power users, or programmers who know Matrix but not Python, won't even try it if they read "matrix-nio".

Yes, It would make sense to define/document json-spec as "according to specification....".

  1. json-spec does not really make sense for --get-room-info and other queries that are not for room events. It feels weird to write -spec and get stuff from matrix-nio (one of many libraries).

Are you suggesting that json-spec should only print the message events in --listen and --tail but be silent (print nothing) in all other calls (e.g. --get-room-info) ? The message event when receiving messages is the only "as-spec" event, correct? Or is there another one ?

4> Some find it convenient to shorten if a == A1 or a == A2 to the idiomatic if a in (A1, A2, A3)

Thanks for pointing that out, easy to fix.

Question: --tail 10 -y --output json-spec : This is working for you? Yes/No? Working as you expected?

8go pushed a commit that referenced this issue Oct 10, 2022
@opk12
Copy link
Author

opk12 commented Oct 11, 2022

  1. json-spec does not really make sense for --get-room-info and other queries that are not for room events. It feels weird to write -spec and get stuff from matrix-nio (one of many libraries).

Are you suggesting that json-spec should only print the message events in --listen and --tail but be silent (print nothing) in all other calls (e.g. --get-room-info) ?

I suggest that it raises an error: another option was meant (it is a typo), or, to silence output, the usual shell redirection > /dev/null would be more appropriate.

The message event when receiving messages is the only "as-spec" event, correct? Or is there another one ?

As far as I know (I studied room events), there are no other kinds of events (the spec says "event" and "room event" interchangeably) or other cases for --json-spec. Taking your example of --get-room-info: it merges information from different HTTP calls and I don't know a single Matrix object that represents a room. If the spec adds such an object in the future, it will be the perfect match for --json-spec, so giving an error now gives flexibility in the future (no backward compatibility to constrain the output).

Question: --tail 10 -y --output json-spec : This is working for you? Yes/No? Working as you expected?

Sure, it works and I migrated my room dump script!

Another code simplification is possible. Rather than making the output inside an if, all outputs can be made in advance. Adding other --outputs in the future will not make a lot of unmanageable nested ifs, but just one new line another_output = json_ | ...

For example, this

                if gs.pa.output in ...:
                    dic = resp.__dict__
                    if gs.pa.output == OUTPUT_JSON:
                        ...

can become (json_ has the underscore to avoid a name clash with the module json)

    json_ = {...}
    json_max = json_ | {'new key': ...}
    json_other = json_.copy()
    json_other.pop('key to remove')
    json_spec = ...
    print_event_depending_on_option(gs.pa.output, text=..., json_=json_, json_max=json_max, json_spec=json_spec)

    def print_event_depending_on_option(option: Literal['text', 'json', 'json-max', 'json-spec'], *, text: str, json_: dict=None, json_max: dict=None, json_spec: dict=None) -> None:
        results = {
            OUTPUT_TEXT: text,
            OUTPUT_JSON: json_,
            OUTPUT_JSON_MAX: json_max,
            OUTPUT_JSON_SPEC: json_spec,
        }
        print(results[gs.pa.output])

8go pushed a commit that referenced this issue Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants