Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: download room history #89

Open
mcnesium opened this issue Aug 5, 2022 · 19 comments
Open

Feature Request: download room history #89

mcnesium opened this issue Aug 5, 2022 · 19 comments

Comments

@mcnesium
Copy link

mcnesium commented Aug 5, 2022

The "Migrate your existing Matrix account" tool does quite a good job in transfering your account to a new home server including setting roles and parting the old account, which is quite cool. It does not transfer the history though, which is totally plausible from a technical point of view.

Though when migrating to a new home server, leaving years of chat history behind might be a show stopper from finally moving away from matrix.org (which is the case for myself by now). The built-in history download thing from inside Element Web is fine for one or two iterations, but it gets sort of tedious when you have a three-digit number of rooms and dms in your account.

So I was looking for a scriptable solution and found matrix-commander having candidate potential. Unfortunately, it does not seem to be able to download a room's history yet, does it? So this is a request for a "download room history" feature.

@8go
Copy link
Owner

8go commented Oct 2, 2022

Hi @mcnesium

Have a close look at the --tail feature. This should be able to get your old messages (history).

Maybe using the --joined-rooms first to get a list of your 100+ rooms, then looping over each room with --tail to get the old messages?

It most likely is not be the full solution for you, but it might get you a big step further. Have a look at these 2 options in particular.

I don't know how you would import an old history into the new account on the new server. Anyone knows if this is possible?

@mcnesium
Copy link
Author

mcnesium commented Oct 3, 2022

Thanks for pointing me to the --tail option, I guess it will work to download all room's history. And in conjunction with e.g. tac I think I can even handle to store it in the right order.

One more question though: how do I get the actual room addresses and/or a clue whether this is a DM ? The --joined-rooms option only returns the internal room IDs. For storing the room history a real readable and searchable name would be better.

A good start would be to figure out whether there is only one person in the room, but I am struggeling with using --joined-members correctly.

@8go
Copy link
Owner

8go commented Oct 3, 2022

Today (for testing some downloading attachments), I did a --listen all and it downloaded 5000+ messages (for 1 given room name) AND all the attached files (images, audio, etc.) See Issue #90 for the exact command as an example. It took about 5 minutes.

@8go
Copy link
Owner

8go commented Oct 3, 2022

You want to convert a room ID into a possible room alias. That does not exist in matrix-commander.
The opposite exists, converting an existing alias into an ID: --room-resolve-alias

@8go
Copy link
Owner

8go commented Oct 3, 2022

This should work: matrix-commander --joined-members "*"

The quotes are very important! Just did it, works like a charm.

8go pushed a commit that referenced this issue Oct 4, 2022
- bug fix in --joined-members
- new feature: --output

- --output is currently only implemented for 2 functions: --joined-members and --joined-rooms

- see Issue #94
- see Issue #95
- see Issue #92
- see Issue #89
@8go
Copy link
Owner

8go commented Oct 4, 2022

@mcnesium Get the new version from today.

@opk12
Copy link

opk12 commented Oct 4, 2022

Hello, just a heads up to mention --listen-self in addition to the --listen all.

@8go
Copy link
Owner

8go commented Oct 5, 2022

Yes, of course, in total you get a command like:

matrix-commander --tail XXX --download-media --listen-self --output raw --room XXX or
matrix-commander --listen all --download-media --listen-self --output raw --room XXX or similar

@mcnesium
Copy link
Author

mcnesium commented Oct 6, 2022

Thanks for all the input. I started downloading using this bash snippet:

# iterate over all room ids
for room_id in $(

    # get all room ids, remove ^M at the end of each
    matrix-commander --joined-rooms --output human | tr -d '\r'

); do

    # some verbosity for this script
    date "+%H:%M:%S $room_id"

    # file name is the room_id without the '!' in the name
    room_log="/home/mcnesium/matrix-history/${room_id#\!*}.log"

    # download the room history into a logfile
    matrix-commander \
        --room "$room_id" \
        --listen all \
        --listen-self >"$room_log"

done

It ran well for about 8,5h but then stopped at 95 out of 230 rooms with the message

INFO: matrix-commander: 1 error and 0 warnings occurred.

The whole procedure turns out to be sort of tedious as well in the end, because my account also idles in chats like #python:matrix.org or #matrix:matrix.org which I am not really interested in saving their history.

So I think I am going to change my strategie to first get a handy list of rooms and select those that I want and then only download those. The problem here is that you said it is not possible (yet?) to get the name or alias from the internal ID of a room. So I thought of changing the script above from --listen all to --tail 1 to get a string like this:

Message received for room Fun Shell Scripting [!esyCFwsvHGqLXuAxSC:matrix.org] | sender …

and then get the room name with some substring extraction magic like e.g. sed.

First experiments showed though that --tail 1 takes even longer for each room than --listen all so I am wondering if that is a promising approach. Do you have any other suggestions?

@8go
Copy link
Owner

8go commented Oct 6, 2022

a) thank you for reporting your feedback. It is appreciated because it helps others to learn as well. Thank you @mcnesium

b) matrix-commander cannot map room_id to alias but maybe the REST API of Matrix can. I have not looked at the REST API enough to know. If a REST API call exists for this, then matrix-commander --rest .... can issue the call and get the response. Ideas: read the Matrix REST API. Ask in the Matrix chat how to do this. Open an issue asking people for ideas how this could be done, which Matrix REST calls or which matrix-nio API calls could achieve this. I did look into the matrix-nio API 2 days ago for this, but I did not find anything. But 4 eyes or 40 eyes see more than 2.

c)

INFO: matrix-commander: 1 error and 0 warnings occurred.

This could have been a time-out. Only way to know is to run it with --debug. Also, in many cases mc keeps running even after an error (it depends). Meaning the error might just mean one lost message or similar.

d)

First experiments showed though that --tail 1 takes even longer for each room than --listen all

Mmmh, strange. It takes hours? Run it on 1 room only and in --debug mode.

@8go
Copy link
Owner

8go commented Oct 6, 2022

Thinking about it, most likely you are NOT interested in the room_alias you want the room display name, the room title.

So, the above about the alias is actually irrelevant for you.

You want roomid-to-displayname mapping. That is different.

8go pushed a commit that referenced this issue Oct 6, 2022
- upgrade code from matrix-nio v0.19 to 0.20
- added --output to more functions
- renamed --output options from `human` to `text`
- renamed --output options from `raw` to `json`
- renamed --output options from `raw-details` to `json-max`
- might be useful for Issue #89
- see Issue #92
@8go
Copy link
Owner

8go commented Oct 6, 2022

@mcnesium

Just did a release, especially for you :)

For any message received with --listen you can now get output in JSON format, and I included the room display_name.

Try something like --tail 1 -y --output json and have a look. It MIGHT help you. Maybe, maybe not. But it gives you a more info and hence more options.

@8go
Copy link
Owner

8go commented Oct 6, 2022

Someone, maybe not you, could read, in various stages, in various runs, messages from the server, with --listen. Any output is placed into a database. Since output is JSON, it is easy to pick the fields of interest. Event_id is unique.

Later, one goes thru the database one by one, and for each entry (msg received) there is the info of display_name (e.g. to decide in which directory to write it to), msg content, etc.

In short, one could create a temporary storeage (e.g. db) for import and export.

@8go
Copy link
Owner

8go commented Oct 6, 2022

FYI, I frequently see execution times of 5 to 15sec for a trivial --tail 1. Mileage varies ...

@mcnesium
Copy link
Author

mcnesium commented Oct 6, 2022

Thank you very much for that personalized release 😍

For reproducibility, here is exactly what I did:

# build the latest release from github in docker
docker build -t matrix-commander -f docker/Dockerfile github.com/8go/matrix-commander

# shortcut function for the docker command
matrix-commander() { docker run --rm -ti -v /home/mcnesium/.config/matrix-commander:/data:z matrix-commander "$@"; }

# run it
matrix-commander --version
… version 3.5.5 2022-10-06

So I seem to be running my very own personal version. So lets try to --tail 1 a public unencrypted room with few traffic on the same server, pick #matrix-berlin:matrix.org and take the time:

time matrix-commander --room '!baZIOHUMMbxQRsrphy:matrix.org' --tail 1 -y --output json
{…}
real    4m45.854s
user    0m0.075s
sys     0m0.034s

It took almost 5 minutes for the return of that JSON string. Here is another one, #freifunk:matrix.org:

time matrix-commander --room '!GnMdRpCkJZlWZljMrA:hackint.org' --tail 1 -y --output json
{…}
real    4m57.170s                                                                                                                                                                                                                                                                                                                                                                              
user    0m0.078s                                                                                                                                                                                                                                                                                                                                                                               
sys     0m0.049s                                                                                                                                                                                                                                                                                                                                                                               

Again, no response for almost 5 minutes, then the JSON appears and the command finishes.
This happens on a 2GHz CPU 16GB RAM Debian machine connected to an IPv4 Vodafone cable 1000 connection.

The long response time aside, I can now pipe that --tail 1 to jq '.room_display_name' and get a decent name for a log file. That feels way better than digging the name out of an actual log line. Again, thank you very much for implementing this!

I wonder whether matrix-commander returns an non-zero exit code in case of errors, so that I could extend my script with a repeat while not 0… loop and then let it run for … a week 😬 🤷

edit: I tried the same on another 2,3G machine on a DFN connection. Turns out, running v3.5.3 natively from AUR and v3.5.5 in Docker both take a bit more than one minute for the return of a --tail 1 message.

…
real	1m12,510s

@8go
Copy link
Owner

8go commented Oct 6, 2022

Can you take your comment (90% of it), the performance related part, and post it on Issue #91 which is a more appropriate place. Thanks.

@8go
Copy link
Owner

8go commented Oct 6, 2022

I wonder whether matrix-commander returns an non-zero exit code in case of errors, so that I could extend my script with a repeat while not 0… loop and then let it run for … a week grimacing shrug

Look at the nearly last line of the source code:
return gs.err_count # 0 for success

So, yes it tries to return 0 for no error, or the # of errors.

But remember mc is NOT atomic, what I want to say is that in one run it might get mesg 1, fail on 2 and then get mesg 3. It returns 1. Then you run it again, it might fail on msg 1 and fail on msg2 but get get mesg 3. Returning 2. So, without further study, if you have an error you do not know what is missing (maybe even nothing is missing and the error was not consequential depending on what you are doing).

8go pushed a commit that referenced this issue Oct 6, 2022
@8go
Copy link
Owner

8go commented Oct 6, 2022

@mcnesium

Another feature for you in today's second commit.

Try --get-room-info --output json | jq or similar

Give it a list of room ids and get a list of room objects all having a display name and a topic, etc.

@8go
Copy link
Owner

8go commented Oct 7, 2022

Just FYI: The new release also allows resolving room aliases to room ids. I mention it because we spoke about it in the past.

There are 2 ways of doing it, one being: --get-room-info #someAlias --output json | jq ...

@ps245
Copy link

ps245 commented Apr 27, 2023

Hi,

I'm finding that matrix-commander is stopping after about 200 images. We use the room for my photography group. Unfortunately none of us have saved all the photos.

Here is what I tried:

podman run -v /home/user/.config/matrix-commander/data docker.io/matrixcommander/matrix-commander \
  --login password
  --homeserver https://matrix.org \
  --user-login <USERNAME> \
  --password "<PASSWORD>" \
  --device <DEVICE NAME>  \ 
  --room-default \#myroom:matrix.org
podman run --rm -ti -v /home/user/.config/matrix-commander:/data:z docker.io/matrixcommander/matrix-commander \
  --room '!roomid:matrix.org' -t 1000000 \
  --download-media /data/media \
  --output json > log.json

I don't know how long to tail back (how many events there might be), I think the room has been around since last year. Ideally I'd like to go back as far as Matrix.org will allow.

My understanding is the --listen option only works for new messages received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants