Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ChatHistory messages #1679

Closed
jijiechen opened this issue Dec 27, 2018 · 10 comments
Closed

Support ChatHistory messages #1679

jijiechen opened this issue Dec 27, 2018 · 10 comments

Comments

@jijiechen
Copy link

I didn't use provided issue template since this is an enhancement request instead of an issue.

Wechaty works well for many types of messages including Text, Image, Video, etc. I'm here to ask for a new feature of supporting the reading and extracting sub-messages from a ChatHistory message forwarded from other conversations.

A ChatHistory message is generated by following steps:

  1. go to an existing conversation and hold on one of the messages
  2. tap Multiselect (It is now Select on newer versions of WeChat)
  3. pick the messages you'd like to forward
  4. tap the Forward icon on the bottom-left of the toolbar on the bottom of screen
  5. select Combine and forward and choose a contact to forward to

ChatHistory message is an efficient way for archiving information. So it'll be nice if Wechaty could provide the functionality out of box to extract this history message as some kind of structured data.

Note that, I can already get an XML payload from message.text() method (attached at the end), so it's already possible to analyze forwarded text messages on top of current APIs. But there are two major problems:

  1. I CAN NOT extract multi-media messages, since their content are not provided inline and are encrypted and need to be downloaded in separated invocations which can not be performed based on current provided APIs
  2. Extracting these messages manually could be costly for developers and it's not easy to keep up-to-date supporting as many message types as you do along with new releases of puppets

So here I'm providing two possible implementation paths:

  1. Provide some APIs to help download encrypted content of multi-media sub-messages in ChatHistory messages so that people like me can build their own support
  2. Provide an out-of-box support for the ChatHistory message type

Here is an example of a ChatHistory message:

<?xml version="1.0" encoding="UTF-8"?>
<recordinfo>
   <fromscene>3</fromscene>
   <favusername>chan</favusername>
   <title>Group Chat History</title>
   <desc>屿松: [Video]
林森: [File] XPLANE_VT101_SketchBook.pdf</desc>
   <info>屿松: [Video]
林森: [File] XPLANE_VT101_SketchBook.pdf</info>
   <datalist count="3">
      <dataitem datatype="4" subtype="0" dataid="0" htmlid="0">
         <datafmt>mp4</datafmt>
         <sourcename>屿松</sourcename>
         <sourcetime>2018-12-21 22:42</sourcetime>
         <thumbsourcepath>/var/mobile/Containers/Data/Application/182D457C4EA8-3EAA-4433-A143-B6B33D8F/Documents/53f13aef56722cdb6de4634d5087f474/Video/fae600e744c7bc69ba65f1993a23045d/3877.video_thum</thumbsourcepath>
         <thumbsize>3978</thumbsize>
         <datasourcepath>/var/mobile/Containers/Data/Application/182D457C4EA8-3EAA-4433-A143-B6B33D8F/Documents/aef56722cdd500e744c7bc69087f53f13b6de4634474/Video/fae65f1996ba3a23045d/3877.mp4</datasourcepath>
         <cdndataurl>3054020100044d304b020100020487b93d3802032f4f5602040f7ac2dc02045c21c24a0426777875706c6f61645f6368616e6578743537385f313534353731363239365f305f6e6f4165730204010400040201000400</cdndataurl>
         <cdndatakey>8af3a52db7543318b720dc48ff45dacd</cdndatakey>
         <cdnthumburl>3054020100044d304b020100020487b93d3802032f4f5602040f7ac2dc02045c21c2480426777875706c6f61645f6368616e6578743537385f313534353731363239365f305f7468756d620204010400010201000400</cdnthumburl>
         <cdnthumbkey>bce2f1f6b8a7e20bd6f76bfc694ca1ec</cdnthumbkey>
         <fullmd5>5095f0fd32bfb44bca18e52ffc3a6a51</fullmd5>
         <thumbfullmd5>6846e1284d0d7c9870fd547cde357905</thumbfullmd5>
         <datasize>1586921</datasize>
         <cdnencryver>1</cdnencryver>
         <duration>10</duration>
         <srcChatname>1203541653@chatroom</srcChatname>
         <srcMsgLocalid>3877</srcMsgLocalid>
         <srcMsgCreateTime>1545403335</srcMsgCreateTime>
         <dataitemsource>
            <realchatname>heart_nd</realchatname>
         </dataitemsource>
         <streamvideo>
            <streamvideourl />
            <streamvideototaltime>0</streamvideototaltime>
            <streamvideotitle />
            <streamvideowording />
            <streamvideoweburl />
            <streamvideothumburl />
            <streamvideopublishid />
            <streamvideoaduxinfo />
         </streamvideo>
         <illegalType>1</illegalType>
      </dataitem>
      <dataitem datatype="8" subtype="0" dataid="1" htmlid="1">
         <datafmt>pdf</datafmt>
         <sourcename>林森</sourcename>
         <sourcetime>2018-12-24 23:53</sourcetime>
         <datatitle>XPLANE_VT101_SketchBook.pdf</datatitle>
         <datasourcepath>/var/mobile/Containers/Data/Application/3D8FB6B3-4433-3EAA-A143-2D457C418EA8/Documents/aef5087f53f13b6de46356722cdd4474/OpenData/fa230199600e744c7ae65f69ba3bc45d/4052.pdf</datasourcepath>
         <cdndataurl>304c24d0420777875706c6f61645f6368616e6578743537385f313534353731363239365f310204010400050273045020100020487b93d3802032f4f560100e02010004402040f7ac2dc02045c210400</cdndataurl>
         <cdndatakey>fc5baeaeafdb2559d31700be06c235ea</cdndatakey>
         <fullmd5>1c3da50949018f12afbeb1e7b81eddea</fullmd5>
         <datasize>2442269</datasize>
         <cdnencryver>1</cdnencryver>
         <srcChatname>1203541653@chatroom</srcChatname>
         <srcMsgLocalid>4052</srcMsgLocalid>
         <srcMsgCreateTime>1545666826</srcMsgCreateTime>
         <dataitemsource>
            <realchatname>wxid_a011ldx307rq5u</realchatname>
         </dataitemsource>
      </dataitem>
      <dataitem datatype="1" subtype="0" dataid="1" htmlid="1">
         <sourcename>Nick</sourcename>
         <sourcetime>2018-12-24 23:53</sourcetime>
         <datadesc>我不知道啊,谁说的?</datadesc>
         <cdnencryver>1</cdnencryver>
         <srcChatname>1203541653@chatroom</srcChatname>
         <srcMsgLocalid>21163</srcMsgLocalid>
         <srcMsgCreateTime>1545666826</srcMsgCreateTime>
         <dataitemsource>
            <realchatname>loopigloo</realchatname>
         </dataitemsource>
      </dataitem>
   </datalist>
</recordinfo>

Hope it helps and look forward to your response.

@jijiechen
Copy link
Author

This issue was originally published at wechaty/wechaty-puppet-padchat#223

@jijiechen
Copy link
Author

I'm working on some projects extracting ChatHistory messages.
But, as stated in the initial post, there is one blocking problem makes me not possible to support multi-media messages as Wechaty and underlying puppets DO NOT provide any standalone APIs for downloading multi-media content based on cdndataurl and cdndatakey things.

Only this is solved could my solution move on. If so, I'll be horned to file a PR.

I can confirm the newly added CdnManager in padpro could be used to download attachment files, but not images.

@huan
Copy link
Member

huan commented Jan 11, 2019

@jijiechen thank you very much for this new feature purpose, it's very clear that we will definitely welcome this new feature.

There will be some more details need to be discussed, like how can we separate the chat history into messages.

However, I agree that ChatHistory will be a good name for Wechaty to deal with this problem, I'd like to add this new design to the Wechaty framework, and the underlining Puppet base class.

A more detail design will be welcome and then we will be able to discuss more based on that.

@jijiechen
Copy link
Author

Thanks for the reply.
As mentioned before, I worked out a prototype to support this and published it at https://github.com/jijiechen/dotnetclub-chaty

If interested, you may try it now with a padchat token or a padpro token placed in config/config.json. (I'm honored to be one of the beta users of padpro, BTW)
Indeed, separating and storing messages are core design concerns.

@huan
Copy link
Member

huan commented Jan 11, 2019

Your project based on wechaty is awesome!

I will look into it when I have time and get back to you.

@LukeDev2K
Copy link

does wechaty support chathistory now?

@archywillhe
Copy link
Member

archywillhe commented Jul 10, 2020

@jijiechen really cool project mate! Happened to be working on a similar feature that relies on WeChat's ChatHistory created via Multiselect for my Anki flashcards generating bot. Just went through club-chaty's parser code (inhistory-message-text-parser.ts) and conversion code (inconversion-session.ts and converting/) and I gotta say they are pretty great!

I think right now it can be good to have a ChatHistory (or CombinedMessages) class in WeChaty that just contains the list of Messages parsed from the xml. Url and Text types are quite easy to deal with. The only problem seems to be with local file paths and encrpyted strings for Attachment, Image, etc, as @jijiechen have described, and I think we can ignore them for now and treat them as something like an Intermediate type in club-chaty's intermediate-message.ts.

So at least we have something built-in in WeChaty for devs to easily work with Url and Text messages inside a ChatHistory rather than needing to parse the xml themselves.

Highly recommend @jijiechen to to do the honors of filing a PR since from what I see 95% of the ChatHistory class described above is already implemented inside club-chaty! The code is great and just need to do some decoupling and it is good to go imo.

@jijiechen
Copy link
Author

I'll try to spend some time on this in one or two weeks time.

@archywillhe
Copy link
Member

archywillhe commented Jul 13, 2020

@jijiechen looking forwards to it! I did some testing and noticed the following two things:

  1. looks like history-message-text-parser's .extractMessagesFromXML will not return an array if the history message contains just one message (probably due to the werid way the xml is constructed in WeChat). Since in my use case I may receive a history message containg just one message from the user, I'm using an if-statement for now to catch the special case:
const obj = parser.readXMLPayload(xml)
const resourceUrls = parser.extractResourcesFromXML(obj.resourceText)
const msgItems = parser.extractMessagesFromXML(obj.messageText)

const strings : string[] = _.isArray(msgItems)? msgItems.map(extractTextFromXML) : [extractTextFromXML(msgItems)]
  1. History message can contain history messages. The WeChat implementation currently only allows maximium 1 layer of it though e.g.

WechatIMG1

Haven't tried it out so not sure how well the current implementation of the parser handles it

These two aside I don't think there are any other edge cases :)

and once again thanks for the parser! 👍

Copy link

dosubot bot commented Nov 17, 2023

Hi, @jijiechen! I'm Dosu, and I'm helping the Wechaty team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you requested a new feature in Wechaty to support reading and extracting sub-messages from a ChatHistory message. There have been some discussions and suggestions from other users and maintainers on how to implement this feature. You mentioned that you plan to work on it in the next one or two weeks.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the Wechaty repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution, and we look forward to hearing from you soon!

Best regards,
Dosu

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 17, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants