This app lets you run Splunk against messages from Telegram groups and generate graphs based on the activity in them.
Splunk Telegram includes a Natural Language Processing (NLP) module which lets you extract things like sentiment, Named Entities, etc.
This app is based on my other app, Splunk Lab, which is a generate Splunk platform build for ingesting data on an ad-hoc basis. You should check it out!
- Docker
- HTML exports from a Telegram conversation, channel, or group.
- Exporting is explained further below
- First step is to convert Telegram's HTML into JSON that Splunk can understand:
bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-telegram/master/1-telegram-html-to-json.sh) path/to/telegram-export/messages\*.html > logs/Group-Name.json
- Then, run Splunk:
SPLUNK_START_ARGS=--accept-license bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-telegram/master/2-start-splunk.sh)
- You'll be presented with a list of options to confirm, change your environment variables if you like and re-run, otherwise press ENTER to launch Splunk.
By default, Splunk will be listening at https://localhost:8000/.
Telegram has a blog post which explains how to export data over here. However, if you follow those instructions, everything will be exported, a process which will take hours and hours. Instead, we recommend that you export a single channel, group, or conversation at a time. This can be done in the Telegram Desktop App by going into the converstaion or group and manually exporting it:
This will save the converstaion in Telegram's own HTML format, which we can then parse to extract messages.
Splunk has its own license. Please abide by it.
The Docker image ships with the NLP Text Analytics app, which is licensed under the MIT License.
- Only regular messages are supported at this time. If a photo or sticker is found, a note will be made that it was a photo of a specified size. No other media types (including stickers) are supported at this time.
- Forwarded messages are not counted/supported at this time.
- Messages that are imported must be in the current directory because of how Docker mounts directories
- I may revisit this in the future and instead take a directory as a value to Docker's
-v
argument.
- I may revisit this in the future and instead take a directory as a value to Docker's
- I need to add Development instructions and possibly revisit that
My email is doug.muth@gmail.com. I am also @dmuth on Twitter and Facebook!