What is this?
This is a program that takes your Twitter archive's
tweets.csv on stdin and then prints some stats about it.
It doesn't have much functionality and could use a little lazy parsing but is fine otherwise. Well, there are missing comments but this is just temporary. It's also probably not best practice, that's why I'm going to rewrite it a little.
Currently prints (might be incomplete):
- tweet count
- number of retweets
- number of links
- mean number of characters
- mean number of accounts mentioned
- mean number of links
- mean number of words
- mean word length
- most used sources
- most used characters
- most used words
- most often replied to
- most often mentioned
I want <some stat>!
Tell me! Open an issue!
I want to add <some stat> myself!
Just send a PR. This whole thing is licensed under MIT so you're probably fine using it.
How do I actually use this?
Best is you install stack.
# if you are using stack stack setup stack build stack exec twitter-archive-parser < ../tweets.csv # if you use cabal (assuming you have appropriate dependencies) cabal run < ../tweets.csv # if you happen to have ghc (8.0), parsec, regex-pcre and containers installed # you probably know how to compile and run this yourself
I stripped the list output to only two per stat.
tweet count: 112594 number of retweets: 50606 number of plain tweets (no mentions or retweets): 21948 mean number of accounts mentioned: 0 mean character count: 54 number of characters: 3408388 most used sources: - <a href="https://github.com/veskuh/Tweetian" rel="nofollow">Tweetian for Sailfish OS</a> - <a href="http://hotot.org" rel="nofollow">Hotot</a> most used characters: - ' ' - 'e' most often replied to: - 170420365 - 1201898934 most often mentioned: - @_xxyy - @Nordwind25
In case you want numbers instead of just the top 5 or something similar, at the
moment most output things use
mostOccurring, just change that to
sortedOccurrences and you should get some nice numbers.
There will be an arguments to make the program print numbers by itself soon.
--format you can set the output format to e.g. plain or
json which then prints in the choosen format.
Currently the ChangeLog and versioning are based on the functions that are not directly stats but make up the API that is used by the statistics. This will change with the 18.104.22.168 release. From then on the versioning will be about the statistics and not about the internal API, mainly because the API will remain frozen then, except for major releases.
There might be a 22.214.171.124 release which transforms this entire program into a library which can be used easier by third party programs and a small command line program to call the library.