parses a twitter archive and prints stats
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
.travis.yml
ChangeLog.md
LICENSE
README.md
Setup.hs
stack.yaml
twitter-archive-parser.cabal

README.md

Build Status

What is this?

This is a program that takes your Twitter archive's tweets.csv on stdin and then prints some stats about it.

It doesn't have much functionality and could use a little lazy parsing but is fine otherwise. Well, there are missing comments but this is just temporary. It's also probably not best practice, that's why I'm going to rewrite it a little.

Currently prints (might be incomplete):

  • tweet count
  • number of retweets
  • number of links
  • mean number of characters
  • mean number of accounts mentioned
  • mean number of links
  • mean number of words
  • mean word length
  • most used sources
  • most used characters
  • most used words
  • most often replied to
  • most often mentioned

I want <some stat>!

Tell me! Open an issue!

I want to add <some stat> myself!

Just send a PR. This whole thing is licensed under MIT so you're probably fine using it.

How do I actually use this?

Best is you install stack.

# if you are using stack
stack setup
stack build
stack exec twitter-archive-parser < ../tweets.csv

# if you use cabal (assuming you have appropriate dependencies)
cabal run < ../tweets.csv

# if you happen to have ghc (8.0), parsec, regex-pcre and containers installed
# you probably know how to compile and run this yourself

Example Output

I stripped the list output to only two per stat.

tweet count: 112594

number of retweets: 50606

number of plain tweets (no mentions or retweets): 21948

mean number of accounts mentioned: 0

mean character count: 54

number of characters: 3408388

most used sources:
- <a href="https://github.com/veskuh/Tweetian" rel="nofollow">Tweetian for Sailfish OS</a>
- <a href="http://hotot.org" rel="nofollow">Hotot</a>

most used characters:
- ' '
- 'e'

most often replied to:
- 170420365
- 1201898934

most often mentioned:
- @_xxyy
- @Nordwind25

Getting Numbers

In case you want numbers instead of just the top 5 or something similar, at the moment most output things use mostOccurring, just change that to sortedOccurrences and you should get some nice numbers.

There will be an arguments to make the program print numbers by itself soon.

Output Format

Using -f or --format you can set the output format to e.g. plain or json which then prints in the choosen format.

Version Information

Currently the ChangeLog and versioning are based on the functions that are not directly stats but make up the API that is used by the statistics. This will change with the 1.0.0.0 release. From then on the versioning will be about the statistics and not about the internal API, mainly because the API will remain frozen then, except for major releases.

There might be a 2.0.0.0 release which transforms this entire program into a library which can be used easier by third party programs and a small command line program to call the library.