# Unpack twitter ids to get timestamp etc 

Twitter ids are interesting here: there are roughly 2000 10-bit ids, then a jump to 18-bit ids (roughly 1000 of these).  

* Coo: ids can break JavaScript, which is why the string version is included in the API https://developer.twitter.com/en/docs/basics/twitter-ids
* Twitter opensourced the code https://github.com/twitter/snowflake
* ... and now I'm reading Scala (function nextId in file IdWorker.scala)
* Twitter wrote about it: https://www.slideshare.net/davegardnerisme/unique-id-generation-in-distributed-systems
* 64 bits = 41-bit unix timestamp (ms) + 10-bit machine id + 12-bit sequence (0-4095); timestamp = timestamp - twepoch, where twepoch is the date of the first tweet, 1288834974657 or 2010/11/04 10:42:54 (see https://www.slideshare.net/moaikids/20130901-snowflake)
* id = time* 2^22 + node * 2^12 + seq
* So we should be able to work out when the twitter ids were created, at least for the later 1000 ones. Plus which machines they were on. 

Notes:
* datetime.datetime.utcfromtimestamp(twitter_epoch/1e3).strftime('%Y-%m-%d %H:%M:%S') gives '2010-11-04 01:42:54'.  Am presuming for now that the original tweet date stated was in PST time.
* Using this in congress tweeter list decoding.

In [6]:
import math
import datetime


def decode_twitter_id(twitterid, verbose=False):
    
    # Weed out old twitter ids
    # put boundary at pow(10,12)
    if twitterid < math.pow(10,12):
        return(twitterid, 0, 0, 0, 0)
    
    twitter_epoch = 1288834974657
    bins = '{0:b}'.format(twitterid)
    if verbose:
        print('{0}\n{1}, {2}, {3}'.format(bins, bins[:-22], bins[-22:-12], bins[-12:]))
        print('{0}, {1}, {2}'.format(int(bins[:-22], base=2), int(bins[-22:-12], base=2), 
                                            int(bins[-12:], base=2)))
    seq = int(bins[-12:], base=2)
    mid = int(bins[-22:-12], base=2)
    tepoch = twitter_epoch + int(bins[:-22], base=2)
    tmillis = tepoch % 1e3
    tstamp = datetime.datetime.utcfromtimestamp(tepoch/1e3).strftime('%Y-%m-%d %H:%M:%S')
    return(seq, mid, tepoch, tstamp, tmillis)

In [7]:
decode_twitter_id(701386398629306368)

(0, 370, 1456058523219, '2016-02-21 12:42:03', 219.0)

In [8]:
decode_twitter_id(1688370956)

(1688370956, 0, 0, 0, 0)