# Confluence Tiny URLs and the magic of Base64

#confluence #base64 #encodings #urls

## What are tiny URLs in Confluence?

If you have ever shared a Confluence link – be it an internal wiki page, or a public Atlassian blogpost – chances are you have used one of these URL formats:

1. **Pretty URL**, which includes the page title: https://wiki.softwareplant.com/display/DOCUMENTATION/About+BigPicture
2. **'Ugly' URL**, which includes the numeric page ID: https://wiki.softwareplant.com/pages/viewpage.action?pageId=201819180
3. **Tiny URL**, which is generated when you click on the 'Share' button: https://wiki.softwareplant.com/x/LIQHD

For sure it's convenient to have a choice between pretty URLs and regular URLs. Pretty URLs convey the content of the page, but break if the page title or space changes, can become too long, and are not always available (in case the title contains special characters). Regular URLs are bounded in length and will always point to the page regardless of renaming, but look 'ugly' (the 'viewpage.action' bit is just noise).

Tiny URLs are just like regular URLs, but much easier on the eye. Besides being 'tweetable', they are easy to remember and type out.

And yet, they contain exactly the same information of a regular link, which uses the much longer numeric page ID. How is this achieved?

To answer this, we need to take a look at how encodings can be used to compress data and adapt it to a text-only medium like the URL.

## A tiny URL string _is_ the page ID

Given that a tiny URL uniquely identifies a piece of Confluence content, it is (unsurprisingly) derived from the numeric page ID.

It is actually _exactly the same_ value as the page ID! But instead of using a decimal or Base10 representation (like the number 84803642), it is uses Base64 representation. The same exact bytes, just translated to a different encoding.

Below is a Python function to calculate the tiny URL of a page, given the page ID. Note that this was is best-effort adaptation of the Perl code shown in [this Atlassian KB article](https://confluence.atlassian.com/confkb/how-to-programmatically-generate-the-tiny-link-of-a-confluence-page-956713432.html).

In [104]:
import base64
import re


def pageid_to_tinystring(pageid):
    # 1. Turn the page ID into its bytestring representation.
    #    PageID is an unsigned long (32-bit) integer.
    #    Little-endianness is assumed (although not explicitly
    #    specified in the corresponding Perl function).
    pageid_bytes = int(pageid).to_bytes(4, 'little')
    
    # 2. Encode bytes into base64 (with URL-safe characters)
    #    and then decode into a base64 string
    tinystring_raw = base64.urlsafe_b64encode(pageid_bytes).decode()
    
    # 3. Strip out any padding ('=' characters) and leading 'zero' bits
    #    (these become trailing 'A' characters in little-endian base64)
    tinystring = re.split('A*=*$', tinystring_raw)[0]
    
    # 4. We have the tiny string!
    return tinystring

def tinystring_to_pageid(tinystring):
    # 1. Pad the base64 string (otherwise decoder will throw an error).
    #    Nice hack: just append leading zeros and padding, Python will
    #    ignore any excess padding (thx to stackoverflow.com/a/2942039)
    tinystring_padded = tinystring + 'AA==='
    
    # 2. Convert the base64 string into raw bytes
    pageid_bytes = base64.urlsafe_b64decode(tinystring_padded)
    
    # 3. Interpret those bytes as 32-bit unsigned integer
    return int.from_bytes(pageid_bytes, 'little')


# Example
pageid = 84803642
assert pageid == tinystring_to_pageid(pageid_to_tinystring(pageid))

Let's run our algorithm on a set of publicly available Confluence pages, and check that it computes the correct tiny URL and back to the page ID:

In [114]:
import requests
import json

baseurl = 'https://tempo-io.atlassian.net'

def gen_page_tinyurls(baseurl):
    resp = requests.get(baseurl + '/wiki/rest/api/content?type=page&limit=20')
    resp.raise_for_status()
    pages = resp.json()['results']
    for page in pages:
        pageid = page['id']
        tinystring = page['_links']['tinyui'].split('/')[-1]
        assert int(pageid) == tinystring_to_pageid(pageid_to_tinystring(pageid)), \
                f'TinyURL calculation is incorrect for pageID {pageid}, tinyURL {tinystring}'
        yield (pageid, tinystring)

print('PageID\t\tTinyURL string\n')
for vals in gen_page_tinyurls(baseurl):
    print('\t'.join(vals))

PageID		TinyURL string

84803642	OgAOBQ
135987203	AwAbC
136085540	JIAcC
136609846	NoAkC
142901377	gYCEC
143360025	GYCLC
143491152	UICNC
144867453	fYCiC
144932914	MoCjC
144932960	YICjC
149618744	OADrC
154763266	AoA5CQ
164561016	eADPCQ
164593691	G4DPCQ
164593760	YIDPCQ
164593767	Z4DPCQ
164659244	LIDQCQ
164659251	M4DQCQ
164692019	MwDRCQ
164692065	YQDRCQ


## References

- [How to programmatically generate the tiny link of a Confluence page](https://confluence.atlassian.com/confkb/how-to-programmatically-generate-the-tiny-link-of-a-confluence-page-956713432.html)
- [Will YouTube Ever Run Out Of Video IDs?](https://www.youtube.com/watch?v=gocwRvLhDf8)
- [URL Shortening (Wikipedia)](https://en.wikipedia.org/wiki/URL_shortening)