# Design a TinyURL application

- How many unique identifiers possible? Will you run out of unique URLs?
- Should the identifier be increment or not? Which is easier to design? Pros and cons?
- Mapping an identifier to an URL and its reversal - Does this problem ring a bell to you?
- How do you store the URLs? Does a simple flat file database work?
- What is the bottleneck of the system? Is it read-heavy or write-heavy?
- Estimate the maximum number of URLs a single machine can store.
- Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.
- How would you scale the service? For example, a viral link which is shared in social media could result in a peak QPS at a moment's notice.
- How could you handle redundancy? i,e, if a server is down, how could you ensure the service is still operational?
- Keep URLs forever or prune, pros/cons? How we do pruning?
- What API would you provide to a third-party developer?
- If you can enable caching, what would you cache and what's the expiry time?

## Solution

We're essentially creating a massive hashmap from a real-url to a unique string alias that will be at http://www.tinyURL.com/<alias>.

### How many can we support?


We'll make things easy on ourselves by making the aliases fixed, length strings using the character set [A-Z, a-z, 0-9], which, with 62 unique characters, can produce 62^L unique aliases, where L is the length of the string.

There are currently >1Billion unique URLs, so 6 character length string would give us ~56.8B urls, and 7 charactes would give us ~3.5 Trillion URLs.  

To be on the safe side, we can choose an alias lenght of 7.

### Cost

7 * 4 bytes + 2084 * 4 bytes + 8 byte + 8 bytes= 
28 + 8336 + 8 + 8 = 8.4 Kb
alias + URL + timestamp (long) + counter (long)

If we stored 1M rows, 8.4 GB
If we stored 1B rows, 8.4 TB
If we stored 1T rows 8.4 PB
If we store 5T rows 42PB (more than our alias capacity)

### How to create the the aliases?

Should we increment them?

We can build an incrementer directly into the hash, or we can keep track of the creation timestamp.  If it's build into the hash, sorting is done without a join.