Skip to content
acksponies edited this page Aug 21, 2012 · 5 revisions

I created this project because I was not satisfied with the way short URLs were generated in existing shortening scripts because of the following basic problems: * One, they are typically sequential: they reveal how many target URLs your system is currently storing. * You may not mind this outright. But there are settings where you don't want to reveal the rate at which new URLs are being added to your system, or how many URLs are total stored in your system. * This may be considered proprietary information in some settings. * Two, codes can end up being comprised of weird, confusing, or unwanted character combinations. * such as 'bad words' (racist, profane, vulgar, scatological). Maybe that possibility does not bother you personally, but what about professional settings? * confusing letter/number combinations, repeating characters (l1l,lll,111). This may not be considered unacceptable, but it probably depends on your setting and who and what the end use is. * Three, what if you want to reserve certain character combinations ahead of time? * This is not particularly easy to do with existing algo based scripts.

I considered the possibility of possibly creating an algo that generated codes in a way that is non-sequential. Should non-sequential id-to-code/code-to-id algo be a feasible, I would still be left with my second problem.

Any algo that creates reasonably short URLs does not eliminate the need for persistent target URL storage.

My solution (the shrturl project: http://github.com/katmore/shrturl) is to eliminate the algo at the expense of storing available shorten codes ahead of time.

The url_code.sql provided with this project has all codes removed that equate to ANY English dictionary word, also removed are any that equate to vulgar, racist, and scatological words (or at least this was the intention). Feel free to submit an issue if any problematic codes are found.

As mentioned above, it also might be desirable to remove 'confusing' character combinations (such as repeating chars, all numeric ones, etc) but I have not done so in url_code.sql.

-Doug (http://www.linkedin.com/in/pauldouglasbird)

Clone this wiki locally