Skip to content

DataDepreciates

Ben Christel edited this page Nov 13, 2021 · 1 revision

Stored data depreciates in value over its lifetime until it becomes a liability.

Consider the case of a NoSQL document store. When the app is young, you just throw any old thing in there. You don't worry about migrations or consistency, because the data is small and simple and there's not much that could get out of sync. Any unexpected null values or whatever can be patched over when the data is read.

As the application grows, so does the complexity of the data that you store. Data becomes interrelated and denormalized and now you have bugs in your application as a result. Many of the bugs only affect users who have been around for a long time and have data that was stored by older versions of the code.

At this point, you'd like to migrate all your data to a new, better, normalized format. But there's so much of it and it's in such a convoluted state that, for reasons of complexity and expense (CPU and IO cost) you can't.

The need to maintain backwards compatibility with all this crufty data slowly drags your application into the muck. Every new feature you add has to contend with its complexity. You can't just get rid of it, because to do so would be to get rid of your users.

The data has become a liability. What was once a valuable asset, something you could charge rent on, has now become something that you are paying through the nose to maintain. The only thing more expensive than maintaining it (you think) would be deleting it.

Yes, data is an asset. But it's an asset that depreciates. At some point, its value becomes negative.

What should we do about it? Use a frigging schema! Normalize it. If you can, use a relational database—or several, sharded, perhaps. Establish an understanding with your users that old data will not stick around forever.

Clone this wiki locally