Skip to content

Features that will change

Adam Hooper edited this page Jun 30, 2021 · 18 revisions

At Workbench, we do our utmost to preserve your data, your settings and your sanity.

We also learn as we go. We make mistakes. And when we do, we find that the only paths forward are all painful.

Here is our comprehensive list: all Workbench's current features that we're likely to change.

There are no dates on this list! This isn't our roadmap. It's a heads-up for concerned users. If a feature isn't on this list, rest easy: it won't frustrate you in the foreseeable future.

Reducing automatic update rates for unpaid users

The feature: You tell Workbench to update your workflow periodically when something external (Twitter; an API; a Google Doc) changes.

What will change: We'll reduce the number of automated updates you're allowed, if you're unpaid.

Why: This feature is expensive! It's really, really expensive. Here's why.

Suppose you log in to Workbench every day and edit your workflow 50 times. We consider you a "power user". You're using Workbench far more than most active users.

Now, suppose that someone else -- let's call her Amy -- builds a workflow and schedules 50 automatic updates per day ... and then she drifts out of contact with us.

Who do you expect will cost us more to serve: Amy, or you? Surprisingly, it's Amy. Automated updates happen at the beginning of a workflow, so Amy's updates are more expensive than yours.

What if I'm a paid user? When you subscribe to Workbench, we commit to providing you with the number of updates you pay for. We won't constrain your update count.

Limiting how long a workflow can process

The feature: You give Workbench huge tables and lots of joins, and it takes two minutes to process every step. Your workflow runs for 30 minutes.

What will change: We'll reduce the allowed processing time -- especially if you're unpaid.

Why: We want Workbench to handle lots of workflows for lots of people.

Right now, Workbench's design has a small loophole: if a user updates a workflow every 15 minutes, and Workbench takes 20 minutes to process that workflow, then Workbench will end up processing that workflow forever.

As of June 2021 Workbench hasn't had this problem yet. When it does, we'll close the loophole by stopping workflows that run too long.

What if I'm a paid user? When we add this restriction, we'll scan our logs to determine whether you'll be affected. If so, we'll contact you well ahead of time.

Replacing the "formula" module

The feature: When Workbench doesn't have the perfect step for you, you can fall back to the "Formula" step. It lets you write Excel formulas like A1 + C1.

What will change: We're going to build a new step that isn't exactly like Excel.

Why: Because heck, the existing interface isn't exactly like Excel; and it's not as pleasant as we want.

Workbench's model isn't the same as Excel's. Excel stores dates and timestamps as numbers; Excel allows "error" values; Excel lets you highlight arbitrary cells.

Plus, writing columns in Excel format ("A1", etc.) has pain points. Excel itself auto-rewrites formulas when you reorder columns. Workbench never auto-rewrites anything. In Workbench, it's painful to edit data before an Excel function.

All told, we aren't proud of the "formula" step. We'll do better.

What if I'm a paid user? Don't worry -- whether you're paid or not. We're going to be extra-considerate with all our users when we release this enhancement.

Revamping the API

The feature: Workbench lets you expose any step's output as an "API", with CSV and JSON formats.

What will change: Everything: authentication; URLs; pricing; and which steps can be exported.

Why: We never built an "on" switch! Without that, we conflict with Workbench's model.

Workbench's model is: "source data" + "steps". Given the same source data and steps, Workbench produces the same tables.

This lends itself to a very common data structure: a "cache". We "cache" each step's table, to serve you more quickly. Those cached tables are temporary. We delete them whenever source data changes. And we delete every cached table occasionally, during some Workbench upgrades. Deletion shouldn't impact you because the whole point of Workbench is that "source data" + "steps" gives the same result every time.

Unfortunately, today's "API" feature is, well, a hack. It simply serves the cached table. If we deleted the cached table, Workbench produces an error message (and strives to cache a table sometime in the near future). These API errors are common, and they affect most users. We need to fix this oversight. Our solution is to stop serving from the cache.

And we can't fix today's feature because there's no "on" switch. We want to store API-ready tables for our users ... and we don't know which tables users are using!

We're going to build a user interface that lets you "turn on" APIs. This new API will cost us more, and so it'll be paid. We're not sure about the pricing yet or what we'll offer our free tier.

All told, here's what you can expect:

  • You will be able to build an API from any workflow and maintain which tables that API gives.
  • Your new APIs will come with a new URL scheme ... and we will phase out the old URLs.
  • If you're a paid user, you can use the "secret link" feature as a (weak) authentication scheme.
  • Some API features will require payment.

There will be a transition period in which the old URLs continue to work as they do today.

What if I'm a paid user? When we introduce the new API feature, we'll contact you personally to make sure we keep you happy.

Storing one "Version" per update

The feature: Workbench stores old "versions" of data, so you can revisit them. The version list omits "no-change" versions.

What will change: Workbench will no longer omit "no-change" versions. Every update will produce a version.

(When you're out of space, Workbench deletes old versions to make way for new ones. With Workbench storing a new version with every update, old versions will be deleted sooner in many workflows.)

Why: We never defined "version". On the Web, every request tends to produce slightly different data. Workbench fibs when it declares an update to be "no-change".

For instance: if a Twitter search produces the same tweets but with different retweet counts, should Workbench consider the new table to be "no-change"? If a second request to the same HTTP URL produces the exact same data with a more up-to-date timestamp, should Workbench consider the new timestamp a "change"? Answers to these questions will satisfy some users and frustrate others.

Workbench already has a feature that satisfies all users and frustrates none: workflows! Turn on alerts to find out when data changes. If you don't care about retweet counts, delete that column.

We think Workbench will feel simpler and more transparent with this change. And we haven't spoken to anybody who seems to mind.

What if I'm a paid user? We're pretty sure you won't mind this change. Please contact us if you're worried.

Frequent breaking changes to the "Python" step

The feature: Workbench lets you write arbitrary code in a "Python" step (with Pandas and Numpy support).

What will change: We don't support Python code. Any day, any Python step may change results. Steps that use the Internet are most at-risk.

Why: Two reasons: A) Python is hard to support; and B) Workbench is designed to replace Python.

It's hard to support Python. Whenever we deliver a new feature, we may break Python code. For example, we want to upgrade to Pandas 1 to help users take advantage of its nullable integer columns; but as of June 2021, Workbench is still stuck on 2-year-old Pandas 0.25 because an upgrade will certainly break some users' Python steps. We're someday going to drop networking support from the Python module entirely, to strengthen Workbench's model of "source data" + "steps"; again, this will certainly break some users' Python steps. (Python modules that access the Internet behave differently every time they run.)

Remember Workbench's mission: we make data easy to understand and share. For most people, Python code doesn't fit that definition. Workbench's big bet is that you can do all the analysis you need without Python. We compete with Python, so we're not going to invest endless effort in supporting it.

We consider Python steps temporary. Python gives great value, even if your code may break in the future. Python lets users write code where a Workbench step doesn't exist yet; Python lets educators teach Pandas 101, for a class-wide "a-ha" moment; Python lets coders find a quick-and-dirty answer to see whether it's worth fleshing out a workflow that can be understood and shared.

You're free to use Python. It will be around forever. But be aware: every Python step that works today may break tomorrow. That may mean an error message; it could also mean a different result. Consider your Python code -- and any workflow with a Python step -- temporary. You'll need to edit it every time Workbench changes.

We won't notify you of these changes, because that would slow us down too much. Workbench changes hundreds of times each year.

What if I'm a paid user? If you're using Python and you want your workflow to last forever, please tell us what Python is doing for you. We build solutions that last. We'd love to tackle your problem.