Skip to content

Conversation

jseldess
Copy link
Contributor

@jseldess jseldess commented May 17, 2019

  • Overview
  • Single-region
    • Development
    • Basic Production
  • Multi-region
    • Geo-Partitioned Replicas
    • Geo-Partitioned Leaseholders
    • Pinned Secondary Indexes
    • Follower Reads
    • Follow-the-Workload

Fixes #4574.

Future improvements/considerations tracked in #4991.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jseldess jseldess changed the title [Topology patterns, v2 [WIP] Topology patterns, v2 May 17, 2019
@jseldess jseldess force-pushed the cluster-patterns-v2 branch 3 times, most recently from 67a446c to dfcb17f Compare May 17, 2019 18:21
@jseldess jseldess force-pushed the cluster-patterns-v2 branch 4 times, most recently from 3fbf703 to 797086e Compare May 31, 2019 18:40
@jseldess jseldess force-pushed the cluster-patterns-v2 branch 4 times, most recently from b87ff43 to 23a6bfd Compare June 13, 2019 02:08
@jseldess jseldess force-pushed the cluster-patterns-v2 branch 2 times, most recently from fc7682a to 3469000 Compare June 13, 2019 04:35
@jseldess
Copy link
Contributor Author

jseldess commented Jul 3, 2019


v19.1/follower-reads.md, line 27 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This change looks like it belongs in another PR.

Removed.

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


_includes/v19.1/topology-patterns/multi-region-cluster-setup.md, line 14 at r3 (raw file):

- Region-specific app instances and load balancers
    - Each load balancer redirects to CockroachDB nodes in its region.
    - When CockroachDB nodes are unavailable in a region, the load balancer redirects to nodes in other regions.

If we mention this we should have a section someplace in the docs about how to setup backup servers in haproxy.

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-basic-production.md, line 50 at r3 (raw file):

#### Reads

Since all ranges, including leaseholder replicas, are in a single region, reads are very fast.

I think it would be good to talk about things in terms of latency. So here we could say that "reads will have low latency."

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-basic-production.md, line 64 at r3 (raw file):

#### Writes

Just like for reads, since all ranges are in a single region, writes can achieve consensus very fast.

"Writes will also have low latency."

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-basic-production.md, line 98 at r3 (raw file):

- Other system ranges contain information needed to allocate new table IDs and track the status of a cluster's nodes.

For the cluster as a whole to remain available, these system ranges must retain a majority of their replicas, so CockroachDB comes with [pre-configured replication zones](configure-replication-zones.html#create-a-replication-zone-for-a-system-range) for them with the replication factor increased to `5` by default. Based on the [cluster setup](#configuration) described above, with each node started with the `--locality` flag specifying its AZ, each system range's replicas are spread across 3 AZs. This means that one entire AZ can fail without causing any system range to lose consensus:

One caveat here is that I don't believe it goes to '5' replicas if there are only 3 machines in the cluster.

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-follower-reads.md, line 9 at r2 (raw file):

v19.1/topology-follower-reads.md

I often talk about follower reads being good for queries against tables which have fewer updates.

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-follower-reads.md, line 10 at r3 (raw file):

- The table is read frequently but updated infrequently.
- Reads must be fast, but writes can be slower.

I'm not sure this should be here, there is nothing about follower reads which slows down writes from any other topology.

@rkruze
Copy link
Contributor

rkruze commented Jul 3, 2019


v19.1/topology-follower-reads.md, line 13 at r3 (raw file):

- Reads can be historical (48 seconds or more in the past).
- Rows in the table **cannot** be tied to specific geographies (e.g., a reference table).
- The table must remain available during a region failure.

How is this related to follower reads?

@jseldess jseldess force-pushed the cluster-patterns-v2 branch from cfc5777 to e446dc3 Compare July 8, 2019 05:59
Copy link
Contributor Author

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of edits. @bdarnell, @nvanbenshoten, please take another look. If you can accept my punting some issues to the next iteration (tracked here), I'd love to get this merged.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @bdarnell, @nvanbenschoten, @rkruze, @robert-s-lee, and @timveil)


_includes/v19.1/topology-patterns/multi-region-cluster-setup.md, line 10 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'd drop this sentence. One VM per AZ is fine, and is expensive enough.

Done.


_includes/v19.1/topology-patterns/multi-region-cluster-setup.md, line 14 at r3 (raw file):

Previously, rkruze (Roko) wrote…

If we mention this we should have a section someplace in the docs about how to setup backup servers in haproxy.

I agree. We have an issue for that (#4388), though I'm not sure when we'll get to it.


v19.1/multi-region-topologies.md, line 32 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

3 VMs per region are only required for the geo-partitioning cases. Pinned index leaseholders, follower reads and follow the workload are all fine with one VM per region.

This current iteration takes the simplistic approach of assuming there are 3 AZs per region and so at least 1 VM per AZ. Clearly, this will be excessive in some cases. I've added this as something to revisit in the next iteration: #4991.


v19.1/multi-region-topologies.md, line 240 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Now that I've split the patterns into separate pages, I might add anti-patterns on each page.

For now, just added an "Anti-patterns" list to the topology overview page.


v19.1/multi-region-topologies.md, line 283 at r1 (raw file):
In our current docs, we explicitly say:

Currently, follower reads are available for any read operation at least 48 seconds in the past, though there is active work to reduce that window.

Given the thread above, I think we need to figure out the exact messaging and guidance for follower reads more generally. It's broader than this PR. I've created this issue: #5024.


v19.1/multi-region-topologies.md, line 317 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It might also be worth mentioning that (like the other patterns), this can be combined with leaseholder preferences or follow the workload to avoid the gateway -> leaseholder hop during writes. This pattern works well when applications in multiple regions may be reading by only applications in a single region are performing writes.

I'd like to hold off on further detail about using these patterns in combination for the next iteration. Added to this issue: #4991.


v19.1/multi-region-topologies.md, line 335 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's the default and used for the system tables, and it's the only pattern available without an enterprise license. It's important to discuss it.

OK, keeping it, with some tweaks.


v19.1/multi-region-topologies.md, line 341 at r1 (raw file):

Previously, awoods187 (Andy Woods) wrote…

that's why i like calling it the "default" pattern

Done.


v19.1/topology-basic-production.md, line 50 at r3 (raw file):

Previously, rkruze (Roko) wrote…

I think it would be good to talk about things in terms of latency. So here we could say that "reads will have low latency."

Done. Changed fast/slow to "latency"-focused language throughout.


v19.1/topology-basic-production.md, line 64 at r3 (raw file):

Previously, rkruze (Roko) wrote…

"Writes will also have low latency."

Done.


v19.1/topology-basic-production.md, line 98 at r3 (raw file):

Previously, rkruze (Roko) wrote…

One caveat here is that I don't believe it goes to '5' replicas if there are only 3 machines in the cluster.

Done.


v19.1/topology-follower-reads.md, line 9 at r2 (raw file):

Previously, rkruze (Roko) wrote…
v19.1/topology-follower-reads.md

I often talk about follower reads being good for queries against tables which have fewer updates.

These bullets are listing characteristics of tables that make them good candidates for follower reads. The first bullet is saying, if the table is read frequently but updated infrequently, that's one sign that follower reads might be a good pattern for it. But this and some other comments below make me think the first bullets of all of these patterns, where I say something like "The table is read xxx and updated xxx" are not helping. I'll remove them and start with the more direct latency statements.


v19.1/topology-follower-reads.md, line 41 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider mentioning that this pattern is compatible with all of the other patterns except geo-partitioned replicas.

Added to tip in intro.


v19.1/topology-follower-reads.md, line 10 at r3 (raw file):

Previously, rkruze (Roko) wrote…

I'm not sure this should be here, there is nothing about follower reads which slows down writes from any other topology.

Well, the idea is that reads are in-region and fast while writes are cross-region and slow. Even though reads are the focus in this and some of the other patterns, I think it's important to signal the impact on writes. In each case, the impact is spelled out in more detail in the Latency section.


v19.1/topology-follower-reads.md, line 13 at r3 (raw file):
It's described in the resiliency section:

Because this pattern balances the replicas for the table across regions, one entire region can fail without interrupting access to the table.


v19.1/topology-geo-partitioned-leaseholders.md, line 9 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
updated infrequently

That's not really true. The table can be frequently updated and the pattern is fine as long as the application can tolerate the additional replication latency.

As mentioned above, I removed this first bullet, and the corresponding first bullets in other patterns, in favor of focusing on latency requirements.


v19.1/topology-geo-partitioned-leaseholders.md, line 20 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"If rows and all latency-sensitive queries cannot be tied to specific geographies" (wherever messages like this appear). It's often easy to tie the rows to a region but much harder to ensure that the region key appears in all queries.

Done, though I think we're bumping up against the limits of these docs in terms of clarity vs. completion.


v19.1/topology-geo-partitioned-leaseholders.md, line 99 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

By itself, I don't think lease preferences are enough to actually ensure that a replica of a Range will always be placed in the region that we want to give a lease preference. This won't have an effect on a 3 region cluster, but will for a cluster with more than 3 regions. To address this, I think we need to specify the zones like:

ALTER PARTITION la OF TABLE users CONFIGURE ZONE USING constraints = '{"+us-west":1}', lease_preferences = '[[+region=us-west]]'; 

@bdarnell could you confirm this?

Done.


v19.1/topology-geo-partitioned-leaseholders.md, line 154 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This visualization gets a little crazy because three writes are happening at once. We should consider doing only a single write at once so that a reader can track the path of a single write.

Good call. Done, also for the write animation for geo-partitioned replicas.


v19.1/topology-geo-partitioned-leaseholders.md, line 160 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should mention that because the replicas for each range are spread across regions, the leaseholder can failover to another region if the region with its preference goes down.

Done. Also updated the image to show this.


v19.1/topology-geo-partitioned-replicas.md, line 9 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure we should have the "frequently" language here - what matters is whether the application can tolerate slowness or if it needs low latency. Usually the frequency will be correlated with the performance requirements but not necessarily.

Remove this first bullet and the corresponding first bullets in the other patterns.


v19.1/topology-geo-partitioned-replicas.md, line 12 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add "It's OK for regional data to become unavailable during a region-wide failure." (to emphasize the contrast with geo-partitioned leaseholders)

Done.


v19.1/topology-geo-partitioned-replicas.md, line 22 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Maybe it's just because I'm reviewing this all at once, but I find the amount of repetition in this presentation annoying. It's hard to pick out what's actually different between the patterns.

I'm using includes for prerequisites that are common across patterns. I know this isn't a perfect solution, but it's the best I can come up with in this iteration, I think.


v19.1/topology-geo-partitioned-replicas.md, line 50 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'd recommend a region column instead of a city one. There's too much bookkeeping with any realistic list of cities.

There's some value in using region names here that don't necessarily correspond to your locality regions, so you have a level of indirection (store records with region=eu-central now, even though they're mapped to locality us-east, so that you have less to change when you do start to have servers in europe), but going all the way down to cities is impractical.

I hear your point, but I'd prefer to stick with cities for now, as I'd have to update a ton of images, and this approach is already modeled by MovR and a few of our other tutorials.


v19.1/topology-geo-partitioned-replicas.md, line 60 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We should be clear that the secondary index is not required for this pattern; it's fine to have a geo-partitioned table with no secondary indexes. Conversely, if you have multiple secondary indexes, you must partition all of them (which means that they must all start with your city/region column, and this therefore impacts which queries they'll be useful for) to ensure the fast writes promised by this pattern. If you can't partition all your secondary indexes, you're likely better off with geo-partitioned leaseholders instead.

Done.


v19.1/topology-geo-partitioned-replicas.md, line 89 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/a distinct range/distinct ranges/ (each partition may have multiple ranges if the table is big enough)

Done.


v19.1/topology-geo-partitioned-replicas.md, line 125 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should clarify that reads and writes that span multiple partitions are still possible and transactionally consistent, they just won't have local latencies.

Done.


v19.1/topology-geo-partitioned-replicas.md, line 139 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add "or writes that do not specify a region key".

Done.


v19.1/topology-geo-partitioned-replicas.md, line 167 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The system tables cannot be configured to use partitioning, so I don't think it makes sense to duplicate the cluster resiliency info here or on the other patterns.

Are you suggesting to remove the cluster resiliency details entirely? I only added them in response to feedback/discussion in a core meeting. It's true that these cluster resiliency details don't connect to the particular patterns, but I'm not sure where else to put them...


v19.1/topology-geo-partitioned-replicas.md, line 169 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove this empty heading

Sorry. Meant to add a link to the geo-partitioning tutorial and video. I'm hoping to add tutorials for the other patterns over time as well.


v19.1/topology-pinned-index-leaseholders.md, line 41 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Mention that this pattern is suited well for immutable/reference tables that are rarely or never updated.

Added to intro.


v19.1/topology-pinned-index-leaseholders.md, line 67 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Mention that if you have more than three regions, you'd create an index for all but one of them (the one you specified in the previous step).

Reworded.


images/v19.1/topology_basic_production1.png, line at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

For all the graphics: There is a convention (from UML?) to represent storage devices like databases as cylinders and to use other shapes for applications and other components. Maybe the application should be a box instead of a cylinder. (I hope these are automated in some way; if they're not it's not worth changing)

Updating these images/animations takes me a long time, unfortunately, so I'm going to punt this to the next iteration: #4991.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 22 files at r4, 3 of 14 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @bdarnell, @jseldess, @nvanbenschoten, @rkruze, @robert-s-lee, and @timveil)


v19.1/topology-geo-partitioned-leaseholders.md, line 20 at r2 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Done, though I think we're bumping up against the limits of these docs in terms of clarity vs. completion.

True, but this is really important - if you can't incorporate the region key into all your queries, everything will turn into full table scans and destroy your performance. This is the most important downside to the geo-partitioned patterns. They give the best performance but require this very intrusive change to your application.


v19.1/topology-geo-partitioned-replicas.md, line 167 at r2 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Are you suggesting to remove the cluster resiliency details entirely? I only added them in response to feedback/discussion in a core meeting. It's true that these cluster resiliency details don't connect to the particular patterns, but I'm not sure where else to put them...

Yes, I think I'd get rid of the cluster resiliency section completely. Instead, on geo-partitioned-replicas (which is the only one where there's something more complex to discuss), drop the discussion of simultaneous "one AZ per region" failures and only talk about one AZ failure at a time.

(If you really want to go into more detail here you could talk about how with default configurations you can survive two simultaneous AZ failures in different regions, and with increased replication factor you could survive more)


v19.1/topology-pinned-index-leaseholders.md, line 24 at r4 (raw file):

{{site.data.alerts.end}}

If rows and all latency-sensitive queries **cannot** be tied to specific geographies

This sentence looks incomplete.

@jseldess jseldess force-pushed the cluster-patterns-v2 branch from 6c87b15 to dc4b027 Compare July 10, 2019 05:23
Copy link
Contributor Author

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final round of revisions to simplify diagrams, remove cluster-wide resiliency details, rename pinned index leaseholders to duplicate indexes, and make various other small corrections and improvements.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @bdarnell, @jseldess, @nvanbenschoten, @rkruze, @robert-s-lee, and @timveil)


v19.1/topology-geo-partitioned-leaseholders.md, line 20 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

True, but this is really important - if you can't incorporate the region key into all your queries, everything will turn into full table scans and destroy your performance. This is the most important downside to the geo-partitioned patterns. They give the best performance but require this very intrusive change to your application.

Good point. I liked your idea of calling out pros/cons for each pattern. When we do that, this will definitely need to be called out somehow. Not sure if it's really a "con", but it is intrusive, as you say.


v19.1/topology-geo-partitioned-replicas.md, line 167 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yes, I think I'd get rid of the cluster resiliency section completely. Instead, on geo-partitioned-replicas (which is the only one where there's something more complex to discuss), drop the discussion of simultaneous "one AZ per region" failures and only talk about one AZ failure at a time.

(If you really want to go into more detail here you could talk about how with default configurations you can survive two simultaneous AZ failures in different regions, and with increased replication factor you could survive more)

Done.


images/v19.1/topology_basic_production1.png, line at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Updating these images/animations takes me a long time, unfortunately, so I'm going to punt this to the next iteration: #4991.

Actually managed to update the graphics in this PR.

@jseldess jseldess force-pushed the cluster-patterns-v2 branch from dc4b027 to bafd537 Compare July 10, 2019 05:52
@jseldess jseldess force-pushed the cluster-patterns-v2 branch from bafd537 to 2f79b37 Compare July 10, 2019 06:07
@cockroach-teamcity
Copy link
Member

@jseldess jseldess force-pushed the cluster-patterns-v2 branch from b716f9d to 4589dcc Compare July 10, 2019 06:35
@cockroach-teamcity
Copy link
Member

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/4589dcced1bcf51642e53a7d852a3162bc178734/

Edited pages:

@jseldess jseldess force-pushed the cluster-patterns-v2 branch from 4589dcc to 8be4bae Compare July 10, 2019 06:40
@cockroach-teamcity
Copy link
Member

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/8be4baeebff8baa8bd42ed568b768cefc3ac0b52/

Edited pages:

@jseldess jseldess merged commit 7216d22 into master Jul 10, 2019
@jseldess jseldess deleted the cluster-patterns-v2 branch July 10, 2019 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Topology Pattern Library
6 participants