-
Notifications
You must be signed in to change notification settings - Fork 6
/
02-02_-_Data_Modeling_Cassandra-Land_Project_PART_1.studio-nb.tar
249 lines (218 loc) · 50 KB
/
02-02_-_Data_Modeling_Cassandra-Land_Project_PART_1.studio-nb.tar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
notebook.bin 0100644 0000000 0000000 00000075100 13577206476 012143 0 ustar 00 0000000 0000000 json_notebook_v1 {"1":"00000000-0000-0000-0000-000000000004","10":"418ed742-cd61-4df8-abd8-07bc56a62e8d","11":"02-02 - Data Modeling: Cassandra-Land Project PART 1","12":{"1":1576863788,"2":981000000},"13":{"1":1576865084,"2":80000000},"14":false,"15":[{"1":"0c627dc1-384c-4179-a768-32639a7cedbe","10":4,"11":"<center><img src=\"//datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png\" width=\"800\"></center>\n\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n# <center><span style=\"color:navy\">Data Modeling: Cassandra-Land Project Part 1</span></center>\n\nWelcome to the Data Modeling: Cassandra-Land notebooks. In these notebooks we are going to build a data model for a hypothetical application.\n\nThese notebooks are a step-by-step tutorial. \nEach step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.\n\nIf you get stuck, there are solutions below each of the exercise sections. Just click the dropdown.\n\nLet's get started!\n","12":"markdown","13":{"1":"af9c435f-cfaa-4821-9c03-5d0937845991","10":{"9":"<p><center><img src=\" //datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png\" width=\"800\"></center></p>\n<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h1><center><span style=\"color:navy\">Data Modeling: Cassandra-Land Project Part 1</span></center></h1>\n<p>Welcome to the Data Modeling: Cassandra-Land notebooks. In these notebooks we are going to build a data model for a hypothetical application.</p>\n<p>These notebooks are a step-by-step tutorial.\n<br />Each step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.</p>\n<p>If you get stuck, there are solutions below each of the exercise sections. Just click the dropdown.</p>\n<p>Let's get started!</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"3810b196-0663-413d-b1b0-bf652d91c351","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 1: Create cassandra_land Keyspace</span></center>\n<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\nThe first thing we need to do is to create a keyspace to hold our tables.\nRemember, a Cassandra keyspace is like a \"database\" in relational database terms.\n\nThe form of the CQL command to create a keyspace is as follows:\n```\nCREATE KEYSPACE <keyspace name> WITH REPLICATION = {\n 'class' : <replication strategy>,\n <datacenter name> : <replication factor>,\n // Specify addition datacenters/replication factors here...\n}\n```\nWhere `<keyspace name>` is, uh, the name of the keyspace you are creating,\n`<replication strategy>` lets you set different strategies for locating replicas, but is usually set to `'NetworkTopologyStrategy'` (including quotes),\n`<datacenter name>` in this example is `'DC1'` (with the quotes),\n`<replication factor>` is the number of data replicas you want in the datacenter - we'll use `'3'` in this exercise (including quotes).\n\nIn the next cell, go ahead and create a keyspace named `cassandra_land`.","12":"markdown","13":{"1":"1e0e806a-89f2-4356-8acc-efae0533edb6","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 1: Create cassandra_land Keyspace</span></center></h3>\n<p><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg\" alt=\"Registration use case\" /></center>\n<br /><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" />\n<br />The first thing we need to do is to create a keyspace to hold our tables.\n<br />Remember, a Cassandra keyspace is like a “database” in relational database terms.</p>\n<p>The form of the CQL command to create a keyspace is as follows:</p>\n<pre><code>CREATE KEYSPACE <keyspace name> WITH REPLICATION = {\n 'class' : <replication strategy>,\n <datacenter name> : <replication factor>,\n // Specify addition datacenters/replication factors here...\n}\n</code></pre>\n<p>Where <code><keyspace name></code> is, uh, the name of the keyspace you are creating,\n<br /><code><replication strategy></code> lets you set different strategies for locating replicas, but is usually set to <code>'NetworkTopologyStrategy'</code> (including quotes),\n<br /><code><datacenter name></code> in this example is <code>'DC1'</code> (with the quotes),\n<br /><code><replication factor></code> is the number of data replicas you want in the datacenter - we'll use <code>'3'</code> in this exercise (including quotes).</p>\n<p>In the next cell, go ahead and create a keyspace named <code>cassandra_land</code>.</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"bfa6e777-32bf-46a4-b864-9f7bdde00f1f","11":"// Create the cassandra_land keyspace here\n// (execute it by pressing shift-return or pressing the CL.ONE button in the top right corner of this cell):\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"9d474bda-84a5-407e-bc95-e96d6dae45c6","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nCREATE KEYSPACE cassandra_land WITH REPLICATION = { 'class' : 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'DC1': '3' };\n```\n</details>","12":"markdown","13":{"1":"66a326f9-7a22-4d0b-8233-89c92585ba74","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>CREATE KEYSPACE cassandra_land WITH REPLICATION = { 'class' : 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'DC1': '3' };\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"a83e8519-794f-4e1c-802f-68cdcc695158","10":4,"11":"Let's verify that you have created the `cassandra_land` keyspace. Use the **SCHEMA** viewer in the upper right of this Studio window.\n\n<span style=\"color:blue\">ProTip:</span> If you made a mistake when creating the keyspace, you can delete the keyspace and recreate it. Here's the command to delete the keyspace:\n```\nDROP KEYSPACE cassandra_land;\n```","12":"markdown","13":{"1":"67c8e1a9-8034-4fea-b03b-2d2fb725ddce","10":{"9":"<p>Let's verify that you have created the <code>cassandra_land</code> keyspace. Use the <strong>SCHEMA</strong> viewer in the upper right of this Studio window.</p>\n<p><span style=\"color:blue\">ProTip:</span> If you made a mistake when creating the keyspace, you can delete the keyspace and recreate it. Here's the command to delete the keyspace:</p>\n<pre><code>DROP KEYSPACE cassandra_land;\n</code></pre>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"dbfb282b-0fd3-48e8-a7eb-e8d41c02c329","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 2: Create users_by_phone_number Table</span></center>\n<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n\nNow we are ready to create the `users_by_phone_number` table.\nRemember from the discussion that this table has three fields:\n`user_id` of type `UUID`\n`phone_number` of type `text`\n`password` of type `text`\n\nStart by considering these questions:\n* What is the partition key?\n* Does the table have clustering columns?\n\nThe form of this command is:\n```\nCREATE TABLE <keyspace name>.<table name> (\n <field name> <field type>,\n // Add additional field descriptions here\n PRIMARY KEY(<primary key descriptor>)\n)\n```\nHere, the section that may need some explanation is the primary key.\nIf the primary key consists only of a single partition key column and clustering columns, you merely list these columns (separated by commas).\nHowever, if you have several partition key columns, first list the partition key coluns inside parens, and then list the cluster columns.","12":"markdown","13":{"1":"5386172c-5156-4c9b-b7aa-73c51c2085f4","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 2: Create users_by_phone_number Table</span></center></h3>\n<p><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg\" alt=\"Registration use case\" /></center>\n<br /><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<p>Now we are ready to create the <code>users_by_phone_number</code> table.\n<br />Remember from the discussion that this table has three fields:\n<br /><code>user_id</code> of type <code>UUID</code>\n<br /><code>phone_number</code> of type <code>text</code>\n<br /><code>password</code> of type <code>text</code></p>\n<p>Start by considering these questions:</p>\n<ul>\n<li>What is the partition key?</li>\n<li>Does the table have clustering columns?</li>\n</ul>\n<p>The form of this command is:</p>\n<pre><code>CREATE TABLE <keyspace name>.<table name> (\n <field name> <field type>,\n // Add additional field descriptions here\n PRIMARY KEY(<primary key descriptor>)\n)\n</code></pre>\n<p>Here, the section that may need some explanation is the primary key.\n<br />If the primary key consists only of a single partition key column and clustering columns, you merely list these columns (separated by commas).\n<br />However, if you have several partition key columns, first list the partition key coluns inside parens, and then list the cluster columns.</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"76602bda-68ee-4c9d-9c04-b9c43f0cf5ce","11":"// Create the users_by_phone_number table here:\n","12":"cql","16":true,"17":false,"18":{},"25":"CL.ONE"},{"1":"a17f5584-4a15-456e-9da2-d33577287c0b","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nCREATE TABLE cassandra_land.users_by_phone_number (\n phone_number text,\n \"password\" text,\n user_id uuid,\n PRIMARY KEY (phone_number)\n);\n```\n</details>","12":"markdown","13":{"1":"b798ba97-3fc1-41b8-9579-1bb53b2846b1","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>CREATE TABLE cassandra_land.users_by_phone_number (\n phone_number text,\n \"password\" text,\n user_id uuid,\n PRIMARY KEY (phone_number)\n);\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"e02d4fae-a652-43e4-9390-acfdcabb8b62","10":4,"11":"Once again, use the **SCHEMA** viewer to review the table you created and make sure it has the correct settings:\n(You may have noticed we used double quotes around the \"password\" field name when creating the table. This is because, in earlier versions of CQL, _password_ was a CQL keyword word. The double quotes allow us to use it anyway, but you don't have to use double quotes when you use the field name in INSERT statements and queries, etc.).","12":"markdown","13":{"1":"cb5953b0-3cc8-432d-9bdc-dd6f2aeac357","10":{"9":"<p>Once again, use the <strong>SCHEMA</strong> viewer to review the table you created and make sure it has the correct settings:\n<br />(You may have noticed we used double quotes around the “password” field name when creating the table. This is because, in earlier versions of CQL, <em>password</em> was a CQL keyword word. The double quotes allow us to use it anyway, but you don't have to use double quotes when you use the field name in INSERT statements and queries, etc.).</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"78dde3bf-b55a-4f85-9575-7e7f416dd457","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 3: Add Rows to the users_by_phone_number Table</span></center>\n<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n\nOK, so we have created the keyspace and the table.\nNow, we want to insert some data into the table.\nThe form of the insert command is:\n```\nINSERT INTO <keyspace name>.<table name> (<column list>) VALUES(<column values>)\n```\nWhere `<keyspace name>` and `<table name>` are just like you used when you created the table.\n`<column list>` is the list of column names separated by commas.\n`<column values>` are the values of the row you want to insert (in the same order as the column names).\n\nThis is straight forward, except for the `user_id` value.\nWhere does the UUID come from?\nWe can let Cassandra generate it for us by using the `UUID()` function.\n\nExecute the next cell:","12":"markdown","13":{"1":"75eae654-6b36-456f-9580-2e4f8366a313","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 3: Add Rows to the users_by_phone_number Table</span></center></h3>\n<p><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg\" alt=\"Registration use case\" /></center>\n<br /><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<p>OK, so we have created the keyspace and the table.\n<br />Now, we want to insert some data into the table.\n<br />The form of the insert command is:</p>\n<pre><code>INSERT INTO <keyspace name>.<table name> (<column list>) VALUES(<column values>)\n</code></pre>\n<p>Where <code><keyspace name></code> and <code><table name></code> are just like you used when you created the table.\n<br /><code><column list></code> is the list of column names separated by commas.\n<br /><code><column values></code> are the values of the row you want to insert (in the same order as the column names).</p>\n<p>This is straight forward, except for the <code>user_id</code> value.\n<br />Where does the UUID come from?\n<br />We can let Cassandra generate it for us by using the <code>UUID()</code> function.</p>\n<p>Execute the next cell:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"3314ea77-8432-4889-89cd-b8dbf2a1a9d4","11":"// Try inserting the row here with phone_number = 4088675309 and password = C@55@ndr@:\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'C@55@ndr@', UUID());","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"49825512-bbe0-4bac-a3df-d668ad3f06e9","10":4,"11":"To view the contents of the table and verify what the data looks like, you can select all the rows:\n```\nSELECT * FROM cassandra_land.users_by_phone_number;\n```\nTry it in the next cell:","12":"markdown","13":{"1":"6f5681bf-3350-48c2-aac7-b5606b179519","10":{"9":"<p>To view the contents of the table and verify what the data looks like, you can select all the rows:</p>\n<pre><code>SELECT * FROM cassandra_land.users_by_phone_number;\n</code></pre>\n<p>Try it in the next cell:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"c69ebd11-39d2-4702-85e9-d367036cabc3","11":"// Select all the rows of the users_by_phone_number table here:\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"d05666e9-cc32-440d-969b-a8673c9b85c1","10":4,"11":"Well, that seems simple enough!\n\nBut wait - there's a bit of a complication here.\nRemember that the phone number is the partition key.\nWhat happens if we add a second row with the same phone number?\nExecute the next cell:","12":"markdown","13":{"1":"4df9f189-ff1e-4260-960d-313e92efd7c8","10":{"9":"<p>Well, that seems simple enough!</p>\n<p>But wait - there's a bit of a complication here.\n<br />Remember that the phone number is the partition key.\n<br />What happens if we add a second row with the same phone number?\n<br />Execute the next cell:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"6605cee8-09e9-4c59-9ede-617ff97f57f7","11":"// NOTE: We are using pre-generated UUIDs in some of the exercises to make them more streamlined\n// and easier to troubleshoot. This will definitely come into play as you get to Cassandra-Land PART 2.\n// In a real-world scenario we would use UUID() as in the following example:\n// INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'P@55w0rd', UUID());\n\n// Insert the second row with the same phone number here\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'P@55w0rd', 24fa0298-ccbb-41f7-8137-7c2d91cc4e09);","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"e745498b-dd8c-4416-a8c8-b761919c6a88","10":4,"11":"That seemed to work, so let's look at the contents of the table (select all the rows of the table again):","12":"markdown","13":{"1":"6d0a9fcc-1840-4b73-a631-c9dcaeb1422e","10":{"9":"<p>That seemed to work, so let's look at the contents of the table (select all the rows of the table again):</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"7bb76040-482e-47d0-8de5-1a89e34e1f29","11":"// Select all the rows of the users_by_phone_number table here:\nSELECT * FROM cassandra_land.users_by_phone_number;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"d6fa8209-2e7d-4db4-9787-58e1b5d5670a","10":4,"11":"Wait a minute! We only have one row in the table.\nBut we did two inserts. What gives?\n\nIf we look carefully at the row in the table we see that it is the second row we inserted.\nWe know this because it has the second password value.\nNotice also that the UUID has changed.\nThis happened because Cassandra does not do a read-before-write.\nSo the insert was effectively an update - called an \"upsert\".\n\nIf we think about this, the whole upsert thing is a bit of a problem for our registration use-case.\nIf a user tries to register twice, we will overwrite the previous entry and we will lose the previous entry with its user ID.\n\nNormally, we want to avoid doing read-before-write because it slows things down too much.\nBut in this case, we _must_ do the read before we write.\nHow can we do this?\n\nThe CQL `INSERT` statement provides for this case with the `IF NOT EXISTS` clause.\nWe place this clause at the end of the `INSERT` statement.\nGive this a try in the next cell:\n```\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'third_time_is_a_charm', UUID()) IF NOT EXISTS;\n```","12":"markdown","13":{"1":"c1dd5b45-8cea-48ee-ab3f-399181de211a","10":{"9":"<p>Wait a minute! We only have one row in the table.\n<br />But we did two inserts. What gives?</p>\n<p>If we look carefully at the row in the table we see that it is the second row we inserted.\n<br />We know this because it has the second password value.\n<br />Notice also that the UUID has changed.\n<br />This happened because Cassandra does not do a read-before-write.\n<br />So the insert was effectively an update - called an “upsert”.</p>\n<p>If we think about this, the whole upsert thing is a bit of a problem for our registration use-case.\n<br />If a user tries to register twice, we will overwrite the previous entry and we will lose the previous entry with its user ID.</p>\n<p>Normally, we want to avoid doing read-before-write because it slows things down too much.\n<br />But in this case, we <em>must</em> do the read before we write.\n<br />How can we do this?</p>\n<p>The CQL <code>INSERT</code> statement provides for this case with the <code>IF NOT EXISTS</code> clause.\n<br />We place this clause at the end of the <code>INSERT</code> statement.\n<br />Give this a try in the next cell:</p>\n<pre><code>INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'third_time_is_a_charm', UUID()) IF NOT EXISTS;\n</code></pre>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"aa3d129c-3027-4331-aa0d-8ac862de71d5","11":"// Try the INSERT statement with the IF NOT EXISTS clause here:\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"486c78f1-ca08-4cae-a3f5-c21335b39d9e","10":4,"11":"You see in the results of the insert statement (in the `[applied]` column) that the insert failed.\nThis is what we had hoped for since a row with that phone number already exists in the table.\n\nLet's select the contents of the table just to be sure things worked like we expected.\nDo a `SELECT *` in the following cell:","12":"markdown","13":{"1":"f3a956a4-5bdd-4ddd-b13d-c9f8de2aca1f","10":{"9":"<p>You see in the results of the insert statement (in the <code>[applied]</code> column) that the insert failed.\n<br />This is what we had hoped for since a row with that phone number already exists in the table.</p>\n<p>Let's select the contents of the table just to be sure things worked like we expected.\n<br />Do a <code>SELECT *</code> in the following cell:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"d9ec79c8-adbb-42b7-a5fb-a9e76cb13e3f","11":"// Select the contents of the users_by_phone_number table here:\nSELECT * FROM cassandra_land.users_by_phone_number;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"5c431dc0-28e8-468b-8bff-1a15c0848cc7","10":4,"11":"We see that the only row in the table is the row we had previously inserted, so the latest insert did not cause an upsert - great!","12":"markdown","13":{"1":"abcb1981-20c0-4afe-8878-4da43d19e79a","10":{"9":"<p>We see that the only row in the table is the row we had previously inserted, so the latest insert did not cause an upsert - great!</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"2af371d9-40ac-4903-a524-ef1b413fdd84","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 4: Query the users_by_phone_number Table</span></center>\n<center>![Login use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseLogin.jpg)</center>\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n\nFor the second use-case, we need to get the password and user ID, based on the phone number.\nThe app can use the password to authenticate the user, and then retain the user ID for further activites.\nIt turns out that we already created the table we need for this use-case in the previous use-case, so all we need to do here is write the query.\nIn CQL, queries have the following basic form:\n```\nSELECT <column list> FROM <keyspace name>.<table name> WHERE <query constraints>\n```\nActually, queries can be more elaborate than the form shown here, but this is a good starting point.\nWe've already explained the variable parts of this form except for the `<query constraints>`, so let's elaborate on that here.\nQuery constraints _must_ start with a specification of the partition key column value.\nFor example, to find the user record with a telephone number of `4088675309` we would specify the primary key with the expression `phone_number = '4088675309'`.\n\nThe query constraints could also include specifications for clustering column values.\nIn our current example of the `users_by_phone_number` table, we have no clustering columns.\nBut if we did, we could specify their values in the same way.\nOne final caveat - not all clustering column values must be specified, but if you constrain a clustering column, you must also specify all preceding cluster columns' values as well.\n\nIn the next cell, see if you can create a query to return the user record with the phone number `4088675309`:","12":"markdown","13":{"1":"daabe288-ba1c-4f46-9ced-e1f5c0d0e08d","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 4: Query the users_by_phone_number Table</span></center></h3>\n<p><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseLogin.jpg\" alt=\"Login use case\" /></center>\n<br /><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<p>For the second use-case, we need to get the password and user ID, based on the phone number.\n<br />The app can use the password to authenticate the user, and then retain the user ID for further activites.\n<br />It turns out that we already created the table we need for this use-case in the previous use-case, so all we need to do here is write the query.\n<br />In CQL, queries have the following basic form:</p>\n<pre><code>SELECT <column list> FROM <keyspace name>.<table name> WHERE <query constraints>\n</code></pre>\n<p>Actually, queries can be more elaborate than the form shown here, but this is a good starting point.\n<br />We've already explained the variable parts of this form except for the <code><query constraints></code>, so let's elaborate on that here.\n<br />Query constraints <em>must</em> start with a specification of the partition key column value.\n<br />For example, to find the user record with a telephone number of <code>4088675309</code> we would specify the primary key with the expression <code>phone_number = '4088675309'</code>.</p>\n<p>The query constraints could also include specifications for clustering column values.\n<br />In our current example of the <code>users_by_phone_number</code> table, we have no clustering columns.\n<br />But if we did, we could specify their values in the same way.\n<br />One final caveat - not all clustering column values must be specified, but if you constrain a clustering column, you must also specify all preceding cluster columns' values as well.</p>\n<p>In the next cell, see if you can create a query to return the user record with the phone number <code>4088675309</code>:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"fdd791cb-cec2-4b60-b83a-796fa68c1501","11":"// Write the select statement for the users_by_phone_number table to get the row associated with the phone number = '4088675309':\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"01668f74-8de4-41f9-a16c-3fa3e73cf236","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nSELECT * FROM cassandra_land.users_by_phone_number WHERE phone_number = '4088675309';\n```\n</details>","12":"markdown","13":{"1":"010f3614-cafd-49bc-9862-3d8621e79aec","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>SELECT * FROM cassandra_land.users_by_phone_number WHERE phone_number = '4088675309';\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"f128757e-edfa-4235-866d-537484010980","10":4,"11":"Because we only have one row in the table, it's a bit difficult to know if our query really worked.\nLet's add some more rows to make this step a little more interesting.\n\nAdd users with the following phone numbers and passwords:\n![Password Table](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/PhonePassTable.jpg)","12":"markdown","13":{"1":"54a7b609-408f-458d-a80d-59b871a7f33a","10":{"9":"<p>Because we only have one row in the table, it's a bit difficult to know if our query really worked.\n<br />Let's add some more rows to make this step a little more interesting.</p>\n<p>Add users with the following phone numbers and passwords:\n<br /><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/PhonePassTable.jpg\" alt=\"Password Table\" /></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"89984ed3-35b0-4f80-8f9b-f3d57855802c","11":"// Execute this cell to add more users:\n// Note again, we are using pre-generated UUIDs in some of the exercises to make them more streamlined\n// and easier to troubleshoot. This will come into play as you get to Cassandra-Land PART 2.\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('8017124927', 'JamiesPhone-NotReally', 7558d1e9-544b-4e81-960f-8957e372760d);\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('9999999999', 'NinesAreWild', 154a857e-4b40-4d8d-9646-c4bf7632999f);\nINSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('1234567890', 'SeeQuential', d0acbb79-64c7-43a2-aaf2-0ef8fceb94d4);","12":"cql","16":true,"17":false,"18":{},"23":110,"25":"CL.ONE"},{"1":"5bb9121f-6a46-467b-91c7-498d7d5424d0","10":4,"11":"Let's do a query to see all the rows in the table:","12":"markdown","13":{"1":"85640712-ac10-4f11-83dc-1714dd38431d","10":{"9":"<p>Let's do a query to see all the rows in the table:</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"08f8018b-c98f-42ac-a15b-2869dc026b8e","11":"// Select all the rows in the users_by_phone_number table here:\nSELECT * FROM cassandra_land.users_by_phone_number;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"3fe8da15-179d-4407-a06b-32e52acb9893","10":4,"11":"Finally, let's re-run the query to get the row with phone number = '4088675309' (which is what we need for this use-case):","12":"markdown","13":{"1":"730b1829-c6ce-4fb2-ab23-d1082c2a5623","10":{"9":"<p>Finally, let's re-run the query to get the row with phone number = '4088675309' (which is what we need for this use-case):</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"ffa28ce6-9c98-4bcd-b987-d4529556d14e","11":"SELECT * FROM cassandra_land.users_by_phone_number WHERE phone_number = '4088675309';","12":"cql","16":true,"17":false,"25":"CL.ONE"}],"16":{"1":{}},"17":""} code.txt 0100644 0000000 0000000 00000034732 13577206476 011312 0 ustar 00 0000000 0000000 --------------------NOTEBOOK_02-02 - Data Modeling: Cassandra-Land Project PART 1--------------------
--------------------CELL_MARKDOWN_1--------------------
<center><img src="//datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png" width="800"></center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
# <center><span style="color:navy">Data Modeling: Cassandra-Land Project Part 1</span></center>
Welcome to the Data Modeling: Cassandra-Land notebooks. In these notebooks we are going to build a data model for a hypothetical application.
These notebooks are a step-by-step tutorial.
Each step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.
If you get stuck, there are solutions below each of the exercise sections. Just click the dropdown.
Let's get started!
--------------------CELL_MARKDOWN_2--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 1: Create cassandra_land Keyspace</span></center>
<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
The first thing we need to do is to create a keyspace to hold our tables.
Remember, a Cassandra keyspace is like a "database" in relational database terms.
The form of the CQL command to create a keyspace is as follows:
```
CREATE KEYSPACE <keyspace name> WITH REPLICATION = {
'class' : <replication strategy>,
<datacenter name> : <replication factor>,
// Specify addition datacenters/replication factors here...
}
```
Where `<keyspace name>` is, uh, the name of the keyspace you are creating,
`<replication strategy>` lets you set different strategies for locating replicas, but is usually set to `'NetworkTopologyStrategy'` (including quotes),
`<datacenter name>` in this example is `'DC1'` (with the quotes),
`<replication factor>` is the number of data replicas you want in the datacenter - we'll use `'3'` in this exercise (including quotes).
In the next cell, go ahead and create a keyspace named `cassandra_land`.
--------------------CELL_CQL_3--------------------
// Create the cassandra_land keyspace here
// (execute it by pressing shift-return or pressing the CL.ONE button in the top right corner of this cell):
--------------------CELL_MARKDOWN_4--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
CREATE KEYSPACE cassandra_land WITH REPLICATION = { 'class' : 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'DC1': '3' };
```
</details>
--------------------CELL_MARKDOWN_5--------------------
Let's verify that you have created the `cassandra_land` keyspace. Use the **SCHEMA** viewer in the upper right of this Studio window.
<span style="color:blue">ProTip:</span> If you made a mistake when creating the keyspace, you can delete the keyspace and recreate it. Here's the command to delete the keyspace:
```
DROP KEYSPACE cassandra_land;
```
--------------------CELL_MARKDOWN_6--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 2: Create users_by_phone_number Table</span></center>
<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
Now we are ready to create the `users_by_phone_number` table.
Remember from the discussion that this table has three fields:
`user_id` of type `UUID`
`phone_number` of type `text`
`password` of type `text`
Start by considering these questions:
* What is the partition key?
* Does the table have clustering columns?
The form of this command is:
```
CREATE TABLE <keyspace name>.<table name> (
<field name> <field type>,
// Add additional field descriptions here
PRIMARY KEY(<primary key descriptor>)
)
```
Here, the section that may need some explanation is the primary key.
If the primary key consists only of a single partition key column and clustering columns, you merely list these columns (separated by commas).
However, if you have several partition key columns, first list the partition key coluns inside parens, and then list the cluster columns.
--------------------CELL_CQL_7--------------------
// Create the users_by_phone_number table here:
--------------------CELL_MARKDOWN_8--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
CREATE TABLE cassandra_land.users_by_phone_number (
phone_number text,
"password" text,
user_id uuid,
PRIMARY KEY (phone_number)
);
```
</details>
--------------------CELL_MARKDOWN_9--------------------
Once again, use the **SCHEMA** viewer to review the table you created and make sure it has the correct settings:
(You may have noticed we used double quotes around the "password" field name when creating the table. This is because, in earlier versions of CQL, _password_ was a CQL keyword word. The double quotes allow us to use it anyway, but you don't have to use double quotes when you use the field name in INSERT statements and queries, etc.).
--------------------CELL_MARKDOWN_10--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 3: Add Rows to the users_by_phone_number Table</span></center>
<center>![Registration use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseRegistration.jpg)</center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
OK, so we have created the keyspace and the table.
Now, we want to insert some data into the table.
The form of the insert command is:
```
INSERT INTO <keyspace name>.<table name> (<column list>) VALUES(<column values>)
```
Where `<keyspace name>` and `<table name>` are just like you used when you created the table.
`<column list>` is the list of column names separated by commas.
`<column values>` are the values of the row you want to insert (in the same order as the column names).
This is straight forward, except for the `user_id` value.
Where does the UUID come from?
We can let Cassandra generate it for us by using the `UUID()` function.
Execute the next cell:
--------------------CELL_CQL_11--------------------
// Try inserting the row here with phone_number = 4088675309 and password = C@55@ndr@:
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'C@55@ndr@', UUID());
--------------------CELL_MARKDOWN_12--------------------
To view the contents of the table and verify what the data looks like, you can select all the rows:
```
SELECT * FROM cassandra_land.users_by_phone_number;
```
Try it in the next cell:
--------------------CELL_CQL_13--------------------
// Select all the rows of the users_by_phone_number table here:
--------------------CELL_MARKDOWN_14--------------------
Well, that seems simple enough!
But wait - there's a bit of a complication here.
Remember that the phone number is the partition key.
What happens if we add a second row with the same phone number?
Execute the next cell:
--------------------CELL_CQL_15--------------------
// NOTE: We are using pre-generated UUIDs in some of the exercises to make them more streamlined
// and easier to troubleshoot. This will definitely come into play as you get to Cassandra-Land PART 2.
// In a real-world scenario we would use UUID() as in the following example:
// INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'P@55w0rd', UUID());
// Insert the second row with the same phone number here
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'P@55w0rd', 24fa0298-ccbb-41f7-8137-7c2d91cc4e09);
--------------------CELL_MARKDOWN_16--------------------
That seemed to work, so let's look at the contents of the table (select all the rows of the table again):
--------------------CELL_CQL_17--------------------
// Select all the rows of the users_by_phone_number table here:
SELECT * FROM cassandra_land.users_by_phone_number;
--------------------CELL_MARKDOWN_18--------------------
Wait a minute! We only have one row in the table.
But we did two inserts. What gives?
If we look carefully at the row in the table we see that it is the second row we inserted.
We know this because it has the second password value.
Notice also that the UUID has changed.
This happened because Cassandra does not do a read-before-write.
So the insert was effectively an update - called an "upsert".
If we think about this, the whole upsert thing is a bit of a problem for our registration use-case.
If a user tries to register twice, we will overwrite the previous entry and we will lose the previous entry with its user ID.
Normally, we want to avoid doing read-before-write because it slows things down too much.
But in this case, we _must_ do the read before we write.
How can we do this?
The CQL `INSERT` statement provides for this case with the `IF NOT EXISTS` clause.
We place this clause at the end of the `INSERT` statement.
Give this a try in the next cell:
```
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('4088675309', 'third_time_is_a_charm', UUID()) IF NOT EXISTS;
```
--------------------CELL_CQL_19--------------------
// Try the INSERT statement with the IF NOT EXISTS clause here:
--------------------CELL_MARKDOWN_20--------------------
You see in the results of the insert statement (in the `[applied]` column) that the insert failed.
This is what we had hoped for since a row with that phone number already exists in the table.
Let's select the contents of the table just to be sure things worked like we expected.
Do a `SELECT *` in the following cell:
--------------------CELL_CQL_21--------------------
// Select the contents of the users_by_phone_number table here:
SELECT * FROM cassandra_land.users_by_phone_number;
--------------------CELL_MARKDOWN_22--------------------
We see that the only row in the table is the row we had previously inserted, so the latest insert did not cause an upsert - great!
--------------------CELL_MARKDOWN_23--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 4: Query the users_by_phone_number Table</span></center>
<center>![Login use case](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/use_caseLogin.jpg)</center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
For the second use-case, we need to get the password and user ID, based on the phone number.
The app can use the password to authenticate the user, and then retain the user ID for further activites.
It turns out that we already created the table we need for this use-case in the previous use-case, so all we need to do here is write the query.
In CQL, queries have the following basic form:
```
SELECT <column list> FROM <keyspace name>.<table name> WHERE <query constraints>
```
Actually, queries can be more elaborate than the form shown here, but this is a good starting point.
We've already explained the variable parts of this form except for the `<query constraints>`, so let's elaborate on that here.
Query constraints _must_ start with a specification of the partition key column value.
For example, to find the user record with a telephone number of `4088675309` we would specify the primary key with the expression `phone_number = '4088675309'`.
The query constraints could also include specifications for clustering column values.
In our current example of the `users_by_phone_number` table, we have no clustering columns.
But if we did, we could specify their values in the same way.
One final caveat - not all clustering column values must be specified, but if you constrain a clustering column, you must also specify all preceding cluster columns' values as well.
In the next cell, see if you can create a query to return the user record with the phone number `4088675309`:
--------------------CELL_CQL_24--------------------
// Write the select statement for the users_by_phone_number table to get the row associated with the phone number = '4088675309':
--------------------CELL_MARKDOWN_25--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
SELECT * FROM cassandra_land.users_by_phone_number WHERE phone_number = '4088675309';
```
</details>
--------------------CELL_MARKDOWN_26--------------------
Because we only have one row in the table, it's a bit difficult to know if our query really worked.
Let's add some more rows to make this step a little more interesting.
Add users with the following phone numbers and passwords:
![Password Table](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/PhonePassTable.jpg)
--------------------CELL_CQL_27--------------------
// Execute this cell to add more users:
// Note again, we are using pre-generated UUIDs in some of the exercises to make them more streamlined
// and easier to troubleshoot. This will come into play as you get to Cassandra-Land PART 2.
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('8017124927', 'JamiesPhone-NotReally', 7558d1e9-544b-4e81-960f-8957e372760d);
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('9999999999', 'NinesAreWild', 154a857e-4b40-4d8d-9646-c4bf7632999f);
INSERT INTO cassandra_land.users_by_phone_number (phone_number, password, user_id) VALUES('1234567890', 'SeeQuential', d0acbb79-64c7-43a2-aaf2-0ef8fceb94d4);
--------------------CELL_MARKDOWN_28--------------------
Let's do a query to see all the rows in the table:
--------------------CELL_CQL_29--------------------
// Select all the rows in the users_by_phone_number table here:
SELECT * FROM cassandra_land.users_by_phone_number;
--------------------CELL_MARKDOWN_30--------------------
Finally, let's re-run the query to get the row with phone number = '4088675309' (which is what we need for this use-case):
--------------------CELL_CQL_31--------------------
SELECT * FROM cassandra_land.users_by_phone_number WHERE phone_number = '4088675309';