/
blobs.rst
207 lines (136 loc) · 6.3 KB
/
blobs.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
.. highlight:: sh
.. _blob_support:
=====
Blobs
=====
CrateDB includes support to store `binary large objects`_. By utilizing
CrateDB's cluster features the files can be replicated and sharded just like
regular data.
.. rubric:: Table of contents
.. contents::
:local:
Creating a table for blobs
==========================
Before adding blobs a ``blob table`` must be created. Blob tables can be
sharded. This makes it possible to distribute binaries over multiple nodes.
Lets use the CrateDB shell crash to issue the SQL statement::
sh$ crash -c "create blob table myblobs clustered into 3 shards with (number_of_replicas=0)"
CREATE OK, 1 row affected (... sec)
Now CrateDB is configured to allow blobs to be management under the
``/_blobs/myblobs`` endpoint.
Custom location for storing blob data
=====================================
It is possible to define a custom directory path for storing blob data which
can be completely different than the normal data path. Best use case for this
is storing normal data on a fast SSD and blob data on a large cheap spinning
disk.
The custom blob data path can be set either globally by configuration or while
creating a blob table. The path can be either absolute or relative and must be
creatable/writable by the user CrateDB is running as. A relative path value is
relative to :ref:`CRATE_HOME <conf-env-crate-home>`.
Blob data will be stored under this path with the following layout::
/<blobs.path>/nodes/<NODE_NO>/indices/<INDEX_UUID>/<SHARD_ID>/blobs
Global by configuration
-----------------------
Just uncomment or add following entry at the CrateDB configuration in order to
define a custom path globally for all blob tables::
blobs.path: /path/to/blob/data
Also see :ref:`config`.
Per blob table setting
----------------------
It is also possible to define a custom blob data path per table instead of
global by configuration. Also per table setting take precedence over the
configuration setting.
See :ref:`sql-create-blob-table` for details.
Creating a blob table with a custom blob data path::
sh$ crash -c "create blob table myblobs clustered into 3 shards with (blobs_path='/tmp/crate_blob_data')" # doctest: +SKIP
CREATE OK, 1 row affected (... sec)
List
====
.. Hidden: Add a blob entry to list it afterwards::
sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 201 Created
content-length: 0
To list all blobs inside a blob table a ``SELECT`` statement can be used::
sh$ crash -c "select digest, last_modified from blob.myblobs"
+------------------------------------------+---------------+
| digest | last_modified |
+------------------------------------------+---------------+
| 4a756ca07e9487f482465a99e8286abc86ba4dc7 | ... |
+------------------------------------------+---------------+
SELECT 1 row in set (... sec)
.. NOTE::
To query blob tables it is necessary to always specify the schema name
``blob``.
.. Hidden: Delete the blob entry::
sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 204 No Content
Altering a blob table
=====================
The number of replicas a blob table has can be changed using the ``ALTER BLOB
TABLE`` clause::
sh$ crash -c "alter blob table myblobs set (number_of_replicas=0)"
ALTER OK, -1 rows affected (... sec)
Deleting a blob table
=====================
Blob tables can be deleted similar to normal tables::
sh$ crash -c "drop blob table myblobs"
DROP OK, 1 row affected (... sec)
.. Hidden: Re-create the blob table so information_schema will show it::
sh$ crash -c "create blob table myblobs clustered into 3 shards with (number_of_replicas=0)"
CREATE OK, 1 row affected (... sec)
Using blob tables
=================
The usage of Blob Tables is only supported using the HTTP/HTTPS protocol. This
section describes how binaries can be stored, fetched and deleted.
.. NOTE::
For the reason of internal optimization any successful request could lead to
a 307 Temporary Redirect response.
Uploading
---------
To upload a blob the SHA1 hash of the blob has to be known upfront since this
has to be used as the ID of the new blob. For this example we use a fancy
Python one-liner to compute the SHA hash::
sh$ python3 -c 'import hashlib;print(hashlib.sha1("contents".encode("utf-8")).hexdigest())'
4a756ca07e9487f482465a99e8286abc86ba4dc7
The blob can now be uploaded by issuing a PUT request::
sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 201 Created
content-length: 0
If a blob already exists with the given hash a 409 Conflict is returned::
sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 409 Conflict
content-length: 0
Downloading
-----------
To download a blob simply use a GET request::
sh$ curl -sS '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
contents
If the blob doesn't exist a 404 Not Found error is returned::
sh$ curl -isS '127.0.0.1:4200/_blobs/myblobs/e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e'
HTTP/1.1 404 Not Found
content-length: 0
To determine if a blob exists without downloading it, a HEAD request can be
used::
sh$ curl -sS -I '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 200 OK
content-length: 8
accept-ranges: bytes
expires: Thu, 31 Dec 2037 23:59:59 GMT
cache-control: max-age=315360000
.. NOTE::
The cache headers for blobs are static and basically allows clients to
cache the response forever since the blob is immutable.
Deleting
--------
To delete a blob simply use a DELETE request::
sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 204 No Content
If the blob doesn't exist a 404 Not Found error is returned::
sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 404 Not Found
content-length: 0
.. hide:
sh$ crash -c "drop blob table myblobs"
DROP OK, 1 row affected (... sec)
.. _binary large objects: https://en.wikipedia.org/wiki/Binary_large_object