Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store attachments as BLOB column in database #114

Merged
merged 1 commit into from Jan 9, 2019

Conversation

Projects
None yet
2 participants
@c-w
Copy link
Member

commented Jan 7, 2019

See #112

@c-w c-w requested a review from aidan-fitz Jan 7, 2019

for attachment in email['attachments']:
content = attachment.get('content', b'')
if content:
attachment['content'] = b64encode(content).decode('ascii')

This comment has been minimized.

Copy link
@aidan-fitz

aidan-fitz Jan 7, 2019

Collaborator

Why are you adding b64encode and b64decode here? Aren't we removing base64?

This comment has been minimized.

Copy link
@c-w

c-w Jan 7, 2019

Author Member

The attachments on the wire are base64 encoded since JSON doesn't have a binary type and anyways jsonl + zstandard compression performed better than msgpack + zstandard in the benchmarks, so we must deserialize when we receive attachments from the sync and serialize before sending attachments into the sync. In any case, this means that we (de-)serialize once per email instead of once per render which is much better than before.

This comment has been minimized.

Copy link
@aidan-fitz

aidan-fitz Jan 7, 2019

Collaborator

Ah, that's good.

This comment has been minimized.

Copy link
@aidan-fitz

aidan-fitz Jan 8, 2019

Collaborator

Does this benchmark include network transmission time? I think using a format that supports binary types (protocol buffers, BSON, MessagePack) would be more beneficial for the end users since they are paying for network usage rather than compute time and base64 uses up more bandwidth.

This comment has been minimized.

Copy link
@c-w

c-w Jan 8, 2019

Author Member

The benchmark includes file-size, i.e. bytes-on-the-write, which is what really matters. MessagePack was part of the benchmark but it turns out that zstandard compression deals better with base64 encoded data than raw binary data so overall jsonl is the best choice for us. This was surprising to me too.

This comment has been minimized.

Copy link
@aidan-fitz

aidan-fitz Jan 9, 2019

Collaborator

Huh, that's an interesting finding! Great, then.

@c-w c-w force-pushed the blob-attachments branch from 2b494ff to bf522ae Jan 9, 2019

@c-w c-w merged commit 20d5404 into master Jan 9, 2019

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
pyup.io/safety-ci No dependencies with known security vulnerabilities.
Details

@c-w c-w deleted the blob-attachments branch Jan 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.