-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigtable: read and filter snippets #2707
Merged
Merged
Changes from 31 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
87c68ae
Bigtable write samples
billyjacobson 971e1b7
Cleaning up test
billyjacobson 1176825
Fixing lint issues
billyjacobson a87606a
Fixing imports in test
billyjacobson 222fd2c
Cleaning up samples and showing error handling
billyjacobson e12b69f
removing note about the row commit bug
billyjacobson 3d63d6a
Add fixture to write test
billyjacobson 23f9fdc
Merge branch 'master' into write-samples
billyjacobson 4a488a4
Read snippets WIP
billyjacobson ffda58e
Cleanup bigtable python:
billyjacobson a60656f
Change bigtable cluster variable to bigtable instance for consistency
billyjacobson 0ddcb22
Fixing step size for metric scaler
billyjacobson 440b32c
Merge branch 'master' into bigtable-cleanup
leahecole 8f67553
Creating fixtures for quickstart tests
billyjacobson 659efb1
Fix quickstart extra delete table
billyjacobson 5c05ec5
Use clearer instance names for tests
billyjacobson cb042e7
Merge branch 'master' into bigtable-cleanup
billyjacobson 151b72e
Linting
billyjacobson d3371e4
Merge branch 'bigtable-cleanup' into write-samples
billyjacobson 821a97a
get session issue in test sorted out
billyjacobson 8f672f1
Read snippets with tests working
billyjacobson bf06f87
Filter snippets with tests working
billyjacobson 74b7fe5
Lint
billyjacobson 3c250d8
Merge branch 'master' into bigtable-reads
billyjacobson d2f5d58
Update module import
billyjacobson 02de5a3
Fix bigtable instance env var
billyjacobson c0c4a78
Change scope to module
billyjacobson 0ab8b33
Don't print empty parens
billyjacobson d7a5ef0
sort cols
billyjacobson 93eb9e2
sort by cfs too
billyjacobson e4da2b3
Merge branch 'master' into bigtable-reads
leahecole da7ad4a
Merge branch 'master' into bigtable-reads
crwilcox 31dfa4c
Make requirements more specific to samples.
billyjacobson 781155c
Merge branch 'master' into bigtable-reads
billyjacobson a14f419
Merge branch 'master' into bigtable-reads
crwilcox File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,357 @@ | ||
#!/usr/bin/env python | ||
|
||
# Copyright 2020, Google LLC | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# [START bigtable_filters_limit_row_sample] | ||
# [START bigtable_filters_limit_row_regex] | ||
# [START bigtable_filters_limit_cells_per_col] | ||
# [START bigtable_filters_limit_cells_per_row] | ||
# [START bigtable_filters_limit_cells_per_row_offset] | ||
# [START bigtable_filters_limit_col_family_regex] | ||
# [START bigtable_filters_limit_col_qualifier_regex] | ||
# [START bigtable_filters_limit_col_range] | ||
# [START bigtable_filters_limit_value_range] | ||
# [START bigtable_filters_limit_value_regex] | ||
# [START bigtable_filters_limit_timestamp_range] | ||
# [START bigtable_filters_limit_block_all] | ||
# [START bigtable_filters_limit_pass_all] | ||
# [START bigtable_filters_modify_strip_value] | ||
# [START bigtable_filters_modify_apply_label] | ||
# [START bigtable_filters_composing_chain] | ||
# [START bigtable_filters_composing_interleave] | ||
# [START bigtable_filters_composing_condition] | ||
import datetime | ||
from google.cloud import bigtable | ||
|
||
# [END bigtable_filters_limit_row_sample] | ||
# [END bigtable_filters_limit_row_regex] | ||
# [END bigtable_filters_limit_cells_per_col] | ||
# [END bigtable_filters_limit_cells_per_row] | ||
# [END bigtable_filters_limit_cells_per_row_offset] | ||
# [END bigtable_filters_limit_col_family_regex] | ||
# [END bigtable_filters_limit_col_qualifier_regex] | ||
# [END bigtable_filters_limit_col_range] | ||
# [END bigtable_filters_limit_value_range] | ||
# [END bigtable_filters_limit_value_regex] | ||
# [END bigtable_filters_limit_timestamp_range] | ||
# [END bigtable_filters_limit_block_all] | ||
# [END bigtable_filters_limit_pass_all] | ||
# [END bigtable_filters_modify_strip_value] | ||
# [END bigtable_filters_modify_apply_label] | ||
# [END bigtable_filters_composing_chain] | ||
# [END bigtable_filters_composing_interleave] | ||
# [END bigtable_filters_composing_condition] | ||
from google.cloud.bigtable.row_filters import ApplyLabelFilter, \ | ||
BlockAllFilter, CellsColumnLimitFilter, CellsRowLimitFilter, \ | ||
CellsRowOffsetFilter, ColumnQualifierRegexFilter, ColumnRangeFilter, \ | ||
ConditionalRowFilter, FamilyNameRegexFilter, PassAllFilter, \ | ||
RowFilterChain, RowFilterUnion, RowKeyRegexFilter, RowSampleFilter, \ | ||
StripValueTransformerFilter, TimestampRange, TimestampRangeFilter, \ | ||
ValueRangeFilter, ValueRegexFilter | ||
|
||
|
||
# [START bigtable_filters_limit_row_sample] | ||
def filter_limit_row_sample(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=RowSampleFilter(.75)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_row_sample] | ||
# [START bigtable_filters_limit_row_regex] | ||
def filter_limit_row_regex(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows( | ||
filter_=RowKeyRegexFilter(".*#20190501$".encode("utf-8"))) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_row_regex] | ||
# [START bigtable_filters_limit_cells_per_col] | ||
def filter_limit_cells_per_col(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=CellsColumnLimitFilter(2)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_cells_per_col] | ||
# [START bigtable_filters_limit_cells_per_row] | ||
def filter_limit_cells_per_row(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=CellsRowLimitFilter(2)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_cells_per_row] | ||
# [START bigtable_filters_limit_cells_per_row_offset] | ||
def filter_limit_cells_per_row_offset(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=CellsRowOffsetFilter(2)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_cells_per_row_offset] | ||
# [START bigtable_filters_limit_col_family_regex] | ||
def filter_limit_col_family_regex(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows( | ||
filter_=FamilyNameRegexFilter("stats_.*$".encode("utf-8"))) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_col_family_regex] | ||
# [START bigtable_filters_limit_col_qualifier_regex] | ||
def filter_limit_col_qualifier_regex(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows( | ||
filter_=ColumnQualifierRegexFilter("connected_.*$".encode("utf-8"))) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_col_qualifier_regex] | ||
# [START bigtable_filters_limit_col_range] | ||
def filter_limit_col_range(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows( | ||
filter_=ColumnRangeFilter("cell_plan", | ||
b"data_plan_01gb", | ||
b"data_plan_10gb", | ||
inclusive_end=False)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_col_range] | ||
# [START bigtable_filters_limit_value_range] | ||
def filter_limit_value_range(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows( | ||
filter_=ValueRangeFilter(b"PQ2A.190405", b"PQ2A.190406")) | ||
|
||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_value_range] | ||
# [START bigtable_filters_limit_value_regex] | ||
|
||
|
||
def filter_limit_value_regex(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=ValueRegexFilter("PQ2A.*$".encode("utf-8"))) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_value_regex] | ||
# [START bigtable_filters_limit_timestamp_range] | ||
def filter_limit_timestamp_range(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
end = datetime.datetime(2019, 5, 1) | ||
|
||
rows = table.read_rows( | ||
filter_=TimestampRangeFilter(TimestampRange(end=end))) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_timestamp_range] | ||
# [START bigtable_filters_limit_block_all] | ||
def filter_limit_block_all(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=BlockAllFilter(True)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_block_all] | ||
# [START bigtable_filters_limit_pass_all] | ||
def filter_limit_pass_all(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=PassAllFilter(True)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_limit_pass_all] | ||
# [START bigtable_filters_modify_strip_value] | ||
def filter_modify_strip_value(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=StripValueTransformerFilter(True)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_modify_strip_value] | ||
# [START bigtable_filters_modify_apply_label] | ||
def filter_modify_apply_label(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=ApplyLabelFilter(label="labelled")) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_modify_apply_label] | ||
# [START bigtable_filters_composing_chain] | ||
def filter_composing_chain(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=RowFilterChain( | ||
filters=[CellsColumnLimitFilter(1), | ||
FamilyNameRegexFilter("cell_plan")])) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_composing_chain] | ||
# [START bigtable_filters_composing_interleave] | ||
def filter_composing_interleave(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=RowFilterUnion( | ||
filters=[ValueRegexFilter("true"), | ||
ColumnQualifierRegexFilter("os_build")])) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_composing_interleave] | ||
# [START bigtable_filters_composing_condition] | ||
def filter_composing_condition(project_id, instance_id, table_id): | ||
client = bigtable.Client(project=project_id, admin=True) | ||
instance = client.instance(instance_id) | ||
table = instance.table(table_id) | ||
|
||
rows = table.read_rows(filter_=ConditionalRowFilter( | ||
base_filter=RowFilterChain(filters=[ | ||
ColumnQualifierRegexFilter( | ||
"data_plan_10gb"), | ||
ValueRegexFilter( | ||
"true")]), | ||
true_filter=ApplyLabelFilter(label="passed-filter"), | ||
false_filter=ApplyLabelFilter(label="filtered-out") | ||
|
||
)) | ||
for row in rows: | ||
print_row(row) | ||
|
||
|
||
# [END bigtable_filters_composing_condition] | ||
|
||
|
||
# [START bigtable_filters_limit_row_sample] | ||
# [START bigtable_filters_limit_row_regex] | ||
# [START bigtable_filters_limit_cells_per_col] | ||
# [START bigtable_filters_limit_cells_per_row] | ||
# [START bigtable_filters_limit_cells_per_row_offset] | ||
# [START bigtable_filters_limit_col_family_regex] | ||
# [START bigtable_filters_limit_col_qualifier_regex] | ||
# [START bigtable_filters_limit_col_range] | ||
# [START bigtable_filters_limit_value_range] | ||
# [START bigtable_filters_limit_value_regex] | ||
# [START bigtable_filters_limit_timestamp_range] | ||
# [START bigtable_filters_limit_block_all] | ||
# [START bigtable_filters_limit_pass_all] | ||
# [START bigtable_filters_modify_strip_value] | ||
# [START bigtable_filters_modify_apply_label] | ||
# [START bigtable_filters_composing_chain] | ||
# [START bigtable_filters_composing_interleave] | ||
# [START bigtable_filters_composing_condition] | ||
def print_row(row): | ||
print("Reading data for {}:".format(row.row_key.decode('utf-8'))) | ||
for cf, cols in sorted(row.cells.items()): | ||
print("Column Family {}".format(cf)) | ||
for col, cells in sorted(cols.items()): | ||
for cell in cells: | ||
labels = " [{}]".format(",".join(cell.labels)) \ | ||
if len(cell.labels) else "" | ||
print( | ||
"\t{}: {} @{}{}".format(col.decode('utf-8'), | ||
cell.value.decode('utf-8'), | ||
cell.timestamp, labels)) | ||
print("") | ||
# [END bigtable_filters_limit_row_sample] | ||
# [END bigtable_filters_limit_row_regex] | ||
# [END bigtable_filters_limit_cells_per_col] | ||
# [END bigtable_filters_limit_cells_per_row] | ||
# [END bigtable_filters_limit_cells_per_row_offset] | ||
# [END bigtable_filters_limit_col_family_regex] | ||
# [END bigtable_filters_limit_col_qualifier_regex] | ||
# [END bigtable_filters_limit_col_range] | ||
# [END bigtable_filters_limit_value_range] | ||
# [END bigtable_filters_limit_value_regex] | ||
# [END bigtable_filters_limit_timestamp_range] | ||
# [END bigtable_filters_limit_block_all] | ||
# [END bigtable_filters_limit_pass_all] | ||
# [END bigtable_filters_modify_strip_value] | ||
# [END bigtable_filters_modify_apply_label] | ||
# [END bigtable_filters_composing_chain] | ||
# [END bigtable_filters_composing_interleave] | ||
# [END bigtable_filters_composing_condition] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all of these imports needed by each sample? Is there a way to separate them so it is a smaller subset?
Also, I think it is still the plan to split snippets to be separate files. Perhaps that would make this more straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the issue I was having is that the reading and printing functions are exactly the same, so I didn't want to have that code duplicated throughout in case any of it needed to change. If there is a way to split the imports, I can do that so each snippet only gets the necessary ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to punt this for a bit we can, but the end goal is to have 1 sample: 1 file. At that point the imports will split. For an example of of what I mean, you can look at the storage samples.