Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigtable: read and filter snippets #2707

Merged
merged 35 commits into from
Mar 12, 2020
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
87c68ae
Bigtable write samples
billyjacobson Jun 7, 2019
971e1b7
Cleaning up test
billyjacobson Jun 7, 2019
1176825
Fixing lint issues
billyjacobson Jun 7, 2019
a87606a
Fixing imports in test
billyjacobson Jun 7, 2019
222fd2c
Cleaning up samples and showing error handling
billyjacobson Jun 18, 2019
e12b69f
removing note about the row commit bug
billyjacobson Jun 20, 2019
3d63d6a
Add fixture to write test
billyjacobson Jun 24, 2019
23f9fdc
Merge branch 'master' into write-samples
billyjacobson Jan 7, 2020
4a488a4
Read snippets WIP
billyjacobson Jan 8, 2020
ffda58e
Cleanup bigtable python:
billyjacobson Jan 8, 2020
a60656f
Change bigtable cluster variable to bigtable instance for consistency
billyjacobson Jan 8, 2020
0ddcb22
Fixing step size for metric scaler
billyjacobson Jan 8, 2020
440b32c
Merge branch 'master' into bigtable-cleanup
leahecole Jan 8, 2020
8f67553
Creating fixtures for quickstart tests
billyjacobson Jan 8, 2020
659efb1
Fix quickstart extra delete table
billyjacobson Jan 8, 2020
5c05ec5
Use clearer instance names for tests
billyjacobson Jan 8, 2020
cb042e7
Merge branch 'master' into bigtable-cleanup
billyjacobson Jan 8, 2020
151b72e
Linting
billyjacobson Jan 8, 2020
d3371e4
Merge branch 'bigtable-cleanup' into write-samples
billyjacobson Jan 9, 2020
821a97a
get session issue in test sorted out
billyjacobson Jan 9, 2020
8f672f1
Read snippets with tests working
billyjacobson Jan 10, 2020
bf06f87
Filter snippets with tests working
billyjacobson Jan 10, 2020
74b7fe5
Lint
billyjacobson Jan 10, 2020
3c250d8
Merge branch 'master' into bigtable-reads
billyjacobson Jan 10, 2020
d2f5d58
Update module import
billyjacobson Jan 10, 2020
02de5a3
Fix bigtable instance env var
billyjacobson Jan 10, 2020
c0c4a78
Change scope to module
billyjacobson Jan 10, 2020
0ab8b33
Don't print empty parens
billyjacobson Jan 10, 2020
d7a5ef0
sort cols
billyjacobson Jan 10, 2020
93eb9e2
sort by cfs too
billyjacobson Jan 10, 2020
e4da2b3
Merge branch 'master' into bigtable-reads
leahecole Feb 4, 2020
da7ad4a
Merge branch 'master' into bigtable-reads
crwilcox Feb 5, 2020
31dfa4c
Make requirements more specific to samples.
billyjacobson Feb 11, 2020
781155c
Merge branch 'master' into bigtable-reads
billyjacobson Feb 11, 2020
a14f419
Merge branch 'master' into bigtable-reads
crwilcox Mar 12, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
357 changes: 357 additions & 0 deletions bigtable/snippets/filters/filter_snippets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
#!/usr/bin/env python

# Copyright 2020, Google LLC
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# [START bigtable_filters_limit_row_sample]
# [START bigtable_filters_limit_row_regex]
# [START bigtable_filters_limit_cells_per_col]
# [START bigtable_filters_limit_cells_per_row]
# [START bigtable_filters_limit_cells_per_row_offset]
# [START bigtable_filters_limit_col_family_regex]
# [START bigtable_filters_limit_col_qualifier_regex]
# [START bigtable_filters_limit_col_range]
# [START bigtable_filters_limit_value_range]
# [START bigtable_filters_limit_value_regex]
# [START bigtable_filters_limit_timestamp_range]
# [START bigtable_filters_limit_block_all]
# [START bigtable_filters_limit_pass_all]
# [START bigtable_filters_modify_strip_value]
# [START bigtable_filters_modify_apply_label]
# [START bigtable_filters_composing_chain]
# [START bigtable_filters_composing_interleave]
# [START bigtable_filters_composing_condition]
import datetime
from google.cloud import bigtable

# [END bigtable_filters_limit_row_sample]
# [END bigtable_filters_limit_row_regex]
# [END bigtable_filters_limit_cells_per_col]
# [END bigtable_filters_limit_cells_per_row]
# [END bigtable_filters_limit_cells_per_row_offset]
# [END bigtable_filters_limit_col_family_regex]
# [END bigtable_filters_limit_col_qualifier_regex]
# [END bigtable_filters_limit_col_range]
# [END bigtable_filters_limit_value_range]
# [END bigtable_filters_limit_value_regex]
# [END bigtable_filters_limit_timestamp_range]
# [END bigtable_filters_limit_block_all]
# [END bigtable_filters_limit_pass_all]
# [END bigtable_filters_modify_strip_value]
# [END bigtable_filters_modify_apply_label]
# [END bigtable_filters_composing_chain]
# [END bigtable_filters_composing_interleave]
# [END bigtable_filters_composing_condition]
from google.cloud.bigtable.row_filters import ApplyLabelFilter, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of these imports needed by each sample? Is there a way to separate them so it is a smaller subset?

Also, I think it is still the plan to split snippets to be separate files. Perhaps that would make this more straightforward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the issue I was having is that the reading and printing functions are exactly the same, so I didn't want to have that code duplicated throughout in case any of it needed to change. If there is a way to split the imports, I can do that so each snippet only gets the necessary ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to punt this for a bit we can, but the end goal is to have 1 sample: 1 file. At that point the imports will split. For an example of of what I mean, you can look at the storage samples.

BlockAllFilter, CellsColumnLimitFilter, CellsRowLimitFilter, \
CellsRowOffsetFilter, ColumnQualifierRegexFilter, ColumnRangeFilter, \
ConditionalRowFilter, FamilyNameRegexFilter, PassAllFilter, \
RowFilterChain, RowFilterUnion, RowKeyRegexFilter, RowSampleFilter, \
StripValueTransformerFilter, TimestampRange, TimestampRangeFilter, \
ValueRangeFilter, ValueRegexFilter


# [START bigtable_filters_limit_row_sample]
def filter_limit_row_sample(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=RowSampleFilter(.75))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_row_sample]
# [START bigtable_filters_limit_row_regex]
def filter_limit_row_regex(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(
filter_=RowKeyRegexFilter(".*#20190501$".encode("utf-8")))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_row_regex]
# [START bigtable_filters_limit_cells_per_col]
def filter_limit_cells_per_col(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=CellsColumnLimitFilter(2))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_cells_per_col]
# [START bigtable_filters_limit_cells_per_row]
def filter_limit_cells_per_row(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=CellsRowLimitFilter(2))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_cells_per_row]
# [START bigtable_filters_limit_cells_per_row_offset]
def filter_limit_cells_per_row_offset(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=CellsRowOffsetFilter(2))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_cells_per_row_offset]
# [START bigtable_filters_limit_col_family_regex]
def filter_limit_col_family_regex(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(
filter_=FamilyNameRegexFilter("stats_.*$".encode("utf-8")))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_col_family_regex]
# [START bigtable_filters_limit_col_qualifier_regex]
def filter_limit_col_qualifier_regex(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(
filter_=ColumnQualifierRegexFilter("connected_.*$".encode("utf-8")))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_col_qualifier_regex]
# [START bigtable_filters_limit_col_range]
def filter_limit_col_range(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(
filter_=ColumnRangeFilter("cell_plan",
b"data_plan_01gb",
b"data_plan_10gb",
inclusive_end=False))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_col_range]
# [START bigtable_filters_limit_value_range]
def filter_limit_value_range(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(
filter_=ValueRangeFilter(b"PQ2A.190405", b"PQ2A.190406"))

for row in rows:
print_row(row)


# [END bigtable_filters_limit_value_range]
# [START bigtable_filters_limit_value_regex]


def filter_limit_value_regex(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=ValueRegexFilter("PQ2A.*$".encode("utf-8")))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_value_regex]
# [START bigtable_filters_limit_timestamp_range]
def filter_limit_timestamp_range(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

end = datetime.datetime(2019, 5, 1)

rows = table.read_rows(
filter_=TimestampRangeFilter(TimestampRange(end=end)))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_timestamp_range]
# [START bigtable_filters_limit_block_all]
def filter_limit_block_all(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=BlockAllFilter(True))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_block_all]
# [START bigtable_filters_limit_pass_all]
def filter_limit_pass_all(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=PassAllFilter(True))
for row in rows:
print_row(row)


# [END bigtable_filters_limit_pass_all]
# [START bigtable_filters_modify_strip_value]
def filter_modify_strip_value(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=StripValueTransformerFilter(True))
for row in rows:
print_row(row)


# [END bigtable_filters_modify_strip_value]
# [START bigtable_filters_modify_apply_label]
def filter_modify_apply_label(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=ApplyLabelFilter(label="labelled"))
for row in rows:
print_row(row)


# [END bigtable_filters_modify_apply_label]
# [START bigtable_filters_composing_chain]
def filter_composing_chain(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=RowFilterChain(
filters=[CellsColumnLimitFilter(1),
FamilyNameRegexFilter("cell_plan")]))
for row in rows:
print_row(row)


# [END bigtable_filters_composing_chain]
# [START bigtable_filters_composing_interleave]
def filter_composing_interleave(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=RowFilterUnion(
filters=[ValueRegexFilter("true"),
ColumnQualifierRegexFilter("os_build")]))
for row in rows:
print_row(row)


# [END bigtable_filters_composing_interleave]
# [START bigtable_filters_composing_condition]
def filter_composing_condition(project_id, instance_id, table_id):
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

rows = table.read_rows(filter_=ConditionalRowFilter(
base_filter=RowFilterChain(filters=[
ColumnQualifierRegexFilter(
"data_plan_10gb"),
ValueRegexFilter(
"true")]),
true_filter=ApplyLabelFilter(label="passed-filter"),
false_filter=ApplyLabelFilter(label="filtered-out")

))
for row in rows:
print_row(row)


# [END bigtable_filters_composing_condition]


# [START bigtable_filters_limit_row_sample]
# [START bigtable_filters_limit_row_regex]
# [START bigtable_filters_limit_cells_per_col]
# [START bigtable_filters_limit_cells_per_row]
# [START bigtable_filters_limit_cells_per_row_offset]
# [START bigtable_filters_limit_col_family_regex]
# [START bigtable_filters_limit_col_qualifier_regex]
# [START bigtable_filters_limit_col_range]
# [START bigtable_filters_limit_value_range]
# [START bigtable_filters_limit_value_regex]
# [START bigtable_filters_limit_timestamp_range]
# [START bigtable_filters_limit_block_all]
# [START bigtable_filters_limit_pass_all]
# [START bigtable_filters_modify_strip_value]
# [START bigtable_filters_modify_apply_label]
# [START bigtable_filters_composing_chain]
# [START bigtable_filters_composing_interleave]
# [START bigtable_filters_composing_condition]
def print_row(row):
print("Reading data for {}:".format(row.row_key.decode('utf-8')))
for cf, cols in sorted(row.cells.items()):
print("Column Family {}".format(cf))
for col, cells in sorted(cols.items()):
for cell in cells:
labels = " [{}]".format(",".join(cell.labels)) \
if len(cell.labels) else ""
print(
"\t{}: {} @{}{}".format(col.decode('utf-8'),
cell.value.decode('utf-8'),
cell.timestamp, labels))
print("")
# [END bigtable_filters_limit_row_sample]
# [END bigtable_filters_limit_row_regex]
# [END bigtable_filters_limit_cells_per_col]
# [END bigtable_filters_limit_cells_per_row]
# [END bigtable_filters_limit_cells_per_row_offset]
# [END bigtable_filters_limit_col_family_regex]
# [END bigtable_filters_limit_col_qualifier_regex]
# [END bigtable_filters_limit_col_range]
# [END bigtable_filters_limit_value_range]
# [END bigtable_filters_limit_value_regex]
# [END bigtable_filters_limit_timestamp_range]
# [END bigtable_filters_limit_block_all]
# [END bigtable_filters_limit_pass_all]
# [END bigtable_filters_modify_strip_value]
# [END bigtable_filters_modify_apply_label]
# [END bigtable_filters_composing_chain]
# [END bigtable_filters_composing_interleave]
# [END bigtable_filters_composing_condition]