## Find similar storm reports

This notebook shows a TFX pipeline that does a semantic search to find duplicate storm reports.
This is an example of a Workflow Pipeline that doesn't do any training. Instead, it just sets a TFX pipeline up for inference.

The source of our data are preliminary storm reports filed by storm spotters to National Weather Service offices. This dataset has already been manually cleaned, so for illustration, we'll ignore the year/location when doing the search

In [1]:
## CHANGE AS NEEDED
BEAM_RUNNER = 'DirectRunner'  # or DataflowRunner
PROJECT='ai-analytics-solutions'
BUCKET='ai-analytics-solutions-kfpdemo'
REGION='us-west1'

## Explore data in BigQuery

Preview data in BigQuery

In [6]:
%%bigquery
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  'wind' as type,
  LOWER(comments) AS comments,
FROM `bigquery-public-data.noaa_preliminary_severe_storms.wind_reports`
LIMIT 10

Unnamed: 0,year,julian_day,latitude,longitude,office,type,comments
0,2019,19,32.49,-85.18,(BMX),wind,trees down near the intersection of lee rd 440...
1,2019,43,32.49,-85.13,(BMX),wind,reports of trees down in various locations in ...
2,2019,62,32.6,-85.24,(BMX),wind,corrects previous tornado report from salem. u...
3,2019,85,32.55,-85.1,(BMX),wind,a tree was downed onto a home. (bmx)
4,2019,158,32.6,-85.42,(BMX),wind,tree down on a home. time estimated from radar...
5,2019,158,32.51,-85.42,(BMX),wind,multiple trees down on lee road 29. time estim...
6,2019,158,32.71,-85.18,(BMX),wind,trees down in beulah. time estimated from rada...
7,2019,158,32.54,-85.09,(BMX),wind,multiple trees down in smiths station. time es...
8,2019,190,32.67,-85.49,(BMX),wind,trees down near the intersection of lee rd 147...
9,2019,217,32.52,-85.07,(BMX),wind,corrects previous tstm wnd dmg report from 1 e...


In [7]:
%%bigquery
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  LOWER(comments) AS comments,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  size,
  'hail' as type
FROM `bigquery-public-data.noaa_preliminary_severe_storms.hail_reports`
LIMIT 10

Unnamed: 0,year,julian_day,latitude,longitude,comments,office,size,type
0,2019,177,0.0,-40.4,9134,,100,hail
1,2019,222,0.0,-5.91,246 reported at 235 meadow view dr... butte mt...,,175,hail
2,2019,84,32.77,-87.77,lots of hail fell. not quite the size of a gol...,(BMX),125,hail
3,2019,84,32.77,-87.59,quarter size hail... may have been larger. sev...,(BMX),100,hail
4,2019,73,33.66,-88.11,(bmx),(BMX),150,hail
5,2019,136,33.89,-86.75,several reports of dime to quarter size hail i...,(BMX),100,hail
6,2019,85,31.7,-87.8,trained spotter reports quarter sized hail. re...,(MOB),100,hail
7,2019,74,34.29,-85.84,quarter size hail was reported in the collinsv...,(HUN),100,hail
8,2019,84,32.73,-86.33,quarter-sized hail measured in the titus area....,(BMX),100,hail
9,2019,121,32.51,-86.21,quarter size hail near walmart on highway 231....,(BMX),100,hail


In [8]:
%%bigquery
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  LOWER(comments) AS comments,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  'tornado' as type
FROM `bigquery-public-data.noaa_preliminary_severe_storms.tornado_reports`
LIMIT 10

Unnamed: 0,year,julian_day,latitude,longitude,comments,office,type
0,2019,4,31.62,-85.28,tree damage from a brief tornado about 1.5 mi ...,(TAE),tornado
1,2019,19,32.49,-86.73,*** 2 inj *** corrects previous tornado report...,(BMX),tornado
2,2019,19,32.52,-86.24,*** 4 inj *** a nws storm survey team confirme...,(BMX),tornado
3,2019,19,32.85,-86.15,a nws storm survey team confirmed a tornado of...,(BMX),tornado
4,2019,54,33.06,-88.31,corrects previous tornado report from 3 wsw ga...,(BMX),tornado
5,2019,54,33.66,-88.06,the tornado touched down just south of crawfor...,(BMX),tornado
6,2019,54,33.7,-87.98,report of trees down along cody rd. near kingv...,(BMX),tornado
7,2019,55,33.83,-87.68,an ef-0 tornado touched down in a wooded area ...,(BMX),tornado
8,2019,55,32.86,-86.18,the tornado touched down just east of cr 40 an...,(BMX),tornado
9,2019,62,32.47,-86.75,a tornado touched 6 miles northwest of autauga...,(BMX),tornado


In [2]:
query = """
WITH wind AS (
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  LOWER(comments) AS comments,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  'wind' as type
FROM `bigquery-public-data.noaa_preliminary_severe_storms.wind_reports`
),

hail AS (
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  LOWER(comments) AS comments,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  'hail' as type
FROM `bigquery-public-data.noaa_preliminary_severe_storms.hail_reports`
),

tornadoes AS (
SELECT 
  EXTRACT(YEAR from timestamp) AS year,
  EXTRACT(DAYOFYEAR from timestamp) AS julian_day,
  latitude, longitude,
  LOWER(comments) AS comments,
  REGEXP_EXTRACT(comments, r"\([A-Z]+\)$") AS office,
  'tornado' as type
FROM `bigquery-public-data.noaa_preliminary_severe_storms.tornado_reports`
)

SELECT * FROM (
   SELECT * FROM wind
   UNION ALL
   SELECT * FROM hail
   UNION ALL
   SELECT * FROM tornadoes
)
"""

In [3]:
## skip_for_export
import google.cloud.bigquery as bq
df = bq.Client().query(query).result().to_dataframe()
df.groupby('type').count()

Unnamed: 0_level_0,year,julian_day,latitude,longitude,comments,office
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
hail,5397,5397,5397,5397,5397,5394
tornado,1677,1677,1677,1677,1677,1677
wind,16064,16064,16064,16064,16064,16064


## Ingest data

We'll use the TFX component BigQueryExampleGen to read in the data.

In [4]:
import tensorflow as tf
print('tensorflow ' + tf.__version__)
import tfx
print('tfx ' + tfx.__version__)
import apache_beam as beam
print('beam ' + beam.__version__)

tensorflow 2.2.0-dlenv
tfx 0.22.0
beam 2.22.0


In [5]:
from tfx.components import BigQueryExampleGen
example_gen = BigQueryExampleGen(query=query)

Error importing tfx_bsl_extension.coders. Some tfx_bsl functionalities are not availableError importing tfx_bsl_extension.arrow.array_util. Some tfx_bsl functionalities are not availableError importing tfx_bsl_extension.arrow.table_util. Some tfx_bsl functionalities are not available: libarrow.so.16: cannot open shared object file: No such file or directory

In [6]:
import os
beam_pipeline_args = [
    '--runner={}'.format(BEAM_RUNNER),
    '--project={}'.format(PROJECT),
    '--temp_location=' + os.path.join('gs://{}/noaa_similar_reports/'.format(BUCKET), 'tmp'),
    '--region=' + REGION,

    # Temporary overrides of defaults.
    '--disk_size_gb=50',
    '--experiments=shuffle_mode=auto',
    '--machine_type=n1-standard-8',
]

In [7]:
## skip_for_export
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
context = InteractiveContext()



In [8]:
ingest_result = context.run(example_gen, beam_pipeline_args=beam_pipeline_args)



  query=query, use_standard_sql=True, project=project))
  temp_location = pcoll.pipeline.options.view_as(


In [9]:
context.show(ingest_result)

0,1
.execution_id,1
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } BigQueryExampleGen at 0x7f96b08cd990.inputs{}.outputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f96550a8890.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""].exec_properties['input_config']{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""\nWITH wind AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'wind' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.wind_reports`\n),\n\nhail AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'hail' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.hail_reports`\n),\n\ntornadoes AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'tornado' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.tornado_reports`\n)\n\nSELECT * FROM (\n SELECT * FROM wind\n UNION ALL\n SELECT * FROM hail\n UNION ALL\n SELECT * FROM tornadoes\n)\n""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }['custom_config']None['_beam_pipeline_args'][0]--runner=DirectRunner[1]--project=ai-analytics-solutions[2]--temp_location=gs://ai-analytics-solutions-kfpdemo/noaa_similar_reports/tmp[3]--region=us-west1[4]--disk_size_gb=50[5]--experiments=shuffle_mode=auto[6]--machine_type=n1-standard-8[7]--extra_package=/tmp/tmpvs_lpbdn/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz[8]--labels[9]tfx_executor=-components-example_gen-big_query_example_gen-executor-executor[10]--labels[11]tfx_py_version=3-7[12]--labels[13]tfx_version=0-22-0"
.component.inputs,{}
.component.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f96550a8890.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.inputs,{}
.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f96550a8890.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"
.exec_properties,"['input_config']{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""\nWITH wind AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'wind' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.wind_reports`\n),\n\nhail AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'hail' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.hail_reports`\n),\n\ntornadoes AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'tornado' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.tornado_reports`\n)\n\nSELECT * FROM (\n SELECT * FROM wind\n UNION ALL\n SELECT * FROM hail\n UNION ALL\n SELECT * FROM tornadoes\n)\n""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }['custom_config']None['_beam_pipeline_args'][0]--runner=DirectRunner[1]--project=ai-analytics-solutions[2]--temp_location=gs://ai-analytics-solutions-kfpdemo/noaa_similar_reports/tmp[3]--region=us-west1[4]--disk_size_gb=50[5]--experiments=shuffle_mode=auto[6]--machine_type=n1-standard-8[7]--extra_package=/tmp/tmpvs_lpbdn/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz[8]--labels[9]tfx_executor=-components-example_gen-big_query_example_gen-executor-executor[10]--labels[11]tfx_py_version=3-7[12]--labels[13]tfx_version=0-22-0"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f96550a8890.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['input_config'],"{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""\nWITH wind AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'wind' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.wind_reports`\n),\n\nhail AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'hail' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.hail_reports`\n),\n\ntornadoes AS (\nSELECT \n EXTRACT(YEAR from timestamp) AS year,\n EXTRACT(DAYOFYEAR from timestamp) AS julian_day,\n latitude, longitude,\n LOWER(comments) AS comments,\n REGEXP_EXTRACT(comments, r\""\\([A-Z]+\\)$\"") AS office,\n 'tornado' as type\nFROM `bigquery-public-data.noaa_preliminary_severe_storms.tornado_reports`\n)\n\nSELECT * FROM (\n SELECT * FROM wind\n UNION ALL\n SELECT * FROM hail\n UNION ALL\n SELECT * FROM tornadoes\n)\n""  }  ] }"
['output_config'],"{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }"
['custom_config'],
['_beam_pipeline_args'],[0]--runner=DirectRunner[1]--project=ai-analytics-solutions[2]--temp_location=gs://ai-analytics-solutions-kfpdemo/noaa_similar_reports/tmp[3]--region=us-west1[4]--disk_size_gb=50[5]--experiments=shuffle_mode=auto[6]--machine_type=n1-standard-8[7]--extra_package=/tmp/tmpvs_lpbdn/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz[8]--labels[9]tfx_executor=-components-example_gen-big_query_example_gen-executor-executor[10]--labels[11]tfx_py_version=3-7[12]--labels[13]tfx_version=0-22-0

0,1
[0],--runner=DirectRunner
[1],--project=ai-analytics-solutions
[2],--temp_location=gs://ai-analytics-solutions-kfpdemo/noaa_similar_reports/tmp
[3],--region=us-west1
[4],--disk_size_gb=50
[5],--experiments=shuffle_mode=auto
[6],--machine_type=n1-standard-8
[7],--extra_package=/tmp/tmpvs_lpbdn/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz
[8],--labels
[9],tfx_executor=-components-example_gen-big_query_example_gen-executor-executor

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f96550a8890.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1) at 0x7f96597a82d0.type<class 'tfx.types.standard_artifacts.Examples'>.uri/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1
.span,0
.split_names,"[""train"", ""eval""]"


In [10]:
print(ingest_result)

ExecutionResult(
    component_id: BigQueryExampleGen
    execution_id: 1
    outputs:
        examples: Channel(
            type_name: Examples
            artifacts: [Artifact(type_name: Examples, uri: /tmp/tfx-interactive-2020-08-10T17_40_26.761302-aa0vd3pg/BigQueryExampleGen/examples/1, id: 1)]
        ))


## Validate the data

Let's generate statistics from the data

In [11]:
from tfx.components import StatisticsGen
stats_gen = StatisticsGen(examples=example_gen.outputs['examples'])

In [12]:
context.run(stats_gen) #, beam_pipeline_args=beam_pipeline_args)

AttributeError: module 'tfx_bsl.coders.example_coder' has no attribute 'ExamplesToRecordBatchDecoder' [while running 'TFXIORead[train]/RawRecordToRecordBatch/RawRecordToRecordBatch/Decode']

Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License