Not using an ingested column as a raw_column results in error #69

deliahu · 2019-04-19T16:54:48Z

Description

It should be possible to have a column defined in env.data.schema that isn't used as a raw_column (CSV and Parquet).

Application Configuration

Start from Iris, and then completely remove one of the features (e.g. sepal_width)

To Reproduce

cx deploy

Stack Trace

Starting

INFO:cortex:Starting
INFO:cortex:Ingesting
INFO:cortex:Ingesting iris-1 data from s3a://cortex-examples/iris.csv
ERROR:cortex:An error occurred, see `cx logs raw_column sepal_width` for more details.
Traceback (most recent call last):
  File "/src/spark_job/spark_job.py", line 307, in run_job
    raw_df = ingest_raw_dataset(spark, ctx, cols_to_validate, should_ingest)
  File "/src/spark_job/spark_job.py", line 151, in ingest_raw_dataset
    ingest_df = spark_util.ingest(ctx, spark)
  File "/src/spark_job/spark_util.py", line 223, in ingest
    expected_schema = expected_schema_from_context(ctx)
  File "/src/spark_job/spark_util.py", line 117, in expected_schema_from_context
    for fname in expected_field_names
  File "/src/spark_job/spark_util.py", line 117, in <listcomp>
    for fname in expected_field_names
KeyError: 'petal_width'

Version

master

Additional Context

It should also be possible to not ingest all of the columns in the dataset (just Parquet for now); we should test this.

The text was updated successfully, but these errors were encountered:

deliahu added the bug Something isn't working label Apr 19, 2019

deliahu added this to the v0.4 milestone Apr 19, 2019

deliahu changed the title ~~Not using an ingested column as a raw_column results in error~~ Not using an ingested column as a raw_column results in error Apr 19, 2019

deliahu added the v0.4 label Apr 19, 2019

deliahu assigned vishalbollu Apr 22, 2019

vishalbollu mentioned this issue May 1, 2019

Allow users to ingest a subset of input columns #92

Merged

7 tasks

deliahu closed this as completed May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not using an ingested column as a raw_column results in error #69

Not using an ingested column as a raw_column results in error #69

deliahu commented Apr 19, 2019

Not using an ingested column as a raw_column results in error #69

Not using an ingested column as a raw_column results in error #69

Comments

deliahu commented Apr 19, 2019

Description

Application Configuration

To Reproduce

Stack Trace

Version

Additional Context