You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It should be possible to have a column defined in env.data.schema that isn't used as a raw_column (CSV and Parquet).
Application Configuration
Start from Iris, and then completely remove one of the features (e.g. sepal_width)
To Reproduce
cx deploy
Stack Trace
Starting
INFO:cortex:Starting
INFO:cortex:Ingesting
INFO:cortex:Ingesting iris-1 data from s3a://cortex-examples/iris.csv
ERROR:cortex:An error occurred, see `cx logs raw_column sepal_width` for more details.
Traceback (most recent call last):
File "/src/spark_job/spark_job.py", line 307, in run_job
raw_df = ingest_raw_dataset(spark, ctx, cols_to_validate, should_ingest)
File "/src/spark_job/spark_job.py", line 151, in ingest_raw_dataset
ingest_df = spark_util.ingest(ctx, spark)
File "/src/spark_job/spark_util.py", line 223, in ingest
expected_schema = expected_schema_from_context(ctx)
File "/src/spark_job/spark_util.py", line 117, in expected_schema_from_context
for fname in expected_field_names
File "/src/spark_job/spark_util.py", line 117, in <listcomp>
for fname in expected_field_names
KeyError: 'petal_width'
Version
master
Additional Context
It should also be possible to not ingest all of the columns in the dataset (just Parquet for now); we should test this.
The text was updated successfully, but these errors were encountered:
deliahu
changed the title
Not using an ingested column as a raw_column results in error
Not using an ingested column as a raw_column results in error
Apr 19, 2019
Description
It should be possible to have a column defined in env.data.schema that isn't used as a raw_column (CSV and Parquet).
Application Configuration
Start from Iris, and then completely remove one of the features (e.g.
sepal_width
)To Reproduce
Stack Trace
Version
master
Additional Context
It should also be possible to not ingest all of the columns in the dataset (just Parquet for now); we should test this.
The text was updated successfully, but these errors were encountered: