Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not using an ingested column as a raw_column results in error #69

Closed
deliahu opened this issue Apr 19, 2019 · 0 comments
Closed

Not using an ingested column as a raw_column results in error #69

deliahu opened this issue Apr 19, 2019 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@deliahu
Copy link
Member

deliahu commented Apr 19, 2019

Description

It should be possible to have a column defined in env.data.schema that isn't used as a raw_column (CSV and Parquet).

Application Configuration

Start from Iris, and then completely remove one of the features (e.g. sepal_width)

To Reproduce

  1. cx deploy

Stack Trace

Starting

INFO:cortex:Starting
INFO:cortex:Ingesting
INFO:cortex:Ingesting iris-1 data from s3a://cortex-examples/iris.csv
ERROR:cortex:An error occurred, see `cx logs raw_column sepal_width` for more details.
Traceback (most recent call last):
  File "/src/spark_job/spark_job.py", line 307, in run_job
    raw_df = ingest_raw_dataset(spark, ctx, cols_to_validate, should_ingest)
  File "/src/spark_job/spark_job.py", line 151, in ingest_raw_dataset
    ingest_df = spark_util.ingest(ctx, spark)
  File "/src/spark_job/spark_util.py", line 223, in ingest
    expected_schema = expected_schema_from_context(ctx)
  File "/src/spark_job/spark_util.py", line 117, in expected_schema_from_context
    for fname in expected_field_names
  File "/src/spark_job/spark_util.py", line 117, in <listcomp>
    for fname in expected_field_names
KeyError: 'petal_width'

Version

master

Additional Context

It should also be possible to not ingest all of the columns in the dataset (just Parquet for now); we should test this.

@deliahu deliahu added the bug Something isn't working label Apr 19, 2019
@deliahu deliahu added this to the v0.4 milestone Apr 19, 2019
@deliahu deliahu changed the title Not using an ingested column as a raw_column results in error Not using an ingested column as a raw_column results in error Apr 19, 2019
@deliahu deliahu added the v0.4 label Apr 19, 2019
@deliahu deliahu closed this as completed May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants