-
Notifications
You must be signed in to change notification settings - Fork 61
[BUG] Fixed problems with generation parquet files #93
Conversation
I like the idea of generating data on the fly. Could you explain in more detail why you disabled tests? |
Specifically, I mistakenly turned off these tests, because I tested it locally on a machine and (which was not very obvious to me) missed the data generation phase ( For example, a comment below says that data is generated only for # generate test data for test_io
python hpat/tests/gen_test_data.py |
schema = StructType([StructField('DT64', DateType(), True), | ||
StructField('DATE', TimestampType(), True)]) | ||
sdf = spark.createDataFrame(df, schema) | ||
sdf.write.parquet('sdf_dt.pq', 'overwrite') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move this directory (and other generated data) under "build" directories tree? I don't think we need it on top sources level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location of the test folder is quite good, as far as I can see, it is used in many large projects (for example, pandas).
The generated data is really out of place.
I suggest leaving the test folder in place and transferring the generated data into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ready to do it, but probably better in separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf This PR can be merged? There is no blocker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmyachev yes, but lets wait @fschlimb approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shssf ok, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.
schema = StructType([StructField('DT64', DateType(), True), | ||
StructField('DATE', TimestampType(), True)]) | ||
sdf = spark.createDataFrame(df, schema) | ||
sdf.write.parquet('sdf_dt.pq', 'overwrite') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.
[BUG] Fixed problems with generation parquet files (IntelPython#93)
Some tests were skipped by me, because I did not think that I needed to separately generate data.
Besides I think that unit tests should be run without additional run of the data generation script.