Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

[BUG] Fixed problems with generation parquet files #93

Merged
merged 6 commits into from
Jul 18, 2019

Conversation

anmyachev
Copy link
Contributor

Some tests were skipped by me, because I did not think that I needed to separately generate data.

Besides I think that unit tests should be run without additional run of the data generation script.

@fschlimb
Copy link
Contributor

I like the idea of generating data on the fly.

Could you explain in more detail why you disabled tests?

@anmyachev
Copy link
Contributor Author

Specifically, I mistakenly turned off these tests, because I tested it locally on a machine and (which was not very obvious to me) missed the data generation phase (python hpat/tests/gen_test_data.py), which runs when testing occurs in the Trave CI.

For example, a comment below says that data is generated only for test_io.py which isn't True.
(file - hpat/buildscripts/test.sh)

# generate test data for test_io
python hpat/tests/gen_test_data.py

hpat/tests/gen_test_data.py Show resolved Hide resolved
hpat/tests/gen_test_data.py Show resolved Hide resolved
@shssf shssf added the bug label Jul 17, 2019
schema = StructType([StructField('DT64', DateType(), True),
StructField('DATE', TimestampType(), True)])
sdf = spark.createDataFrame(df, schema)
sdf.write.parquet('sdf_dt.pq', 'overwrite')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this directory (and other generated data) under "build" directories tree? I don't think we need it on top sources level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location of the test folder is quite good, as far as I can see, it is used in many large projects (for example, pandas).

The generated data is really out of place.

I suggest leaving the test folder in place and transferring the generated data into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ready to do it, but probably better in separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf This PR can be merged? There is no blocker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmyachev yes, but lets wait @fschlimb approval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shssf ok, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.

schema = StructType([StructField('DT64', DateType(), True),
StructField('DATE', TimestampType(), True)])
sdf = spark.createDataFrame(df, schema)
sdf.write.parquet('sdf_dt.pq', 'overwrite')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.

@fschlimb fschlimb merged commit 37a5ba8 into IntelPython:master Jul 18, 2019
kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Jul 18, 2019
[BUG] Fixed problems with generation parquet files (IntelPython#93)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants