[BUG] Fixed problems with generation parquet files #93

anmyachev · 2019-07-17T07:09:13Z

Some tests were skipped by me, because I did not think that I needed to separately generate data.

Besides I think that unit tests should be run without additional run of the data generation script.

…esting

fschlimb · 2019-07-17T07:47:17Z

I like the idea of generating data on the fly.

Could you explain in more detail why you disabled tests?

anmyachev · 2019-07-17T08:20:01Z

Specifically, I mistakenly turned off these tests, because I tested it locally on a machine and (which was not very obvious to me) missed the data generation phase (python hpat/tests/gen_test_data.py), which runs when testing occurs in the Trave CI.

For example, a comment below says that data is generated only for test_io.py which isn't True.
(file - hpat/buildscripts/test.sh)

# generate test data for test_io
python hpat/tests/gen_test_data.py

hpat/tests/gen_test_data.py

shssf · 2019-07-17T18:16:12Z

hpat/tests/gen_test_data.py

+    schema = StructType([StructField('DT64', DateType(), True),
+                         StructField('DATE', TimestampType(), True)])
+    sdf = spark.createDataFrame(df, schema)
+    sdf.write.parquet('sdf_dt.pq', 'overwrite')


Is it possible to move this directory (and other generated data) under "build" directories tree? I don't think we need it on top sources level.

The location of the test folder is quite good, as far as I can see, it is used in many large projects (for example, pandas).

The generated data is really out of place.

I suggest leaving the test folder in place and transferring the generated data into it.

I am ready to do it, but probably better in separate PR

@shssf This PR can be merged? There is no blocker?

@anmyachev yes, but lets wait @fschlimb approval.

@shssf ok, thanks

Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.

fschlimb · 2019-07-18T06:22:40Z

hpat/tests/gen_test_data.py

+    schema = StructType([StructField('DT64', DateType(), True),
+                         StructField('DATE', TimestampType(), True)])
+    sdf = spark.createDataFrame(df, schema)
+    sdf.write.parquet('sdf_dt.pq', 'overwrite')


Yes, this is standard practice to have test folders to each source folder.
I am not sure I follow the discussion about "generated data". We should not generate any data or source at build or test time into source directories. Build-time and run-time files should always be separate.

[BUG] Fixed problems with generation parquet files (IntelPython#93)

anmyachev added 5 commits July 16, 2019 23:01

create new classes: 'ParquetGenerator', 'SparkGenerator' for better t…

dcd4735

…esting

unskipped tests with parquet files problem

3fdd501

fixed Spark data generation; created new func: 'generate_other_data'

04a1717

skipped tests

cb3fa59

skipped 'test_string_NA_box'

7962aca

shssf approved these changes Jul 17, 2019

View reviewed changes

hpat/tests/gen_test_data.py Show resolved Hide resolved

hpat/tests/gen_test_data.py Show resolved Hide resolved

replaced: 'class SparkGenerator' -> 'generate_spark_data' func

3451e01

shssf added the bug label Jul 17, 2019

shssf reviewed Jul 17, 2019

View reviewed changes

fschlimb approved these changes Jul 18, 2019

View reviewed changes

fschlimb merged commit 37a5ba8 into IntelPython:master Jul 18, 2019

kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Jul 18, 2019

Merge pull request #1 from IntelPython/master

793d66e

[BUG] Fixed problems with generation parquet files (IntelPython#93)

kozlov-alexey mentioned this pull request Jul 19, 2019

Fix for string tests failed due to bad str_overload #94

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fixed problems with generation parquet files #93

[BUG] Fixed problems with generation parquet files #93

anmyachev commented Jul 17, 2019

fschlimb commented Jul 17, 2019

anmyachev commented Jul 17, 2019

shssf Jul 17, 2019

anmyachev Jul 17, 2019

anmyachev Jul 17, 2019

anmyachev Jul 17, 2019

shssf Jul 17, 2019

anmyachev Jul 17, 2019

fschlimb Jul 18, 2019

fschlimb Jul 18, 2019

[BUG] Fixed problems with generation parquet files #93

[BUG] Fixed problems with generation parquet files #93

Conversation

anmyachev commented Jul 17, 2019

fschlimb commented Jul 17, 2019

anmyachev commented Jul 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment