Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

PyArrow porting 0.14.1 -> 0.15.0 #249

Merged
merged 9 commits into from
Oct 25, 2019
Merged

Conversation

PokhodenkoSA
Copy link
Contributor

@@ -14,11 +14,11 @@ requirements:
- {{ compiler('cxx') }}
- cmake >=3.2
- python 3.6.*
- pyarrow ==0.14.1
- arrow-cpp ==0.14.1
- pyarrow ==0.15.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have this version of the required package in one place for whole project? It might be some macro variable or something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example with eliminating duplication inside one file #252.
I think use one version for whole project is not possible because all places plays different roles. Some places points range of available versions for use in customer environment, some points to concrete version we use for building and development.

@Vyacheslav-Smirnov have you some ideas about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This article explains differences between versions in Python infrastructure files.

return 0;

auto arr = chunked_array->chunk(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it meant that we always have one chunk after reading?

Copy link
Contributor Author

@PokhodenkoSA PokhodenkoSA Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It corresponds to original implementation of ReadColumn(... <Array> ...). It gets ChunkedArray end returns only the first Array in it.
I think this code should be improved in the future, e.g. for using all arrays in ChunkedArray. But for porting reasons it is ok because it works as previously.

parquet::arrow::FromParquetSchema(descr, column_indices, parquet_key_value_metadata, &col_schema);
// auto descr = arrow_reader->parquet_reader()->metadata()->schema();
// auto parquet_key_value_metadata = arrow_reader->parquet_reader()->metadata()->key_value_metadata();
// parquet::arrow::FromParquetSchema(descr, column_indices, parquet_key_value_metadata, &col_schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these comments helpful for some reason?

Copy link
Contributor Author

@PokhodenkoSA PokhodenkoSA Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I will remove it.

@@ -14,11 +14,11 @@ requirements:
- {{ compiler('cxx') }}
- cmake >=3.2
- python 3.6.*
- pyarrow ==0.14.1
- arrow-cpp ==0.14.1
- pyarrow ==0.15.0
Copy link
Contributor Author

@PokhodenkoSA PokhodenkoSA Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to use Jinja2 template.
Define in the beginning of the file:

{% set PYARROW_VERSION = "==0.15.0" %}
Suggested change
- pyarrow ==0.15.0
- pyarrow {{ PYARROW_VERSION }}

Is it ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run:
- pyarrow ==0.14.1
- arrow-cpp ==0.14.1
- pyarrow ==0.15.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- pyarrow ==0.15.0
- pyarrow {{ PYARROW_VERSION }}

@shssf shssf merged commit db43e32 into IntelPython:master Oct 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants