Skip to content

Conversation

@thangnd197
Copy link

Lead-authored-by: dgd-contributor dgd_contributor@viettel.com.vn
Co-authored-by: thangnd197 thangnd2210@gmail.com

What changes were proposed in this pull request?

Support multi-index in new syntax to specify index data type

Why are the changes needed?

Support multi-index in new syntax to specify index data type

https://issues.apache.org/jira/browse/SPARK-36707

Does this PR introduce any user-facing change?

After this PR user can use

>>> ps.DataFrame[[int, int],[int, int]]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType]


>>> arrays = [[1, 1, 2], ['red', 'blue', 'red']]
>>> idx = pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
>>> pdf = pd.DataFrame([[1,2,3],[2,3,4],[4,5,6]], index=idx, columns=["a", "b", "c"])
>>> ps.DataFrame[pdf.index.dtypes, pdf.dtypes]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType]


>>> ps.DataFrame[[("index", int), ("index-2", int)], [("id", int), ("A", int)]]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType]


>>> ps.DataFrame[zip(pdf.index.names, pdf.index.dtypes), zip(pdf.columns, pdf.dtypes)]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.IndexNameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType, pyspark.pandas.typedef.typehints.NameType]

How was this patch tested?

exist tests

Lead-authored-by: dgd-contributor <dgd_contributor@viettel.com.vn>
Co-authored-by: thangnd197 <thangnd2210@gmail.com>
@thangnd197 thangnd197 changed the title [SPARK-36711] Support multi-index in new syntax [SPARK-36711][PYTHON] Support multi-index in new syntax Sep 27, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Lead-authored-by should be individual instead of a group. If this is committed, the commit will be shown as @dgd-contributor as its main author that is ambiguous. Let's discuss and wait until we reach to a conclusion.

@thangnd197 thangnd197 closed this Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants