Skip to content

[WIP][SPARK-46620][PS][CONNECT] Implement Frame.asfreq#44621

Closed
zhengruifeng wants to merge 3 commits intoapache:masterfrom
zhengruifeng:ps_df_asfreq
Closed

[WIP][SPARK-46620][PS][CONNECT] Implement Frame.asfreq#44621
zhengruifeng wants to merge 3 commits intoapache:masterfrom
zhengruifeng:ps_df_asfreq

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Implement Frame.asfreq

Why are the changes needed?

for feature parity

Does this PR introduce any user-facing change?

yes

In [1]: import pyspark.pandas as ps

In [2]: import pandas as pd

In [3]: index = pd.date_range('1/1/2000', periods=4, freq='min')

In [4]: series = pd.Series([0.0, None, 2.0, 3.0], index=index)

In [5]: pdf = pd.DataFrame({'s': series})

In [6]: psdf = ps.from_pandas(pdf)
24/01/08 17:25:19 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found

In [7]: psdf.asfreq(freq='30s')
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: `frame.asfreq` loads partial data into the driver's memory to infer the schema, and loads all data into one executor's memory to compute. It should only be used if the pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: If the type hints is not specified for `groupby.apply`, it is expensive to infer the data type internally.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
Out[7]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  NaN
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  NaN
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  NaN
2000-01-01 00:03:00  3.0

In [8]: psdf.asfreq(freq='30s', fill_value=9.0)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: `frame.asfreq` loads partial data into the driver's memory to infer the schema, and loads all data into one executor's memory to compute. It should only be used if the pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: If the type hints is not specified for `groupby.apply`, it is expensive to infer the data type internally.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
Out[8]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  9.0
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  9.0
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  9.0
2000-01-01 00:03:00  3.0

In [9]: psdf.asfreq(freq='30s', method='bfill')
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: `frame.asfreq` loads partial data into the driver's memory to infer the schema, and loads all data into one executor's memory to compute. It should only be used if the pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: If the type hints is not specified for `groupby.apply`, it is expensive to infer the data type internally.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
Out[9]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  NaN
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  2.0
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  3.0
2000-01-01 00:03:00  3.0

How was this patch tested?

doc test and ut

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 21, 2024
@github-actions github-actions bot closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant