Implements DataFrame.persist() with additional tests for DataFrame.cache() #1381

itholic · 2020-03-31T07:19:39Z

Resolves #1373

Here, we have a DataFrame named df

>>> import pyspark
>>> df = ks.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df
   dogs  cats
0   0.2   0.3
1   0.0   0.6
2   0.6   0.0
3   0.2   0.1

Set the StorageLevel to MEMORY_ONLY.

>>> with df.persist(pyspark.StorageLevel.MEMORY_ONLY) as cached_df:
...     print(cached_df.count())
...
dogs    4
cats    4
Name: 0, dtype: int64

Set the StorageLevel to DISK_ONLY.

>>> with df.persist(pyspark.StorageLevel.DISK_ONLY) as cached_df:
...     print(cached_df.count())
...
dogs    4
cats    4
Name: 0, dtype: int64

If a StorageLevel is not given, it uses MEMORY_AND_DISK by default.

>>> with df.persist() as cached_df:
...     print(cached_df.count())
...
dogs    4
cats    4
Name: 0, dtype: int64

codecov-io · 2020-03-31T07:40:42Z

Codecov Report

Merging #1381 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1381   +/-   ##
=======================================
  Coverage   95.23%   95.24%           
=======================================
  Files          34       34           
  Lines        7792     7799    +7     
=======================================
+ Hits         7421     7428    +7     
  Misses        371      371

Impacted Files	Coverage Δ
databricks/koalas/frame.py	`96.80% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5130b99...52b5539. Read the comment docs.

ueshin

LGTM.

ueshin · 2020-03-31T18:13:42Z

Thanks! merging.

…che() (#1381) Resolves #1373 Here, we have a `DataFrame` named `df` ```python >>> import pyspark >>> df = ks.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df dogs cats 0 0.2 0.3 1 0.0 0.6 2 0.6 0.0 3 0.2 0.1 ``` Set the StorageLevel to `MEMORY_ONLY`. ```python >>> with df.persist(pyspark.StorageLevel.MEMORY_ONLY) as cached_df: ... print(cached_df.count()) ... dogs 4 cats 4 Name: 0, dtype: int64 ``` Set the StorageLevel to `DISK_ONLY`. ```python >>> with df.persist(pyspark.StorageLevel.DISK_ONLY) as cached_df: ... print(cached_df.count()) ... dogs 4 cats 4 Name: 0, dtype: int64 ``` If a StorageLevel is not given, it uses `MEMORY_AND_DISK` by default. ```python >>> with df.persist() as cached_df: ... print(cached_df.count()) ... dogs 4 cats 4 Name: 0, dtype: int64 ```

Implements DataFrame.persist()

81da976

itholic changed the title ~~Implements DataFrame.persist()~~ Implements DataFrame.persist() & Adding test for DataFrame.cache() Mar 31, 2020

itholic added 3 commits March 31, 2020 16:20

Remove unused import

fff984d

Add unpersist to docs and fix related docstrings

234d58d

remove unpersist from docs

52b5539

HyukjinKwon changed the title ~~Implements DataFrame.persist() & Adding test for DataFrame.cache()~~ Implements DataFrame.persist() with additional tests for DataFrame.cache() Mar 31, 2020

ueshin approved these changes Mar 31, 2020

View reviewed changes

ueshin merged commit 1e3e093 into databricks:master Mar 31, 2020

itholic deleted the f_persist branch April 1, 2020 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements DataFrame.persist() with additional tests for DataFrame.cache() #1381

Implements DataFrame.persist() with additional tests for DataFrame.cache() #1381

itholic commented Mar 31, 2020

codecov-io commented Mar 31, 2020 •

edited

ueshin left a comment

ueshin commented Mar 31, 2020

Implements DataFrame.persist() with additional tests for DataFrame.cache() #1381

Implements DataFrame.persist() with additional tests for DataFrame.cache() #1381

Conversation

itholic commented Mar 31, 2020

codecov-io commented Mar 31, 2020 • edited

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Mar 31, 2020

codecov-io commented Mar 31, 2020 •

edited