Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync fork #1

Merged
merged 99 commits into from
Jan 8, 2019
Merged

sync fork #1

merged 99 commits into from
Jan 8, 2019

Conversation

austin-chou
Copy link
Owner

What do these changes do?

Related issue number

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • passes black --check modin/

osalpekar and others added 30 commits October 27, 2018 17:24
* printing aws file path

* corrected pathname

* public perf tests

* remove s3 ls statements
#238)

* Dropping git revision because it causes problems for some Ubuntu users

* Removing from pandas also

* lint
* Update README.md

* Typo fix for README

* Revert "Typo fix for README"

This reverts commit dc2023f.
* Fixing issue iterating over groupby with multiple columns

* formatting

* performance improvement
…246)

* modin.pandas.io.read_parquet to partition columns according to CPU cores, reading multiple columns at once using Ray

* blank line removed from EOF

* column splits recalculated according to @devin-petersohn review

* Formatting for consistency and flake8
* Revert some changes in behavior for sum/prod

* Fix default value for numeric_only

* Fix mean in the context of the new sum changes

* Add comments
* Adding advanced usage docs

* Updating docs

* Updating phrasing

* Addressing comments
* Refactor dataframe

* Fix bug

* Fix lint
* Bump version 0.2.3

* Fix issue in init

* Removing typing as a dependency
* Pin redis until Ray release

* Moving requirement order

* Add requirements.txt

* Pin pytest version on travis

* Add
* Jenkins perf builds for Master Commits

* formatting change

* code for adding commit ordering info

* formatting

* updated deps on Dockerfile

* removed extra dependency from Dockerfile
* squeeze fixes

* updated squeeze

* fixed rebasing issue

* updated squeeze

* updating to fix travis fails

* black and flake

* Update test_dataframe.py

* removed unncess comments

* addressing comments

* fixed formatting

* changed an ndim parameter

Co-Authored-By: adits31 <adityasheth@berkeley.edu>
* modin.pandas.io.read_parquet to partition columns according to CPU cores, reading multiple columns at once using Ray

* blank line removed from EOF

* column splits recalculated according to @devin-petersohn review

* Formatting for consistency and flake8

* issue #204
Use Ray to parallelize `read_hdf`
similar to the way `read_parquet` works

* reformatted by black

* merged from master
* Update documentation

* Update other supported methods

* Resolve comments

* minor typo fix
* Adding read_table

* Removing unneeded variable

* Removing default

* Fix python2 bug

* Removing unneeded code

* Add read_table to __init__

* Adding read_table to documentation
* Adding skeleton for a Dask runtime implementation

* Lint

* Revert extra newline

* Fix error message

* Fix typo
* fixed empty rows error in to_pandas

* formatting and code style
* Fixing filter for read_csv args

* Remove hardcode

* Fix bug

* Lint
* Update using_modin.rst

The blog post link leads to a non-existent page. I suggested the post that was probably mentioned.

* Update using_modin.rst
williamma12 and others added 29 commits December 31, 2018 13:17
* Fix mode when axis=1 and remove a reindex

* formatting
* Fix rfloordiv error

* formatting
* Scenario that caused the bug:
  * `query` or a similar filter operation run
  * Some partitions completely removed from the dataframe
  * During `get_indices`, the filter optimization on BlockPartitions
    caused the empty partitions to not count toward the index
    computation
* Fix: Use _partitions_cache, which is not modified when accessed.
* Fixed sort_index

* Fixed formatting

* Fixed when ascending is None

* Update modin/data_management/query_compiler/pandas_query_compiler.py

Co-Authored-By: williamma12 <12377941+williamma12@users.noreply.github.com>

* Changed insert to keep old index explicitly
)

* Change the way we test whether something is callable in SeriesView

* This will make sure that all things that are callable are evaluated

* Fix bug

* Revert test change
* Removed fixtures for checking equality

Parameterized the different dataframes into a parameterize mark

added comments and used parameterization for test_functions

Moved test_empty_df to be alphabetical

Grouped inter df operations

Updated equals functions to one function now in utils.py for test_dataframe.py

Alphabetized the tests

Continued alphabetized the tests

Changed quantiles and query to new testing structure

Changed some fillna tests, median, mean, and max to use new structure

Updated tests upto test_get

Updated test_get

Updated test_get_dummies and equality checker

updated bool arguments parameterization'

updated testing of aggregate functions and created name_contains for cleaner code

Updated all and any tests

Updated count test

Updated cumulative min/max/sum/prod functions

Added more apply tests

updated test_diff class of functions

updated a lot of functions

Updated query and rank

Updated more tests

Updated test_sort_index

Updated more tests and give equals a possible error 0.01

Finished test_dataframe.py

Updated tests

Fixed test_sum in groupby and changed all names of ray_ to modin_

changed ray_df to modin_df

Rebased and updated test after clip was implemented

Ran black on test_dataframe and test_groupby

Fixed ray to modin errors

updated test for dropna

Fix test_dropna_subset

Fix test_groupby errors from rebasing

Resolve flake errors

Fixed remaining flake8 errors

Fixed more remaining flake8 errors

Fixed upto but not including clip functions

Fixed parts of clip and all of count

Fixed tests upto but not including mean

Fixed test_mean

Removed numeric_only requirement for count

Changed abs to use _validate_dtypes function

cleaned up count

Fixed median and reorganized full_axis_reduce functions

Fixed numeric_only = None tests.

Fixed numeric functions

Added larger test data

Fixed mode test

Fixed test_prod wanting too much TypeErrors but still broken

fixing test_clip

Added None as an option to arguments

Added none argument testing of axis for all and any

some fixes for empty df and series testing

Fixed sum test and removed print statement for sample

Removed default value for min_count in sum

Fixed sort_values

Forgot to remove debugging print statements

Fixed sort_index update inplace and more rigorous testing of nans

fix plot tests

formatting

fixed tests for min and max

fix

Updated some tests

Refactored tests to better test for errors

updated test_mean

fixed test_equals and made transpose more comprehensive

rebased and reset groupby to master

Update inter_df_math_helper

updated remaining inter df operations

Updated inter_df_operations tests

Updated error testing

Updated testing for inter_df_operations

Update test_rank

Changed parameterization to create dataframe within tests insteadd

fixed test_copy

Fixed tests for all and any

fixed overeager find and replace

misspelled modin

Updated jenkins

Updated test___bool__

fixed test_agg

fixed test_aggregate

Fixed tests all and any

Updated equality testint

Undo travis changes

Skip test_apply for now until we are able to properly test UDFs

Skip applymap for now execpt for numeric ones

Testing applymap for numeric dataframes

Update test for at

Updated clip tests

Updated clip tests

Update cumsum test

Updated test duplicate, empty_df, fillna_dataframe

Fix jenkins

Update jenkins

Update jenkins

Update fillna test functions

Skip tests that otherwise default to pandas

Updated test insert

skip is_copy because defaulting to pandas

Test transform only on numeric functions

Update travis to run on python backend

Skip rename nocopy

update jenkins

Reset dataframe and querycompiler to master

Fixed merge conflicts in test_dataframe

Updated test clip and squeeze

Update tests to take out axis and integer none testing

Updated testing apply_numeric

Minor changes

Update test_inplace"

Fixed typo in dropna_inplace

* Update test_merge and the all integer test dataframe

* Update travis build syntax error

* Update install-dependencies.sh

* Update how series results are tested for equality

* Update test_eval_df_use_case syntax error

* Update new tests to new test suite format

* Removing the running of mixed dtype dataframes

* Update equality testing

* Fix set_axis assertion error

* Fix set_index assertion error

* Update testing dataframes to be positive

* Fixed index checking in df_equals

* Lint

* Expand tests and fix bugs

* Fix issue with travis

* Fixing nan failure

* Update pytest

* Run only test_dataframe.py in parallel on travis

* Run only test_dataframe.py in parallel on jenkins

* Revert version change
* Checking for proper types (Series and SeriesView) on insert

* Lint
…ssed in (#363)

* Filtering default exclude values based on the include passed in

* Fix error checking
* Fixing partitioning issue when doing a reindex/concat

* Adds a new parameter to apply for axis_partitions
* Allows partitioning to be either recomputed and rebalanced or
  maintained between runs

* Adding parameter to Python and Dask backends
#365)

* This allows the partition to return a valid DataFrame and not an
  Exception
* In the future, additional post-processing may be needed to throw the
  pandas error that gets thrown if no data of that type exists.
* Adding a condition for selective apply operations that will simply
  return an empty 2D numpy array for partitions if there are no
  partitions.
* This is added to both `apply_func_to_select_indices` and
  `apply_func_to_select_indices_along_full_axis`
Adjust logo size and text size.
* Adding groupby columns and index name when necessary

* Lint

* Making sure we only change columns when we group by index values
* Making drop faster for drop operations

* Drops the same way that pandas Index gets dropped with duplicates
* Also clean up some inefficient and duplicated `dropna` code

* Lint

* Reverting dropna cleanup to fix in future PR
#383)

* Checking for type before we check the length to avoid spurious errors.

* Resolves #382
* Prevents errors from DataFrames with 2 rows

* Lint
#386)

* Converting PandasQueryCompiler.getitem_array to accept numeric indices

* Resolves #384
* Make changes to the functions that use getitem_array to now use
  indices instead

* Converting some range to RangeIndex to get indexing

* Lint
* Resolves #388
* For now, we have a class that overrides all function calls and
  converts the arguments to pandas.
* The class created is a metaclass with the sole purpose of converting
  the arguments to pandas compatible args.
* This approach allows us to avoid copying a bunch of code and
  hard-coding the entire module.
…it (#393)

* Resolves #392
* This creates a view on all objects, even PandasQueryCompilerView
  objects, when using an Indexer (e.g. LocIndexer)
* Allows `df.iloc[...].iloc[...]` to be supported now
* Resolves #145
* Performs a `reindex` on the `dtypes_cache` if there is already
  something there.
* Adds a `dtype` property to `PandasQueryCompilerView` objects to be
  handled differently than parent.
* Resolves #394
* Aliasing isnull to isna, which was already implemented.
@austin-chou austin-chou merged commit 0c8eddc into austin-chou:master Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants