forked from modin-project/modin
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync fork #1
Merged
Merged
sync fork #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* printing aws file path * corrected pathname * public perf tests * remove s3 ls statements
#238) * Dropping git revision because it causes problems for some Ubuntu users * Removing from pandas also * lint
* Update README.md * Typo fix for README * Revert "Typo fix for README" This reverts commit dc2023f.
* Fixing issue iterating over groupby with multiple columns * formatting * performance improvement
…246) * modin.pandas.io.read_parquet to partition columns according to CPU cores, reading multiple columns at once using Ray * blank line removed from EOF * column splits recalculated according to @devin-petersohn review * Formatting for consistency and flake8
* Revert some changes in behavior for sum/prod * Fix default value for numeric_only * Fix mean in the context of the new sum changes * Add comments
* Adding advanced usage docs * Updating docs * Updating phrasing * Addressing comments
* Refactor dataframe * Fix bug * Fix lint
* Bump version 0.2.3 * Fix issue in init * Removing typing as a dependency
* Pin redis until Ray release * Moving requirement order * Add requirements.txt * Pin pytest version on travis * Add
* Jenkins perf builds for Master Commits * formatting change * code for adding commit ordering info * formatting * updated deps on Dockerfile * removed extra dependency from Dockerfile
* squeeze fixes * updated squeeze * fixed rebasing issue * updated squeeze * updating to fix travis fails * black and flake * Update test_dataframe.py * removed unncess comments * addressing comments * fixed formatting * changed an ndim parameter Co-Authored-By: adits31 <adityasheth@berkeley.edu>
* modin.pandas.io.read_parquet to partition columns according to CPU cores, reading multiple columns at once using Ray * blank line removed from EOF * column splits recalculated according to @devin-petersohn review * Formatting for consistency and flake8 * issue #204 Use Ray to parallelize `read_hdf` similar to the way `read_parquet` works * reformatted by black * merged from master
type(self) in each method
* Update documentation * Update other supported methods * Resolve comments * minor typo fix
* Adding read_table * Removing unneeded variable * Removing default * Fix python2 bug * Removing unneeded code * Add read_table to __init__ * Adding read_table to documentation
* Adding skeleton for a Dask runtime implementation * Lint * Revert extra newline * Fix error message * Fix typo
* fixed empty rows error in to_pandas * formatting and code style
* Fixing filter for read_csv args * Remove hardcode * Fix bug * Lint
* Update using_modin.rst The blog post link leads to a non-existent page. I suggested the post that was probably mentioned. * Update using_modin.rst
* Fix mode when axis=1 and remove a reindex * formatting
* Fix rfloordiv error * formatting
* Scenario that caused the bug: * `query` or a similar filter operation run * Some partitions completely removed from the dataframe * During `get_indices`, the filter optimization on BlockPartitions caused the empty partitions to not count toward the index computation * Fix: Use _partitions_cache, which is not modified when accessed.
* Fixed sort_index * Fixed formatting * Fixed when ascending is None * Update modin/data_management/query_compiler/pandas_query_compiler.py Co-Authored-By: williamma12 <12377941+williamma12@users.noreply.github.com> * Changed insert to keep old index explicitly
* Removed fixtures for checking equality Parameterized the different dataframes into a parameterize mark added comments and used parameterization for test_functions Moved test_empty_df to be alphabetical Grouped inter df operations Updated equals functions to one function now in utils.py for test_dataframe.py Alphabetized the tests Continued alphabetized the tests Changed quantiles and query to new testing structure Changed some fillna tests, median, mean, and max to use new structure Updated tests upto test_get Updated test_get Updated test_get_dummies and equality checker updated bool arguments parameterization' updated testing of aggregate functions and created name_contains for cleaner code Updated all and any tests Updated count test Updated cumulative min/max/sum/prod functions Added more apply tests updated test_diff class of functions updated a lot of functions Updated query and rank Updated more tests Updated test_sort_index Updated more tests and give equals a possible error 0.01 Finished test_dataframe.py Updated tests Fixed test_sum in groupby and changed all names of ray_ to modin_ changed ray_df to modin_df Rebased and updated test after clip was implemented Ran black on test_dataframe and test_groupby Fixed ray to modin errors updated test for dropna Fix test_dropna_subset Fix test_groupby errors from rebasing Resolve flake errors Fixed remaining flake8 errors Fixed more remaining flake8 errors Fixed upto but not including clip functions Fixed parts of clip and all of count Fixed tests upto but not including mean Fixed test_mean Removed numeric_only requirement for count Changed abs to use _validate_dtypes function cleaned up count Fixed median and reorganized full_axis_reduce functions Fixed numeric_only = None tests. Fixed numeric functions Added larger test data Fixed mode test Fixed test_prod wanting too much TypeErrors but still broken fixing test_clip Added None as an option to arguments Added none argument testing of axis for all and any some fixes for empty df and series testing Fixed sum test and removed print statement for sample Removed default value for min_count in sum Fixed sort_values Forgot to remove debugging print statements Fixed sort_index update inplace and more rigorous testing of nans fix plot tests formatting fixed tests for min and max fix Updated some tests Refactored tests to better test for errors updated test_mean fixed test_equals and made transpose more comprehensive rebased and reset groupby to master Update inter_df_math_helper updated remaining inter df operations Updated inter_df_operations tests Updated error testing Updated testing for inter_df_operations Update test_rank Changed parameterization to create dataframe within tests insteadd fixed test_copy Fixed tests for all and any fixed overeager find and replace misspelled modin Updated jenkins Updated test___bool__ fixed test_agg fixed test_aggregate Fixed tests all and any Updated equality testint Undo travis changes Skip test_apply for now until we are able to properly test UDFs Skip applymap for now execpt for numeric ones Testing applymap for numeric dataframes Update test for at Updated clip tests Updated clip tests Update cumsum test Updated test duplicate, empty_df, fillna_dataframe Fix jenkins Update jenkins Update jenkins Update fillna test functions Skip tests that otherwise default to pandas Updated test insert skip is_copy because defaulting to pandas Test transform only on numeric functions Update travis to run on python backend Skip rename nocopy update jenkins Reset dataframe and querycompiler to master Fixed merge conflicts in test_dataframe Updated test clip and squeeze Update tests to take out axis and integer none testing Updated testing apply_numeric Minor changes Update test_inplace" Fixed typo in dropna_inplace * Update test_merge and the all integer test dataframe * Update travis build syntax error * Update install-dependencies.sh * Update how series results are tested for equality * Update test_eval_df_use_case syntax error * Update new tests to new test suite format * Removing the running of mixed dtype dataframes * Update equality testing * Fix set_axis assertion error * Fix set_index assertion error * Update testing dataframes to be positive * Fixed index checking in df_equals * Lint * Expand tests and fix bugs * Fix issue with travis * Fixing nan failure * Update pytest * Run only test_dataframe.py in parallel on travis * Run only test_dataframe.py in parallel on jenkins * Revert version change
* Checking for proper types (Series and SeriesView) on insert * Lint
…ssed in (#363) * Filtering default exclude values based on the include passed in * Fix error checking
* Fixing partitioning issue when doing a reindex/concat * Adds a new parameter to apply for axis_partitions * Allows partitioning to be either recomputed and rebalanced or maintained between runs * Adding parameter to Python and Dask backends
#365) * This allows the partition to return a valid DataFrame and not an Exception * In the future, additional post-processing may be needed to throw the pandas error that gets thrown if no data of that type exists.
* Adding a condition for selective apply operations that will simply return an empty 2D numpy array for partitions if there are no partitions. * This is added to both `apply_func_to_select_indices` and `apply_func_to_select_indices_along_full_axis`
Adjust logo size and text size.
* Adding groupby columns and index name when necessary * Lint * Making sure we only change columns when we group by index values
* Making drop faster for drop operations * Drops the same way that pandas Index gets dropped with duplicates * Also clean up some inefficient and duplicated `dropna` code * Lint * Reverting dropna cleanup to fix in future PR
* Resolves #388 * For now, we have a class that overrides all function calls and converts the arguments to pandas. * The class created is a metaclass with the sole purpose of converting the arguments to pandas compatible args. * This approach allows us to avoid copying a bunch of code and hard-coding the entire module.
* Resolves #145 * Performs a `reindex` on the `dtypes_cache` if there is already something there. * Adds a `dtype` property to `PandasQueryCompilerView` objects to be handled differently than parent.
* Resolves #394 * Aliasing isnull to isna, which was already implemented.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What do these changes do?
Related issue number
git diff upstream/master -u -- "*.py" | flake8 --diff
black --check modin/