Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

PokhodenkoSA
Copy link
Contributor

@PokhodenkoSA PokhodenkoSA commented Apr 24, 2020

In this PR:

  • Added support for Pandas types: CategoricalDtype, Categorical and Series(dtype='category')
  • Added support for categorical in read_csv(). Eliminating of astype() for categorical columns is deferred to future PRs.

Also introduced:

  • Rewrite TuplifyArgs (sdc/datatypes/common/rewriteutils.py) which replaces arguments provided as lists with the same data represented as tuple. It helps get types of arguments in compile time.
    This rewrite is reusable. Categorical types use it widely for inferring categories in compile time.
  • Improved RewriteReadCsv with the same approach as TuplifyArgs. It is possible to extend TuplifyArgs with map support and reuse in RewriteReadCsv but it is deferred for the future.
  • Added sdc.types (sdc/sdc/types.py) which is analogy to numba.types. It is a collection of SDC types.
  • Used approach when SDC types like CategoricalDtype and Categorical has function repr(). This function returns string which could be used in eval() to recreate this objects. This approach simplifies objmode usage. objmode requires from user to provide string for eval() which will create Numba type. objmode use eval() with numba.types available so it also necessary to extend numba.types with SDC types to use this approach.

I have rearranged commits to make it easy to review.

@PokhodenkoSA
Copy link
Contributor Author

PokhodenkoSA commented May 15, 2020

Fixed remarks:

  • Move imports to one place sdc.__init__
  • Added docs with limitations
  • Added small example for overloads for categorical types

@AlexanderKalistratov AlexanderKalistratov merged commit e061965 into IntelPython:master May 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants