Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.dtypes #52

Closed
AbdealiLoKo opened this issue Mar 30, 2019 · 3 comments · Fixed by #148
Closed

DataFrame.dtypes #52

AbdealiLoKo opened this issue Mar 30, 2019 · 3 comments · Fixed by #148

Comments

@AbdealiLoKo
Copy link
Contributor

Raising this to discuss what the handling of the .dtypes should be.
Current implementations:

  • Pandas: Returns a pandas Series with index of fieldnames and values as dtypes
  • Spark: Returns a list of tuples with (fieldname, dtype)
  • Koala: Same as Spark

Other Differences:

  • Whether to show indexes in .dtypes:
    • Pandas: Shows only columns and not indexes
    • Spark: Shows all columns, no index concept
    • Koala: Shows columns and indexes
  • Type values:
    • Pandas: Shows strings, arrays, map, etc as object
    • Spark: Shows strings, arrays, map, etc in appropriate simpleString notation
    • Koala: Same as spark
  • Dtype notation:
    • Pandas: Shows numpy type for the dtype
    • Spark: Shows simpleString notation
    • Koala: Same as spark
@rxin
Copy link
Contributor

rxin commented Apr 18, 2019

It sort of depends on what users use dtypes for. If they are using it to programmatically get the column names and data types, then it'd make sense to be as much the same as Pandas as possible. If they are just using that to inspect the dataset's schema, then the existing one should be fine.

@AbdealiLoKo
Copy link
Contributor Author

When you say inspect do you mean if they manually inspect it?
Wouldn't manually inspection be OK if it's pandas behaviour too?

@thunterdb
Copy link
Contributor

Unless there are objections, I think that the behaviour for dtypes should be to return the same as pandas. Spark users will have schema if they want to have the underlying spark types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants