DataFrame.dtypes #52

AbdealiLoKo · 2019-03-30T05:04:44Z

Raising this to discuss what the handling of the .dtypes should be.
Current implementations:

Pandas: Returns a pandas Series with index of fieldnames and values as dtypes
Spark: Returns a list of tuples with (fieldname, dtype)
Koala: Same as Spark

Other Differences:

Whether to show indexes in .dtypes:
- Pandas: Shows only columns and not indexes
- Spark: Shows all columns, no index concept
- Koala: Shows columns and indexes
Type values:
- Pandas: Shows strings, arrays, map, etc as object
- Spark: Shows strings, arrays, map, etc in appropriate simpleString notation
- Koala: Same as spark
Dtype notation:
- Pandas: Shows numpy type for the dtype
- Spark: Shows simpleString notation
- Koala: Same as spark

The text was updated successfully, but these errors were encountered:

rxin · 2019-04-18T06:16:03Z

It sort of depends on what users use dtypes for. If they are using it to programmatically get the column names and data types, then it'd make sense to be as much the same as Pandas as possible. If they are just using that to inspect the dataset's schema, then the existing one should be fine.

AbdealiLoKo · 2019-04-18T06:53:56Z

When you say inspect do you mean if they manually inspect it?
Wouldn't manually inspection be OK if it's pandas behaviour too?

thunterdb · 2019-04-23T11:44:36Z

Unless there are objections, I think that the behaviour for dtypes should be to return the same as pandas. Spark users will have schema if they want to have the underlying spark types.

thunterdb mentioned this issue Apr 8, 2019

Document the functions that are different between Pandas and Spark #62

Closed

ueshin mentioned this issue Apr 23, 2019

Add dtypes to DataFrame. #148

Merged

thunterdb closed this as completed in #148 Apr 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.dtypes #52

DataFrame.dtypes #52

AbdealiLoKo commented Mar 30, 2019

rxin commented Apr 18, 2019

AbdealiLoKo commented Apr 18, 2019

thunterdb commented Apr 23, 2019

DataFrame.dtypes #52

DataFrame.dtypes #52

Comments

AbdealiLoKo commented Mar 30, 2019

rxin commented Apr 18, 2019

AbdealiLoKo commented Apr 18, 2019

thunterdb commented Apr 23, 2019