You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to how pandas has select_dtypes function, there should be a function on DataTable that allows users to select_logical_types or select_ltypes.
User can pass in a list of Logical Types, a list of strings (class name or camelCase type string) of Logical Types, or 1 specific Logical Type objects or a list of Logical Type objects
In the future, we can look into adding exclude to select_ltypes
df=pd.read_csv(...)
dt=DataTable(df, name='data')
# support single stringdt.select_ltypes('bool')
# support type name as stringdt.select_ltypes('zip_code')
# support list of type_string of Logical Typedt.select_ltypes(['categorical', 'natural_language'])
# support list of string of class namedt.select_ltypes(['Categorical', 'NaturalLanguage'])
fromdata_table.logicalimportCategorical, NaturalLanguage# support actual Logical Types objectsdt.select_ltypes(Categorical)
# support actual Logical Types list of objectsdt.select_ltypes([Categorical, NaturalLanguage])
The text was updated successfully, but these errors were encountered:
Is the selection happening from the original dataframe always or whatever the current state of the DataTable is?
My guess would be we always apply to the DataTable's columns, so if we've already changed some aspects of the logical or semantic types, those changes get propagated.
Can input lists be of mixed types - so something like ['boolean', Categorical]?
Do we want to be case blind in logical type strings?
Just thinking that we do this with primitives in dfs
Should the underlying dataframe also change when we select certain DataTable columns?
I might expect the underlying df to never change, but I haven't seen that explicitly stated anywhere
Do we want any warnings if no columns fall under any of the ltypes specified (empty DataTable) or if all of them apply (no change from the original)?
Should this remove the index and time_index columns even if their ltypes aren't included?
The selection is happening ont he current state of the DataTable. We would always apply it to the DataColumns on the DataTable.
Yes, the input lists can be mixed types.
Yes, let's be case blind (upper/lowercase). If we are doing that with primitives and DFS, let's do that in this case.
The underlying dataframe should not change. This is a helper function to return DataTable based on the inputted Logical Types.
If no logical types fall under the specified, let's return an empty DataTable with an empty dataframe.
If all logical types fall under the specified, return the full DataTable with all columns.
Yes, for now, we can remove the index and time_index if the logical types are not included. We can revisit this behavior in the future. But for now let's be specific.
select_ltypes
.The text was updated successfully, but these errors were encountered: