Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert strings to category/factor/enum #4054

Closed
david-cortes opened this issue Jul 7, 2022 · 4 comments
Closed

Cannot convert strings to category/factor/enum #4054

david-cortes opened this issue Jul 7, 2022 · 4 comments
Labels

Comments

@david-cortes
Copy link
Contributor

I want to have a query result with categorical columns being encoded as pandas 'Category' or R 'factor' type for better memory efficiency. Trying to cast a string (python 'object' or R 'character' types) in a query does not produce the desired result:

import pandas as pd
df = pd.DataFrame({"col1" : ["a","b","c"]})
import duckdb
duckdb.query("select col1::enum from df").to_df()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-81ef41169f6a> in <module>
      2 df = pd.DataFrame({"col1" : ["a","b","c"]})
      3 import duckdb
----> 4 duckdb.query("select col1::enum from df").to_df()

RuntimeError: Catalog Error: Type with name enum does not exist!
Did you mean "null"?

I would have to manually create a type and assign it there, which is extremely inconvenient and pollutes my duckdb namespace:

import duckdb
duckdb.query("create type my_col_type as enum('a', 'b', 'c');")
import pandas as pd
df = pd.DataFrame({"col1" : ["a","b","c"]})
duckdb.query("select col1::my_col_type from df").to_df()
@Alex-Monahan
Copy link
Contributor

Today, this behavior is expected, but it is on the roadmap to enable the syntax you are interested in!

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jul 30, 2023
@david-cortes
Copy link
Contributor Author

Still relevant.

@github-actions github-actions bot removed the stale label Jul 31, 2023
@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Oct 30, 2023
@Mause Mause converted this issue into discussion #9509 Oct 30, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

2 participants