Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io.read_frame to optionally convert df.index to pd.datetime #110

Open
jabrennem opened this issue Feb 10, 2019 · 11 comments
Open

io.read_frame to optionally convert df.index to pd.datetime #110

jabrennem opened this issue Feb 10, 2019 · 11 comments

Comments

@jabrennem
Copy link

I am enjoying django-pandas. When I define a query set and use qs.to_dataframe(index='custom_int_index_col'), my index column is always converted to a datetime object.

I could be missing some greater context, but can we wrap the line below located in io.read_frame under an if statement using a new boolean argument or something similar and have QuerySet.to_dataframe default to False? That way, the index can be set to an integer column and it won't be converted to datetime.

if {new_boolean}:
df.index = pd.to_datetime(df.index, errors="ignore")

@vtoupet
Copy link
Contributor

vtoupet commented Feb 27, 2019

I totally agree. I've created a PR for this (#111). Would it fit your needs?

@jabrennem
Copy link
Author

I looked at the PR. It's almost perfect for what I need. I left a comment on django_pandas/managers.py. Thank you so much!!

@vtoupet
Copy link
Contributor

vtoupet commented Mar 22, 2019

@chrisdev Are you ok with this PR? If yes, could you accept it and merge?

Thanks

@jabrennem
Copy link
Author

I am ok with the PR. I don't have write access to the repository to merge it.

@chrisdev
Copy link
Owner

@jabrennem the PR is not passing Travis. Was waiting on you to address failures?

@jabrennem
Copy link
Author

@chrisdev sorry, I didn't see that he included your name. I just opened the issue to expand on the use case.

@vtoupet
Copy link
Contributor

vtoupet commented Mar 28, 2019

@jabrennem the PR is not passing Travis. Was waiting on you to address failures?

@chrisdev Sorry I did not see that. I've just fixed the tests. It is passing now.

@jabrennem
Copy link
Author

@chrisdev I see it is in master. When will this feature be available via pip?

@MikeSandford
Copy link

MikeSandford commented Apr 12, 2019

Hi all, just getting started with this project to try and do a bit of DS for a side project. Very happy that it exists as it's so much cleaner than doing all the manipulations yourself. But I too have run into this problem and been confused! I've got a queryset that I want (or at least think I want) to index based on the object's ID but they're all coming out as datetimes in seconds after the epoch in 1970. Took me a while to figure out what was going on at first because I was also using dataframe.head wrong (no parens) but once I figured that out it became obvious.

A thought I had regarding a hopefully very general solution is that you could allow people to pass in the datatype for the index and default it to datetime.

my_frame = my_queryset.to_dataframe(index="my_index", index_dtype=int)

and then the code inside would need to use the Pandas astype in order to make that work, right?
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.astype.html#pandas.Series.astype

df.index = df.index.astype(index_dtype)

Does that seem right? I'm willing to make a PR but as I'm brand-new to this project even existing I don't know if this would be a welcome addition or not. Thanks!

EDIT: I just did this with the copy on my local machine and it's working fine; happy to make a PR if others think it would be useful.

@chrisdev
Copy link
Owner

chrisdev commented Apr 12, 2019

@MikeSandford I just added @vtoupet PR to PyPi (sorry for the delay @vtoupet !) which deals with specifically with date time indexes. I think your approach is cool. But it would be nice if we had sensible defaults. I mean I'm hoping that in most cases the user would not have to specify the index type. But of course when they need to do so its nice to give them the option.

@MikeSandford
Copy link

@chrisdev agreed that sensible defaults are necessary! Here is what I did:

def read_frame(qs, fieldnames=(), index_col=None, coerce_float=False,
               verbose=True, index_dtype=None):

and then further down:

if index_dtype is None:
    # this is where fancy "guess the datatype" heuristics would go, so we can do whatever
    # we can automagically if the user doesn't specify "I want this datatype"
    # for now we just assume the default behavior of datetime
    index_dtype = datetime
df.index = df.index.astype(index_dtype)

I also had to from datetime import datetime up at the top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants