Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a timestamp column in a pandas dataframe #1

Closed
SophMC opened this issue May 2, 2016 · 1 comment
Closed

create a timestamp column in a pandas dataframe #1

SophMC opened this issue May 2, 2016 · 1 comment
Assignees

Comments

@SophMC
Copy link
Owner

SophMC commented May 2, 2016

@aaren
I am reading in a txt file with the format
year month day hour ws
and want to create an extra column which takes the first four columns and makes a timestamp out of them and makes a new column.
This is proving difficult for several reasons:

  • The hour column is in hundreds, i.e 600, 1200, 1800 and if I use read_csv to try and make a datetime object, it can do it with year, month and day, but not + hour.

date_spec = {'date_time': [0,1,2]} this works in wind below.
date_spec = {'date_time': [0,1,2,3]} this doesn't. It creates a plain object for date_time, and we want a datetime object.

wind = pd.read_csv('/home/sophie/projects/windspeed/data/61401BirMoghrein_allwinds.txt', sep=" ", names=column_names, parse_dates=date_spec, keep_date_col=True)

  • Trying to use datetime() to create a timestamp with year, month, day, hour doesn't work either, as it appears it doesn't take lists so:

d = datetime(2013, 12, 22, 11, 30, 59); print d works
2013-12-22 11:30:59

years = [2012, 2013, 2014]
months=[5, 4, 3]
days=[6, 5, 4]
e = datetime(years, months, days); print e doesn't work

As far as I can see you can only input for value at a time - it doesn't take lists.

  • I've also tried using to_datetime but I've no idea how to get that to work in pandas as it only takes strings and I can't seem to input the data as strings whatever I do.
  • These two following suggestions are in 0.18.1 pandas - so I just updated, but conda update pandas only took me up to 0.18.0 so they still don't work.
    https://pandas-docs.github.io/pandas-docs-travis/timeseries.html

pd.to_datetime(df[['year', 'month', 'day']]) doesn't work, substituting df=wind

This other suggestion from the documentation (I copied exactly)
df = pd.DataFrame({'year': [2015, 2016], 'month': [2, 3],'day': [4, 5],'hour': [2, 3]})
pd.to_datetime(df) doesn't work

Maybe I can update to 0.18.1 some other way?
Any ideas most welcome. I'll continue the fight tomorrow.

@SophMC SophMC self-assigned this May 3, 2016
@SophMC
Copy link
Owner Author

SophMC commented May 3, 2016

Solution!
#specify the columns you want to group together. Can't include hour at this point as it is not in the right format.
date_spec = {'date_time': [0,1,2]}

#Read and parse date_time simultaneously.
#When you use keep_dat_col it keeps them as objects, not as the dtype you read them in as.
wind = pd.read_csv(datafile, sep=" ", names=column_names, parse_dates=date_spec, keep_date_col=True, index_col=False )

#Dealing with hour - going from 600, 1200 etc to 6,12, 18
wind["hour"]=(wind["hour"]/100).astype(int)

#combining date_time with hour, which is now in the correct format.
wind['date_time'] = pd.to_datetime(wind.date_time) + wind.hour.astype('timedelta64[h]')

print wind.date_time[0:5]

0 1984-03-01 06:00:00
1 1984-03-01 12:00:00
2 1984-03-01 18:00:00
3 1984-03-02 06:00:00
4 1984-03-02 12:00:00
Name: date_time, dtype: datetime64[ns]

The light bulb moment came from discovering the line wind.hour.astype('timedelta64[h]')

@SophMC SophMC closed this as completed May 4, 2016
SophMC added a commit that referenced this issue Jul 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant