create a timestamp column in a pandas dataframe #1

SophMC · 2016-05-02T22:00:36Z

@aaren
I am reading in a txt file with the format
year month day hour ws
and want to create an extra column which takes the first four columns and makes a timestamp out of them and makes a new column.
This is proving difficult for several reasons:

The hour column is in hundreds, i.e 600, 1200, 1800 and if I use read_csv to try and make a datetime object, it can do it with year, month and day, but not + hour.

date_spec = {'date_time': [0,1,2]} this works in wind below.
date_spec = {'date_time': [0,1,2,3]} this doesn't. It creates a plain object for date_time, and we want a datetime object.

wind = pd.read_csv('/home/sophie/projects/windspeed/data/61401BirMoghrein_allwinds.txt', sep=" ", names=column_names, parse_dates=date_spec, keep_date_col=True)

Trying to use datetime() to create a timestamp with year, month, day, hour doesn't work either, as it appears it doesn't take lists so:

d = datetime(2013, 12, 22, 11, 30, 59); print d works
2013-12-22 11:30:59

years = [2012, 2013, 2014]
months=[5, 4, 3]
days=[6, 5, 4]
e = datetime(years, months, days); print e doesn't work

As far as I can see you can only input for value at a time - it doesn't take lists.

I've also tried using to_datetime but I've no idea how to get that to work in pandas as it only takes strings and I can't seem to input the data as strings whatever I do.
These two following suggestions are in 0.18.1 pandas - so I just updated, but conda update pandas only took me up to 0.18.0 so they still don't work.
https://pandas-docs.github.io/pandas-docs-travis/timeseries.html

pd.to_datetime(df[['year', 'month', 'day']]) doesn't work, substituting df=wind

This other suggestion from the documentation (I copied exactly)
df = pd.DataFrame({'year': [2015, 2016], 'month': [2, 3],'day': [4, 5],'hour': [2, 3]})
pd.to_datetime(df) doesn't work

Maybe I can update to 0.18.1 some other way?
Any ideas most welcome. I'll continue the fight tomorrow.

The text was updated successfully, but these errors were encountered:

SophMC · 2016-05-03T11:39:29Z

Solution!
#specify the columns you want to group together. Can't include hour at this point as it is not in the right format.
date_spec = {'date_time': [0,1,2]}

#Read and parse date_time simultaneously.
#When you use keep_dat_col it keeps them as objects, not as the dtype you read them in as.
wind = pd.read_csv(datafile, sep=" ", names=column_names, parse_dates=date_spec, keep_date_col=True, index_col=False )

#Dealing with hour - going from 600, 1200 etc to 6,12, 18
wind["hour"]=(wind["hour"]/100).astype(int)

#combining date_time with hour, which is now in the correct format.
wind['date_time'] = pd.to_datetime(wind.date_time) + wind.hour.astype('timedelta64[h]')

print wind.date_time[0:5]

0 1984-03-01 06:00:00
1 1984-03-01 12:00:00
2 1984-03-01 18:00:00
3 1984-03-02 06:00:00
4 1984-03-02 12:00:00
Name: date_time, dtype: datetime64[ns]

The light bulb moment came from discovering the line wind.hour.astype('timedelta64[h]')

SophMC self-assigned this May 3, 2016

SophMC closed this as completed May 4, 2016

SophMC added a commit that referenced this issue Jul 25, 2016

Day 72 Naive Bayes and Udacity NLP #1

4e17461

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create a timestamp column in a pandas dataframe #1

create a timestamp column in a pandas dataframe #1

SophMC commented May 2, 2016 •

edited

Loading

SophMC commented May 3, 2016 •

edited

Loading

create a timestamp column in a pandas dataframe #1

create a timestamp column in a pandas dataframe #1

Comments

SophMC commented May 2, 2016 • edited Loading

SophMC commented May 3, 2016 • edited Loading

SophMC commented May 2, 2016 •

edited

Loading

SophMC commented May 3, 2016 •

edited

Loading