## Time-based Data

This lesson, we'll be working with some of the ways that Python/Pandas can manipulate data based upon a time index. But, like everything we've done, it doesn't always start out the way we want it.

In [2]:
import pandas as pd
import numpy as np

The next two links are the data, and a README file that describes the data format. To keep it somewhat close-to-home, the data contained in the first link is from Durham, NC.

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2018/CRNS0101-05-2018-NC_Durham_11_W.txt

Clicking on that link, you'll see that there is a bunch of columns, but no headers. It's divided into fixed-width, but not with commas, or other single characters. Let's see what the pandas default does with this kind of text:

In [25]:
pd.read_csv(r'./CRNS0101-05-2018-NC_Durham_11_W.txt').head()

Unnamed: 0,03758 20180101 0005 20171231 1905 2 -79.09 35.97 -4.3 0.0 0 0 -3.8 C 0 22 0 0.265 4.4 1217 0 1.76 0
0,03758 20180101 0010 20171231 1910 2 -79....
1,03758 20180101 0015 20171231 1915 2 -79....
2,03758 20180101 0020 20171231 1920 2 -79....
3,03758 20180101 0025 20171231 1925 2 -79....
4,03758 20180101 0030 20171231 1930 2 -79....


That's not all that useful. Let's see what the README has to say about it.

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/README.txt

If you scroll down to section 5, you'll see a bit where it describes the 

In [3]:
headers = pd.read_csv(r'./weather_headers.csv',header=None,squeeze=True)

In [4]:
headers

0                  WBANNO
1                UTC_DATE
2                UTC_TIME
3                LST_DATE
4                LST_TIME
5                  CRX_VN
6               LONGITUDE
7                LATITUDE
8         AIR_TEMPERATURE
9           PRECIPITATION
10        SOLAR_RADIATION
11                SR_FLAG
12    SURFACE_TEMPERATURE
13                ST_TYPE
14                ST_FLAG
15      RELATIVE_HUMIDITY
16                RH_FLAG
17        SOIL_MOISTURE_5
18     SOIL_TEMPERATURE_5
19                WETNESS
20               WET_FLAG
21               WIND_1_5
22              WIND_FLAG
Name: 0, dtype: object

In [None]:
# pd.read_csv?

In [5]:
noaa_data = pd.read_csv(r'./CRNS0101-05-2018-NC_Durham_11_W.txt',delimiter='\s+',header=None,names=headers.values,parse_dates=[['LST_DATE','LST_TIME']])

In [6]:
noaa_data.head()

Unnamed: 0,LST_DATE_LST_TIME,WBANNO,UTC_DATE,UTC_TIME,CRX_VN,LONGITUDE,LATITUDE,AIR_TEMPERATURE,PRECIPITATION,SOLAR_RADIATION,...,ST_TYPE,ST_FLAG,RELATIVE_HUMIDITY,RH_FLAG,SOIL_MOISTURE_5,SOIL_TEMPERATURE_5,WETNESS,WET_FLAG,WIND_1_5,WIND_FLAG
0,2017-12-31 19:05:00,3758,20180101,5,2,-79.09,35.97,-4.3,0.0,0,...,C,0,22,0,0.265,4.4,1217,0,1.76,0
1,2017-12-31 19:10:00,3758,20180101,10,2,-79.09,35.97,-4.3,0.0,0,...,C,0,22,0,0.265,4.4,1217,0,1.51,0
2,2017-12-31 19:15:00,3758,20180101,15,2,-79.09,35.97,-4.3,0.0,0,...,C,0,22,0,0.265,4.3,1211,0,1.56,0
3,2017-12-31 19:20:00,3758,20180101,20,2,-79.09,35.97,-4.3,0.0,0,...,C,0,22,0,0.265,4.3,1217,0,2.07,0
4,2017-12-31 19:25:00,3758,20180101,25,2,-79.09,35.97,-4.3,0.0,0,...,C,0,22,0,0.265,4.3,1217,0,1.86,0


In [7]:
noaa_data.set_index('LST_DATE_LST_TIME',inplace=True)

In [8]:
noaa_data.head()

Unnamed: 0_level_0,WBANNO,UTC_DATE,UTC_TIME,CRX_VN,LONGITUDE,LATITUDE,AIR_TEMPERATURE,PRECIPITATION,SOLAR_RADIATION,SR_FLAG,...,ST_TYPE,ST_FLAG,RELATIVE_HUMIDITY,RH_FLAG,SOIL_MOISTURE_5,SOIL_TEMPERATURE_5,WETNESS,WET_FLAG,WIND_1_5,WIND_FLAG
LST_DATE_LST_TIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-12-31 19:05:00,3758,20180101,5,2,-79.09,35.97,-4.3,0.0,0,0,...,C,0,22,0,0.265,4.4,1217,0,1.76,0
2017-12-31 19:10:00,3758,20180101,10,2,-79.09,35.97,-4.3,0.0,0,0,...,C,0,22,0,0.265,4.4,1217,0,1.51,0
2017-12-31 19:15:00,3758,20180101,15,2,-79.09,35.97,-4.3,0.0,0,0,...,C,0,22,0,0.265,4.3,1211,0,1.56,0
2017-12-31 19:20:00,3758,20180101,20,2,-79.09,35.97,-4.3,0.0,0,0,...,C,0,22,0,0.265,4.3,1217,0,2.07,0
2017-12-31 19:25:00,3758,20180101,25,2,-79.09,35.97,-4.3,0.0,0,0,...,C,0,22,0,0.265,4.3,1217,0,1.86,0


In [9]:
noaa_data.loc[:,'AIR_TEMPERATURE'].groupby(pd.Grouper(freq='W')).mean()

LST_DATE_LST_TIME
2017-12-31     -4.528814
2018-01-07     -7.237847
2018-01-14      6.990625
2018-01-21      0.747569
2018-01-28      8.395089
2018-02-04      2.817708
2018-02-11      8.598810
2018-02-18     11.144494
2018-02-25     16.084524
2018-03-04      8.591121
2018-03-11      4.532242
2018-03-18      6.572669
2018-03-25      5.911409
2018-04-01     12.809970
2018-04-08     12.107837
2018-04-15     14.936558
2018-04-22     12.446776
2018-04-29    -44.525942
2018-05-06     19.035268
2018-05-13     20.755060
2018-05-20     22.814236
2018-05-27     22.898760
2018-06-03   -214.974206
2018-06-10     22.816915
2018-06-17     23.292510
2018-06-24     26.877282
2018-07-01     24.928472
2018-07-08     24.993353
2018-07-15     24.147272
2018-07-22     24.144544
2018-07-29     24.514484
2018-08-05     19.234970
2018-08-12     26.266321
Freq: W-SUN, Name: AIR_TEMPERATURE, dtype: float64

In [10]:
noaa_data[(noaa_data.index >= '2018-05-27') & (noaa_data.index < '2018-06-03')].loc[:,'AIR_TEMPERATURE']

LST_DATE_LST_TIME
2018-05-27 00:00:00    21.6
2018-05-27 00:05:00    21.5
2018-05-27 00:10:00    21.4
2018-05-27 00:15:00    21.5
2018-05-27 00:20:00    21.5
2018-05-27 00:25:00    21.5
2018-05-27 00:30:00    21.5
2018-05-27 00:35:00    21.4
2018-05-27 00:40:00    21.5
2018-05-27 00:45:00    21.6
2018-05-27 00:50:00    21.6
2018-05-27 00:55:00    21.6
2018-05-27 01:00:00    21.6
2018-05-27 01:05:00    21.6
2018-05-27 01:10:00    21.7
2018-05-27 01:15:00    21.7
2018-05-27 01:20:00    21.6
2018-05-27 01:25:00    21.6
2018-05-27 01:30:00    21.6
2018-05-27 01:35:00    21.5
2018-05-27 01:40:00    21.5
2018-05-27 01:45:00    21.4
2018-05-27 01:50:00    21.4
2018-05-27 01:55:00    21.4
2018-05-27 02:00:00    21.3
2018-05-27 02:05:00    21.3
2018-05-27 02:10:00    21.4
2018-05-27 02:15:00    21.3
2018-05-27 02:20:00    21.3
2018-05-27 02:25:00    21.2
                       ... 
2018-06-02 21:30:00    20.9
2018-06-02 21:35:00    21.0
2018-06-02 21:40:00    20.9
2018-06-02 21:45:00    20.8
20

In [11]:
np.unique(noaa_data[noaa_data['AIR_TEMPERATURE'] < -273.15]['AIR_TEMPERATURE'].index.date)

array([datetime.date(2018, 4, 26), datetime.date(2018, 5, 29),
       datetime.date(2018, 8, 5)], dtype=object)

In [None]:
noaa_data[noaa_data['AIR_TEMPERATURE']

In [13]:
noaa_data.replace(-9999,np.nan,inplace=True)

In [14]:
noaa_data.mean?

In [15]:
pd.Series([1,2,3]).mean()

2.0

In [16]:
pd.Series([1,2,3,np.nan]).mean()

2.0

In [24]:
pd.Series([1,2,3,np.nan]).mean(skipna=False)

nan

In [19]:
noaa_data.loc[:,'AIR_TEMPERATURE'].groupby(pd.Grouper(freq='W')).mean()

LST_DATE_LST_TIME
2017-12-31    -4.528814
2018-01-07    -7.237847
2018-01-14     6.990625
2018-01-21     0.747569
2018-01-28     8.395089
2018-02-04     2.817708
2018-02-11     8.598810
2018-02-18    11.144494
2018-02-25    16.084524
2018-03-04     8.591121
2018-03-11     4.532242
2018-03-18     6.572669
2018-03-25     5.911409
2018-04-01    12.809970
2018-04-08    12.107837
2018-04-15    14.936558
2018-04-22    12.446776
2018-04-29    15.081687
2018-05-06    19.035268
2018-05-13    20.755060
2018-05-20    22.814236
2018-05-27    22.898760
2018-06-03    23.660569
2018-06-10    22.816915
2018-06-17    23.292510
2018-06-24    26.877282
2018-07-01    24.928472
2018-07-08    24.993353
2018-07-15    24.147272
2018-07-22    24.144544
2018-07-29    24.514484
2018-08-05    24.206799
2018-08-12    26.266321
Freq: W-SUN, Name: AIR_TEMPERATURE, dtype: float64

In [22]:
noaa_data['AIR_TEMPERATURE'].groupby(noaa_data.index.hour).mean()

LST_DATE_LST_TIME
0     12.528899
1     12.109060
2     11.750956
3     11.448853
4     11.220757
5     11.050573
6     11.482722
7     12.618702
8     14.181797
9     15.808794
10    17.145315
11    18.265392
12    19.172476
13    19.780650
14    19.933677
15    19.973280
16    19.603800
17    18.786943
18    17.138287
19    15.429904
20    14.372095
21    13.772859
22    13.339717
23    12.936506
Name: AIR_TEMPERATURE, dtype: float64

In [23]:
noaa_data.loc[:,'PRECIPITATION'].groupby(pd.Grouper(freq='W')).sum()

LST_DATE_LST_TIME
2017-12-31     0.0
2018-01-07     3.3
2018-01-14    23.5
2018-01-21     9.3
2018-01-28    59.7
2018-02-04    37.5
2018-02-11    13.3
2018-02-18     3.5
2018-02-25     6.1
2018-03-04    21.6
2018-03-11    20.4
2018-03-18    18.6
2018-03-25    53.9
2018-04-01     0.6
2018-04-08    28.3
2018-04-15    57.4
2018-04-22    11.1
2018-04-29    42.3
2018-05-06     0.0
2018-05-13     0.2
2018-05-20    68.2
2018-05-27     8.6
2018-06-03    27.6
2018-06-10    26.9
2018-06-17     3.2
2018-06-24     0.0
2018-07-01    70.8
2018-07-08    16.4
2018-07-15     0.0
2018-07-22    60.5
2018-07-29    23.5
2018-08-05    59.8
2018-08-12     0.0
Freq: W-SUN, Name: PRECIPITATION, dtype: float64