# Wind Statistics

- [Four tricks you should know to parse date columns with Pandas read_csv()](https://towardsdatascience.com/4-tricks-you-should-know-to-parse-date-columns-with-pandas-read-csv-27355bb2ad0e)

- [Mastering Dates and Timestamps in Pandas (and Python in general](https://towardsdatascience.com/mastering-dates-and-timestamps-in-pandas-and-python-in-general-5b8c6edcc50c)

- [Changing Years in a date column](https://stackoverflow.com/questions/53137642/changing-years-in-a-date-column)

### Introduction:

The data have been modified to contain some missing values, identified by NaN.
Using pandas should make this exercise
easier, in particular for the bonus question.

You should be able to perform all of these operations without using
a for loop or other looping construct.


1. The data in 'wind.data' has the following format:

In [7]:
"""
Yr Mo Dy   RPT   VAL   ROS   KIL   SHA   BIR   DUB   CLA   MUL   CLO   BEL   MAL
61  1  1 15.04 14.96 13.17  9.29   NaN  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  1  2 14.71   NaN 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25   NaN  8.50  7.67 12.75 12.71
"""

'\nYr Mo Dy   RPT   VAL   ROS   KIL   SHA   BIR   DUB   CLA   MUL   CLO   BEL   MAL\n61  1  1 15.04 14.96 13.17  9.29   NaN  9.87 13.67 10.25 10.83 12.58 18.50 15.04\n61  1  2 14.71   NaN 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83\n61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25   NaN  8.50  7.67 12.75 12.71\n'

   The first three columns are year, month and day.  The
   remaining 12 columns are average windspeeds in knots at 12
   locations in Ireland on that day.

   More information about the dataset go [here](wind.desc).

### Step 1. Import the necessary libraries

In [8]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data)

### Step 3. Assign it to a variable called data and replace the first 3 columns by a proper datetime index.

In [9]:
url='https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data'

data=pd.read_csv(url, header=0, sep='\s+', parse_dates={'date': ['Yr', 'Mo', 'Dy']}, index_col=0)

## sep='\s+'  >>> multi-space separator

In [10]:
data.shape

(6574, 12)

In [11]:
data.head(3)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2061-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
2061-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2061-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71


In [12]:
data.tail(3)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1978-12-29,14.0,10.29,14.42,8.71,9.71,10.54,19.17,12.46,14.5,16.42,18.88,29.58
1978-12-30,18.5,14.04,21.29,9.13,12.75,9.71,18.08,12.87,12.46,12.12,14.67,28.79
1978-12-31,20.33,17.41,27.29,9.59,12.08,10.13,19.25,11.63,11.58,11.38,12.08,22.08


In [13]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6574 entries, 2061-01-01 to 1978-12-31
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RPT     6568 non-null   float64
 1   VAL     6571 non-null   float64
 2   ROS     6572 non-null   float64
 3   KIL     6569 non-null   float64
 4   SHA     6572 non-null   float64
 5   BIR     6574 non-null   float64
 6   DUB     6571 non-null   float64
 7   CLA     6572 non-null   float64
 8   MUL     6571 non-null   float64
 9   CLO     6573 non-null   float64
 10  BEL     6574 non-null   float64
 11  MAL     6570 non-null   float64
dtypes: float64(12)
memory usage: 667.7 KB


In [14]:
data=data.reset_index() # MOVE date OUT OF INDEX MAKES IT EASIER TO WORK ON DATETIME OBJECTS
                        # REPEATING THIS CELL WILL RESULT IN AN ERROR

In [15]:
data.head(3)

Unnamed: 0,date,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,2061-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1,2061-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2,2061-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71


In [16]:
data.tail(3)

Unnamed: 0,date,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
6571,1978-12-29,14.0,10.29,14.42,8.71,9.71,10.54,19.17,12.46,14.5,16.42,18.88,29.58
6572,1978-12-30,18.5,14.04,21.29,9.13,12.75,9.71,18.08,12.87,12.46,12.12,14.67,28.79
6573,1978-12-31,20.33,17.41,27.29,9.59,12.08,10.13,19.25,11.63,11.58,11.38,12.08,22.08


### Step 4. Year 2061? Do we really have data from this year? Create a function to fix it and apply it.

In [17]:
### THIS FIRST BIT IS AN INTERESTING AND INSTRUCTIVE MISTAKE
### I HAD NOT RUN TAIL TO INSPECT THE date COLUMN, SOME DATES WERE PREPENDED
### W/20 AND OTHERS WITH W/19
### SO THE NEXT FEW CELLS ARE INCORRECT FOR THIS PROBLEM. SHIFTING BY 100
### YEARS WON'T WORK

In [18]:
from datetime import datetime, timedelta
from dateutil.parser import parse
import datetime

#df['date'] = data['date'].apply(lambda x: x.replace(year = x.year - 100))
data['date'] = data['date'] + pd.DateOffset(years=-100)

data.set_index('date', inplace=True)    # RESTORE DATETIME INDEX

In [19]:
data.head(3) # I THOUGHT I'D SUCCESSFULLY SHIFTED DATES

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71


In [20]:
### BUT LOOK AT THE TAIL
data.tail(3)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1878-12-29,14.0,10.29,14.42,8.71,9.71,10.54,19.17,12.46,14.5,16.42,18.88,29.58
1878-12-30,18.5,14.04,21.29,9.13,12.75,9.71,18.08,12.87,12.46,12.12,14.67,28.79
1878-12-31,20.33,17.41,27.29,9.59,12.08,10.13,19.25,11.63,11.58,11.38,12.08,22.08


In [21]:
# I'LL REPEAT THE STEPS PRIOR TO THE MISTAKE
url='https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data'
data=pd.read_csv(url, header=0, sep='\s+', parse_dates={'date': ['Yr', 'Mo', 'Dy']}, index_col=0) ## sep='\s+'  >>> multi-space separator
data=data.reset_index()
data.head(3)

Unnamed: 0,date,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,2061-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1,2061-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2,2061-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71


In [24]:
data.date.dt.strftime('%y-%m-%d').head(3) # Just prepend '19' to it

0    61-01-01
1    61-01-02
2    61-01-03
Name: date, dtype: object

In [25]:
### GO WITH A STRING BASED APPROACH. REPLACE THE FIRST TWO DIGITS OF EVERY YEAR
### WITH '19' AND THEN TURN BACK INTO A DATETIME OBJECT

data['date'] = pd.to_datetime('19' + data.date.dt.strftime('%y-%m-%d'))
data.set_index('date', inplace=True)

In [27]:
data.head(3)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71


In [28]:
data.tail(3)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1978-12-29,14.0,10.29,14.42,8.71,9.71,10.54,19.17,12.46,14.5,16.42,18.88,29.58
1978-12-30,18.5,14.04,21.29,9.13,12.75,9.71,18.08,12.87,12.46,12.12,14.67,28.79
1978-12-31,20.33,17.41,27.29,9.59,12.08,10.13,19.25,11.63,11.58,11.38,12.08,22.08


### Step 5. Set the right dates as the index. Pay attention at the data type, it should be datetime64[ns].

In [29]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6574 entries, 1961-01-01 to 1978-12-31
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RPT     6568 non-null   float64
 1   VAL     6571 non-null   float64
 2   ROS     6572 non-null   float64
 3   KIL     6569 non-null   float64
 4   SHA     6572 non-null   float64
 5   BIR     6574 non-null   float64
 6   DUB     6571 non-null   float64
 7   CLA     6572 non-null   float64
 8   MUL     6571 non-null   float64
 9   CLO     6573 non-null   float64
 10  BEL     6574 non-null   float64
 11  MAL     6570 non-null   float64
dtypes: float64(12)
memory usage: 667.7 KB


In [30]:
data.index.dtype  # THIS APPEARS TO BE THE SAME THING AS datetime64; BUT A FEW CASES REQUIRE EXPLICIT CONVERSION

dtype('<M8[ns]')

In [31]:
# FROM PROVIDED SOLUTION. EASY.

data.index = pd.to_datetime(data.index)
data.index.dtype   ####hmmmm

dtype('<M8[ns]')

### Step 6. Compute how many values are missing for each location over the entire record.
#### They should be ignored in all calculations below.

In [126]:
data.isna().sum().to_frame().T

Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,6,3,2,5,2,0,3,2,3,1,0,4


### Step 7. Compute how many non-missing values there are in total.

In [33]:
### data.isna().sum().sum() #### OOPS

data.notnull().sum()

RPT    6568
VAL    6571
ROS    6572
KIL    6569
SHA    6572
BIR    6574
DUB    6571
CLA    6572
MUL    6571
CLO    6573
BEL    6574
MAL    6570
dtype: int64

### Step 8. Calculate the mean windspeeds of the windspeeds over all the locations and all the times.
#### A single number for the entire dataset.

In [34]:
data.head()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71
1961-01-04,10.58,6.63,11.75,4.58,4.54,2.88,8.63,1.79,5.83,5.88,5.46,10.88
1961-01-05,13.33,13.25,11.42,6.17,10.71,8.21,11.92,6.54,10.92,10.34,12.92,11.83


In [35]:
data.sum().sum()/data.count().sum()

10.227883764282181

### Step 9. Create a DataFrame called loc_stats and calculate the min, max and mean windspeeds and standard deviations of the windspeeds at each location over all the days

#### A different set of numbers for each location.

In [36]:
loc_stats = data.agg(['min', 'max', 'mean', 'std'])
loc_stats


Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
min,0.67,0.21,1.5,0.0,0.13,0.0,0.0,0.0,0.0,0.04,0.13,0.67
max,35.8,33.37,33.84,28.46,37.54,26.16,30.37,31.08,25.88,28.21,42.38,42.54
mean,12.362987,10.644314,11.660526,6.306468,10.455834,7.092254,9.797343,8.495053,8.49359,8.707332,13.121007,15.599079
std,5.618413,5.267356,5.00845,3.605811,4.936125,3.968683,4.977555,4.499449,4.166872,4.503954,5.835037,6.699794


In [131]:
loc_stats.mean(axis=1) # MEAN IS CLOSE ENOUGH TO MY UNIVERSAL MEAN WINDSPEED
                       # ABOVE TO MAKE ME FEEL GOOD ABOUT THAT VALUE

min      0.279167
max     32.969167
mean    10.227982
std      4.923958
dtype: float64

### Step 10. Create a DataFrame called day_stats and calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day.

#### A different set of numbers for each day.

In [132]:
day_stats = data.agg(['min', 'max', 'mean', 'std'], axis=1)
day_stats.head(10)

Unnamed: 0_level_0,min,max,mean,std
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1961-01-01,9.29,18.5,13.018182,2.808875
1961-01-02,6.5,17.54,11.336364,3.188994
1961-01-03,6.17,18.5,11.641818,3.681912
1961-01-04,1.79,11.75,6.619167,3.198126
1961-01-05,6.17,13.33,10.63,2.445356
1961-01-06,4.42,13.21,8.24,2.998063
1961-01-07,4.96,14.29,10.385,3.072114
1961-01-08,5.91,16.62,10.4875,3.547237
1961-01-09,4.75,15.37,9.8975,2.905954
1961-01-10,6.54,19.5,10.4775,3.44261


### Step 11. Find the average windspeed in January for each location.
#### Treat January 1961 and January 1962 both as January.

In [133]:
data.loc[data.index.month == 1].mean().to_frame().T


Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,14.847325,12.91456,13.299624,7.199498,11.667734,8.054839,11.819355,9.512047,9.543208,10.053566,14.55052,18.028763


### Step 12. Downsample the record to a yearly frequency for each location.

In [134]:
(
    data
    .groupby(pd.Grouper(freq='1Y'))
    .mean()
)

# CHECK OUT THEIR SOLUTION USING to_period('A') WHICH
# RETURNS SAME RESULT BUT W/DATE PRESENTED AS 1961, 1962, ETC.

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-12-31,12.299583,10.351796,11.362369,6.958227,10.881763,7.729726,9.733923,8.858788,8.647652,9.835577,13.502795,13.680773
1962-12-31,12.246923,10.110438,11.732712,6.96044,10.657918,7.393068,11.020712,8.793753,8.316822,9.676247,12.930685,14.323956
1963-12-31,12.813452,10.836986,12.541151,7.330055,11.72411,8.434712,11.075699,10.336548,8.903589,10.224438,13.638877,14.999014
1964-12-31,12.363661,10.920164,12.104372,6.787787,11.454481,7.570874,10.259153,9.46735,7.789016,10.207951,13.740546,14.910301
1965-12-31,12.45137,11.075534,11.848767,6.858466,11.024795,7.47811,10.618712,8.879918,7.907425,9.918082,12.964247,15.591644
1966-12-31,13.461973,11.557205,12.02063,7.345726,11.805041,7.793671,10.579808,8.835096,8.514438,9.768959,14.265836,16.30726
1967-12-31,12.737151,10.990986,11.739397,7.143425,11.63074,7.368164,10.652027,9.325616,8.645014,9.547425,14.774548,17.135945
1968-12-31,11.835628,10.468197,11.409754,6.477678,10.760765,6.067322,8.85918,8.255519,7.224945,7.832978,12.808634,15.017486
1969-12-31,11.166356,9.723699,10.902,5.767973,9.873918,6.189973,8.564493,7.711397,7.924521,7.754384,12.621233,15.762904
1970-12-31,12.600329,10.726932,11.730247,6.217178,10.56737,7.609452,9.60989,8.33463,9.297616,8.289808,13.183644,16.456027


In [37]:
### TRY ANOTHER APPROACH

(
    data
    .resample('1Y')
    .mean()
).head()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-12-31,12.299583,10.351796,11.362369,6.958227,10.881763,7.729726,9.733923,8.858788,8.647652,9.835577,13.502795,13.680773
1962-12-31,12.246923,10.110438,11.732712,6.96044,10.657918,7.393068,11.020712,8.793753,8.316822,9.676247,12.930685,14.323956
1963-12-31,12.813452,10.836986,12.541151,7.330055,11.72411,8.434712,11.075699,10.336548,8.903589,10.224438,13.638877,14.999014
1964-12-31,12.363661,10.920164,12.104372,6.787787,11.454481,7.570874,10.259153,9.46735,7.789016,10.207951,13.740546,14.910301
1965-12-31,12.45137,11.075534,11.848767,6.858466,11.024795,7.47811,10.618712,8.879918,7.907425,9.918082,12.964247,15.591644


In [42]:
### THEIR SOLUTION
(
     data
    .groupby(data.index.to_period('1A'))
    .mean()
).head(5)

# to_period HAS THE ADVANTAGE OF A BETTER DATE DISPLAY

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961,12.299583,10.351796,11.362369,6.958227,10.881763,7.729726,9.733923,8.858788,8.647652,9.835577,13.502795,13.680773
1962,12.246923,10.110438,11.732712,6.96044,10.657918,7.393068,11.020712,8.793753,8.316822,9.676247,12.930685,14.323956
1963,12.813452,10.836986,12.541151,7.330055,11.72411,8.434712,11.075699,10.336548,8.903589,10.224438,13.638877,14.999014
1964,12.363661,10.920164,12.104372,6.787787,11.454481,7.570874,10.259153,9.46735,7.789016,10.207951,13.740546,14.910301
1965,12.45137,11.075534,11.848767,6.858466,11.024795,7.47811,10.618712,8.879918,7.907425,9.918082,12.964247,15.591644


### Step 13. Downsample the record to a monthly frequency for each location.

In [136]:
(
    data
    .resample('1M')
    .mean()
)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-31,14.841333,11.988333,13.431613,7.736774,11.072759,8.588065,11.184839,9.245333,9.085806,10.107419,13.880968,14.703226
1961-02-28,16.269286,14.975357,14.441481,9.230741,13.852143,10.937500,11.890714,11.846071,11.821429,12.714286,18.583214,15.411786
1961-03-31,10.890000,11.296452,10.752903,7.284000,10.509355,8.866774,9.644194,9.829677,10.294138,11.251935,16.410968,15.720000
1961-04-30,10.722667,9.427667,9.998000,5.830667,8.435000,6.495000,6.925333,7.094667,7.342333,7.237000,11.147333,10.278333
1961-05-31,9.860968,8.850000,10.818065,5.905333,9.490323,6.574839,7.604000,8.177097,8.039355,8.499355,11.900323,12.011613
...,...,...,...,...,...,...,...,...,...,...,...,...
1978-08-31,9.645161,8.259355,9.032258,4.502903,7.368065,5.935161,5.650323,5.417742,7.241290,5.536774,10.466774,12.054194
1978-09-30,10.913667,10.895000,10.635000,5.725000,10.372000,9.278333,10.790333,9.583000,10.069333,8.939000,15.680333,19.391333
1978-10-31,9.897742,8.670968,9.295806,4.721290,8.525161,6.774194,8.115484,7.337742,8.297742,8.243871,13.776774,17.150000
1978-11-30,16.151667,14.802667,13.508000,7.317333,11.475000,8.743000,11.492333,9.657333,10.701333,10.676000,17.404667,20.723000


### Step 14. Downsample the record to a weekly frequency for each location.

In [137]:
(
    data
    .resample('1W')
    .mean()
)

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,15.040000,14.960000,13.170000,9.290000,,9.870000,13.670000,10.250000,10.830000,12.580000,18.500000,15.040000
1961-01-08,13.541429,11.486667,10.487143,6.417143,9.474286,6.435714,11.061429,6.616667,8.434286,8.497143,12.481429,13.238571
1961-01-15,12.468571,8.967143,11.958571,4.630000,7.351429,5.072857,7.535714,6.820000,5.712857,7.571429,11.125714,11.024286
1961-01-22,13.204286,9.862857,12.982857,6.328571,8.966667,7.417143,9.257143,7.875714,7.145714,8.124286,9.821429,11.434286
1961-01-29,19.880000,16.141429,18.225714,12.720000,17.432857,14.828571,15.528571,15.160000,14.480000,15.640000,20.930000,22.530000
...,...,...,...,...,...,...,...,...,...,...,...,...
1978-12-03,14.934286,11.232857,13.941429,5.565714,10.215714,8.618571,9.642857,7.685714,9.011429,9.547143,11.835714,18.728571
1978-12-10,20.740000,19.190000,17.034286,9.777143,15.287143,12.774286,14.437143,12.488571,13.870000,14.082857,18.517143,23.061429
1978-12-17,16.758571,14.692857,14.987143,6.917143,11.397143,7.272857,10.208571,7.967143,9.168571,8.565714,11.102857,15.562857
1978-12-24,11.155714,8.008571,13.172857,4.004286,7.825714,6.290000,7.798571,8.667143,7.151429,8.072857,11.845714,18.977143


### Step 15. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week (assume that the first week starts on January 2 1961) for the first 52 weeks.

In [139]:
#?pd.DataFrame.resample

In [205]:
### THEY HAD A DIFFERENT OBJECTIVE IN MIND. WORTH CHECKING OUT.
### THEY WANTED THESE STATS FOR EACH WEEK AND EACH LOCATION, NOT ACROSS ALL
### LOCATION.

from pandas.tseries.offsets import DateOffset

(
    data
    .resample('7D')
    .mean()
    .loc[pd.Timestamp("1961-01-08"): pd.Timestamp("1961-01-08") + DateOffset(weeks=51)]
    .agg(['min', 'max', 'mean', 'std'], axis=1)
)


Unnamed: 0_level_0,min,max,mean,std
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1961-01-08,5.135714,12.422857,8.895,2.688046
1961-01-15,6.357143,13.554286,9.242639,2.378738
1961-01-22,10.524286,20.607143,14.920476,2.97819
1961-01-29,9.091429,16.482857,12.655893,2.623683
1961-02-05,11.238571,20.964286,16.036548,2.969975
1961-02-12,9.84,20.564286,13.694881,2.940274
1961-02-19,7.41,17.275714,11.719524,2.708721
1961-02-26,8.572857,19.45,13.123929,2.729939
1961-03-05,5.221429,15.427143,10.075357,2.666391
1961-03-12,7.667143,20.665714,12.715873,3.473851
