# US - Baby Names
### Introduction:
##### We are going to use a subset of US Baby Names (https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.
##### In the file it will be names from 2004 until 2014

### Step 1. Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd

### Step 2. Import the dataset from this address:
https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv
### Step 3. Assign it to a variable called baby_names.

In [2]:
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv'
baby_names = pd.read_csv(url)
baby_names.head()

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41


### Step 4. See the first 10 entries

In [3]:
baby_names.head(10)

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
5,11354,11355,Abigail,2004,F,AK,37
6,11355,11356,Olivia,2004,F,AK,33
7,11356,11357,Isabella,2004,F,AK,30
8,11357,11358,Alyssa,2004,F,AK,29
9,11358,11359,Sophia,2004,F,AK,28


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [4]:
baby_names.drop(['Unnamed: 0', 'Id'], axis=1, inplace=True)
baby_names.head()

Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41


### Step 6. Is there more male or female names in the dataset?

In [5]:
baby_names['Gender'].value_counts()

F    558846
M    457549
Name: Gender, dtype: int64

### Step 7. Group the dataset by name and assign to names
##### group the data

##### print the first 5 observations

##### print the size of the dataset

##### sort it from the biggest value to the smallest one


In [6]:
baby_names.drop('Year', axis=1, inplace=True)
names = baby_names.groupby('Name').sum()

In [7]:
names.head(5)

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Aaban,12
Aadan,23
Aadarsh,5
Aaden,3426
Aadhav,6


In [8]:
names.shape

(17632, 1)

In [9]:
names.sort_values('Count', ascending=False).head()

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Jacob,242874
Emma,214852
Michael,214405
Ethan,209277
Isabella,204798


### Step 8. How many different names exist in the dataset?

In [54]:
names.count()

Count    17632
dtype: int64

### Step 9. What is the name with most occurrences?

In [11]:
names.sort_values('Count', ascending=False).head(1)

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Jacob,242874


In [12]:
names['Count'].idxmax()

'Jacob'

### Step 10. How many different names have the least occurrences?

In [57]:
len(names[names['Count'] == names['Count'].min()])

2578

### Step 11. What is the median name occurrence?

In [59]:
names[names['Count'] == names['Count'].median()]

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Aishani,49
Alara,49
Alysse,49
Ameir,49
Anely,49
Antonina,49
Aveline,49
Aziah,49
Baily,49
Caleah,49


### Step 12. What is the standard deviation of names?

In [60]:
names.std()

Count    11006.069468
dtype: float64

### Step 13. Get a summary with the mean, min, max, std and quartiles.

In [62]:
names.describe()

Unnamed: 0,Count
count,17632.0
mean,2008.932169
std,11006.069468
min,5.0
25%,11.0
50%,49.0
75%,337.0
max,242874.0


In [72]:
names.agg(['mean', 'min', 'max', 'std', 'quantile'])

Unnamed: 0,Count
mean,2008.932169
min,5.0
max,242874.0
std,11006.069468
quantile,49.0


# Wind Statistics
### Introduction:
##### The data have been modified to contain some missing values, identified by NaN.
##### Using pandas should make this exercise easier, in particular for the bonus question.

##### You should be able to perform all of these operations without using a for loop or other looping construct.

##### The data in 'wind.data' has the following format:
In [434]:
"""
Yr Mo Dy   RPT   VAL   ROS   KIL   SHA   BIR   DUB   CLA   MUL   CLO   BEL   MAL
61  1  1 15.04 14.96 13.17  9.29   NaN  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  1  2 14.71   NaN 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25   NaN  8.50  7.67 12.75 12.71
"""
Out[434]:
'\nYr Mo Dy   RPT   VAL   ROS   KIL   SHA   BIR   DUB   CLA   MUL   CLO   BEL   MAL\n61  1  1 15.04 14.96 13.17  9.29   NaN  9.87 13.67 10.25 10.83 12.58 18.50 15.04\n61  1  2 14.71   NaN 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83\n61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25   NaN  8.50  7.67 12.75 12.71\n'
##### The first three columns are year, month and day. The remaining 12 columns are average windspeeds in knots at 12 locations in Ireland on that day.

##### More information about the dataset go here: https://render.githubusercontent.com/view/wind.desc

### Step 1. Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd

In [2]:
import datetime

### Import the dataset from this address: https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data
### Step 3. Assign it to a variable called data and replace the first 3 columns by a proper datetime index.

In [83]:
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data'

In [84]:
data = pd.read_table(url, sep='\s+', parse_dates = [[0,1,2]])

In [85]:
data.head()

Unnamed: 0,Yr_Mo_Dy,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,2061-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1,2061-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2,2061-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71
3,2061-01-04,10.58,6.63,11.75,4.58,4.54,2.88,8.63,1.79,5.83,5.88,5.46,10.88
4,2061-01-05,13.33,13.25,11.42,6.17,10.71,8.21,11.92,6.54,10.92,10.34,12.92,11.83


### Step 4. Year 2061? Do we really have data from this year? Create a function to fix it and apply it.

In [86]:
data['Yr_Mo_Dy'][0].year

2061

In [87]:
data.dtypes
    

Yr_Mo_Dy    datetime64[ns]
RPT                float64
VAL                float64
ROS                float64
KIL                float64
SHA                float64
BIR                float64
DUB                float64
CLA                float64
MUL                float64
CLO                float64
BEL                float64
MAL                float64
dtype: object

In [88]:
data['Yr_Mo_Dy'].unique()

array(['2061-01-01T00:00:00.000000000', '2061-01-02T00:00:00.000000000',
       '2061-01-03T00:00:00.000000000', ...,
       '1978-12-29T00:00:00.000000000', '1978-12-30T00:00:00.000000000',
       '1978-12-31T00:00:00.000000000'], dtype='datetime64[ns]')

In [89]:
def fix_year(x):
    year = x.year
    if x.year == 2061:
        year = x.year-100
        return datetime.date(year, x.month, x.day)
    else:
        return datetime.date(x.year, x.month, x.day)

In [90]:
data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_year)
data.head()

Unnamed: 0,Yr_Mo_Dy,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1,1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2,1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71
3,1961-01-04,10.58,6.63,11.75,4.58,4.54,2.88,8.63,1.79,5.83,5.88,5.46,10.88
4,1961-01-05,13.33,13.25,11.42,6.17,10.71,8.21,11.92,6.54,10.92,10.34,12.92,11.83


In [91]:
# function that uses datetime
def fix_century(x):
  year = x.year - 100 if x.year > 1989 else x.year
  return datetime.date(year, x.month, x.day)

# apply the function fix_century on the column and replace the values to the right ones
data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_century)

# data.info()
data.head()

Unnamed: 0,Yr_Mo_Dy,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
0,1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1,1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
2,1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71
3,1961-01-04,10.58,6.63,11.75,4.58,4.54,2.88,8.63,1.79,5.83,5.88,5.46,10.88
4,1961-01-05,13.33,13.25,11.42,6.17,10.71,8.21,11.92,6.54,10.92,10.34,12.92,11.83


### Step 5. Set the right dates as the index. Pay attention at the data type, it should be datetime64[ns].

In [92]:
data.dtypes

Yr_Mo_Dy     object
RPT         float64
VAL         float64
ROS         float64
KIL         float64
SHA         float64
BIR         float64
DUB         float64
CLA         float64
MUL         float64
CLO         float64
BEL         float64
MAL         float64
dtype: object

In [93]:
data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].astype('datetime64[ns]')

In [94]:
data.dtypes

Yr_Mo_Dy    datetime64[ns]
RPT                float64
VAL                float64
ROS                float64
KIL                float64
SHA                float64
BIR                float64
DUB                float64
CLA                float64
MUL                float64
CLO                float64
BEL                float64
MAL                float64
dtype: object

In [95]:
data.set_index('Yr_Mo_Dy', drop=True, inplace=True)

In [96]:
data.head().head()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,15.04,14.96,13.17,9.29,,9.87,13.67,10.25,10.83,12.58,18.5,15.04
1961-01-02,14.71,,10.83,6.5,12.62,7.67,11.5,10.04,9.79,9.67,17.54,13.83
1961-01-03,18.5,16.88,12.33,10.13,11.17,6.17,11.25,,8.5,7.67,12.75,12.71
1961-01-04,10.58,6.63,11.75,4.58,4.54,2.88,8.63,1.79,5.83,5.88,5.46,10.88
1961-01-05,13.33,13.25,11.42,6.17,10.71,8.21,11.92,6.54,10.92,10.34,12.92,11.83


### Step 6. Compute how many values are missing for each location over the entire record.
They should be ignored in all calculations below.

In [97]:
data.isnull().head()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01-01,False,False,False,False,True,False,False,False,False,False,False,False
1961-01-02,False,True,False,False,False,False,False,False,False,False,False,False
1961-01-03,False,False,False,False,False,False,False,True,False,False,False,False
1961-01-04,False,False,False,False,False,False,False,False,False,False,False,False
1961-01-05,False,False,False,False,False,False,False,False,False,False,False,False


In [98]:
data.isnull().sum().sum()

31

In [99]:
data.isnull().sum()

RPT    6
VAL    3
ROS    2
KIL    5
SHA    2
BIR    0
DUB    3
CLA    2
MUL    3
CLO    1
BEL    0
MAL    4
dtype: int64

In [100]:
data.notnull().sum()

RPT    6568
VAL    6571
ROS    6572
KIL    6569
SHA    6572
BIR    6574
DUB    6571
CLA    6572
MUL    6571
CLO    6573
BEL    6574
MAL    6570
dtype: int64

### Step 7. Compute how many non-missing values there are in total.

In [101]:
data.count()

RPT    6568
VAL    6571
ROS    6572
KIL    6569
SHA    6572
BIR    6574
DUB    6571
CLA    6572
MUL    6571
CLO    6573
BEL    6574
MAL    6570
dtype: int64

In [102]:
len(data.index)

6574

In [103]:
data.notnull().describe()

Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
count,6574,6574,6574,6574,6574,6574,6574,6574,6574,6574,6574,6574
unique,2,2,2,2,2,1,2,2,2,2,1,2
top,True,True,True,True,True,True,True,True,True,True,True,True
freq,6568,6571,6572,6569,6572,6574,6571,6572,6571,6573,6574,6570


### Step 8. Calculate the mean windspeeds of the windspeeds over all the locations and all the times.

In [104]:
(data - data.isnull().sum()).mean().sum() / len(data.columns)

7.644649027503598

In [105]:
data.sum().sum() / data.notnull().sum().sum()

10.227883764282167

### Step 9. Create a DataFrame called loc_stats and calculate the min, max and mean windspeeds and standard deviations of the windspeeds at each location over all the days
A different set of numbers for each location.

In [106]:
data.describe()

Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
count,6568.0,6571.0,6572.0,6569.0,6572.0,6574.0,6571.0,6572.0,6571.0,6573.0,6574.0,6570.0
mean,12.362987,10.644314,11.660526,6.306468,10.455834,7.092254,9.797343,8.495053,8.49359,8.707332,13.121007,15.599079
std,5.618413,5.267356,5.00845,3.605811,4.936125,3.968683,4.977555,4.499449,4.166872,4.503954,5.835037,6.699794
min,0.67,0.21,1.5,0.0,0.13,0.0,0.0,0.0,0.0,0.04,0.13,0.67
25%,8.12,6.67,8.0,3.58,6.75,4.0,6.0,5.09,5.37,5.33,8.71,10.71
50%,11.71,10.17,10.92,5.75,9.96,6.83,9.21,8.08,8.17,8.29,12.5,15.0
75%,15.92,14.04,14.67,8.42,13.54,9.67,12.96,11.42,11.19,11.63,16.88,19.83
max,35.8,33.37,33.84,28.46,37.54,26.16,30.37,31.08,25.88,28.21,42.38,42.54


In [107]:
loc_stats = data.agg(['min', 'max', 'mean', 'std'])

In [108]:
loc_stats

Unnamed: 0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
min,0.67,0.21,1.5,0.0,0.13,0.0,0.0,0.0,0.0,0.04,0.13,0.67
max,35.8,33.37,33.84,28.46,37.54,26.16,30.37,31.08,25.88,28.21,42.38,42.54
mean,12.362987,10.644314,11.660526,6.306468,10.455834,7.092254,9.797343,8.495053,8.49359,8.707332,13.121007,15.599079
std,5.618413,5.267356,5.00845,3.605811,4.936125,3.968683,4.977555,4.499449,4.166872,4.503954,5.835037,6.699794


### Step 10. Create a DataFrame called day_stats and calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day.
A different set of numbers for each day.

In [109]:
day_stats = data.agg(['min', 'max', 'mean', 'std'], axis = 1)

TypeError: ("'list' object is not callable", 'occurred at index 1961-01-01 00:00:00')

In [118]:
day_stats['Min'] = data.min(axis=1)

In [116]:
day_stats.drop(day_stats.loc[: ,'RPT':'MAL'], axis=1, inplace=True)

In [123]:
day_stats = pd.DataFrame()

In [124]:
day_stats

In [125]:
day_stats['min'] = data.min(axis = 1) # min
day_stats['max'] = data.max(axis = 1) # max 
day_stats['mean'] = data.mean(axis = 1) # mean
day_stats['std'] = data.std(axis = 1) # standard deviations

In [137]:
day_stats.head()

Unnamed: 0_level_0,min,max,mean,std
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1961-01-01,9.29,18.5,13.018182,2.808875
1961-01-02,6.5,17.54,11.336364,3.188994
1961-01-03,6.17,18.5,11.641818,3.681912
1961-01-04,1.79,11.75,6.619167,3.198126
1961-01-05,6.17,13.33,10.63,2.445356


### Step 11. Find the average windspeed in January for each location.
Treat January 1961 and January 1962 both as January.

In [134]:
data[data.index.month == 1].mean()

RPT    14.847325
VAL    12.914560
ROS    13.299624
KIL     7.199498
SHA    11.667734
BIR     8.054839
DUB    11.819355
CLA     9.512047
MUL     9.543208
CLO    10.053566
BEL    14.550520
MAL    18.028763
dtype: float64

In [135]:
data.mean()

RPT    12.362987
VAL    10.644314
ROS    11.660526
KIL     6.306468
SHA    10.455834
BIR     7.092254
DUB     9.797343
CLA     8.495053
MUL     8.493590
CLO     8.707332
BEL    13.121007
MAL    15.599079
dtype: float64

### Step 12. Downsample the record to a yearly frequency for each location.

In [144]:
data.groupby(data.index.year).mean()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961,12.299583,10.351796,11.362369,6.958227,10.881763,7.729726,9.733923,8.858788,8.647652,9.835577,13.502795,13.680773
1962,12.246923,10.110438,11.732712,6.96044,10.657918,7.393068,11.020712,8.793753,8.316822,9.676247,12.930685,14.323956
1963,12.813452,10.836986,12.541151,7.330055,11.72411,8.434712,11.075699,10.336548,8.903589,10.224438,13.638877,14.999014
1964,12.363661,10.920164,12.104372,6.787787,11.454481,7.570874,10.259153,9.46735,7.789016,10.207951,13.740546,14.910301
1965,12.45137,11.075534,11.848767,6.858466,11.024795,7.47811,10.618712,8.879918,7.907425,9.918082,12.964247,15.591644
1966,13.461973,11.557205,12.02063,7.345726,11.805041,7.793671,10.579808,8.835096,8.514438,9.768959,14.265836,16.30726
1967,12.737151,10.990986,11.739397,7.143425,11.63074,7.368164,10.652027,9.325616,8.645014,9.547425,14.774548,17.135945
1968,11.835628,10.468197,11.409754,6.477678,10.760765,6.067322,8.85918,8.255519,7.224945,7.832978,12.808634,15.017486
1969,11.166356,9.723699,10.902,5.767973,9.873918,6.189973,8.564493,7.711397,7.924521,7.754384,12.621233,15.762904
1970,12.600329,10.726932,11.730247,6.217178,10.56737,7.609452,9.60989,8.33463,9.297616,8.289808,13.183644,16.456027


### Step 13. Downsample the record to a monthly frequency for each location.

In [145]:
data.groupby(data.index.month).mean()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,14.847325,12.91456,13.299624,7.199498,11.667734,8.054839,11.819355,9.512047,9.543208,10.053566,14.55052,18.028763
2,13.710906,12.111122,12.879132,6.942411,11.551772,7.633858,11.206024,9.341437,9.313169,9.518051,13.728898,17.156142
3,13.158687,11.505842,12.648118,7.265907,11.554516,7.959409,11.310179,9.635896,9.700324,10.096953,13.810609,16.909317
4,12.555648,10.429759,12.204815,6.898037,10.677667,7.441389,10.221315,8.909056,8.93087,9.158019,12.664759,14.937611
5,11.724032,10.145619,11.550394,6.307487,10.224301,6.942061,8.797738,8.452903,8.040806,8.524857,12.767258,13.736039
6,10.451317,8.949704,10.361315,5.652278,9.529926,6.410093,8.009556,7.920796,7.639796,7.729185,12.246407,12.861818
7,9.992007,8.357778,9.349642,5.416935,9.302634,5.972348,7.843501,7.26276,7.54448,7.321416,11.676505,12.800789
8,10.213411,8.415143,9.993441,5.270681,8.901559,5.891057,7.772312,6.842025,7.240573,7.002783,11.11009,12.565943
9,11.458519,9.981002,10.756883,5.615176,9.766315,6.566222,8.609722,7.745677,7.610556,7.689278,12.686389,14.761963
10,12.66061,11.010681,11.453943,6.065215,10.550251,7.15991,9.387778,8.726308,8.347181,8.850376,14.155323,16.697151


In [148]:
data.groupby(data.index.to_period('M')).mean()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1961-01,14.841333,11.988333,13.431613,7.736774,11.072759,8.588065,11.184839,9.245333,9.085806,10.107419,13.880968,14.703226
1961-02,16.269286,14.975357,14.441481,9.230741,13.852143,10.937500,11.890714,11.846071,11.821429,12.714286,18.583214,15.411786
1961-03,10.890000,11.296452,10.752903,7.284000,10.509355,8.866774,9.644194,9.829677,10.294138,11.251935,16.410968,15.720000
1961-04,10.722667,9.427667,9.998000,5.830667,8.435000,6.495000,6.925333,7.094667,7.342333,7.237000,11.147333,10.278333
1961-05,9.860968,8.850000,10.818065,5.905333,9.490323,6.574839,7.604000,8.177097,8.039355,8.499355,11.900323,12.011613
1961-06,9.904138,8.520333,8.867000,6.083000,10.824000,6.707333,9.095667,8.849333,9.086667,9.940333,13.995000,14.553793
1961-07,10.614194,8.221613,9.110323,6.340968,10.532581,6.198387,8.353333,8.284194,8.077097,8.891613,11.092581,12.312903
1961-08,12.035000,10.133871,10.335806,6.845806,12.715161,8.441935,10.093871,10.460968,9.111613,10.544667,14.410000,14.345333
1961-09,12.531000,9.656897,10.776897,7.155517,11.003333,7.234000,8.206000,8.936552,7.728333,9.931333,13.718333,12.921667
1961-10,14.289667,10.915806,12.236452,8.154839,11.865484,8.333871,11.194194,9.271935,8.942667,11.455806,14.229355,16.793226


### Step 14. Downsample the record to a weekly frequency for each location.

In [149]:
data.groupby(data.index.to_period('W')).mean()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1960-12-26/1961-01-01,15.040000,14.960000,13.170000,9.290000,,9.870000,13.670000,10.250000,10.830000,12.580000,18.500000,15.040000
1961-01-02/1961-01-08,13.541429,11.486667,10.487143,6.417143,9.474286,6.435714,11.061429,6.616667,8.434286,8.497143,12.481429,13.238571
1961-01-09/1961-01-15,12.468571,8.967143,11.958571,4.630000,7.351429,5.072857,7.535714,6.820000,5.712857,7.571429,11.125714,11.024286
1961-01-16/1961-01-22,13.204286,9.862857,12.982857,6.328571,8.966667,7.417143,9.257143,7.875714,7.145714,8.124286,9.821429,11.434286
1961-01-23/1961-01-29,19.880000,16.141429,18.225714,12.720000,17.432857,14.828571,15.528571,15.160000,14.480000,15.640000,20.930000,22.530000
1961-01-30/1961-02-05,16.827143,15.460000,12.618571,8.247143,13.361429,9.107143,12.204286,8.548571,9.821429,9.460000,14.012857,11.935714
1961-02-06/1961-02-12,19.684286,16.417143,17.304286,10.774286,14.718571,12.522857,14.934286,14.850000,14.064286,14.440000,21.832857,19.155714
1961-02-13/1961-02-19,15.130000,15.091429,13.797143,10.083333,13.410000,11.868571,9.542857,12.128571,12.375714,13.542857,21.167143,16.584286
1961-02-20/1961-02-26,15.221429,13.625714,14.334286,8.524286,13.655714,10.114286,11.150000,10.875714,10.392857,12.730000,16.304286,14.322857
1961-02-27/1961-03-05,12.101429,12.951429,11.063333,7.834286,12.101429,9.238571,10.232857,11.130000,10.383333,12.370000,17.842857,13.951667


### Step 15. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week (assume that the first week starts on January 2 1961) for the first 52 weeks.

In [150]:
data.groupby(data.index.week).mean()

Unnamed: 0_level_0,RPT,VAL,ROS,KIL,SHA,BIR,DUB,CLA,MUL,CLO,BEL,MAL
Yr_Mo_Dy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,13.92,11.71088,12.853016,6.617302,10.473175,7.578492,11.623651,9.1236,9.272222,9.870635,14.241746,18.249841
2,15.549921,13.359524,14.016349,7.307698,12.18381,8.170238,11.934365,9.50246,9.428095,9.798175,14.494603,17.623413
3,14.950159,13.281349,13.166825,7.060794,11.9592,8.320238,11.856905,9.696825,9.597619,10.169444,14.578254,17.98119
4,15.24192,13.575317,13.554524,7.698413,12.110079,8.235159,11.732857,9.854048,9.822302,10.352063,14.892222,18.337381
5,14.559127,12.853413,13.070794,7.400952,11.82373,8.165397,11.995714,9.682381,9.854286,10.196111,14.667381,17.702222
6,13.20373,11.508492,12.142619,6.469127,11.025476,7.166746,11.115317,8.894206,9.323413,9.298175,13.903254,17.921587
7,14.51746,12.681349,13.219524,7.14816,12.141032,7.90619,11.390238,9.749524,9.417778,9.605476,13.776667,17.130159
8,13.422857,11.822937,13.367381,7.1424,11.480635,7.729683,11.329048,9.329762,9.247063,9.541984,13.132302,16.88254
9,12.76136,11.83746,12.62744,6.62,10.937143,7.457778,10.622778,9.387778,8.86416,9.611508,13.724048,16.60208
10,13.52672,11.758095,13.151667,7.317063,11.477143,7.965397,10.751111,9.151825,9.367302,9.761429,13.055,16.047381


In [151]:
# resample data to 'W' week and use the functions
weekly = data.resample('W').agg(['min','max','mean','std'])

# slice it for the first 52 weeks and locations
weekly.loc[weekly.index[1:53], "RPT":"MAL"] .head(10)

Unnamed: 0_level_0,RPT,RPT,RPT,RPT,VAL,VAL,VAL,VAL,ROS,ROS,...,CLO,CLO,BEL,BEL,BEL,BEL,MAL,MAL,MAL,MAL
Unnamed: 0_level_1,min,max,mean,std,min,max,mean,std,min,max,...,mean,std,min,max,mean,std,min,max,mean,std
Yr_Mo_Dy,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1961-01-08,10.58,18.5,13.541429,2.631321,6.63,16.88,11.486667,3.949525,7.62,12.33,...,8.497143,1.704941,5.46,17.54,12.481429,4.349139,10.88,16.46,13.238571,1.773062
1961-01-15,9.04,19.75,12.468571,3.555392,3.54,12.08,8.967143,3.148945,7.08,19.5,...,7.571429,4.084293,5.25,20.71,11.125714,5.552215,5.17,16.92,11.024286,4.692355
1961-01-22,4.92,19.83,13.204286,5.337402,3.42,14.37,9.862857,3.837785,7.29,20.79,...,8.124286,4.783952,6.5,15.92,9.821429,3.626584,6.79,17.96,11.434286,4.237239
1961-01-29,13.62,25.04,19.88,4.619061,9.96,23.91,16.141429,5.170224,12.67,25.84,...,15.64,3.713368,14.04,27.71,20.93,5.210726,17.5,27.63,22.53,3.874721
1961-02-05,10.58,24.21,16.827143,5.251408,9.46,24.21,15.46,5.187395,9.04,19.7,...,9.46,2.839501,9.17,19.33,14.012857,4.210858,7.17,19.25,11.935714,4.336104
1961-02-12,16.0,24.54,19.684286,3.587677,11.54,21.42,16.417143,3.608373,13.67,21.34,...,14.44,1.746749,15.21,26.38,21.832857,4.063753,17.04,21.84,19.155714,1.828705
1961-02-19,6.04,22.5,15.13,5.064609,11.63,20.17,15.091429,3.575012,6.13,19.41,...,13.542857,2.531361,14.09,29.63,21.167143,5.910938,10.96,22.58,16.584286,4.685377
1961-02-26,7.79,25.8,15.221429,7.020716,7.08,21.5,13.625714,5.147348,6.08,22.42,...,12.73,4.920064,9.59,23.21,16.304286,5.091162,6.67,23.87,14.322857,6.182283
1961-03-05,10.96,13.33,12.101429,0.997721,8.83,17.0,12.951429,2.851955,8.17,13.67,...,12.37,1.593685,11.58,23.45,17.842857,4.332331,8.83,17.54,13.951667,3.021387
1961-03-12,4.88,14.79,9.376667,3.732263,8.08,16.96,11.578571,3.230167,7.54,16.38,...,10.458571,3.655113,10.21,22.71,16.701429,4.358759,5.54,22.54,14.42,5.76989
