<img src="images/Project_logos.png" width="500" height="300" align="center">

## Operations

In [None]:
import numpy as np
import pandas as pd

# Set up an example DataFrame

index_array = np.array(['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6'])
column_list = ('A', 'B', 'C', 'D')
df = pd.DataFrame(np.random.randn(6, 4), index=index_array, columns=column_list)

df.loc['Row 1', 'D'] = np.nan
df.loc['Row 4', 'B'] = np.nan

### Statistics
You can calculate statistics over rows or columns. NaNs are ignored in the statistics calculations.

In [None]:
# Calculate the mean for each column
df.mean()

In [None]:
# Calculate the mean for each row
df.mean(axis=1)

The missing values are not included in the calculations.

### Exercise
Calculate the sum of each row.

In [None]:
# space to complete the exercise

### Applying functions
You can apply your own functions to values or columns.

In [None]:
# Apply function to all values (x) in the DataFrame

df.transform(lambda x: x + 3)

In [None]:
# Apply function to each column (x) in the DataFrame

df.agg(lambda x: np.std(x) / 2.0)

### Exercise
Multiply all values in `df` by 10.

In [None]:
# space to complete the exercise

### Rolling window operations
It is possible to perform operations on rolling windows of data, e.g. calculating moving averages. More detailed information can be found at https://pandas.pydata.org/docs/user_guide/window.html#rolling-window.

In [None]:
# Setting up example DataFrame

df2 = pd.DataFrame(
    {"Temperature at location A": np.array([30.0, 29.6, 28.0, 31.2, 32.1, 27.4, 27.3, 27.8, 29.5, 29.8]),
     "Temperature at location B": np.array([23.1, 22.4, 19.6, 20.5, 23.3, 23.2, 19.9, 20.2, 21.1, 20.8]),
     "Rainfall at location A": np.array([5.0, 6.0, 1.0, 0.0, 0.6, 7.0, 4.3, 1.3, 11.5, 1.9]),
     "Rainfall at location B": np.array([0.0, 0.0, 8.0, 11.2, 3.5, 0.0, 7.3, 2.8, 0.0, 0.2])
    }
)

The keyword argument `window` specifies the size of the window. The following example calculates the rolling sum for a window size of 2. The default is for the resulting value to be assigned to the index at the end of the window.

In [None]:
df2['Rainfall at location A'].rolling(window=2).sum()

The resulting value can instead be assigned to the index at the centre of the window by setting the `center` keyword to True. The following example calculates the rolling mean for a window size of 3.

In [None]:
df2['Temperature at location B'].rolling(window=3, center=True).mean()

### Correlation
You can create a correlation matrix of correlations between a `DataFrame`'s columns using the `corr()` method (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html). There are options for the correlation method used.

In [None]:
df2.corr()

You can also calculate the correlation between `Series`.

In [None]:
series1 = df2['Temperature at location B']
series2 = df2['Rainfall at location B']

series1.corr(series2)


### Joining DataFrames and Series

You can join DataFrames that will extend the number of rows using ``concat``.

In [None]:
df1 = pd.DataFrame(np.random.randn(3, 5))
df2 = pd.DataFrame(np.random.randn(4, 5))
df3 = pd.DataFrame(np.random.randn(2, 5))

df4 = pd.concat([df1, df2, df3])

df4

You can join DataFrames that will extend the number of columns using ``concat``, setting the axis to 1.

In [None]:
df1 = pd.DataFrame(np.random.randn(5, 2))
df2 = pd.DataFrame(np.random.randn(5, 3))
df3 = pd.DataFrame(np.random.randn(5, 4))

df4 = pd.concat([df1, df2, df3], axis=1)

df4

You will notice that the resulting DataFrame keeps the index labels and column labels from the original DataFrames. This can be avoided by using ``ignore_index=True``.

In [None]:
df5 = pd.concat([df1, df2, df3], axis=1, ignore_index=True)

df5

In [None]:
from datetime import time as dtt

temp_df = pd.DataFrame(
    {"Station ID": ['1', '1', '1', '2', '2', '3', '4', '4', '4', '4'],
     "Temperature": [10, 14, 16, 15, 21, 17, 9, 11, 15, 7],
     "Time": [dtt(3), dtt(6), dtt(9), dtt(6), dtt(12), dtt(9), dtt(3), dtt(6), dtt(9), dtt(12)]
    }
)

precip_df = pd.DataFrame(
    {"Station ID": ['1', '1', '2', '2', '3', '4', '4'],
     "Precipitation": [0, 25, 0, 0, 36, 50, 0],
     "Time": [dtt(3), dtt(6), dtt(6), dtt(9), dtt(6), dtt(3), dtt(9)]
    }
)

We can combine DataFrames using `merge`. 

The default is an inner join, which finds matches between the two DataFrames being merged. It will discard rows that don’t match.

In [None]:
temp_df.merge(precip_df)

We can choose to merge using an outer join, which aligns rows that have matches, but keeps remaining rows giving them a NaN value in the column where data is not there.

In [None]:
temp_df.merge(precip_df, how="outer", on=["Station ID", "Time"])