# Our Goal: An App for Smoothing Land-Ocean Temperature Data

For this tutorial, we will be developing a small web app that will provide a graphical user interface (GUI) for exploring a dataset that contains global land-ocean temperature data. The data is stored in a CSV file.

We will build the web app using a Jupyter notebook where our workflow will:
- **load a data file** with global temperature data and
- **smooth the data with several algorithms**.
- In the GUI, you should be able to **select a particular range of years** and
- **create a new data file** containing the smoothed data for desired time range.

Before building the web app, let's explore the data and the smoothing algorithms we will use.

In [None]:
# Load the required libraries
import pandas as pd
import os

## Load data from file

In [None]:
# Load data into memory from file
DATA_DIR = 'data'
DATA_FILE = 'land-ocean-temp-index.csv'

df = pd.read_csv(os.path.join(DATA_DIR, DATA_FILE), escapechar='#')
df

## Plot original data

In [None]:
from matplotlib import pyplot as plt

plt.xlabel('Year')
plt.ylabel('Temperature')
plt.title('Global Temperature versus Time')
plt.plot(df['Year'], df['Temperature'], label='Raw Data')
plt.show()

## Add column for Savitzky-Golay filter

Let's implement the [Savitzky-Golay filter](https://en.wikipedia.org/wiki/Savitzkyâ€“Golay_filter) and add a new column to the data frame with the smoothed data.  This filter is a type of low-pass filter that can be used for smoothing noisy data. The "moving average" filter common in financial data analysis is a special case of the Savitzky-Golay filter with polynomial order 1.

In [None]:
from scipy.signal import savgol_filter

# Set the window size and polynomial order for the Savitzky-Golay filter
window_size = 7
poly_order = 5

moving_avg_col = f'Moving_Average_{window_size}'
SG_col = f'Savitzky_Golay{poly_order}_{window_size}'

# Apply Savitzky-Golay smoothing to the Temperature column
df[moving_avg_col] = savgol_filter(df['Temperature'], window_size, 1)
df[SG_col] = savgol_filter(df['Temperature'], window_size, poly_order)
df

In [None]:
plt.xlabel('Year')
plt.ylabel('Temperature')
plt.title('Global Temperature versus Time')
plt.plot(df['Year'], df['Temperature'], label='Raw Data')
plt.plot(df['Year'], df[SG_col], label='Savitzky-Golay Filtered')
plt.plot(df['Year'], df[moving_avg_col], label='Moving Average Filtered')
plt.legend()
plt.show()

## Select a range of data

In [None]:
# Create a new pandas DataFrame containing only the selected range of years
from_year = 1920
to_year = 1980
subset_df = df[(df['Year'] >= from_year) & (df['Year'] <= to_year)]
subset_df

## Plot selected data with smoothed curve

In [None]:
plt.xlabel('Year')
plt.ylabel('Temperature')
plt.title('Global Temperature versus Time')
plt.plot(subset_df['Year'], subset_df['Temperature'], label='Raw Data')
plt.plot(subset_df['Year'], subset_df[SG_col], label='Savitzky-Golay Filtered')
plt.plot(subset_df['Year'], subset_df[moving_avg_col], label='Moving Average Filtered')
plt.legend()
plt.show()

## Save selected data to file

In [None]:
# Save the subset DataFrame to a new CSV file
OUTPUT_FILE = 'output.csv'

subset_df.to_csv(os.path.join(DATA_DIR, OUTPUT_FILE), index=False)