# "Taps aff"
"Taps aff" is a Scottish expression that literally means "tops off". It refers to the act of removing one's shirt, typically by men, in warm weather. This phrase is commonly used in Scotland, particularly in Glasgow, to describe good weather or good times being had. The expression is often used humorously, as it's a phenomenon rarely seen in Glasgow due to its typically cool climate. When someone declares "taps aff," it usually indicates that the weather is unusually warm or that a celebratory atmosphere is present.


Objective
*   Develop a deep learning system that predicts whether a day in Glasgow is a "taps aff" day based on weather data.
*   Glasgow weather dataset: https://drive.google.com/file/d/16O9Zoo8npYXQqniAB7K2K40Tlkr3ozSQ/view?usp=sharing Credit: https://open-meteo.com/
*   Taps aff dataset: https://drive.google.com/file/d/1XVNe0XmS-_-umhNwQUVMi3xE04nGKx1R/view?usp=sharing


Top Tips

1.   Implement a regression deep learning model to estimate missing Glasgow weather parameters. Train the regression model on complete data points, then use the trained model to fill in missing values.
2.   Feature engineer the date into day and month to help predict these missing values.
3.   Merge the two datasets by changing dates in the taps aff dataset into the same format as the weather dataset.
4.   Implement a binary classification (use softmax and sparse_categorical_crossentropy) deep learning model to predict the taps aff days from 2023 to 2025.
5.   Use test sets on both the regression and classification to evaluate their performance.


## Load and explore Glasgow weather data


In [None]:
# Read in glasgow weather and explore
import pandas as pd
df_weather = pd.read_csv('/content/sample_data/glasgow_weather.csv')

In [None]:
df_weather.info()

In [None]:
df_weather.head()

In [None]:
df_weather.tail()

In [None]:
# Look at missing daylight_duration
import matplotlib.pyplot as plt
plt.plot(df_weather['daylight_duration'])

In [None]:
# Create two new features to help make a model to replace missing daylight_duration
# Feature engineering: extract *day* and *month* from the date string
# Why: daylight duration is strongly seasonal, so month/day help the regression model learn that pattern
df_weather['day'] = df_weather['date'].apply(lambda x: int(x[8:]))
df_weather['month'] = df_weather['date'].apply(lambda x: int(x[5:7]))
df_weather.head()

In [None]:
# Split the dataset into:
# 1) rows WITH missing daylight_duration (used later for prediction)
# 2) rows with KNOWN daylight_duration (used to train the regression model)

# Create to new dataframes - one with missing 730 days of daylight and one without after initial 730 days
df_weather_with_nan = df_weather.iloc[:730]
df_weather_without_nan = df_weather.iloc[730:]

# Alternatively and more elegantly
df_weather_with_nan = df_weather[df_weather['daylight_duration'].isna()]
df_weather_without_nan = df_weather[df_weather['daylight_duration'].notna()]

In [None]:
# Check that worked
df_weather_with_nan.tail()

In [None]:
# Check that worked
df_weather_without_nan.head()

In [None]:
# Create training arrays for regression
# X = input features (all columns except the target and raw date string)
# y = target value to predict (daylight_duration)
x = df_weather_without_nan.drop(['daylight_duration','date'], axis=1).to_numpy()
y = df_weather_without_nan[['daylight_duration']].to_numpy()
print(x.shape)
print(y.shape)

In [None]:
# Training and testing sets
boundary = int(x.shape[0] * 0.8)
x_train = x[:boundary]
y_train = y[:boundary]
x_test = x[boundary:]
y_test = y[boundary:]

In [None]:
# Standardise inputs using TRAINING statistics only (prevents data leakage)
# Standardisation helps neural networks train more smoothly when features are on similar scales
means = x_train.mean(axis=0)
stds = x_train.std(axis=0)
x_train = (x_train - means) / stds

# And test set with values from training set
x_test = (x_test - means) / stds

## Regression model to impute missing `daylight_duration`


In [None]:
from tensorflow import keras

# Regression model (deep network with residual blocks)
# Goal: learn a mapping from weather/date features -> daylight_duration
# Residual connections help deeper networks train by providing a "shortcut" path for gradients

# Input block
inputs = keras.layers.Input(shape=(6,))
z = inputs

# Projection layer
z = keras.layers.Dense(128)(z)

# Two blocks
for i in range(2):
  # Residual connection (ensemble like)
  res = z

  # Layer norm
  z = keras.layers.LayerNormalization()(z)

  # Inverted bottleneck (dimension reduction)
  z = keras.layers.Dense(4*128)(z)
  z = keras.activations.gelu(z) # probabilistic interpretation of dropout (ie ensemble like)
  z = keras.layers.Dense(128)(z)

  # Residual connection
  z = keras.layers.Add()([res,z])

# Layer norm
z = keras.layers.LayerNormalization()(z)

# Regression output block
z = keras.layers.Dense(1)(z)
outputs = keras.activations.gelu(z)

In [None]:
# Prepare model
model = keras.Model(inputs=[inputs], outputs=[outputs])
model.summary()
model.compile(
  loss=keras.losses.MeanSquaredError(),
  optimizer=keras.optimizers.Adam(learning_rate=keras.optimizers.schedules.CosineDecay(initial_learning_rate=0.01, decay_steps=30*165)) # cosine annealing to help converge
)

In [None]:
# Train model
model.fit(x_train, y_train, epochs=30)

In [None]:
# Performance on test set
model.evaluate(x_test, y_test)

In [None]:
# Prepare for inference and replacing missing values
x = df_weather_with_nan.drop(['daylight_duration','date'], axis=1).to_numpy()
print(x.shape)

In [None]:
# Standardise the input with above values
x = (x - means) / stds

In [None]:
# Infer
y_pred = model.predict(x)
print(y_pred)

In [None]:
# Column 2 is where missing values are
df_weather.head()

In [None]:
# Replace the column 2 values (first 730) with the model predictions
# df_weather.iloc[:730, 2] = y_pred

# Alternatively and more elegantly
df_weather.loc[df_weather['daylight_duration'].isna(), 'daylight_duration'] = y_pred


df_weather.head()

## Read and explore Glasgow weather data


In [None]:
df_taps_aff = pd.read_csv('/content/sample_data/taps_aff.csv')

In [None]:
df_taps_aff.info()

In [None]:
df_taps_aff.head()

In [None]:
df_taps_aff.tail()

In [None]:
# Change date format into same as weather dataset
df_taps_aff['date'] = df_taps_aff['date'].apply(lambda x: x[6:] + "-" + x[3:5] + "-" + x[:2])
df_taps_aff.head()

In [None]:
df_taps_aff.tail()

## Join two dataframes together


In [None]:
df_joined = df_weather.merge(df_taps_aff, on='date', how='left')


In [None]:
# Check that worked
df_joined.head()

In [None]:
# Check that worked
df_joined.tail()

In [None]:
# Create two new dataframes - one with missing 730 days of taps aff and one without before the final 730 days
df_joined_with_nan = df_joined.iloc[-730:]
df_joined_without_nan = df_joined.iloc[:-730]

# Alternatively and more elegantly
df_joined_with_nan = df_joined[df_joined['tap_aff'].isna()]
df_joined_without_nan = df_joined[df_joined['tap_aff'].notna()]

In [None]:
# Check that worked
df_joined_without_nan.tail()

In [None]:
# Check that worked
df_joined_with_nan.head()

In [None]:
# Build classification arrays
# X = input features (weather + engineered date features)
# y = binary label (tap_aff). Convert boolean to int (0/1) for training
x = df_joined_without_nan.drop(['tap_aff','date'], axis=1).to_numpy()
y = df_joined_without_nan[['tap_aff']].to_numpy(dtype='int8') # convert from bool to int
print(x.shape)
print(y.shape)

In [None]:
# Training and testing sets
boundary = int(x.shape[0] * 0.8)
x_train = x[:boundary]
y_train = y[:boundary]
x_test = x[boundary:]
y_test = y[boundary:]

In [None]:
# Standardise training set
means = x_train.mean(axis=0)
stds = x_train.std(axis=0)
x_train = (x_train - means) / stds

# And test set with values from training set
x_test = (x_test - means) / stds

## Binary classification model: predict “taps aff” days


In [None]:
# Binary classification model (deep network with residual blocks)
# Goal: predict tap_aff (0/1) from weather features

# Input block
inputs = keras.layers.Input(shape=(7,))
z = inputs

# Projection layer
z = keras.layers.Dense(128)(z)

# Two blocks
for i in range(2):
  # Residual connection (ensemble like)
  res = z

  # Layer norm
  z = keras.layers.LayerNormalization()(z)

  # Inverted bottleneck (dimension reduction)
  z = keras.layers.Dense(4*128)(z)
  z = keras.activations.gelu(z) # probabilistic interpretation of dropout (ie ensemble like)
  z = keras.layers.Dense(128)(z)

  # Residual connection
  z = keras.layers.Add()([res,z])

# Layer norm
z = keras.layers.LayerNormalization()(z)

# Sigmoid output block
z = keras.layers.Dense(1)(z)
outputs = keras.activations.sigmoid(z)

In [None]:
# Prepare model
model = keras.Model(inputs=[inputs], outputs=[outputs])
model.summary()
model.compile(
  loss=keras.losses.BinaryCrossentropy(),
  optimizer=keras.optimizers.Adam(learning_rate=keras.optimizers.schedules.CosineDecay(initial_learning_rate=0.01, decay_steps=30*165)) # cosine annealing to help converge
)

In [None]:
# Train model
model.fit(x_train, y_train, epochs=30)

In [None]:
# Performance on test set
model.evaluate(x_test, y_test)

In [None]:
# Prepare for inference and replacing missing values
x = df_joined_with_nan.drop(['tap_aff','date'], axis=1).to_numpy()
print(x.shape)

In [None]:
# Standardise the input with above values
x = (x - means) / stds

In [None]:
# Infer
y_pred = model.predict(x)

import numpy as np
print(np.round(y_pred)) # np.round to convert simoid predictions to 0/1 false/true values

## Conclusion

In this project, I built a deep learning system to predict whether a day in Glasgow is a “taps aff” day based on weather data.
I used a regression model to estimate missing weather values and then trained a binary classification model to detect warm days.
The model learned meaningful seasonal patterns, such as longer daylight in summer and shorter daylight in winter.
This project strengthened my understanding of data preprocessing, feature engineering, missing value handling, and machine learning workflows.
Future improvements could include adding more weather features, tuning hyperparameters, and testing alternative model architectures.
