The purpose of this colab file is to predict soil temperature in Avon, CT using machine learning and past data. I will use 4 different independent variables in my model:

1. Soil Moisture

2. Average Air Temperature

3. Average Dew Point

4. Average Air Humidity

Data is taken from:
https://www.greencastonline.com/tools/soil-temperature
https://www.wunderground.com/history/daily/us/ct/hartford

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
soil_df = pd.read_csv(r'/content/drive/MyDrive/SoilTempData1.csv')

# 1. Reviewing the dataset

In [4]:
soil_df.head()

Unnamed: 0,Date,Soil Temp (F),Soil Moisture (%),Max Air Temperature,Average Air Temperature (F),Min Air Temperature,Max Dew Point,Average Dew Point (F),Min Dew Point,Max Air Humidity,Average Air Humidity (%),Min Air Humidity
0,01/01/2024,35,27.92,,35.25,,28,22.5,14,78,60.8,40
1,01/02/2024,32,27.29,,29.63,,22,20.0,16,88,68.8,45
2,01/03/2024,31,26.83,,32.88,,29,25.0,20,88,73.9,60
3,01/04/2024,32,26.5,,35.18,,32,23.8,8,92,65.1,46
4,01/05/2024,30,26.08,,27.83,,18,13.6,6,78,56.0,44


In [5]:
soil_df.describe()

Unnamed: 0,Soil Temp (F),Soil Moisture (%),Max Air Temperature,Average Air Temperature (F),Min Air Temperature,Max Dew Point,Average Dew Point (F),Min Dew Point,Max Air Humidity,Average Air Humidity (%),Min Air Humidity
count,522.0,522.0,462.0,522.0,462.0,522.0,522.0,522.0,522.0,522.0,522.0
mean,50.752874,28.037663,63.597403,51.322548,44.52381,44.02682,38.416475,32.331418,83.482759,64.788697,43.733716
std,16.123749,4.564126,18.3826,17.025247,16.238678,17.517177,18.144819,19.042717,11.628638,14.603297,16.28878
min,19.0,19.58,21.0,9.6,-6.0,2.0,-4.3,-11.0,48.0,30.0,0.0
25%,36.0,25.4875,49.0,37.75,32.0,31.0,24.45,16.0,77.0,54.025,32.0
50%,50.0,27.475,65.5,50.85,44.0,44.0,37.2,31.0,86.0,63.45,41.0
75%,65.0,29.9125,79.0,65.375,57.0,58.0,53.1,49.0,92.0,76.9,52.0
max,79.0,44.09,97.0,85.0,78.0,77.0,74.3,73.0,100.0,97.3,93.0


# 2. Pre-processing the data

In [6]:
#Keeping only the columns we need, such as the soil mositure, temperature averages, etc.
X = soil_df[["Soil Moisture (%)", "Average Air Temperature (F)", "Average Dew Point (F)", "Average Air Humidity (%)"]]
Y = soil_df["Soil Temp (F)"]

Feature Scaling - sometimes models "value" features with bigger values and give them a bigger impact int he model. Feature scaling proportionally scales all of the features so they all have more similar values.

In [7]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

In [8]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('regressor', LinearRegression())
])

# 3. Creating a Linear Regression Model

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

In [10]:
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

# 4. Reviewing the accuracy of our model

In [11]:
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))

Mean Squared Error: 12.328549074346064
R^2 Score: 0.9398921448292865


# 5. Trying the model with your own data!

In [12]:
model = pipeline.named_steps['regressor']

In [13]:
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

Intercept: 51.01918465227818
Coefficients: [-0.35116805 16.38544397 -0.33853432  0.04892477]


In [14]:
scaler = pipeline.named_steps['scaler']

In [15]:
# Get original scaler parameters
means = scaler.mean_
scales = scaler.scale_

# Adjust coefficients and intercept to account for feature scaling
true_coefs = model.coef_ / scales
true_intercept = model.intercept_ - np.sum((means / scales) * model.coef_)
print("True Coefficients:", true_coefs)
print("True Intercept:", true_intercept)

True Coefficients: [-0.0744079   0.93154724 -0.01804851  0.00333347]
True Intercept: 5.5159887998798


Input the data below:

In [16]:
# This data is currently for June 30th, 2025. But you can replace it with your own float values for a given day to see how accurate the soil temperature prediction is!
Soil_Moisture_percent = 25.04
Average_Air_Temp_F = 87.38
Average_Dew_Point_F = 70.46
Average_Air_Humidity_percent = 61.8

In [18]:
Predicted_Soil_temp = (
    Soil_Moisture_percent * true_coefs[0] +
    Average_Air_Temp_F * true_coefs[1] +
    Average_Dew_Point_F * true_coefs[2] +
    Average_Air_Humidity_percent * true_coefs[3] +
    true_intercept
)
print(Predicted_Soil_temp)
print("Actual Soil Temp is 83 degrees Farenheit")

83.98572343621689
Actual Soil Temp is 83 degrees Farenheit
