### Problem Statement

You are a data scientist / AI engineer at a meteorological consulting firm. You have been provided with a dataset named **`"weather_data.csv"`**, which includes detailed records of various weather conditions. The dataset comprises the following columns:

- `hours_sunlight:` The total number of hours of sunlight received in a day.
- `humidity_level:` The humidity level as a percentage.
- `daily_temperature:` The temperature recorded at the end of the day in degrees Celsius.

Your task is to use this dataset to build a linear regression model to predict the daily temperature based on the hours of sunlight and humidity level. You will need to split the data into training and test sets, train the model, and evaluate its performance using appropriate metrics.

**Import Necessary Libraries**

In [12]:
# Import necessary libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

### Task 1: Data Preparation and Exploration
1. Import the data from the `"weather_data.csv"` file and store it in a variable df.
2. Display the number of rows and columns in the dataset.
3. Display the first few rows of the dataset to get an overview.
4. Check for any missing values in the dataset.

In [13]:
df = pd.read_csv('weather_data.csv')

In [14]:
df.shape

(49, 3)

In [15]:
df.head()

Unnamed: 0,hours_sunlight,humidity_level,daily_temperature
0,10.5,65,22.3
1,9.2,70,21.0
2,7.8,80,18.5
3,6.4,90,17.2
4,8.1,75,19.4


In [16]:
df.isnull().sum()

hours_sunlight       0
humidity_level       0
daily_temperature    0
dtype: int64

### Task 2: Train a Linear Regression Model

1. Select the features (hours_sunlight, humidity_level) and the target variable (daily_temperature) for modeling.
2. Split the data into training and test sets with a test size of 30%.
3. Create a Linear Regression model and fit it using the training data.
4. Print the model's coefficients and intercept.

In [17]:
# Step 1: Select the features and target variable for modeling
features = ['hours_sunlight', 'humidity_level']
X = df[features]
y = df['daily_temperature']

# Step 2: Split the data into training and test sets with a test size of 30%

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

In [18]:
# Step 3: Create a Linear Regression model and fit it using the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Step 4: Print the model's coefficients and intercept


In [19]:
model.coef_, model.intercept_

(array([ 1.25083729, -0.02763612]), np.float64(11.511007935418261))

### Task 3: Model Evaluation

1. Make predictions on the test set using the trained model.
2. Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics.
3. Print the MSE and R2 values.
4. Display the first few actual vs. predicted values for the daily temperature.

In [20]:
# Step 1: Make predictions on the test set using the trained modely_pred = model.predict(X_test)
y_pred = model.predict(X_test)

# Step 2: Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

In [21]:
# Step 3: Print the MSE and R2 values
print(mse, r2)

# Step 4: Display the first few actual vs. predicted values for the daily temperature


0.11488330185581289 0.9833806480142233


In [22]:
print(y_test[:5])

13    18.7
45    17.0
47    21.3
44    23.9
17    19.6
Name: daily_temperature, dtype: float64


In [23]:
y_pred[:5]

array([18.73667036, 16.94476518, 21.33435016, 23.43169505, 19.84788467])

In [24]:
model.score(X_test, y_test)

0.9833806480142233