### Problem Statement

You are a data scientist / AI engineer at a meteorological consulting firm. You have been provided with a dataset named **`"weather_data.csv"`**, which includes detailed records of various weather conditions. The dataset comprises the following columns:

- `hours_sunlight:` The total number of hours of sunlight received in a day.
- `humidity_level:` The humidity level as a percentage.
- `daily_temperature:` The temperature recorded at the end of the day in degrees Celsius.

Your task is to use this dataset to build a linear regression model to predict the daily temperature based on the hours of sunlight and humidity level. You will need to split the data into training and test sets, train the model, and evaluate its performance using appropriate metrics.

**Import Necessary Libraries**

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd

### Task 1: Data Preparation and Exploration
1. Import the data from the `"weather_data.csv"` file and store it in a variable df.
2. Display the number of rows and columns in the dataset.
3. Display the first few rows of the dataset to get an overview.
4. Check for any missing values in the dataset.

In [3]:

# Step 1: Import the data from the "song_popularity.csv" file and store it in a variable 'df'
df = pd.read_csv("weather_data.csv")

# Step 2: Display the number of rows and columns in the dataset
print(df.info())

# Step 3: Display the first few rows of the dataset to get an overview
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   hours_sunlight     49 non-null     float64
 1   humidity_level     49 non-null     int64  
 2   daily_temperature  49 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 1.3 KB
None


Unnamed: 0,hours_sunlight,humidity_level,daily_temperature
0,10.5,65,22.3
1,9.2,70,21.0
2,7.8,80,18.5
3,6.4,90,17.2
4,8.1,75,19.4


In [5]:
# Step 4: Check for any missing values in the dataset
df.isna().sum()

hours_sunlight       0
humidity_level       0
daily_temperature    0
dtype: int64

### Task 2: Train a Linear Regression Model

1. Select the features (hours_sunlight, humidity_level) and the target variable (daily_temperature) for modeling.
2. Split the data into training and test sets with a test size of 30%.
3. Create a Linear Regression model and fit it using the training data.
4. Print the model's coefficients and intercept.

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
# Step 1: Select the features and target variable for modeling
X = df.drop("daily_temperature",axis = 1)
y = df['daily_temperature']

# Step 2: Split the data into training and test sets with a test size of 30%
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [7]:
X.shape,X_train.shape,X_test.shape

((49, 2), (39, 2), (10, 2))

In [8]:
y.shape,y_train.shape,y_test.shape

((49,), (39,), (10,))

In [9]:
# Step 3: Create a Linear Regression model and fit it using the training data
model = LinearRegression()
model.fit(X_train,y_train)

# Step 4: Print the model's coefficients and intercept
model.coef_,model.intercept_

(array([ 1.10142876, -0.05136419]), np.float64(14.5577156052322))

### Task 3: Model Evaluation

1. Make predictions on the test set using the trained model.
2. Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics.
3. Print the MSE and R2 values.
4. Display the first few actual vs. predicted values for the daily temperature.

In [10]:
# Step 1: Make predictions on the test set using the trained modely_pred = model.predict(X_test)
y_preds = model.predict(X_test)

# Step 2: Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics
mse = mean_squared_error(y_test,y_preds)
print(mse)

mae = mean_absolute_error(y_test,y_preds)
print(mse)

r2 = r2_score(y_test,y_preds)
print(r2)

0.07748328698477494
0.07748328698477494
0.9889189281241384


In [11]:
# Step 3: Print the MSE and R2 values
model_score = model.score(X_test,y_test)
print(model_score)

# Step 4: Display the first few actual vs. predicted values for the daily temperature
y_test,y_preds

0.9889189281241384


(13    18.7
 45    17.0
 47    21.3
 44    23.9
 17    19.6
 27    20.9
 26    23.8
 25    16.2
 31    16.7
 19    18.3
 Name: daily_temperature, dtype: float64,
 array([18.81202447, 17.07198194, 21.31565263, 23.37870928, 19.89862423,
        20.76493825, 23.3199306 , 16.08811056, 16.42595368, 18.05585333]))