### Problem Statement

You are a data scientist / AI engineer at a meteorological consulting firm. You have been provided with a dataset named **`"weather_data.csv"`**, which includes detailed records of various weather conditions. The dataset comprises the following columns:

- `hours_sunlight:` The total number of hours of sunlight received in a day.
- `humidity_level:` The humidity level as a percentage.
- `daily_temperature:` The temperature recorded at the end of the day in degrees Celsius.

Your task is to use this dataset to build a linear regression model to predict the daily temperature based on the hours of sunlight and humidity level. You will need to split the data into training and test sets, train the model, and evaluate its performance using appropriate metrics.

**Import Necessary Libraries**

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

### Task 1: Data Preparation and Exploration
1. Import the data from the `"weather_data.csv"` file and store it in a variable df.
2. Display the number of rows and columns in the dataset.
3. Display the first few rows of the dataset to get an overview.
4. Check for any missing values in the dataset.

In [4]:

# Step 1: Import the data from the "song_popularity.csv" file and store it in a variable 'df'
df=pd.read_csv("weather_data.csv")
# Step 2: Display the number of rows and columns in the dataset
print(df.shape)

# Step 3: Display the first few rows of the dataset to get an overview
print(df.sample(3))

(49, 3)
    hours_sunlight  humidity_level  daily_temperature
3              6.4              90               17.2
30            11.1              63               24.1
40            10.3              64               22.6


In [5]:
# Step 4: Check for any missing values in the dataset
df.isna().sum()

hours_sunlight       0
humidity_level       0
daily_temperature    0
dtype: int64

### Task 2: Train a Linear Regression Model

1. Select the features (hours_sunlight, humidity_level) and the target variable (daily_temperature) for modeling.
2. Split the data into training and test sets with a test size of 30%.
3. Create a Linear Regression model and fit it using the training data.
4. Print the model's coefficients and intercept.

In [7]:
# Step 1: Select the features and target variable for modeling
from sklearn.model_selection import train_test_split
X=df.drop("daily_temperature",axis=1)
y=df.daily_temperature
# Step 2: Split the data into training and test sets with a test size of 30%
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.2)

In [8]:
# Step 3: Create a Linear Regression model and fit it using the training data
model=LinearRegression()
model.fit(X_train,y_train)
# Step 4: Print the model's coefficients and intercept
model.coef_ , model.intercept_

(array([ 1.10142876, -0.05136419]), 14.557715605232207)

### Task 3: Model Evaluation

1. Make predictions on the test set using the trained model.
2. Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics.
3. Print the MSE and R2 values.
4. Display the first few actual vs. predicted values for the daily temperature.

In [10]:
# Step 1: Make predictions on the test set using the trained modely_pred = model.predict(X_test)
from sklearn.metrics import r2_score,mean_squared_error
y_pred=model.predict(X_test)

# Step 2: Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics
print(r2_score(y_test,y_pred))
print(mean_squared_error(y_test,y_pred))


0.9889189281241384
0.07748328698477461


In [11]:
# Step 3: Print the MSE and R2 values

comaprission=pd.DataFrame({
"Actual_data" : y_test[:5],
    "Predicted_data": y_pred[:5]
}
)
# Step 4: Display the first few actual vs. predicted values for the daily temperature


In [12]:
comaprission

Unnamed: 0,Actual_data,Predicted_data
13,18.7,18.812024
45,17.0,17.071982
47,21.3,21.315653
44,23.9,23.378709
17,19.6,19.898624
