### Problem Statement

You are a data scientist / AI engineer at a meteorological consulting firm. You have been provided with a dataset named **`"weather_data.csv"`**, which includes detailed records of various weather conditions. The dataset comprises the following columns:

- `hours_sunlight:` The total number of hours of sunlight received in a day.
- `humidity_level:` The humidity level as a percentage.
- `daily_temperature:` The temperature recorded at the end of the day in degrees Celsius.

Your task is to use this dataset to build a linear regression model to predict the daily temperature based on the hours of sunlight and humidity level. You will need to split the data into training and test sets, train the model, and evaluate its performance using appropriate metrics.

**Import Necessary Libraries**

In [4]:
# Import necessary libraries
# Import necessary libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings("ignore")

### Task 1: Data Preparation and Exploration
1. Import the data from the `"weather_data.csv"` file and store it in a variable df.
2. Display the number of rows and columns in the dataset.
3. Display the first few rows of the dataset to get an overview.
4. Check for any missing values in the dataset.

In [5]:
# Step 1: Import the data from the "song_popularity.csv" file and store it in a variable 'df'
df = pd.read_csv("weather_data.csv")

# Step 2: Display the number of rows and columns in the dataset
print("Number of rows and columns:", df.shape)

# Step 3: Display the first few rows of the dataset to get an overview
print("First few rows of the dataset:")
df.head()

Number of rows and columns: (49, 3)
First few rows of the dataset:


Unnamed: 0,hours_sunlight,humidity_level,daily_temperature
0,10.5,65,22.3
1,9.2,70,21.0
2,7.8,80,18.5
3,6.4,90,17.2
4,8.1,75,19.4


In [6]:
# Step 4: Check for any missing values in the dataset
df.isna().sum()

hours_sunlight       0
humidity_level       0
daily_temperature    0
dtype: int64

### Task 2: Train a Linear Regression Model

1. Select the features (hours_sunlight, humidity_level) and the target variable (daily_temperature) for modeling.
2. Split the data into training and test sets with a test size of 30%.
3. Create a Linear Regression model and fit it using the training data.
4. Print the model's coefficients and intercept.

In [14]:
# Step 1: Select the features and target variable for modeling
features = ['hours_sunlight', 'humidity_level']
X = df[features]
y = df['daily_temperature']


# Step 2: Split the data into training and test sets with a test size of 30%
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=22)


34    22.4
40    22.6
16    16.8
48    19.1
13    18.7
23    21.2
11    22.5
9     17.0
10    20.2
39    20.5
21    20.1
7     21.7
28    17.4
35    17.3
19    18.3
6     16.0
17    19.6
37    18.6
2     18.5
42    18.9
27    20.9
15    24.3
5     24.0
41    16.6
47    21.3
24    18.8
14    17.5
18    20.8
29    21.0
45    17.0
31    16.7
8     19.0
20    22.7
43    21.1
38    16.9
36    21.5
0     22.3
44    23.9
4     19.4
Name: daily_temperature, dtype: float64

In [34]:
# Step 3: Create a Linear Regression model and fit it using the training data
model=LinearRegression();
model.fit(x_train,y_train)
# Step 4: Print the model's coefficients and intercept
model.coef_,model.intercept_

(array([ 1.08878094, -0.05630413]), np.float64(15.039522468372418))

### Task 3: Model Evaluation

1. Make predictions on the test set using the trained model.
2. Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics.
3. Print the MSE and R2 values.
4. Display the first few actual vs. predicted values for the daily temperature.

In [30]:
# Step 1: Make predictions on the test set using the trained modely_pred = model.predict(X_test)
y_predict=model.predict(x_test)
# Step 2: Evaluate the model using Mean Squared Error (MSE) and R-squared (R2) metrics



In [32]:
# Step 3: Print the MSE and R2 values
m=mean_squared_error(y_test,y_predict)
r=r2_score(y_test,y_predict)

print(m,r)

# Step 4: Display the first few actual vs. predicted values for the daily temperature


p=model.predict(x_test);

print(p,y_test)

0.11189020412832011 0.9867359519028498
[20.01877706 14.54130093 16.07678472 19.74471674 18.20923295 17.21813963
 16.94034916 23.57783099 23.36380496 21.11501832] 32    19.7
12    15.0
25    16.2
46    19.3
33    18.4
22    17.1
3     17.2
30    24.1
26    23.8
1     21.0
Name: daily_temperature, dtype: float64
