# Linear Regression for Ice Cream Sales

This notebook demonstrates a simple linear regression model to predict ice cream sales based on temperature.

## 1. Import Libraries

First, we import the necessary Python libraries for data manipulation, model training, and evaluation.

In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## 2. Load Dataset

We load the dataset from a CSV file named `temperature_data.csv`. This file is expected to be in the same directory as the notebook. It contains two columns: 'Temperature_Celsius' and 'Ice_Cream_Sales'.

In [None]:
# Load the dataset
# The CSV file 'temperature_data.csv' is expected to be in the same directory as this script.
try:
    data = pd.read_csv('temperature_data.csv')
    print("Dataset loaded successfully.")
    # Display the first few rows of the dataframe to verify
    # data.head()
except FileNotFoundError:
    # If the file is not found, print an error and raise an exception.
    # This will stop notebook execution if the file is missing.
    print("Error: 'temperature_data.csv' not found. Make sure the file is in the current directory.")
    raise FileNotFoundError("Error: 'temperature_data.csv' not found. Please ensure the file exists in the same directory as the notebook.")

## 3. Define Features and Target

We define our feature (X) and target (y) variables. 
- 'Temperature_Celsius' is the independent variable (feature) we'll use to predict ice cream sales.
- 'Ice_Cream_Sales' is the dependent variable (target) we want to predict.

In [None]:
# Define features (X) and target (y)
# 'Temperature_Celsius' is the feature we'll use to predict 'Ice_Cream_Sales'.
X = data[['Temperature_Celsius']]
y = data['Ice_Cream_Sales']

# print("Features (X):")
# print(X.head())
# print("\nTarget (y):")
# print(y.head())

## 4. Split Data into Training and Testing Sets

To evaluate our model's performance on unseen data, we split the dataset into two parts:
- A training set (typically 80% of the data) used to train the model.
- A testing set (typically 20% of the data) used to evaluate the trained model.

`random_state` is set to ensure that the split is the same every time we run the code, making our results reproducible.

In [None]:
# Split the data into training and testing sets
# We use 80% of the data for training and 20% for testing.
# random_state is set for reproducibility of the split.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# print(f"Training set size: {X_train.shape[0]} samples")
# print(f"Testing set size: {X_test.shape[0]} samples")

## 5. Initialize and Train the Model

We initialize a `LinearRegression` model. This model will try to find a linear relationship between temperature and ice cream sales.
Then, we train the model using our training data (`X_train`, `y_train`). The model learns the best-fitting line that describes the relationship between the features and the target variable.

In [None]:
# Initialize the Linear Regression model
model = LinearRegression()

# Train the model using the training data
# The model learns the relationship between temperature and ice cream sales from the training data.
model.fit(X_train, y_train)

print("Linear Regression model trained successfully.")
# To see the learned parameters:
# print(f"Coefficient (slope): {model.coef_[0]}")
# print(f"Intercept: {model.intercept_}")

## 6. Make Predictions

Once the model is trained, we can use it to make predictions on new, unseen data. Here, we use the test set (`X_test`) to see how well our model generalizes.

In [None]:
# Make predictions on the test data
# The model uses the learned relationship to predict ice cream sales for the test set temperatures.
y_pred = model.predict(X_test)

# print("Predictions for the test set:")
# print(y_pred)

## 7. Evaluate the Model

We evaluate the model's performance using two common metrics for regression tasks:
- **Mean Squared Error (MSE)**: Measures the average of the squares of the errors (the difference between actual and predicted values). A lower MSE indicates a better fit.
- **R-squared (R²)**: Represents the proportion of the variance in the dependent variable (ice cream sales) that is predictable from the independent variable (temperature). It ranges from 0 to 1, where 1 indicates a perfect fit and 0 indicates that the model does not explain any of the variability.

In [None]:
# Evaluate the model
# Mean Squared Error (MSE) measures the average squared difference between actual and predicted values.
# Lower MSE indicates a better fit.
mse = mean_squared_error(y_test, y_pred)

# R-squared (R²) is the proportion of the variance in the dependent variable that is predictable from the independent variable.
# It ranges from 0 to 1, with 1 indicating a perfect fit.
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R²): {r2:.2f}")

## 8. Model Demonstration

This section shows how to use the trained model to predict ice cream sales for a new, specific temperature value (e.g., 25 degrees Celsius).

In [None]:
# Demonstrate the model with a sample temperature
# This section shows how to use the trained model to predict sales for a new temperature value.
sample_temperature = [[25]]  # Temperature in Celsius, needs to be a 2D array for the model
predicted_sales_for_sample = model.predict(sample_temperature)

print(f"\n--- Model Demonstration ---")
print(f"Sample Temperature (Celsius): {sample_temperature[0][0]}")
print(f"Predicted Ice Cream Sales: {predicted_sales_for_sample[0]:.2f}")