In [6]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load data from CSV file
sales_data = pd.read_csv('sales_data.csv')

# Preprocess the data
X = sales_data['Year'].values.reshape(-1, 1)
y = sales_data['Sales'].values.reshape(-1, 1)

# Split data into training and testing sets
split_index = int(len(X) * 0.8)
X_train, y_train = X[:split_index], y[:split_index]
X_test, y_test = X[split_index:], y[split_index:]

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Evaluate the performance of the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")
print(f"R2 Score: {r2:.2f}")

# Predict sales for next year
next_year_sales = model.predict([[2023]])
print(f"Predicted sales for next year: {next_year_sales[0][0]:.2f}")

Mean Squared Error: 223345622.59
Mean Absolute Error: 14718.64
R2 Score: -0.07
Predicted sales for next year: 31449.11


## Here is a breakdown of the code:

- Import necessary libraries: The code starts by importing pandas and LinearRegression from scikit-learn, two essential libraries for working with data and creating a linear regression model.

- Load the data: The sales data is loaded from a CSV file using the pandas read_csv function and stored in a pandas dataframe called sales_data.

- Split the data: The data is split into two parts: the features (X) and the target variable (y). In this case, the target variable is the sales column of the sales_data dataframe, and the features are all the other columns except for sales. The drop() method is used to remove the sales column from the features data.

- Create a linear regression model and fit the data: A linear regression model is created using LinearRegression() from scikit-learn. The fit() method is then used to fit the model to the data. This calculates the coefficients for the linear equation that describes the relationship between the features and the target variable.

- Predict sales for next year: The model is used to predict sales for the next year by calling the predict() method on the model and passing in a list with the year 2023. The predicted sales value is stored in the variable next_year_sales.

- Print predicted sales: The predicted sales value is printed to the console using print(). The value is accessed from the next_year_sales variable using indexing.