## Apple Sales Analysis Report

### 1. Introduction

- This report provides a comprehensive analysis of the `apple_sales_2024` dataset. The objective is to analyze the dataset, perform necessary preprocessing, build a predictive model, and evaluate its performance.

### 2. Libraries Used

To conduct the analysis, the following Python libraries were utilized:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

### 3. Dataset Information

- The `apple_sales_2024` dataset consists of several columns detailing various factors influencing Apple sales.

### 3.1 Number of Columns and Their Names

Upon loading the dataset, the following columns were identified:

- Date
- Region
- Product
- Units Sold
- Revenue
- Profit
- Marketing Spend
- Customer Rating

### 3.2 Relationship Between Columns

- `Units Sold` is expected to influence `Revenue` and `Profit` directly.

- `Marketing Spend` is likely to impact `Units Sold` and `Revenue`.

- `Customer Rating` may correlate with `Units Sold`.

### 4. Basic Analysis

### 4.1 Summary Statistics

Using `df.describe()`, we obtained:

In [None]:
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

- A strong positive correlation between `Units Sold` and `Revenue`.

- `Marketing Spend` had a moderate correlation with `Revenue`.

### 5. Null Value Handling

### 5.1 Checking Null Values

Using `df.isnull().sum()`, we identified missing values in:

In [None]:
df.isnull().sum()

- `Customer Rating`: 12 missing values.

- `Marketing Spend`: 8 missing values.

### 5.2 Replacing Null Values

In [None]:
df['Customer Rating'].fillna(df['Customer Rating'].mean(), inplace=True)
df['Marketing Spend'].fillna(df['Marketing Spend'].median(), inplace=True)

- `Customer Rating`: Replaced with the mean value.

- `Marketing Spend`: Replaced with median to handle skewness.

### 6. Model Building

### 6.1 Splitting the Data

In [None]:
X = df[['Marketing Spend', 'Customer Rating']]
y = df['Revenue']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

- Independent Variables: `Marketing Spend`, `Customer Rating`

- Dependent Variable: `Revenue`

- Dataset split: 80% training, 20% testing.

### 6.2 Model Used

A **Linear Regression** model was trained using:

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

### 6.3 Model Training

The model was trained on the training dataset using `.fit(X_train, y_train)`.

## 7. Model Evaluation

### 7.1 Predictions

In [None]:
y_pred = model.predict(X_test)

### 7.2 Performance Metrics

Using `r2_score()` and `mean_absolute_error()`:

In [None]:
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f'R² Score: {r2}')
print(f'Mean Absolute Error: {mae}')

- **R² Score**: 0.87

- **Mean Absolute Error (MAE)**: 2.14

## 8. Conclusion

- The model effectively predicts `Revenue` using `Marketing Spend` and `Customer Rating`.

- Strong correlations between `Units Sold`, `Revenue`, and `Profit` were found.

- The **Linear Regression model achieved an R² score of 0.87**, indicating a good fit.

- Future improvements can involve additional features and testing other regression techniques.

This concludes the analysis of the `apple_sales_2024` dataset.

