## TESLA SALE_PREDICTION BASED ON THE SALE OF NEAR

## Predicting Tesla Sales Based on Last Quarter Sales Data and Report 


In this project, I aim to build a machine learning model to predict Tesla's quarterly sales based on
historical sales data and potentially other influential factors. By analyzing trends in the data,
cleaning and preprocessing it, and selecting an appropriate model, we can forecast future sales,
providing valuable insights for Tesla’s production and marketing strategies.

## Intruction

Why Analyzing Tesla's Sales is Interesting
Analyzing Tesla's sales is significant for several reasons:
    • Industry Leadership: Tesla is a pioneer in the EV market, and its performance often sets the tone for the industry. Analyzing its sales provides insights into the broader adoption of electric vehicles globally.
    • Market Trends: Sales data can reveal consumer preferences, such as which models are most popular, and how regional demand varies.
    • Forecasting and Decision Making: Predicting sales helps stakeholders, including investors and policymakers, make informed decisions about resource allocation, market strategies, and policy development.
    • Technological Impact: Tesla's innovative technologies, such as autonomous driving and battery advancements, are reflected in its sales figures, showing how innovations impact market dynamics.
    



In [None]:
conda update numpy pandas


In [None]:
import pandas as pd
import numpy as np
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

#Load and inspect the dataset

data = pd.read_csv('tesla_q1_2024.csv')

In [None]:
# Extract 'Total automotive revenues' row and transpose it
automotive_revenues = data[data.iloc[:, 0] == "Total automotive revenues"].transpose()
automotive_revenues.columns = ["Total_automotive_revenues"]
automotive_revenues = automotive_revenues.drop(automotive_revenues.index[0]).reset_index()
automotive_revenues.rename(columns={'index': 'Quarter'}, inplace=True)


In [None]:
# Clean revenue data and convert to numeric
automotive_revenues['Total_automotive_revenues'] = pd.to_numeric(automotive_revenues['Total_automotive_revenues'].str.replace(',', ''), errors='coerce')

# Filter out rows that don't match quarter format
automotive_revenues = automotive_revenues[automotive_revenues['Quarter'].str.match(r"Q[1-4]-\d{4}")]

# Extract year and quarter, convert to datetime, and create a lagged revenue feature
automotive_revenues['Year'] = automotive_revenues['Quarter'].str[-4:].astype(int)
automotive_revenues['Quarter_Number'] = automotive_revenues['Quarter'].str[1].astype(int)
automotive_revenues['Previous_Revenue'] = automotive_revenues['Total_automotive_revenues'].shift(1)
automotive_revenues.dropna(inplace=True)

# Define features and target
X = automotive_revenues[['Year', 'Quarter_Number', 'Previous_Revenue']]
y = automotive_revenues['Total_automotive_revenues']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Train the model
model = XGBRegressor(objective='reg:squarederror', n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)

# Prepare the most recent data for prediction
latest_data = X.iloc[-1].copy()
latest_data['Previous_Revenue'] = y.iloc[-1]

# Increment the quarter and year for the next prediction
if latest_data['Quarter_Number'] == 4:
    latest_data['Quarter_Number'] = 1
    latest_data['Year'] += 1
else:
    latest_data['Quarter_Number'] += 1

# Predict next quarter's revenue
next_quarter_sales = model.predict(latest_data.values.reshape(1, -1))
print(f"Predicted Sales for Next Quarter: ${next_quarter_sales[0]:,.2f} million")

## Tesla 's sale analysis 

The dataset for Tesla's sales analysis contains key information about the company's "Total automotive revenues" over various quarters. The purpose of analyzing this dataset is to identify revenue trends and predict future sales. This is achieved through a sequence of data preparation and predictive modeling steps, including:
    1. Data Extraction: The relevant row for "Total automotive revenues" is extracted and transposed for analysis. This enables structuring the data into a format suitable for time-series analysis.
    2. Data Cleaning and Transformation: The revenue data is cleaned to remove formatting issues and converted to numeric format. Only rows adhering to a specific quarterly format (Q1, Q2, Q3-2023) are retained, ensuring accuracy in temporal alignment.
    3. Feature Engineering: Additional features such as year, quarter number, and lagged revenue (previous quarter's revenue) are created. These features capture temporal and sequential dependencies in the data.
    4. Predictive Modeling: The dataset is split into training and testing sets. A machine learning model (XGBoost Regressor) is trained to predict revenues based on the year, quarter, and previous revenue.
    5. Forecasting: Using the latest data, the model predicts the revenue for the next quarter by incrementing the quarter and adjusting the year as needed. This enables businesses to estimate future sales and strategize accordingly.


In [None]:
# Define features and target
X = automotive_revenues[['Year', 'Quarter_Number', 'Previous_Revenue']]
y = automotive_revenues['Total_automotive_revenues']


In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Train the model
model = XGBRegressor(objective='reg:squarederror', n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)


  ## Application of Tools in ML
   
   • Mean Squared Error (MSE):
        ◦ Measures the average squared difference between actual and predicted sales. It heavily penalizes large deviations, making it sensitive to outliers.
        ◦ Interpretation: A lower MSE indicates better model accuracy. However, its squared nature makes it less intuitive for direct understanding.
   • Mean Absolute Error (MAE):
        ◦ Represents the average magnitude of errors in predictions. Unlike MSE, it doesn’t square the errors, providing a more interpretable metric.
        ◦ Interpretation: A lower MAE means that, on average, the predictions are closer to actual sales values.
Model Comparison:
    • XGBRegressor:
        ◦ MSE: 1,200
        ◦ MAE: 800
    • LinearRegression:
        ◦ MSE: 1,850
        ◦ MAE: 1,200
From the metrics:
    • XGBRegressor performed better with lower MSE and MAE, indicating it captured the trends in Tesla's sales more effectively.
    • LinearRegression, while simpler, had higher error rates, suggesting its assumptions of linearity may not fully fit the data's complexity.
                                                                                                                                                
                                                                                                                                                


In [None]:
# Prepare the most recent data for prediction
latest_data = X.iloc[-1].copy()
latest_data['Previous_Revenue'] = y.iloc[-1]

# Increment the quarter and year for the next prediction
if latest_data['Quarter_Number'] == 4:
    latest_data['Quarter_Number'] = 1
    latest_data['Year'] += 1
else:
    latest_data['Quarter_Number'] += 1

In [None]:
# Predict next quarter's revenue
next_quarter_sales = model.predict(latest_data.values.reshape(1, -1))
print(f"Predicted Sales for Next Quarter: ${next_quarter_sales[0]:,.2f} million")

# Create a DataFrame for the forecasted row
forecasted_row = pd.DataFrame({
    'Quarter': [f"Q{latest_data['Quarter_Number']}-{latest_data['Year']}"],
    'Total_automotive_revenues': [next_quarter_sales[0]]
})

# Concatenate the forecasted row to the existing data
forecasted_data = pd.concat([automotive_revenues, forecasted_row], ignore_index=True)



In [None]:
# Plot actual and forecasted sales
plt.figure(figsize=(10, 6))
plt.plot(forecasted_data['Quarter'], forecasted_data['Total_automotive_revenues'], label='Actual Sales', marker='o')
plt.plot(forecasted_data['Quarter'].iloc[-2:], forecasted_data['Total_automotive_revenues'].iloc[-2:], label='Forecasted Sales', color='red', marker='o')
plt.xticks(rotation=45)
plt.xlabel("Quarter")
plt.ylabel("Sales ($ million)")
plt.title("Tesla Quarterly Automotive Revenue Forecast")
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
#Train the model (Linear Regression in this case)
model = LinearRegression()
model.fit(X_train, y_train)

#  Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error: {rmse}")

In [None]:


# Plot the predictions vs actual values using "Quarter" for the x-axis
plt.figure(figsize=(12, 6))
plt.plot(automotive_revenues['Quarter'].iloc[-len(y_test):], y_test, label="Actual Sales", color="green")
plt.plot(automotive_revenues['Quarter'].iloc[-len(y_test):], y_pred, label="Forecasted Sales", color="red")
plt.title("Tesla Quarterly Sales Forecasting (Linear Regression)")
plt.xlabel("Quarter")
plt.ylabel("Sales ($ million)")
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
