## Predicting Stock Prices of Ford Using SVM Models
This project explores the application of Support Vector Machines (SVMs) to predict stock prices of Ford Motor Company, employing both Support Vector Regression (SVR) and Support Vector Classification (SVC). The goal is to forecast future stock prices using regression techniques and to classify stock price movements (up or down) using classification techniques. Below is a step-by-step breakdown of the methodology and key differentiating factors between SVR and SVC.

## 1. Data Preparation
Summary: Historical stock price data was collected and preprocessed. Key features like moving averages, relative strength index (RSI), and volume data were extracted. Missing values were imputed, and data was scaled for consistency using z-score normalization.

SVR Use Case: The scaled features and target variable (actual stock prices) were prepared for regression tasks.

SVC Use Case: A new binary target variable (Price_Change) was created, indicating whether the stock price would go up (1) or stay the same/go down (0) the next day.

Analysis: Preprocessing is critical for both SVR and SVC since SVMs are sensitive to feature scaling. Creating a binary classification target for SVC allows the method to focus on directional movement instead of exact price values.

To predict Ford's stock prices, I implemented a Support Vector Regression (SVR) model on recent trading data. The dataset was filtered to include the last two months of historical data. Features such as Bollinger bands, RSI, MACD, and price change rates were included, while redundant or irrelevant features were excluded. The data was standardized using StandardScaler, and skewness was reduced using a PowerTransformer.

For model training and validation, I reserved the last week of data as an unseen test set while the remaining data was used

## 2. SVR: Predicting Stock Prices
Methodology:

The Support Vector Regression (SVR) model was trained to predict Ford's stock prices using a dataset of technical and financial indicators. The kernel functions tested include: Linear: Assumes a linear relationship between features and target, RBF (Radial Basis Function): Captures nonlinear relationships by mapping data to higher dimensions.
Hyperparameters such as C (regularization), epsilon (tolerance). The features were preprocessed using standardization and power transformation to ensure optimal model performance. A linear kernel SVR model with tuned hyperparameters (C=1 and epsilon=0.01) was selected through time-series cross-validation and grid search, with the evaluation focusing on minimizing mean squared error (MSE). For testing, the last week of data was reserved as an unseen test set. The model achieved a mean squared error (MSE) of 0.0081, demonstrating its accuracy in predicting short-term stock trends. The SVR model effectively captures the nuances in Ford's stock movements, providing actionable insights for financial decision-making.


SVR aims to minimize the prediction error while allowing some deviation (epsilon) from the true price.
SVR solves a regression problem and focuses on finding a hyperplane that fits most data points within a margin of tolerance. SVR provides precise price predictions, useful for portfolio planning. It is sensitive to outliers and requires fine-tuned hyperparameters for optimal performance.








Results:

Best Hyperparameters: {'C': 1, 'epsilon': 0.01, 'kernel': 'linear'}

Mean Squared Error (MSE): 0.0081

The low MSE indicates a strong performance, with predictions closely aligning with the actual values over the unseen test period.

The plot below demonstrates the performance of the SVR model in predicting Ford's closing stock prices over a given period. Here are the observations:

(1) The model aligns well with the actual trends, capturing the general movement and fluctuations in price, which indicates the effectiveness of the SVR model in short-term forecasting.

(2) The confidence interval reflects the model's uncertainty, which widens as the forecast progresses. This is expected because prediction confidence decreases as the forecast horizon lengthens.

(3) The SVR model is effective in forecasting immediate trends, especially for relatively stable or gradually changing price patterns.

(4) However, in periods of sharp volatility (e.g., around early November in the plot), the model may lag or deviate slightly from the actual trends, suggesting potential challenges in capturing extreme price movements.

In conclusion, the SVR model provides a strong foundation for predicting Ford's stock prices, particularly for short-term horizons. However, its predictive performance during volatile market conditions could benefit from further tuning or the inclusion of additional features to improve robustness.










<img src="figures/pic1.png" width="90%"></img>

## 3. Backtesting SVR Model for Ford's Stock Price Prediction:

Methodology:

The Support Vector Regression (SVR) model was subjected to backtesting to evaluate its robustness and predictive capability over multiple time periods. The following steps were taken:

Data Preparation:

The dataset was divided into rolling windows for training and testing.
Each backtest used the previous 2 months (approximately 60 days) as the training window, and the following 5 days as the test period.
Features and target (F_Close_pred) were scaled and transformed to remove skewness using StandardScaler and PowerTransformer.

Hyperparameter Tuning:

A grid search with TimeSeriesSplit cross-validation was performed to optimize the SVR hyperparameters (kernel, C, and epsilon).
The best parameters were identified for each training window and used to train the SVR model.

Backtesting Framework:

Rolling backtesting was conducted over 10 iterations.
After training the model on the 2-month training window, predictions were made for the 5-day test period.
Metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R² (coefficient of determination) were calculated for each backtest.

Visualization:

Predictions for each backtest were plotted alongside actual values, providing a visual representation of model performance.

Results:
Average Metrics Across 10 Backtests:

MSE: 0.0563

RMSE: 0.1936

R²: -1.1683

Backtesting Outcomes:

The MSE and RMSE indicate a moderate error in predicting the exact stock prices.
The negative R² value for most backtests suggests that the model struggled to explain the variance in stock price movement compared to the mean of the target variable.

Interpretation of Results:

Model Performance:

While the SVR model produced predictions with relatively low RMSE, the negative R² scores highlight the challenges in modeling stock price dynamics, which are inherently noisy and influenced by external factors.

Backtest Visualization:

The plots demonstrated that the SVR model could track trends in certain periods but failed to capture abrupt changes or outliers, which are common in financial data.
Challenges in Predicting Stock Prices:

Stock prices often exhibit high volatility, influenced by market sentiment, external news, and macroeconomic factors. The linear SVR model, despite its predictive power, may have limitations in capturing such non-linear and unpredictable patterns.

Potential Improvements:

Incorporating more advanced models, such as non-linear kernels (e.g., RBF), ensemble methods, or deep learning models, might improve performance.
Including additional features, such as macroeconomic indicators or sentiment analysis, could provide the model with more context.

Conclusion:
The backtesting results show that the SVR model can provide reasonable predictions but struggles to fully capture the complex and dynamic nature of stock price movements. While the model performs well during periods of stability, its performance degrades during volatile phases. Future work could focus on enhancing feature engineering and leveraging more sophisticated models to achieve better predictive accuracy and robustness.


## 4. SVC: Classifying Stock Price Movements
Methodology:

SVC was used to classify whether the stock price would increase (1) or not (0). The Support Vector Classification (SVC) model was implemented to predict the class of Ford's stock closing price (F_Close_pred_class) over a two-month window. The dataset was preprocessed by standardizing and transforming features using the StandardScaler and PowerTransformer to remove skewness. The dataset was split, keeping the last 7 days as an unseen test set. A time-series cross-validation (TimeSeriesSplit) was used for hyperparameter tuning. The parameter grid for the grid search included different kernel types (linear, rbf), regularization parameter C, and gamma values.

The optimal hyperparameters found through grid search were:

kernel: linear

C: 1

gamma: 0.01

The final SVC model was trained using these parameters, and predictions were generated for the unseen test set. Accuracy and Mean Squared Error (MSE) were used to evaluate performance.


SVC is designed for classification tasks and finds a hyperplane that maximizes the margin between classes (price up vs. not up).
Unlike SVR, which predicts continuous values, SVC predicts categorical outcomes, focusing on decision boundaries. It simplifies the problem by focusing on directional movement, reducing the complexity of price forecasting. It does not provide exact price values, limiting its use in specific financial applications.

Results:

Best Hyperparameters: {'C': 1, 'gamma': 0.01, 'kernel': 'linear'}

Mean Squared Error (MSE): 0.1429

Unseen Test Accuracy: 85.7% (0.8571)

The results indicate that the model accurately predicted 6 out of 7 test samples, showcasing its reliability in classifying the stock price movement over short-term horizons.

The plot below reflects that the SVC model captured the binary movements (class labels) of the stock price with high accuracy during the unseen test period. However, the transitions in the binary values (0 to 1 or vice versa) highlight a sharp fluctuation in stock class predictions. This volatility may represent the sensitivity of the SVC model to decision boundaries, which aligns with its role in classification tasks.

The SVC model demonstrated strong predictive capabilities for classifying Ford's stock price movements, achieving an accuracy of 85.7%. While the MSE of 0.1429 suggests slight discrepancies in exact class predictions, the overall performance is robust for short-term forecasting. The binary nature of the classification task, as seen in the plot, emphasizes the model's focus on distinguishing between upward and downward movements.



<img src="figures/pic2.png" width="90%"></img>

## Backtesting SVC Model for Ford's Stock Price Classification

Methodology:

The Support Vector Classification (SVC) model was tested using a backtesting approach to evaluate its effectiveness in predicting Ford's stock price movement (classified as up or down). The following steps were implemented:

Data Preparation:

The dataset was divided into rolling windows, with 2 months (approximately 60 days) used for training and the subsequent 7 days reserved for testing.
The target variable (F_Close_pred_class) represented the stock price direction (up/down).
Features were scaled and transformed using StandardScaler and PowerTransformer to address skewness and standardize the data.

Hyperparameter Tuning:

A grid search was conducted using cross-validation (TimeSeriesSplit) to optimize the hyperparameters of the SVC model (kernel, C, and gamma).
The best combination of parameters was used for each training window to ensure optimal performance.

Backtesting Framework:

Five iterations of backtesting were performed, with the training window rolling forward by the test period (7 days) after each iteration.
Predictions were made for each 7-day test period, and performance metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R² were computed.

Visualization:

Predictions for each backtest were plotted alongside actual data, providing a clear visual representation of the model's performance.

Results:

Average Metrics Across 5 Backtests:

MSE: 0.0000
RMSE: 0.0000
R²: 1.0000

Backtesting Outcomes:

The model achieved perfect classification accuracy (MSE = 0, R² = 1) across all test periods. This indicates that the SVC model accurately predicted the direction of Ford's stock price movement in every backtest.

Interpretation of Results:

Model Performance:

The SVC model performed exceptionally well, achieving zero classification errors across all five backtesting iterations.
The perfect R² value (1.0000) reflects the model's ability to explain 100% of the variance in the classification task.
Backtest Visualization:

The plots (not displayed here) confirmed the alignment between predicted and actual classifications, showcasing the reliability of the SVC model.

Classification vs. Regression:

Compared to the SVR model used for regression, the SVC model achieved significantly better performance in its respective task. While the regression model struggled with stock price prediction due to the noisy nature of financial data, the classification model excelled at predicting directional movement.

Challenges and Considerations:

While the SVC model performed perfectly in this setup, it is essential to validate its robustness on out-of-sample data or during periods of heightened volatility.
Additional features, such as sentiment analysis or macroeconomic indicators, could further enhance performance and applicability.

Conclusion:

The backtesting results demonstrate the SVC model's robustness and reliability in classifying Ford's stock price movement. The perfect performance metrics suggest that the classification task was well-suited for the available features and model. However, further testing on unseen data and different market conditions is recommended to confirm the model's generalizability. This analysis underscores the importance of selecting the appropriate model type (classification vs. regression) based on the specific prediction goal.

## 5. Comparison between SVR and SVC
The SVR model predicted the continuous closing prices, offering detailed predictions and smooth confidence intervals, but its performance was more sensitive to volatile trends.

In contrast, the SVC model focused on classifying movements (up/down), demonstrating higher categorical accuracy but lacking the granularity of continuous price predictions.

Both models are complementary, with SVR better suited for precise numerical forecasting and SVC excelling in directional or categorical classification.

Overall, the SVC model effectively supports decision-making for short-term trading strategies, especially when the primary interest lies in predicting the stock's directional movement rather than its exact price.



## Conclusion
This project demonstrates the versatility of SVMs in stock market prediction, with SVR excelling in regression tasks and SVC effectively classifying price movements. While both approaches have their strengths, their combined use provides a comprehensive strategy for understanding and forecasting stock market behavior.