# Results and Analysis

## Analysis on Model Performance
Which model works best in time series forecasting for stock predictions?

That is, which model produces the most accurate stock price predictions?

### Summary Statistics

Comparison of RMSE

|                         | LSTM  | KNN   | Random Forest |
|-------------------------|-------|-------|---------------|
| Root Mean Squared Error | 7.14  | 11.85 | 12.24         |

Based on the RMSE of all three models, LSTM (Long Short Term Memory) works best in time series forecasting for stock predictions. Looking at the Root Mean Squared Error for each regression model, we can see that LSTM has the lowest value. Let's first explore why this is...

### Model Prediction Visualization

We can first look at a visualization of the predictions our models made with the test data compared to the actual stock prices to get a better understanding of how models performed.

#### Long Short Term Memory

<img src="images/lstm_pred.png" alt="Long Short Term Memory" width="800"/>

It is obvious here that this model did pretty well. While the test and predictions are not exactly the same, we can see that the model was able to predict the general upwards and downwards trends of the stock close price. Considering how LSTM works, this makes sense. LSTM has the ability to capture trends and fluctuations in time-deries data, making is very efficient for predictions with complex time dependencies.

#### K-Nearest-Neighbors

<img src="images/knn_pred.png" alt="K Nearest Neighbors" width="800"/>

This model did not do very well. As visualized, the KNN predicted a horizontal line for the predicted values which does not match well with the actual stock close prices. The issue with KNN is that it is weak for high-dimentional data and complex patterns. When trying to predict time-series forcasing, KNN is unable to consider / analyze long-term trends which is needed to predict stock patterns

#### Random Forest

<img src="images/rf_pred.png" alt="Random Forest" width="800"/>

Random forest looks better than KNN as it is not just a horizontal line. However, we can see here that it does not accuractly predict the correct trends (predicted price drop when price actually increased, predicted no change when price actually increased). Random forest has some issues because it lacks long-term memory and ability to see sequences. Each day is treated as independent, so it is very difficult to capture trends between days.

### Model Prediction Accuracy Visualization

To support the visualizations shown above, we can look at the plot of residuals between out test and predicted stock close prices. Residuals = actual values - predicted values. Thus, we are looking for random distribution near or on the line y = 0. If values are very far off from y = 0, then the model did not do a good job predicting. If the values are not randomly distributed, the model is inadequate

#### Long Short Term Memory

<img src="images/lstm_res.png" alt="Long Short Term Memory" width="500"/>

#### K-Nearest-Neighbors

As KNN predicts stock close prices, it flattens out and predicts the same price every time. Thus, a distribution of residuals does not tell us much. This, We decided to leave out this graph. 

#### Random Forest

<img src="images/rf_res.png" alt="Random Forest" width="500"/>

### Conclusion

To conclude, the best classification model to predict stock close prices for NVIDIA stock is Long Short Term Memory. As seen above, Long Short Term Memory had a significantly smaller mean squared error, even distribution of residuals, and low residual values. This is due to a couple reasons...

Long Term Short Memory
- LSTM is a recurrent neural network is designed to handle sequential data. Thus, it is ideal for time-series forecasting and predicting stock pricies. Since this model uses memory cells, it can remember relevant data, even over long periods of time.
- Since stock prices are highly sequential, LSTM can easily capture trends such as patterns and fluctuations. Thus, it would make sense that this model performs the best for this specific task.

K-Nearest-Neighbors
- KNN is a very simple algorithm that predicts the value of new data points based on the points that are closest to it. It basically calculates the euclidean distance between target points and looks for the most similar instances.
- This model has significant limitations because it is unable to use memory as well as determine trends in previous data. With the complex patterns that come with stock price fluctuations, KNN is too simple to determine future stock prices.

Random Forest
- RF is a model based on many different decision trees. Each tree is trained on a subset of data which can allow it to capture patterns of the data. However, the final prediction is typically the average of predictions from each tree
- This model too has limitations because there are not many features. It will create a new decision tree for each day which does not allow it to capture trends across a time period. Thus, this model simply does not have the predictive power of other modles for time-series data.

### LSTM Experimentation

Expanding further on our best-performing model, we experimented with different amounts and types of layers, such as LSTM and Dropout layers, as well as varying units per layer, to see if certain parameter combinations would perform better than our initial model. We found that ...

|                         | Added LSTM layers | Increased Units per layer | Added Dropout layers |
|-------------------------|-------------------|---------------------------|----------------------|
| Root Mean Squared Error | 6.42              | 7.56                      | 9.20                 |