### Task 2: Predict Future Stock Prices (Short-Term)

The objective of this task is to predict the next day's closing price of a selected stock
using historical price data fetched from Yahoo Finance.

Stock chosen: **Apple (AAPL)**

Skills Covered:
- Time-series data loading
- Data preprocessing
- Regression modeling (Linear Regression, Random Forest)
- Prediction and visualization


In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

sns.set(style="whitegrid")


stock = "AAPL"  # Apple stock

data = yf.download(stock, start="2015-01-01", end="2024-12-31")
data.head()


data["Next_Close"] = data["Close"].shift(-1)
data.dropna(inplace=True)


features = data[["Open", "High", "Low", "Close", "Volume"]]
labels = data["Next_Close"]


X_train, X_test, y_train, y_test = train_test_split(
    features, labels, test_size=0.2, shuffle=False
)


LinearRegression

In [None]:
lr = LinearRegression()
lr.fit(X_train, y_train)

lr_pred = lr.predict(X_test)


RandomForestRegressor

In [None]:
rf = RandomForestRegressor(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)

rf_pred = rf.predict(X_test)


Plot Actual vs Predicted Closing Prices
Linear Regression Plot

In [None]:
def evaluate_model(y_test, y_pred, model_name):
    print(f"\nModel: {model_name}")
    print("MSE:", mean_squared_error(y_test, y_pred))
    print("R²:", r2_score(y_test, y_pred))

evaluate_model(y_test, lr_pred, "Linear Regression")
evaluate_model(y_test, rf_pred, "Random Forest")


Random Forest Plot

In [None]:

plt.figure(figsize=(12,6))
plt.plot(y_test.values, label="Actual Closing Price")
plt.plot(rf_pred, label="Predicted (Random Forest)")
plt.title("Actual vs Predicted Closing Prices — Random Forest")
plt.legend()
plt.show()

Linear Regression

In [None]:
plt.figure(figsize=(12,6))
plt.plot(y_test.values, label="Actual Closing Price")
plt.plot(lr_pred, label="Predicted (Linear Regression)")
plt.title("Actual vs Predicted Closing Prices — Linear Regression")
plt.legend()
plt.show()


9. Final Insights
### Insights

- Random Forest performs better than Linear Regression due to its ability
  to capture non-linear relationships in stock movement.
- Linear Regression underfits the data slightly but still provides smooth predictions.
- Next-day prediction tends to follow the general trend but may miss sudden price jumps.
- Volume and High/Low significantly affect next-day close prediction.