#### Stock Price Prediction
This Python script aims to predict stock prices based on historical data using various regression algorithms. The dataset used in this script is Stock-prediction-data.csv.

#### Problem Description
The problem addressed in this script is that of regression, specifically predicting stock prices. Given historical stock data, including features such as time and stock-related metrics, the goal is to predict the future stock price. This script implements various regression algorithms to predict stock prices and evaluates their performance using metrics like Root Mean Square Error (RMSE) and R2 Score.

In [15]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV,cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [16]:
dataset = pd.read_csv('Stock-prediction-data.csv')

In [17]:
# Display the shape and first 5 rows of the dataset
print("Dataset Shape:", dataset.shape)
print("\nFirst 5 Rows of the Dataset:\n", dataset.head(5))

Dataset Shape: (94, 2)

First 5 Rows of the Dataset:
             x           y
0  168.181818  160.840244
1  187.878788  159.413657
2  207.575758  157.136809
3  227.272727  159.357847
4  246.969697  157.542862


In [27]:
dataset.describe()

Unnamed: 0,x,y
count,94.0,94.0
mean,1084.090909,166.576111
std,537.321877,5.861601
min,168.181818,155.234046
25%,626.136364,161.236377
50%,1084.090909,166.508064
75%,1542.045455,171.784967
max,2000.0,176.361532


In [18]:
# Separate features (X) and target variable (Y)
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, -1].values

In [19]:
# Split the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=0)

In [20]:
# List of regressors to try
regressors = {
    "Linear Regression": LinearRegression(),
    "k-Nearest Neighbors": KNeighborsRegressor(n_neighbors=10, metric="minkowski"),
    "Support Vector Machine": SVR(),
    "Decision Tree": DecisionTreeRegressor(
        max_depth=None, min_samples_leaf=1, min_samples_split=10
    ),
    "Random Forest": RandomForestRegressor(
        max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=50
    ),
}

In [21]:
# Evaluate each regressor using cross-validation
for reg_name, reg in regressors.items():
    scores = cross_val_score(reg, x_train, y_train, cv=5, scoring='neg_mean_squared_error')
    rmse_scores = np.sqrt(-scores)
    print(f"{reg_name} Cross-Validation RMSE: {rmse_scores.mean()}")

Linear Regression Cross-Validation RMSE: 1.823952138115829
k-Nearest Neighbors Cross-Validation RMSE: 1.981491794280443
Support Vector Machine Cross-Validation RMSE: 2.1235846450761655
Decision Tree Cross-Validation RMSE: 2.1062592957308994


Random Forest Cross-Validation RMSE: 2.1062625992379562


In [22]:
# Choose the best-performing regressor
best_regressor_name = min(regressors, key=lambda k: cross_val_score(regressors[k], x_train, y_train, cv=5, scoring='neg_mean_squared_error').mean())
best_regressor = regressors[best_regressor_name]

In [23]:
# Train the best regressor on the entire training set
best_regressor.fit(x_train, y_train)

In [24]:
# Make predictions on the test set
y_pred = best_regressor.predict(x_test)

In [25]:
# Evaluate the performance of the best regressor
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2score = r2_score(y_test, y_pred)
print(f"\nBest Regressor: {best_regressor_name}")
print(f"Root Mean Square Error: {rmse}")
print(f"R2 Score: {r2score}")


Best Regressor: Decision Tree
Root Mean Square Error: 2.097786265886907
R2 Score: 0.8787150407366524


|      | x            | y            |
|------|--------------|--------------|
| min  | 168   | 155   |
| max  | 2000 | 176  |


In [30]:
# Get user input for prediction
user_input = float(input("Enter a value for X to predict Y: "))

# Predict the value for the user input
user_pred = best_regressor.predict([[user_input]])
print(f"Predicted Y for input {user_input}: {user_pred[0]}")

Predicted Y for input 168.0: 158.77427238856527
