### **Abstract**

Predicting stock market trends is a challenging yet critical task for investors and financial analysts. In this project, we explore the effectiveness of traditional machine learning algorithms, specifically **XGBoost** and **Random Forest**, in predicting whether the stock price of a particular stock will go up or down the next day. The project focuses on binary classification, where the target variable is the direction of the stock price movement (up or down).

We begin by preprocessing the dataset, which includes handling missing values, encoding categorical variables, and performing feature engineering to enhance the model's predictive power. Exploratory Data Analysis (EDA) is conducted to understand the distribution of the data and identify key trends. One-hot encoding is applied to categorical features, and the dataset is split into training and testing sets.

Two machine learning models are implemented and evaluated: **XGBoost** and **Random Forest**. Hyperparameter tuning is performed using **GridSearchCV** to optimize model performance. The models are evaluated using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Additionally, **K-Fold Cross-Validation** is employed to ensure the robustness of the models.

Based on the evaluation metrics, **XGBoost** achieves a **training accuracy of 52.16%** and a **test accuracy of 51.97%**, with an **AUC-ROC score of 0.528**. In comparison, **Random Forest** achieves a **training accuracy of 59.43%** and a **test accuracy of 51.20%**, with an **AUC-ROC score of 0.500**. While Random Forest performs slightly better on the training set, XGBoost demonstrates better generalization on the test set, as evidenced by its higher AUC-ROC score.

Feature importance analysis reveals that **`ma5_neg_0`** (XGBoost) and **`vol_-1`** (Random Forest) are the most significant predictors of stock price movements. The project concludes with a discussion on the limitations of traditional machine learning models in stock market prediction and suggests potential improvements, such as incorporating external data sources (e.g., news sentiment) and exploring deep learning models.

This project highlights the potential of machine learning in financial forecasting while emphasizing the need for careful feature engineering and model evaluation to achieve reliable predictions.

### **Dataset Column Descriptions**

1. **`symbol` (object)**:
   - The stock ticker symbol, which is a short code representing a specific stock (e.g., AAPL for Apple, MSFT for Microsoft).

2. **`streak_len` (object)**:
   - The length of a streak, which refers to the number of consecutive days the stock price has been increasing or decreasing.

3. **`direction` (int64)**:
   - The direction of the stock price movement on a given day:
     - `1`: The stock price increased.
     - `-1`: The stock price decreased.
     - `0`: The stock price did not change (if applicable).

4. **`occurrence` (object)**:
   - How often a specific event or pattern happens (e.g., how frequently the stock price increases or decreases).

5. **`performance` (int64)**:
   - A numeric score or metric that represents how well the stock is performing. Higher values usually mean better performance.

6. **`vol` (int64)**:
   - The trading volume, which is the total number of shares traded for the stock on a given day. Higher volume means more trading activity.

7. **`ma5_pos` (int64)**:
   - The position of the stock price relative to its 5-day moving average (average price over the last 5 days):
     - `1`: The stock price is above the 5-day moving average.
     - `-1`: The stock price is below the 5-day moving average.
     - `0`: The stock price is equal to the 5-day moving average (if applicable).

8. **`next_day_actual` (int64)**:
   - The actual direction of the stock price movement on the **next day**:
     - `1`: The stock price increased.
     - `-1`: The stock price decreased.
     - `0`: The stock price did not change (if applicable).
   - This is the **target variable** for the prediction task.

9. **`bin_name` (object)**:
   - A category or group that the stock belongs to, based on certain criteria like price ranges or performance levels.