# Algorithm Project Proposal  
**Team C.J.: Chris Wong and Joshua Meyer**  
**CPSC 322, Fall 2025**

---

## 1. Project Title
**Predicting Bitcoin Price Movement Using Treasury and Sentiment Data**

---

## 2. Dataset Description
- **Source:** [Kaggle — Bitcoin and US Treasury with Daily Sentiment](https://www.kaggle.com/datasets/jessearzate/bitcoin-and-us-treasury-with-daily-sentiment?select=bitcoin_sentiment_12012022_11082025.csv)  
- **Format:** CSV file  
- **Contents:**  
  - Daily Bitcoin trading data (open, close, high, low, volume)  
  - US Treasury data (bonds, bills, notes)  
  - Market sentiment scores (weighted sentiment, daily sentiment indicators)  
- **Time Range:** December 1, 2022 – November 2025  
- **Size:** ~1,000 daily instances, 26 attributes  

---

## 3. Attributes and Target
We plan to use the following attributes as predictors:  
- **Bitcoin metrics:** open, high, low, volume  
- **Treasury data:** treasury_bonds, treasury_bills, treasury_notes  
- **Sentiment data:** market sentiment, weighted_sentiment  

### Target (Class Information)
- Instead of predicting the raw **close** value (regression), we will **discretize close into categories** for classification:  
  - **Binary classification:** “Up” if close > previous day’s close, “Down” otherwise.  
  - **Alternative:** Discretize into three bins (Low / Medium / High) using quantiles.  


---

## 4. Implementation / Technical Merit
- **Preprocessing:**  
  - Handle missing sentiment values (drop or impute).  
  - Normalize/standardize numeric features.  
  - Discretize continuous target into categorical classes.  
- **Feature Selection:**  
  - Correlation pruning to remove redundant attributes.  
  - Embedded importance from decision trees and random forest.  
  - PCA (optional) for dimensionality reduction if collinearity is high.  
- **Classifiers:**  
  - **Decision Tree (baseline)**  
  - **MyRandomForestClassifier (custom implementation)**  
  - **XGBoost (advanced ensemble)**  

---

## 5. Anticipated Challenges
- **Missing values:** Some sentiment data may be incomplete.  
- **Class imbalance:** Bitcoin price may rise more often than fall, requiring stratified sampling or weighted metrics.  
- **Temporal dependency:** Daily data has sequential structure; we will treat each day independently for classification.  
- **Bias in feature selection:** Choosing only 10 attributes may introduce bias; PCA and embedded methods will help mitigate this.  

---

## 6. Feature Selection Techniques
- **Filter methods:** Correlation heatmaps, variance thresholds.  
- **Embedded methods:** Feature importance from decision trees and XGBoost.  
- **Dimensionality reduction:** PCA to reduce noise and speed up training, with trade-off in interpretability.  

---

## 7. Potential Impact of Results
- **Usefulness:**  
  - Predicting Bitcoin price movement categories can help investors and institutions anticipate market trends.  
  - Provides a framework for testing hypotheses about the relationship between treasury yields, sentiment, and crypto markets.  
- **Stakeholders:**  
  - Hedge funds, banks, and trading firms focusing on Bitcoin strategies.  
  - Retail investors and Bitcoin holders.  
  - Shareholders in financial institutions exposed to crypto markets.  

---

## 8. Citations
- **Dataset:** Kaggle — Bitcoin and US Treasury with Daily Sentiment  
- **Libraries:** scikit-learn, XGBoost, pandas, matplotlib, seaborn  
- **References:**  
  - Scikit-learn documentation (Decision Trees, PCA)  
  - XGBoost documentation  
  - Course-provided materials for Random Forest implementation  

---

## 9. Next Steps
- Finalize discretization strategy for the target variable (“close”).  
- Perform exploratory data analysis (EDA) to visualize distributions, correlations, and class balance.  
- Implement and evaluate classifiers, beginning with the custom `MyRandomForestClassifier`.  
- Compare performance across classifiers and report best results with confusion matrices and accuracy metrics.  

---