##Aaryan Thapa(06)
##Aaditya Sapkota(01)


# Bitcoin Price Prediction Using Linear Regression

---

## Problem Definition

The objective of this project is to predict the daily closing price of Bitcoin using historical market data.

Bitcoin is a highly volatile digital asset, and understanding how different market indicators influence its closing price is important for financial analysis and decision-making.

This is a regression problem because the target variable (Close price) is a continuous numerical value.

---

### Objective

- Predict the Close price of Bitcoin.
- Use historical features such as:
  - Open price
  - High price
  - Low price
  - Trading Volume
  - 7-day Moving Average (MA_7)
  - 30-day Moving Average (MA_30)
- Train a model from scratch without using external machine learning libraries.
- Evaluate model performance using MSE, RMSE, and R² score.

---

### Why Linear Regression?

Linear Regression was selected because:

- The target variable is continuous.
- Financial price indicators often have a linear relationship.
- It is simple, interpretable, and mathematically strong.
- It can be implemented from scratch using basic mathematics.
- It allows us to understand how each feature influences the final prediction.

---

## Methodology

The following steps were followed in this project:

1. Load the dataset using the CSV module.
2. Clean the data by removing missing or invalid values.
3. Split the dataset into 80% training and 20% testing sets.
4. Separate features (X) and target variable (y).
5. Normalize the features using Min-Max scaling.
6. Implement Linear Regression using Gradient Descent.
7. Train the model for multiple epochs.
8. Evaluate the model using performance metrics.

---

### Model Formula

The Linear Regression model follows this equation:

Close = w₁(Open) + w₂(High) + w₃(Low) + w₄(Volume) + w₅(MA₇) + w₆(MA₃₀) + b

Where:

- w₁ to w₆ are weights learned during training.
- b is the bias term.
- Gradient Descent is used to minimize the Mean Squared Error (MSE).

---

## Model Evaluation

The model performance is measured using:

- Mean Squared Error (MSE)  
  Average squared difference between predicted and actual values.

- Root Mean Squared Error (RMSE)  
  Square root of MSE. Represents error in dollar value.

- R² Score  
  Measures how well the model explains variation in Bitcoin prices.  
  R² = 1 indicates perfect prediction.

---

## Results

After training and testing:

- The model produces predictions for unseen test data.
- Performance metrics indicate how accurate the predictions are.
- Sample predictions are compared with actual closing prices.

---

## Conclusion

This project demonstrates how Linear Regression can be implemented from scratch to solve a real-world financial prediction problem.

The model successfully learns the relationship between market indicators and Bitcoin closing price. Although financial markets are complex and not perfectly linear, the model provides a strong baseline for prediction.

Future improvements could include:
- Adding more technical indicators
- Using polynomial regression
- Implementing regularization
- Trying more advanced models

#Loading the Data

In [3]:
import csv

# Full path to your CSV file
file_path = "/home/lenovo/Documents/FODS/Bitcoin_Price_Dataset_2014_2023.csv"

# Open and read the CSV file
rows = []
with open(file_path, 'r') as file:
    reader = csv.DictReader(file)  # DictReader reads each row as a dictionary
    for row in reader:
        rows.append(row)

print(f"Total rows loaded: {len(rows)}")

# Print row 10 (index 10)
print("Sample row:", rows[10])

Total rows loaded: 3393
Sample row: {'Date': '2014-09-27', 'Open': '403.56', 'High': '406.62', 'Low': '397.37', 'Close': '399.52', 'Volume': '15029300', 'Daily_Return': '-1.21', 'Price_Range': '9.25', 'Price_Change': '-4.04', 'MA_7': '410.78', 'MA_30': '', 'MA_90': '', 'Volatility_30d': '', 'Day_of_Week': 'Saturday', 'Month': '9', 'Year': '2014', 'Quarter': '3'}


#Cleaning the Data

In [4]:
# We'll use these columns as features (inputs) and Close as the target (output)
features_to_use = ['Open', 'High', 'Low', 'Volume', 'MA_7', 'MA_30']
target = 'Close'

clean_data = []

for row in rows:
    try:
        # Try to convert each needed column to a float
        entry = {}
        for col in features_to_use + [target]:
            val = row[col].strip()
            if val == '' or val == 'None':
                raise ValueError("Missing value")  # Skip rows with empty values
            entry[col] = float(val)
        clean_data.append(entry)
    except ValueError:
        pass  # Skip this row if any value is missing or not a number

print(f"Rows after cleaning: {len(clean_data)}")

Rows after cleaning: 3364


#Split into Train and Test Sets

In [5]:
import random

random.shuffle(clean_data)  # Shuffle so we don't train only on old data

split_index = int(len(clean_data) * 0.8)  # 80% for training

train_data = clean_data[:split_index]
test_data = clean_data[split_index:]

print(f"Training rows: {len(train_data)}")
print(f"Testing rows: {len(test_data)}")

Training rows: 2691
Testing rows: 673


#Preparing Feature Matrices

In [6]:
def get_X_y(data, feature_cols, target_col):
    """Extract features (X) and target (y) from the data"""
    X = []
    y = []
    for row in data:
        X.append([row[col] for col in feature_cols])  # List of feature values
        y.append(row[target_col])                      # Target value
    return X, y

X_train, y_train = get_X_y(train_data, features_to_use, target)
X_test, y_test = get_X_y(test_data, features_to_use, target)

print(f"X_train shape: {len(X_train)} rows, {len(X_train[0])} features")

X_train shape: 2691 rows, 6 features


#Feature Scaling

In [7]:
def normalize(X):
    """Scale each feature to be between 0 and 1"""
    num_features = len(X[0])
    mins = [min(row[i] for row in X) for i in range(num_features)]
    maxs = [max(row[i] for row in X) for i in range(num_features)]
    
    X_scaled = []
    for row in X:
        scaled_row = []
        for i in range(num_features):
            if maxs[i] - mins[i] == 0:
                scaled_row.append(0)  # Avoid division by zero
            else:
                scaled_row.append((row[i] - mins[i]) / (maxs[i] - mins[i]))
        X_scaled.append(scaled_row)
    
    return X_scaled, mins, maxs

X_train_scaled, mins, maxs = normalize(X_train)
# Use the SAME mins/maxs from training to scale test data
X_test_scaled = [[(row[i] - mins[i]) / (maxs[i] - mins[i]) if maxs[i] - mins[i] != 0 else 0
                  for i in range(len(row))] for row in X_test]

#Implentation of Linear Regression with Gradient Descent

In [8]:
def predict(X, weights, bias):
    """Make predictions: y = w1*x1 + w2*x2 + ... + bias"""
    results = []
    for row in X:
        pred = bias
        for i in range(len(row)):
            pred += weights[i] * row[i]
        results.append(pred)
    return results

def train_linear_regression(X, y, learning_rate=0.01, epochs=1000):
    """Train using gradient descent"""
    n = len(X)                        # Number of training examples
    num_features = len(X[0])
    
    # Start with all weights = 0
    weights = [0.0] * num_features
    bias = 0.0
    
    for epoch in range(epochs):
        # Step 1: Make predictions with current weights
        predictions = predict(X, weights, bias)
        
        # Step 2: Calculate errors
        errors = [predictions[i] - y[i] for i in range(n)]
        
        # Step 3: Calculate gradients (how much to adjust each weight)
        weight_gradients = [0.0] * num_features
        for i in range(n):
            for j in range(num_features):
                weight_gradients[j] += errors[i] * X[i][j]
        
        bias_gradient = sum(errors)
        
        # Step 4: Update weights (move in the direction that reduces error)
        for j in range(num_features):
            weights[j] -= learning_rate * (weight_gradients[j] / n)
        bias -= learning_rate * (bias_gradient / n)
        
        # Print progress every 100 epochs
        if epoch % 100 == 0:
            mse = sum(e**2 for e in errors) / n
            print(f"Epoch {epoch}: MSE = {mse:.2f}")
    
    return weights, bias

weights, bias = train_linear_regression(X_train_scaled, y_train)

Epoch 0: MSE = 486110176.59
Epoch 100: MSE = 113315963.27
Epoch 200: MSE = 58117530.65
Epoch 300: MSE = 35915745.86
Epoch 400: MSE = 22821092.51
Epoch 500: MSE = 14642575.86
Epoch 600: MSE = 9500462.79
Epoch 700: MSE = 6264909.34
Epoch 800: MSE = 4228703.21
Epoch 900: MSE = 2947118.38


#Evaluating the Model

In [9]:
def mean_squared_error(y_true, y_pred):
    n = len(y_true)
    return sum((y_true[i] - y_pred[i])**2 for i in range(n)) / n

def r_squared(y_true, y_pred):
    """R² tells us how well our model explains the variance in the data"""
    mean_y = sum(y_true) / len(y_true)
    ss_total = sum((y - mean_y)**2 for y in y_true)      # Total variance
    ss_residual = sum((y_true[i] - y_pred[i])**2 for i in range(len(y_true)))  # Unexplained variance
    return 1 - (ss_residual / ss_total)

# Make predictions on test data
test_predictions = predict(X_test_scaled, weights, bias)

mse = mean_squared_error(y_test, test_predictions)
r2 = r_squared(y_test, test_predictions)

print(f"\n--- Results ---")
print(f"Mean Squared Error: {mse:.2f}")
print(f"Root MSE: {mse**0.5:.2f}")  # In the same units as price (dollars)
print(f"R² Score: {r2:.4f}")        # Closer to 1.0 is better

# Show a few example predictions vs actual
print("\nSample predictions vs actual:")
for i in range(5):
    print(f"  Predicted: ${test_predictions[i]:.2f}  |  Actual: ${y_test[i]:.2f}")


--- Results ---
Mean Squared Error: 2209456.21
Root MSE: 1486.42
R² Score: 0.9911

Sample predictions vs actual:
  Predicted: $8255.20  |  Actual: $6853.84
  Predicted: $1541.16  |  Actual: $247.53
  Predicted: $7411.75  |  Actual: $6529.59
  Predicted: $20206.34  |  Actual: $21161.52
  Predicted: $21939.95  |  Actual: $23389.43
