# data_prep-checkpoint.ipynb

## Notebook Purpose
This notebook is designed to load, clean, and preprocess historical cryptocurrency data. It will handle missing values, convert data types, and prepare the data for further analysis and model training.

## Instructions
1. **Import Necessary Libraries**:
   - Import `pandas` for data manipulation.
   - Import `numpy` for numerical operations.
   - Import `pathlib` for handling file paths.

2. **Load Data**:
   - Load the CSV file containing historical cryptocurrency data.

3. **Preprocess Data**:
   - Handle missing values.
   - Convert data types as necessary.
   - Ensure the data is properly indexed by date.

4. **Save Preprocessed Data**:
   - Save the cleaned and preprocessed data to a new CSV file for later use.

5. **Review Data**:
   - Display the first few rows of the preprocessed data to ensure it looks correct.

## Example Code
```python
# Import necessary libraries
import pandas as pd
import numpy as np
from pathlib import Path

# Load data
data_path = 'data/BTC-USD.csv'  # Update this path based on the selected cryptocurrency
data = pd.read_csv(data_path, parse_dates=['Date'], index_col='Date')

# Preprocess data
data = data.dropna()  # Handle missing values
data = data[['Close']]  # Select necessary columns

# Save the preprocessed data
data.to_csv('data/cleaned_data/BTC_cleaned.csv')

# Display the first few rows of the preprocessed data
data.head()


In [6]:
# Cell 1: Import required libraries
import pandas as pd
from pathlib import Path
import os

# Define global variables
global crypto_data

# Print the current working directory
print("Current working directory:", os.getcwd())

# Change to the root directory of the project if needed
root_dir = '/Users/alexandrclimenco/Documents/UM/homework/FinTech_AlgoTradingBot'
if os.getcwd() != root_dir:
    os.chdir(root_dir)
    print("New working directory:", os.getcwd())


Current working directory: /Users/alexandrclimenco/Documents/UM/homework/FinTech_AlgoTradingBot


In [7]:
# Cell 2: Load data for all cryptocurrencies
cryptos = ["Bitcoin", "Ethereum", "Solana"]
crypto_paths = {
    "Bitcoin": "data/historical_data/BTC-USD.csv",
    "Ethereum": "data/historical_data/ETH-USD.csv",
    "Solana": "data/historical_data/SOL-USD.csv"
}

crypto_data = {}

# List and print files in the data/historical_data directory
data_dir = 'data/historical_data'
print(f"Files in {data_dir} directory: {os.listdir(data_dir)}")

for crypto, path in crypto_paths.items():
    try:
        prices_df = pd.read_csv(
            path, 
            index_col="Date", 
            parse_dates=True
        )
        # Drop columns except Date and Close
        prices_df = prices_df[["Close"]]
        # Drop Nulls
        prices_df = prices_df.dropna()
        # Save to the dictionary
        crypto_data[crypto] = prices_df
        print(f"{crypto} data loaded from {path}")
    except FileNotFoundError:
        print(f"File not found: {path}")

# Verify loaded data
for crypto, df in crypto_data.items():
    print(f"\n{crypto} data sample:")
    display(df.head())


Files in data/historical_data directory: ['ETH-USD.csv', 'BTC-USD.csv', 'coinbase', 'alpha_vantage', 'SOL-USD.csv', 'cryptocompare']
Bitcoin data loaded from data/historical_data/BTC-USD.csv
Ethereum data loaded from data/historical_data/ETH-USD.csv
Solana data loaded from data/historical_data/SOL-USD.csv

Bitcoin data sample:


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2021-07-01,33572.117188
2021-07-02,33897.046875
2021-07-03,34668.546875
2021-07-04,35287.78125
2021-07-05,33746.003906



Ethereum data sample:


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2021-07-01,2113.605469
2021-07-02,2150.040283
2021-07-03,2226.114258
2021-07-04,2321.724121
2021-07-05,2198.58252



Solana data sample:


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2021-07-01,33.404034
2021-07-02,34.020481
2021-07-03,34.478817
2021-07-04,34.3106
2021-07-05,32.984589


In [8]:
# Cell 3: Save the preprocessed data
for crypto, df in crypto_data.items():
    cleaned_data_path = f"data/cleaned_data/{crypto.lower()}_cleaned.csv"
    df.to_csv(cleaned_data_path)
    print(f"Preprocessed data for {crypto} saved to {cleaned_data_path}")


Preprocessed data for Bitcoin saved to data/cleaned_data/bitcoin_cleaned.csv
Preprocessed data for Ethereum saved to data/cleaned_data/ethereum_cleaned.csv
Preprocessed data for Solana saved to data/cleaned_data/solana_cleaned.csv
