# data_prep-checkpoint.ipynb

## Notebook Purpose
This notebook is designed to load, clean, and preprocess historical cryptocurrency data. It will handle missing values, convert data types, and prepare the data for further analysis and model training.

## Instructions
1. **Import Necessary Libraries**:
   - Import `pandas` for data manipulation.
   - Import `numpy` for numerical operations.
   - Import `pathlib` for handling file paths.

2. **Load Data**:
   - Load the CSV file containing historical cryptocurrency data.

3. **Preprocess Data**:
   - Handle missing values.
   - Convert data types as necessary.
   - Ensure the data is properly indexed by date.

4. **Save Preprocessed Data**:
   - Save the cleaned and preprocessed data to a new CSV file for later use.

5. **Review Data**:
   - Display the first few rows of the preprocessed data to ensure it looks correct.

## Example Code
```python
# Import necessary libraries
import pandas as pd
import numpy as np
from pathlib import Path

# Load data
data_path = 'data/BTC-USD.csv'  # Update this path based on the selected cryptocurrency
data = pd.read_csv(data_path, parse_dates=['Date'], index_col='Date')

# Preprocess data
data = data.dropna()  # Handle missing values
data = data[['Close']]  # Select necessary columns

# Save the preprocessed data
data.to_csv('data/cleaned_data/BTC_cleaned.csv')

# Display the first few rows of the preprocessed data
data.head()


In [None]:
# Cell 1: Import required libraries
import pandas as pd
from pathlib import Path

# Define global variables
global crypto
global prices_df


In [None]:
# Cell 2: Determine which file to load; default to Bitcoin if invalid value passed
crypto = "Bitcoin"  # Example value, update this as necessary

pathname = Path()
if crypto == "Bitcoin":
    pathname = Path("data/BTC-USD.csv")
elif crypto == "Ethereum":
    pathname = Path("data/ETH-USD.csv")
elif crypto == "Solana":
    pathname = Path("data/SOL-USD.csv")
else:
    pathname = Path("data/BTC-USD.csv")


In [None]:
# Cell 3: Read file
prices_df = pd.read_csv(
    pathname, 
    index_col="Date", 
    parse_dates=True
)

# Drop columns except Date and Close
prices_df = prices_df[["Close"]]

# Drop Nulls
prices_df = prices_df.dropna()

# Display sample data
display(prices_df.head())


In [None]:
# Cell 4: Save the preprocessed data
cleaned_data_path = f"data/cleaned_data/{crypto.lower()}_cleaned.csv"
prices_df.to_csv(cleaned_data_path)
print(f"Preprocessed data saved to {cleaned_data_path}")
