# Data Wrangling Project: Cryptocurrency Analysis (BTC & ETH)

## Research Question
**How have the prices of Bitcoin (BTC) and Ethereum (ETH) correlated over the last 5 years, and what are their relative trading volumes?**

This project analyzes the daily price and volume data of the two largest cryptocurrencies to understand their market relationship.

## 1. Gather data

In this section, we will gather data from two sources:
1.  **Dataset 1**: Bitcoin (BTC) historical data manually downloaded from Yahoo Finance (CSV).
2.  **Dataset 2**: Ethereum (ETH) historical data downloaded programmatically using the CoinGecko API.

### Dataset 1: Bitcoin (BTC)

**Source**: Yahoo Finance
**Method**: Manual Download

**Instructions:**
1.  Go to the Yahoo Finance page for Bitcoin: [https://finance.yahoo.com/quote/BTC-USD/history](https://finance.yahoo.com/quote/BTC-USD/history)
2.  Set the **Time Period** to **5 Years** (or the max available if less).
3.  Set the **Frequency** to **Daily**.
4.  Click **Apply**.
5.  Click **Download** to save the CSV file.
6.  Rename the downloaded file to `btc_usd.csv`.
7.  Create a folder named `Dataset` in your project directory (if it doesn't exist) and move the file there: `Dataset/btc_usd.csv`.

In [None]:
import pandas as pd
import os

# Ensure Dataset directory exists
if not os.path.exists('Dataset'):
    os.makedirs('Dataset')
    print("Created 'Dataset' directory.")

# Load Dataset 1 (Bitcoin CSV)
btc_path = 'Dataset/btc_usd.csv'

if os.path.exists(btc_path):
    df_btc = pd.read_csv(btc_path)
    print(f"Dataset 1 loaded successfully. Shape: {df_btc.shape}")
    display(df_btc.head())
else:
    print(f"File not found: {btc_path}. Please download it manually as per instructions above.")

### Dataset 2: Ethereum (ETH)

**Source**: CoinGecko API
**Method**: Programmatic Download (API)

We will fetch the last 5 years (approx. 1825 days) of daily market data for Ethereum.

In [None]:
import requests
import json
import time

# URL for CoinGecko API (Ethereum Market Chart)
url = "https://api.coingecko.com/api/v3/coins/ethereum/market_chart?vs_currency=usd&days=1825&interval=daily"

print("Attempting to download Ethereum data from CoinGecko API...")

try:
    # Add User-Agent to avoid 403 Forbidden errors
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers, timeout=15)

    if response.status_code == 200:
        data_eth = response.json()
        print("Successfully downloaded data from API.")

        # Save raw JSON for reference
        with open('Dataset/eth_data_raw.json', 'w') as f:
            json.dump(data_eth, f)

        # Process JSON into DataFrame
        # CoinGecko returns lists of [timestamp, value]
        prices = data_eth.get('prices', [])
        volumes = data_eth.get('total_volumes', [])

        # Create temporary dataframes
        df_prices = pd.DataFrame(prices, columns=['timestamp', 'price'])
        df_volumes = pd.DataFrame(volumes, columns=['timestamp', 'volume'])

        # Merge on timestamp
        df_eth = pd.merge(df_prices, df_volumes, on='timestamp')

        # Convert timestamp to datetime (ms to s)
        df_eth['Date'] = pd.to_datetime(df_eth['timestamp'], unit='ms')

        # Save to CSV
        csv_path_eth = 'Dataset/eth_usd.csv'
        df_eth.to_csv(csv_path_eth, index=False)
        print(f"Dataset 2 gathered and saved to {csv_path_eth}. Shape: {df_eth.shape}")
        display(df_eth.head())

    else:
        print(f"API request failed with status code {response.status_code}")
        print(response.text)

except Exception as e:
    print(f"API request failed or encountered an error: {e}")

## 2. Assess data

Assess the data according to data quality and tidiness metrics.

List **two** data quality issues and **two** tidiness issues. Assess each data issue visually **and** programmatically, then briefly describe the issue you find. **Make sure you include justifications for the methods you use for the assessment.**

### Quality Issue 1: Missing Values or Inconsistencies

In [None]:
#FILL IN - Inspecting the dataframe visually

In [None]:
#FILL IN - Inspecting the dataframe programmatically

Issue and justification: *FILL IN*

### Quality Issue 2: Data Types (e.g., Date parsing)

In [None]:
#FILL IN - Inspecting the dataframe visually

In [None]:
#FILL IN - Inspecting the dataframe programmatically

Issue and justification: *FILL IN*

### Tidiness Issue 1: Redundant Columns

In [None]:
#FILL IN - Inspecting the dataframe visually

In [None]:
#FILL IN - Inspecting the dataframe programmatically

Issue and justification: *FILL IN*

### Tidiness Issue 2: Data Structure (Merge requirements)

In [None]:
#FILL IN - Inspecting the dataframe visually

In [None]:
#FILL IN - Inspecting the dataframe programmatically

Issue and justification: *FILL IN*

## 3. Clean data

Clean the data to solve the 4 issues corresponding to data quality and tidiness found in the assessing step. **Make sure you include justifications for your cleaning decisions.**

After the cleaning for each issue, please use **either** the visually or programatical method to validate the cleaning was succesful.

At this stage, you are also expected to remove variables that are unnecessary for your analysis and combine your datasets. Depending on your datasets, you may choose to perform variable combination and elimination before or after the cleaning stage. Your dataset must have **at least** 4 variables after combining the data.

In [None]:
# FILL IN - Clean Issue 1

In [None]:
# FILL IN - Clean Issue 2

In [None]:
# FILL IN - Remove unnecessary variables and combine datasets

## 4. Update your data store

Update your local database/data store with the cleaned data, following best practices for storing your cleaned data:

1.  Must maintain different instances of the data (raw and cleaned data)
2.  Resourceful naming (e.g. `combined_dataset.csv`)
3.  Compare the quantity of raw and cleaned data

In [None]:
# FILL IN - Save the final cleaned dataframe

## 5. Answer the research question

**4.1:** Define and answer the research question.

In [None]:
# FILL IN - Analysis and Visualization

**4.2:** Reflection
If I had more time to complete the project, I would...