## Phase I Project Proposal
### Cryptocurrency Market Analysis

#### Name: Siddharth Patel, DS 3000



### Introduction

The cryptocurrency market has expanded rapidly over the past decade. Such factors that contributed to this are extreme price volatility and fluctuating investor sentiment. I aim to analyze which factors most strongly influence crypto price movements. Also, I aim to explore whether volatility can be predicted using fundamental metrics like market capitalization, trading volume, and supply.

By studying data from a wide range of coins, I want to test whether larger and established coins (like Bitcoin and Ethereum) show more stability compared to smaller or emerging coins. The long term goal is to see statistical patterns that explain short term volatility and to provide data driven insights into how liquidity and market size relate to risk.

Citations:

https://coinmarketcap.com/

https://onlinelibrary.wiley.com/doi/10.1002/ijfe.2778


### Data Collection

I plan to use the CoinMarketCap API. It is a widely used data source for cryptocurrency research. It provides up to date and historical information on thousands of digital assets. This includes current price, market cap, volume, circulating supply, and percent change over different time frames.

The CoinMarketCap API requires a free personal API key. The code below is written so that you can insert your own key before running. It will pull data for the top 50 cryptocurrencies by market capitalization and save it locally as a CSV file.

In [2]:
import requests
import pandas as pd

def collect_crypto_data(api_key):
    """
    Fetches data on the top 50 cryptocurrencies by market capitalization
    from the CoinMarketCap API and saves it locally as a CSV file.

    Args:
        api_key (str): My personal CoinMarketCap API key for authentication.

    Returns:
        pd.DataFrame: A DataFrame containing cryptocurrency information with the following columns:
            - name (str): Name of the cryptocurrency.
            - symbol (str): Ticker symbol.
            - market_cap (float): Market capitalization in USD.
            - price (float): Current price in USD.
            - volume_24h (float): 24-hour trading volume in USD.
            - percent_change_24h (float): 24-hour percent price change.
            - category (str): 'Large Cap' for top 10 coins by market cap, 'Altcoin' otherwise.
    """

    url = "https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest"
    headers = {"X-CMC_PRO_API_KEY": api_key}
    params = {
        "start": 1,
        "limit": 50,
        "convert": "USD"
    }

    response = requests.get(url, headers=headers, params=params)
    data = response.json()["data"]

    records = []
    for coin in data:
        quote = coin["quote"]["USD"]
        records.append({
            "name": coin["name"],
            "symbol": coin["symbol"],
            "market_cap": quote["market_cap"],
            "price": quote["price"],
            "volume_24h": quote["volume_24h"],
            "percent_change_24h": quote["percent_change_24h"],
            "category": "Large Cap" if coin["cmc_rank"] <= 10 else "Altcoin"
        })

    df = pd.DataFrame(records)
    df.to_csv("coinmarketcap_phase1.csv", index=False)
    return df

crypto_df = collect_crypto_data("3019e35c-039d-47da-ac7c-bf9e244307e3")
crypto_df.head()


Unnamed: 0,name,symbol,market_cap,price,volume_24h,percent_change_24h,category
0,Bitcoin,BTC,2464707000000.0,123679.326246,40537390000.0,1.416342,Large Cap
1,Ethereum,ETH,549111800000.0,4549.270906,26235430000.0,1.603889,Large Cap
2,XRP,XRP,180111600000.0,3.008293,4160625000.0,-0.17659,Large Cap
3,Tether USDt,USDT,177078500000.0,1.00027,105109500000.0,-0.016217,Large Cap
4,BNB,BNB,162053300000.0,1164.302785,3620805000.0,-0.671598,Large Cap


### Data Usage and Remaining Issues

The dataset provides a balance of numeric and categorical information that is perfect for exploratory and predictive analysis. In future phases, I will create scatterplots of market cap vs volatility and volume vs price change. I will also compare stability between Large Cap and Altcoin categories. Thirdly, I will investigate correlations among volume, market cap, and 24 hour returns. Lastly, I will develop regression or clustering models to predict volatility.

I will use machine learning approaches to characterize relationships between features and predict numerical values. I can use regression techniques to quantify how market cap, volume, and category relate to price volatility. Also, I can develop models that predict the percent_change_24h value based on the other features, or potentially classify coins into high volatility versus low volatility categories. I think using scikit-learn would be beneficial.

Potential issues include handling API rate limits and making sure data consistency over time if multiple collection sessions are needed. However, these can be managed with short request delays or local caching.