# DATA COMPREHESION

## DATA MINING

El objetivo es obtener todos los datos relacionados a cada token identificado por un atributo
llamado clase: Clase: (0: IA, 1: Gaming, 2: RWA, 3: Meme).
Se deben capturar datos en los siguientes periodos (desde la fecha del Halving, los siguientes
250 días):
<br/>
- Halving #1: 2/12/2012 + 250 días
- Halving #2: 2/07/2016 + 250 días
- Halving #3: 3/05/2020 + 250 días

Algunos de las características importantes a considerar son las siguientes:
-  Fecha de la captura (en formato dd/mm/yyyy)
-  Token: nombre corto del token
-  Nombre: nombre del token/proyecto
-  Valor del activo (en US$)
-  Capitalización total de mercado (market cap)
-  Posición en el ranking de criptomonedas
-  Volumen (volumen 24h)
-  Porcentaje del volumen diario (volumen 24h / market cap)
-  Suministro circulante (circulating supply)
-  Suministro total (total supply)
-  Suministro máximo (max supply)
- X lograble Comunidad (página web, redes sociales, etc.)
- X lograble lograble Porcentaje/ factor de difusión en redes
-   Calificación
- X Indicador (si vivió o no un halving previamente)
-   Indicador de multi cadena (multichain)
-   Indicador de listado o no en Exchanges Centralizados (CEX)
- X Contrato del proyecto
- Clase: (0: IA, 1: Gaming, 2: RWA, 3: Meme)

**IMPORTANTE: existirán tokens que son nuevos y por tanto no han vivido aun un Halving. Para estos
casos, contemplar adquirir los datos de dichos tokens desde la fecha de su creación hasta la fecha
del Halving #4: 20/04/2024.**


In [1]:
import os
from dotenv import load_dotenv
import time
load_dotenv()
api_key = os.getenv("API_KEY")

In [2]:
from requests import Session
from requests.exceptions import ConnectionError, Timeout, TooManyRedirects
import json
import pandas as pd

In [3]:
#load json file
def load_json(filepath,filename):
    with open(filepath+filename, 'r') as f:
        data = json.load(f)
    return data

In [75]:
class CoinMarketCapAPI:
    def __init__(self, api_key):
        self.api_key = api_key
        self.headers = {
            'Accepts': 'application/json',
            'X-CMC_PRO_API_KEY': api_key,
        }
        self.session = Session()
        self.session.headers.update(self.headers)
        self.url_base = 'https://pro-api.coinmarketcap.com/v1/'
    
    def get_categories(self):
        url = 'cryptocurrency/categories'
        return self.__catch_error(self.url_base+url)
    
    def save_to_csv(self, data, filename):
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
    
    def load_csv(self, filename):
        return pd.read_csv(filename, index_col=False)

    def __catch_error(self, url):
        try:
            response = self.session.get(url)
            data = json.loads(response.text)
            return data["data"]
        except (ConnectionError, Timeout, TooManyRedirects) as e:
            print(e)
        

In [12]:
class CoinGeckoClass:
    def __init__(self):
        self.api_key="CG-4iykpMQk3bQNCYrS4pMjmmXJ"
        self.url_base="https://api.coingecko.com/api/v3/"
        self.headers={
            "accept": "application/json",
            "x-cg-api-key":"CG-4gmHK6qGSZHmTp1Uh7ThLwr6"
            }
        self.session = Session()
        self.session.headers.update(self.headers)

        
    def __catch_error(self, url):
        try:
            response = self.session.get(url)
            data = json.loads(response.text)
            return data
        except (ConnectionError, Timeout, TooManyRedirects) as e:
            print(e)
    
    def get_categories(self):
        url = 'coins/categories'
        return self.__catch_error(self.url_base+url)
    
    def get_asset_platforms(self):
        url = 'asset_platforms'
        return self.__catch_error(self.url_base+url)
    
    def get_coin_list(self):
        url = 'coins/list?include_platform=true'
        return self.__catch_error(self.url_base+url)
    
    def get_coin_list_with_market_data(self,category):
        url = f"coins/markets?vs_currency=usd&category={category}&per_page=250&sparkline=true&price_change_percentage=1h%2C24h%2C7d&precision=full"
        return self.__catch_error(self.url_base+url)
    
    def get_exchange_list(self):
        url = 'exchanges/list'
        return self.__catch_error(self.url_base+url)

    def save_json(self,data,filepath,filename):
        with open(filepath+filename, 'w') as f:
            json.dump(data, f)

     

    def ping(self):
        url=self.url_base+"ping"
        response = self.session.get(url)
        data = json.loads(response.text)
        print(data)

In [13]:
api = CoinGeckoClass()
api.ping()

{'gecko_says': '(V3) To the Moon!'}


In [18]:
categories=api.get_categories()
api.save_json(categories,"data/raw/","categories.json")

In [22]:
asset_platforms=api.get_asset_platforms()
api.save_json(asset_platforms,"data/raw/","asset_platforms.json")

In [25]:
coin_list=api.get_coin_list()
api.save_json(coin_list,"data/raw/","coin_list.json")

### rwa

In [28]:
coin_list_with_market_data=api.get_coin_list_with_market_data("real-world-assets-rwa")
api.save_json(coin_list_with_market_data,"data/raw/","coin_list_with_market_data_real_world_assets_rwa.json")

### gaming

In [30]:
coin_list_with_market_data=api.get_coin_list_with_market_data("gaming")
api.save_json(coin_list_with_market_data,"data/raw/","coin_list_with_market_data_gaming.json")

In [31]:
coin_list_with_market_data=api.get_coin_list_with_market_data("play-to-earn")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","play_to_earn.json")


In [32]:
coin_list_with_market_data=api.get_coin_list_with_market_data("gaming-blockchains")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","gaming_blockchains.json")

In [33]:
coin_list_with_market_data=api.get_coin_list_with_market_data("gaming-utility-token")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","gaming_utility_token.json")

In [34]:
coin_list_with_market_data=api.get_coin_list_with_market_data("gaming-governance-token")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","gaming_governance_token.json")

In [35]:
coin_list_with_market_data=api.get_coin_list_with_market_data("gaming-platform")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","gaming_platform.json")

In [36]:
coin_list_with_market_data=api.get_coin_list_with_market_data("on-chain-gaming")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","on_chain_gaming.json")

### memes

In [4]:
categories = load_json("data/raw/","categories.json")

In [5]:
df_categories=pd.DataFrame(categories)

In [6]:
df_categories_meme = df_categories.loc[df_categories['content'].apply(lambda x : "meme" in str(x).lower())]

In [7]:
df_categories_meme["id"].to_list()

['meme-token',
 'dog-themed-coins',
 'elon-musk-inspired-coins',
 'solana-meme-coins',
 'cat-themed-coins',
 'base-meme-coins',
 'presale-meme-coins',
 'politifi',
 'ai-meme-coins',
 'parody-meme-coins',
 'ton-meme-coins',
 'anime-themed-coins',
 'duck-themed-coins']

In [8]:
df_categories_meme["id"].to_list()[7:]

['politifi',
 'ai-meme-coins',
 'parody-meme-coins',
 'ton-meme-coins',
 'anime-themed-coins',
 'duck-themed-coins']

In [None]:
['politifi',
 'anime-themed-coins',
 'duck-themed-coins']

In [15]:
for category in ['politifi','anime-themed-coins','duck-themed-coins']:
    coin_list_with_market_data=api.get_coin_list_with_market_data(category)
    api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/memes/",f"{category.replace('-','_')}v2.json")

### IA

In [107]:
coin_list_with_market_data=api.get_coin_list_with_market_data("artificial-intelligence")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/AI/","artificial_intelligence.json")

### CEX TOKENS

In [88]:
coin_list_with_market_data=api.get_coin_list_with_market_data("centralized-exchange-token-cex")
api.save_json(coin_list_with_market_data,"data/raw/coin_list_with_market_data/","centralized_exchange_token_cex.json")


In [93]:
data_cex=load_json("data/raw/coin_list_with_market_data/","centralized_exchange_token_cex.json")

In [95]:
len(data_cex)

42

### exchanges

In [74]:
exchange_list=api.get_exchange_list()
api.save_json(exchange_list,"data/raw/","exchange_list.json")

### clasificando tokens por clase

#### gaming

In [None]:
from pathlib import Path
#load gaming json files
data_folder = Path("data/raw/coin_list_with_market_data/gaming/")
gaming_files = [file for file in data_folder.iterdir() if file.is_file()]
# to string list

In [80]:
df_all_tokens_gaming = pd.DataFrame()
for file in gaming_files:
    json_file=load_json("data/raw/coin_list_with_market_data/gaming/",file.name)
    df = pd.DataFrame(json_file)
    df = df[['id','symbol','name']]
    df['class']=2
    df_all_tokens_gaming = pd.concat([df_all_tokens_gaming,df])
    

In [83]:
df_all_tokens_gaming.drop_duplicates(subset=['id'],inplace=True)

In [86]:
#numeor de toknens gaming
len(df_all_tokens_gaming)

329

#### ia

In [109]:
data_folder = Path("data/raw/coin_list_with_market_data/AI/")
files = [file for file in data_folder.iterdir() if file.is_file()]

In [111]:
df_all_tokens_ai = pd.DataFrame()
for file in files:
    json_file=load_json("data/raw/coin_list_with_market_data/AI/",file.name)
    df = pd.DataFrame(json_file)
    df = df[['id','symbol','name']]
    df['class']=0
    df_all_tokens_ai = pd.concat([df_all_tokens_ai,df])

#### memes

In [116]:
data_folder = Path("data/raw/coin_list_with_market_data/memes/")
files = [file for file in data_folder.iterdir() if file.is_file()]

In [118]:
df_all_tokens_memes = pd.DataFrame()
for file in files:
    json_file=load_json("data/raw/coin_list_with_market_data/memes/",file.name)
    df = pd.DataFrame(json_file)
    df = df[['id','symbol','name']]
    df['class']=3
    df_all_tokens_memes = pd.concat([df_all_tokens_memes,df])

KeyError: "None of [Index(['id', 'symbol', 'name'], dtype='object')] are in the [columns]"

In [None]:
df_all_tokens_memes.drop_duplicates(subset=['id'],inplace=True)