# Étape 1 / Exploration de données non structurées

Récolter deux types de données en passant par l’API Binance.  

* Grâce à cette API, on peut aller récupérer des informations sur les cours des différents marchés (BTC-USDT, BTC-ETH, …).
* Le but sera de créer une fonction de récupération de données générique afin de pouvoir avoir les données de n’importe quel marché.

## Explications générales

>**API Terminology**

These terms will be used throughout the documentation, so it is recommended that you read them to enhance your understanding of the API (especially for new users).
- **Base asset** refers to the asset that is the quantity of a symbol; for the symbol BTCUSDT, BTC would be the base asset.
- **Quote asset** refers to the asset that is the price of a symbol; for the symbol BTCUSDT, USDT would be the quote asset.  

>**Symbol Status**
- PRE_TRADING
- TRADING
- POST_TRADING
- END_OF_DAY
- HALT
- AUCTION_MATCH
- BREAK  

>**Status	Description**

`NEW`	The order has been accepted by the engine  
`PARTIALLY_FILLED`	Part of the order has been filled  
`FILLED`	The order has been completed  
`CANCELED`	The order has been canceled by the user  
`PENDING_CANCEL`	This is currently unused  
`REJECTED`	The order was not accepted by the engine and not processed  
`EXPIRED`	The order was canceled according to the order type's rules (e.g., LIMIT FOK orders with no fill, LIMIT IOC, or MARKET orders that partially fill), or by the exchange(e.g., orders canceled during liquidation or orders canceled during maintenance)  
`EXPIRED_IN_MATCH`	The order was canceled by the exchange due to STP. (e.g. an order with EXPIRE_TAKER will match with existing orders on the book with the same account or same tradeGroupId)  

## Données de base des paires

In [None]:
import requests
import pandas as pd

# Format d'affichage plus lisible pour les floats
#pd.options.display.float_format = '{:.8f}'.format
pd.options.display.float_format = lambda x: f'{x:.10f}'.rstrip('0').rstrip('.') if '.' in f'{x:.10f}' else f'{x:.10f}' # formatage des floats pour enlever les zéros inutiles

def get_pairs_list():
    url = 'https://api.binance.com/api/v3/exchangeInfo'
    data = requests.get(url).json()
    # Liste de toutes les paires
    pairs_list = data['symbols']
    return pairs_list

def get_pair_info(pair):
    filtres = { f['filterType']: f for f in pair['filters'] } # on transforme la liste de filtres en dictionnaire pour un accès plus facile

    return {
        'symbol': pair['symbol'],
        'baseAsset': pair['baseAsset'],
        'quoteAsset': pair['quoteAsset'],
        'status': pair['status'],
        'minPrice': filtres.get('PRICE_FILTER', {}).get('minPrice'),
        'maxPrice': filtres.get('PRICE_FILTER', {}).get('maxPrice'),
        'tickSize': filtres.get('PRICE_FILTER', {}).get('tickSize'),
        'minQty': filtres.get('LOT_SIZE', {}).get('minQty'),
        'maxQty': filtres.get('LOT_SIZE', {}).get('maxQty'),
        'stepSize': filtres.get('LOT_SIZE', {}).get('stepSize'),
        'minNotional': filtres.get('NOTIONAL', {}).get('minNotional'),
        'maxNotional': filtres.get('NOTIONAL', {}).get('maxNotional'),
    }

pairs_list = get_pairs_list()
pairs_general_infos = [get_pair_info(pair) for pair in pairs_list]

# Conversion en DataFrame
df_pairs_general_infos = pd.DataFrame(pairs_general_infos)

# Conversion des colonnes numériques en float
df_pairs_general_infos[['minPrice', 'maxPrice', 'tickSize', 'minQty', 'maxQty', 'stepSize', 'minNotional', 'maxNotional']] = df_pairs_general_infos[['minPrice', 'maxPrice', 'tickSize', 'minQty', 'maxQty', 'stepSize', 'minNotional', 'maxNotional']].astype(float)

# Affichage
display(df_pairs_general_infos.head())

Unnamed: 0,symbol,baseAsset,quoteAsset,status,minPrice,maxPrice,tickSize,minQty,maxQty,stepSize,minNotional,maxNotional
0,ETHBTC,ETH,BTC,TRADING,1e-05,922327,1e-05,0.0001,100000,0.0001,0.0001,9000000
1,LTCBTC,LTC,BTC,TRADING,1e-06,100000,1e-06,0.001,100000,0.001,0.0001,9000000
2,BNBBTC,BNB,BTC,TRADING,1e-06,100000,1e-06,0.001,100000,0.001,0.0001,9000000
3,NEOBTC,NEO,BTC,TRADING,1e-07,100000,1e-07,0.01,100000,0.01,0.0001,9000000
4,QTUMETH,QTUM,ETH,TRADING,1e-06,1000,1e-06,0.1,90000000,0.1,0.001,9000000


In [5]:
import requests
import pandas as pd

def get_pairs_list():
    url = 'https://api.binance.com/api/v3/exchangeInfo'
    data = requests.get(url).json()
    # Liste de toutes les paires
    pairs_list = data['symbols']
    return pairs_list

pairs_list = get_pairs_list()
pairs_list_df = pd.DataFrame(pairs_list)
print(pairs_list_df.head(10))

    symbol   status baseAsset  baseAssetPrecision quoteAsset  quotePrecision  \
0   ETHBTC  TRADING       ETH                   8        BTC               8   
1   LTCBTC  TRADING       LTC                   8        BTC               8   
2   BNBBTC  TRADING       BNB                   8        BTC               8   
3   NEOBTC  TRADING       NEO                   8        BTC               8   
4  QTUMETH  TRADING      QTUM                   8        ETH               8   
5   EOSETH    BREAK       EOS                   8        ETH               8   
6   SNTETH    BREAK       SNT                   8        ETH               8   
7   BNTETH    BREAK       BNT                   8        ETH               8   
8   BCCBTC    BREAK       BCC                   8        BTC               8   
9   GASBTC  TRADING       GAS                   8        BTC               8   

   quoteAssetPrecision  baseCommissionPrecision  quoteCommissionPrecision  \
0                    8                    

In [47]:
df_pairs_general_infos[['minPrice', 'maxPrice', 'tickSize', 'minQty', 'maxQty', 'stepSize', 'minNotional', 'maxNotional']] = df_pairs_general_infos[['minPrice', 'maxPrice', 'tickSize', 'minQty', 'maxQty', 'stepSize', 'minNotional', 'maxNotional']].astype(float)

In [48]:
df_pairs_general_infos.dtypes

symbol          object
baseAsset       object
quoteAsset      object
status          object
minPrice       float64
maxPrice       float64
tickSize       float64
minQty         float64
maxQty         float64
stepSize       float64
minNotional    float64
maxNotional    float64
dtype: object

In [40]:
display(df_pairs_general_infos.head())

Unnamed: 0,symbol,baseAsset,quoteAsset,status,orderTypes,minPrice,maxPrice,tickSize,minQty,maxQty,stepSize,minNotional,maxNotional
0,ETHBTC,ETH,BTC,TRADING,"[LIMIT, LIMIT_MAKER, MARKET, STOP_LOSS, STOP_L...",1e-05,922327.0,1e-05,0.0001,100000.0,0.0001,0.0001,9000000.0
1,LTCBTC,LTC,BTC,TRADING,"[LIMIT, LIMIT_MAKER, MARKET, STOP_LOSS, STOP_L...",1e-06,100000.0,1e-06,0.001,100000.0,0.001,0.0001,9000000.0
2,BNBBTC,BNB,BTC,TRADING,"[LIMIT, LIMIT_MAKER, MARKET, STOP_LOSS, STOP_L...",1e-06,100000.0,1e-06,0.001,100000.0,0.001,0.0001,9000000.0
3,NEOBTC,NEO,BTC,TRADING,"[LIMIT, LIMIT_MAKER, MARKET, STOP_LOSS, STOP_L...",1e-07,100000.0,1e-07,0.01,100000.0,0.01,0.0001,9000000.0
4,QTUMETH,QTUM,ETH,TRADING,"[LIMIT, LIMIT_MAKER, MARKET, STOP_LOSS, STOP_L...",1e-06,1000.0,1e-06,0.1,90000000.0,0.1,0.001,9000000.0


###Fonction avec une liste

In [31]:
import requests
import pandas as pd
import json

def get_pair_info(pairs):
    url = 'https://api.binance.com/api/v3/exchangeInfo'
    params = { 'symbols': json.dumps(pairs) } 
    data = requests.get(url, params = params).json()

    if not data.get('symbols'):
        return None  

    pairs_general_info = []

    for symbol_data in data['symbols']:
        filtres = { f['filterType']: f for f in symbol_data['filters'] }

        pairs_general_info.append({
            'symbol': symbol_data['symbol'],
            'baseAsset': symbol_data['baseAsset'],
            'quoteAsset': symbol_data['quoteAsset'],
            'status': symbol_data['status'],
            'orderTypes': symbol_data['orderTypes'],
            'minPrice': filtres.get('PRICE_FILTER', {}).get('minPrice'),
            'maxPrice': filtres.get('PRICE_FILTER', {}).get('maxPrice'),
            'tickSize': filtres.get('PRICE_FILTER', {}).get('tickSize'),
            'minQty': filtres.get('LOT_SIZE', {}).get('minQty'),
            'maxQty': filtres.get('LOT_SIZE', {}).get('maxQty'),
            'stepSize': filtres.get('LOT_SIZE', {}).get('stepSize'),
            'minNotional': filtres.get('NOTIONAL', {}).get('minNotional'),
            'maxNotional': filtres.get('NOTIONAL', {}).get('maxNotional'),
        })

    return pairs_general_info

lst_pairs = ['BTCUSDT', 'ETHUSDT', 'BNBUSDT']
info = get_pair_info('BTCUSDT')

print(info)


None


In [32]:
def display_pair_info(info):
    if info is None:
        print("Paire introuvable.")
        return

    print(f"Pair Informations : {info['symbol']}\n")
    print(f"{'Base asset':15} : {info['baseAsset']}")
    print(f"{'Quote asset':15} : {info['quoteAsset']}")
    print(f"{'Statut':15} : {info['status']}")
    print(f"{'Order types':15} : {', '.join(info['orderTypes'])}")

    print("\nPair Prices Informations :")
    print(f"{'Min price':15} : {info['minPrice']}")
    print(f"{'Max price':15} : {info['maxPrice']}")
    print(f"{'Tick size':15} : {info['tickSize']}")
    print(f"{'Min notional':15} : {info['minNotional']}")
    print(f"{'Max notional':15} : {info['maxNotional']}")

    print("\nPair Quantity Informations :")
    print(f"{'Min qty':15} : {info['minQty']}")
    print(f"{'Max qty':15} : {info['maxQty']}")
    print(f"{'Step size':15} : {info['stepSize']}")

display_pair_info(info)

Paire introuvable.


In [2]:
import requests
import pandas as pd
import time
import os

# Fichier de sauvegarde intermédiaire
SAVE_PATH = 'top50_tokens_enriched.csv'

def get_top_50_tokens():
    url = 'https://api.coingecko.com/api/v3/coins/markets'
    params = {
        'vs_currency': 'usd',
        'order': 'market_cap_desc',
        'per_page': 50,
        'page': 1,
        'sparkline': False
    }
    response = requests.get(url, params=params)
    response.raise_for_status()
    return response.json()

def get_token_metadata_with_retry(coin_id, max_retries=5, delay=2.5):
    for attempt in range(max_retries):
        try:
            url = f'https://api.coingecko.com/api/v3/coins/{coin_id}'
            response = requests.get(url)
            if response.status_code == 429:
                print(f"⚠️ Rate limit atteint pour {coin_id}, attente 60s...")
                time.sleep(60)
                continue
            response.raise_for_status()
            return response.json()
        except Exception as e:
            print(f"❌ Erreur tentative {attempt + 1} pour {coin_id} : {e}")
            time.sleep(delay)
    return None

# Chargement si fichier déjà existant
if os.path.exists(SAVE_PATH):
    df_existing = pd.read_csv(SAVE_PATH)
    already_done = set(df_existing['coingecko_id'])
    tokens_data = df_existing.to_dict(orient='records')
    print(f"🔄 Reprise : {len(already_done)} tokens déjà traités.")
else:
    already_done = set()
    tokens_data = []

# Étape 1 : récupération du top 50
top_50_raw = get_top_50_tokens()

# Étape 2 : enrichissement token par token
for token in top_50_raw:
    coin_id = token['id']
    if coin_id in already_done:
        continue

    metadata = get_token_metadata_with_retry(coin_id)
    if metadata is None:
        continue

    data = {
        'coingecko_id': coin_id,
        'name': metadata.get('name'),
        'symbol': metadata.get('symbol').upper(),
        'market_cap_rank': metadata.get('market_cap_rank'),
        'asset_platform': metadata.get('asset_platform_id'),
        'token_type': metadata.get('asset_platform_id') or 'native',
        'homepage': metadata.get('links', {}).get('homepage', [''])[0],
        'categories': ', '.join(metadata.get('categories', [])),
        'supply_circulating': metadata.get('market_data', {}).get('circulating_supply'),
        'supply_total': metadata.get('market_data', {}).get('total_supply'),
        'supply_max': metadata.get('market_data', {}).get('max_supply'),
    }

    tokens_data.append(data)
    already_done.add(coin_id)

    # Sauvegarde partielle après chaque token
    df_temp = pd.DataFrame(tokens_data)
    df_temp.to_csv(SAVE_PATH, index=False)
    print(f"✅ Token traité : {coin_id} — Total : {len(tokens_data)}")
    
    time.sleep(2.5)

# Affichage final
df_final = pd.DataFrame(tokens_data)
pd.set_option('display.max_columns', None)
display(df_final.head())

✅ Token traité : bitcoin — Total : 1
✅ Token traité : ethereum — Total : 2
✅ Token traité : ripple — Total : 3
✅ Token traité : tether — Total : 4
⚠️ Rate limit atteint pour binancecoin, attente 60s...
✅ Token traité : binancecoin — Total : 5
✅ Token traité : solana — Total : 6
✅ Token traité : usd-coin — Total : 7
✅ Token traité : dogecoin — Total : 8
✅ Token traité : staked-ether — Total : 9
✅ Token traité : tron — Total : 10
⚠️ Rate limit atteint pour cardano, attente 60s...
✅ Token traité : cardano — Total : 11
✅ Token traité : wrapped-bitcoin — Total : 12
✅ Token traité : hyperliquid — Total : 13
⚠️ Rate limit atteint pour wrapped-steth, attente 60s...
✅ Token traité : wrapped-steth — Total : 14
✅ Token traité : stellar — Total : 15
✅ Token traité : sui — Total : 16
✅ Token traité : chainlink — Total : 17
⚠️ Rate limit atteint pour wrapped-beacon-eth, attente 60s...
✅ Token traité : wrapped-beacon-eth — Total : 18
✅ Token traité : bitcoin-cash — Total : 19
✅ Token traité : wrapped

Unnamed: 0,coingecko_id,name,symbol,market_cap_rank,asset_platform,token_type,homepage,categories,supply_circulating,supply_total,supply_max
0,bitcoin,Bitcoin,BTC,1,,native,http://www.bitcoin.org,"Smart Contract Platform, Layer 1 (L1), FTX Hol...",19897060.0,19897060.0,21000000.0
1,ethereum,Ethereum,ETH,2,,native,https://www.ethereum.org/,"Smart Contract Platform, Layer 1 (L1), Ethereu...",120711000.0,120711000.0,
2,ripple,XRP,XRP,3,,native,https://ripple.com/currency/,"FTX Holdings, Pantera Capital Portfolio, Andre...",59239650000.0,99985900000.0,100000000000.0
3,tether,Tether,USDT,4,ethereum,ethereum,https://tether.to/,"Stablecoins, USD Stablecoin, Solana Ecosystem,...",162513500000.0,162513500000.0,
4,binancecoin,BNB,BNB,5,,native,https://www.binance.com,"Smart Contract Platform, Exchange-based Tokens...",139288700.0,139288700.0,200000000.0


In [3]:
display(df_final['categories'].value_counts())

categories
Smart Contract Platform, Layer 1 (L1), FTX Holdings, Proof of Work (PoW), Bitcoin Ecosystem, GMCI 30 Index, GMCI Index, Coinbase 50 Index                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      1
Decentralized Finance (DeFi), Yield Farming, BNB Chain Ecosystem, Lending/Borrowing Protocols, Avalanche Ecosystem, Polygon Ecosystem, Near Protocol Ecosystem, Fantom Ecosystem, Harmony Ecosystem, Arbitrum Ecosystem, Ethereum Ecosystem, Optimism Ecosystem, Base Ecosystem, Index Coop Defi Index, Energi Ecosystem, Sora Ecosystem, Hu

## Autres données

## Boucle avec toutes les paires pour constuire le dataframe des infos de base

# A garder pour l'étape 3

In [34]:
# Récupération de toutes les données
url = 'https://api.binance.com/api/v3/exchangeInfo'
data = requests.get(url).json()
pair_list = data['symbols']

# Boucle sur chaque paire avec son nom
pairs_base_infos = []
for pair in pair_list:
    info = get_pair_info(pair['symbol'])
    if info is not None:
        pairs_base_infos.append(info)

# Création du DataFrame final
df_pairs_base_infos = pd.DataFrame(pairs_base_infos)

# Affichage
print(df_pairs_base_infos.head())

KeyboardInterrupt: 

In [None]:
resp = requests.get('https://api.binance.us/api/v3/trades?{BTCUSDT}')

resp.json()

In [None]:
url = 'https://api.binance.com/api/v3/ticker/24hr'
data = requests.get(url).json()
data

In [None]:
url = 'https://api.binance.com/api/v3/ticker/price'
response = requests.get(url)
tickers = response.json()

tickers