# Inital Project Exploration


## Project Steps
- Step 1: Scope the Project and Gather Data
- Step 2: Explore and Assess the Data
- Step 3: Define the Data Model
- Step 4: Run ETL to Model the Data
- Step 5: Complete Project Write Up

## Project Requirements

### Write up
- Project scope outlines project steps and defines purpose of final data model
- Other scenarios are addressed
    - data increased by 100x
    - pipelines run daily by 7 am
    - database needs access by 100+ people
- Choice of tools and tech defended well

### Execution
- Code is clean and modular
- Includes **2 data quality checks**
- ETL results in planned for data model
- Data model data dicitonary is included
- **2 data sources** (and data formats)
- more than **1 million rows**

In [10]:
import pandas as pd
import requests
import time 

## Osmosis API To Ingest Base Tokens
- Inital project idea is to build a datalake that will ingest historical market data that exists on the Osmosis DEX
- Osmosis DEX is a decentralized crypto currency exchange with low trading fees
- There is a finite but growing number of currencies trading on this exchange, below we can hit and API provided by Osmosis to get that list of currencies
- There is also a historical price endpoint, which I've tested for a specific currency below

In [42]:
osmosis_api_base_url = 'https://api-osmosis.imperator.co'

osmo_token_list_response = requests.get(f'{osmosis_api_base_url}/tokens/v1/all')
akt_test_response = requests.get(f'{osmosis_api_base_url}/tokens/v1/historical/AKT/chart?range=1y')

### 33 Total Tokens On Osmosis
- this provides us with a smaller scope that will be better for this project (can expand later on)

In [44]:
osmo_token_list_df = pd.DataFrame(osmo_token_list_response.json())
print(osmo_token_list_df.shape)
osmo_token_list_df.head()

(33, 6)


Unnamed: 0,price,denom,symbol,liquidity,volume_24h,name
0,5.63,ibc/0954E1C28EB7AF5B72D24F3BC2B47BBB2FDF91BDDF...,SCRT,17479880.0,2936924.0,Secret Network
1,51.96,ibc/0EF15DF2F02480ADE0BB6E85D9EBB5DAEA2836D386...,LUNA,82429480.0,14303820.0,Luna
2,1.46,ibc/1480B8FD20AD5FCAE81EA87584D269547DD4D43684...,AKT,13478350.0,908667.4,Akash Network
3,0.944305,ibc/1DC495FCEFDA068A3820F903EDBD78B942FBD204D7...,NGM,2479584.0,94716.35,e-Money
4,1.13,ibc/1DCC8A6CB5689018431323953344A9F6CC4D0BFB26...,REGEN,5874160.0,93339.33,Regen Network


### Not enough info provided by Osmosis API for price data
- for a year of requested prices, we only have 221 rows, will need to look elsewhere for price data
- there is also only access to Osmosis tokens, if we want to expand the project we'd have to switch data sources

In [45]:
test_currency_df = pd.DataFrame(akt_test_response.json())
print(test_currency_df.shape)
test_currency_df.head()

(221, 5)


Unnamed: 0,time,close,high,low,open
0,1624492800,2.97,2.98,2.9,2.9
1,1624579200,3.85,3.88,2.96,2.97
2,1624665600,4.01,4.2,3.81,3.85
3,1624752000,3.74,4.07,3.54,4.01
4,1624838400,3.63,3.89,3.56,3.74


## Coingecko API for historical data
- coingecko has more robust documentation and more price data

In [147]:
osmosis_coins = ['osmosis','cosmos','terrausd','terra-luna','juno-network','stargaze','secret','comdex','crypto-com-chain','akash-network','ion','sentinel','chihuahua-token','e-money-eur','regen','persistence','lum-network','e-money','bitcanna','iris-network','desmos','ki','bitsong','likecoin','cheqd-network','ixo','starname','vidulum','microtick']
top20_coins = ['bitcoin','ethereum', 'binancecoin', 'cardano', 'solana', 'ripple','polkadot','dogecoin','avalanche-2','shiba-inu','matic-network','crypto-com-chain']

In [156]:
def get_coin_metadata(coin_id):
    response = requests.get(f'https://api.coingecko.com/api/v3/coins/{coin_id}?tickers=false&market_data=false')
    print(coin_id,':',response.status_code)
    r_dict = response.json()
    new_dict = {}
    top_level_keys = ['id', 'symbol', 'name', 'block_time_in_minutes', 'hashing_algorithm','genesis_date']
    links_keys = ['twitter_screen_name', 'subreddit_url',]
    for key in top_level_keys:
        new_dict[key] = r_dict[key]
    for key in links_keys:
        new_dict[key] = r_dict['links'][key]
    new_dict['description'] = r_dict['description']['en']  
    try: 
        new_dict['github_url'] = r_dict['links']['repos_url']['github'][0]
    except IndexError:
        new_dict['github_url']=None
    return new_dict

all_meta_data = []

for coin in osmosis_coins+top20_coins:
    all_meta_data = get_coin_metadata(coin)
    
print('done')

osmosis : 200
cosmos : 200
terrausd : 200
terra-luna : 200
juno-network : 200
stargaze : 200
secret : 200
comdex : 200
crypto-com-chain : 200
akash-network : 200
ion : 200
sentinel : 200
chihuahua-token : 200
e-money-eur : 200
regen : 200
persistence : 200
lum-network : 200
e-money : 200
bitcanna : 200
iris-network : 200
desmos : 200
ki : 200
bitsong : 200
likecoin : 200
cheqd-network : 200
ixo : 200
starname : 200
vidulum : 200
microtick : 200
bitcoin : 200
ethereum : 200
binancecoin : 200
cardano : 200
solana : 200
ripple : 200
polkadot : 200
dogecoin : 200
avalanche-2 : 200
shiba-inu : 200
matic-network : 200
crypto-com-chain : 200
done


In [153]:
r_dict = get_gecko_metadata('stargaze')


stargaze : 200


In [155]:
r_dict['links']['repos_url']['github']

[]

In [146]:
new_dict = {}
top_level_keys = ['id', 'symbol', 'name', 'block_time_in_minutes', 'hashing_algorithm','genesis_date']
links_keys = ['twitter_screen_name', 'subreddit_url',]
for key in top_level_keys:
    new_dict[key] = r_dict[key]
for key in links_keys:
    new_dict[key] = r_dict['links'][key]
new_dict['description'] = r_dict['description']['en']  
new_dict['github_url'] = r_dict['links']['repos_url']['github'][0]
new_dict

{'id': 'bitcoin',
 'symbol': 'btc',
 'name': 'Bitcoin',
 'block_time_in_minutes': 10,
 'hashing_algorithm': 'SHA-256',
 'genesis_date': '2009-01-03',
 'twitter_screen_name': 'bitcoin',
 'subreddit_url': 'https://www.reddit.com/r/Bitcoin/',
 'description': 'Bitcoin is the first successful internet money based on peer-to-peer technology; whereby no central bank or authority is involved in the transaction and production of the Bitcoin currency. It was created by an anonymous individual/group under the name, Satoshi Nakamoto. The source code is available publicly as an open source project, anybody can look at it and be part of the developmental process.\r\n\r\nBitcoin is changing the way we see money as we speak. The idea was to produce a means of exchange, independent of any central authority, that could be transferred electronically in a secure, verifiable and immutable way. It is a decentralized peer-to-peer internet currency making mobile payment easy, very low transaction fees, protec

In [113]:
def

#test a request
coingecko_base_url = 'https://api.coingecko.com/api/v3'
gecko_token_list_response = requests.get(f'{coingecko_base_url}/coins/list')
gecko_token_list_response.status_code

200

In [47]:
## most requests rely on a token id, need to list all tokens to get this (seems there is no ticker lookup)
gecko_token_list_response = requests.get(f'{coingecko_base_url}/coins/list')
gecko_token_list_response.status_code

200

In [48]:
# create a df from the response
gecko_token_list_df = pd.DataFrame(gecko_token_list_response.json())

print(gecko_token_list_df.shape)
gecko_token_list_df.head()

(12387, 3)


Unnamed: 0,id,symbol,name
0,01coin,zoc,01coin
1,0-5x-long-algorand-token,algohalf,0.5X Long Algorand Token
2,0-5x-long-altcoin-index-token,althalf,0.5X Long Altcoin Index Token
3,0-5x-long-ascendex-token-token,asdhalf,0.5X Long AscendEx Token Token
4,0-5x-long-balancer-token,balhalf,0.5X Long Balancer Token


In [67]:
gecko_exchanges_response = requests.get(f'{coingecko_base_url}/exchanges')
gecko_exchanges_response.status_code

200

We could filter with our Osmosis response, but coingecko provides an exchanges endpoint that will be more accurate

In [75]:
gecko_exchanges_df=pd.DataFrame(gecko_exchanges_response.json())
osmosis_id = gecko_exchanges_df[gecko_exchanges_df['name'].str.contains('Osmosis')]['id'].iloc[0]

In [76]:
gecko_osmosis_tokens_response = requests.get(f'{coingecko_base_url}/exchanges/osmosis')
gecko_osmosis_tokens_response.status_code

200

In [111]:
# gecko_osmosis_tokens_response.json()['tickers']
ticker_df = pd.DataFrame(gecko_osmosis_tokens_response.json()['tickers'])
ticker_df[ticker_df['target']=='UOSMO'] 

Unnamed: 0,base,target,market,last,volume,converted_last,converted_volume,trust_score,bid_ask_spread_percentage,timestamp,last_traded_at,last_fetch_at,is_anomaly,is_stale,trade_url,token_info_url,coin_id,target_coin_id
0,IBC/27394FB092D2ECCD56123C74F36E4C1F926001CEAD...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",3.843822,542062.5,"{'btc': 0.00075989, 'eth': 0.01114524, 'usd': ...","{'btc': 411.906, 'eth': 6041, 'usd': 15696528}",green,0.602712,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,cosmos,osmosis
1,IBC/BE1BB42D4BE3C30D50B68D7C41DB4DFCE9678E8EF8...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.133158,17611430.0,"{'btc': 2.632e-05, 'eth': 0.0003861, 'usd': 1.0}","{'btc': 463.606, 'eth': 6800, 'usd': 17666658}",green,0.602713,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,terrausd,osmosis
2,IBC/0EF15DF2F02480ADE0BB6E85D9EBB5DAEA2836D386...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",6.785198,147446.0,"{'btc': 0.00134137, 'eth': 0.01967382, 'usd': ...","{'btc': 197.78, 'eth': 2901, 'usd': 7536795}",green,0.602716,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,terra-luna,osmosis
3,IBC/46B44899322F3CD854D2D46DEEF881958467CDD4B3...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",2.128208,115964.7,"{'btc': 0.00042073, 'eth': 0.00617078, 'usd': ...","{'btc': 48.789441, 'eth': 715.593, 'usd': 1859...",green,0.602718,2022-01-29T23:20:31+00:00,2022-01-29T23:20:31+00:00,2022-01-29T23:20:31+00:00,False,False,https://app.osmosis.zone/?,,juno-network,osmosis
4,IBC/987C17B11ABC2B20019178ACE62929FE9840202CE7...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.08142,3668150.0,"{'btc': 1.61e-05, 'eth': 0.00023608, 'usd': 0....","{'btc': 59.043, 'eth': 865.977, 'usd': 2249940}",green,0.602723,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,stargaze,osmosis
7,IBC/0954E1C28EB7AF5B72D24F3BC2B47BBB2FDF91BDDF...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.745995,417544.5,"{'btc': 0.00014748, 'eth': 0.00216303, 'usd': ...","{'btc': 61.578, 'eth': 903.161, 'usd': 2346551}",green,0.602729,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,secret,osmosis
8,IBC/EA3E1640F9B1532AB129A571203A0B9F789A7F14BB...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.359471,282322.7,"{'btc': 7.106e-05, 'eth': 0.00104229, 'usd': 2...","{'btc': 20.062985, 'eth': 294.263, 'usd': 764541}",green,0.60273,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,comdex,osmosis
9,IBC/E6931F78057F7CC5DA0FD6CEF82FF39373A6E0452B...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.057747,1597843.0,"{'btc': 1.142e-05, 'eth': 0.00016744, 'usd': 0...","{'btc': 18.241138, 'eth': 267.542, 'usd': 695115}",green,0.602732,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,crypto-com-chain,osmosis
13,IBC/1480B8FD20AD5FCAE81EA87584D269547DD4D43684...,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",0.191472,442243.9,"{'btc': 3.785e-05, 'eth': 0.00055518, 'usd': 1...","{'btc': 16.739909, 'eth': 245.524, 'usd': 637908}",green,0.602744,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,akash-network,osmosis
14,UION,UOSMO,"{'name': 'Osmosis', 'identifier': 'osmosis', '...",1149.111259,17.15031,"{'btc': 0.22716851, 'eth': 3.331872, 'usd': 86...","{'btc': 3.896009, 'eth': 57.143, 'usd': 148465}",green,0.602745,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,2022-01-29T23:20:30+00:00,False,False,https://app.osmosis.zone/?,,ion,osmosis


In [112]:
#get list of coins
list(pd.DataFrame(gecko_osmosis_tokens_response.json()['tickers'])['coin_id'].drop_duplicates().reset_index(drop=True))

['cosmos',
 'terrausd',
 'terra-luna',
 'juno-network',
 'stargaze',
 'secret',
 'comdex',
 'crypto-com-chain',
 'akash-network',
 'ion',
 'sentinel',
 'chihuahua-token',
 'e-money-eur',
 'regen',
 'persistence',
 'lum-network',
 'e-money',
 'bitcanna',
 'iris-network',
 'desmos',
 'ki',
 'bitsong',
 'likecoin',
 'cheqd-network',
 'ixo',
 'starname',
 'vidulum',
 'microtick']

In [97]:
#test price feed
cosmos_one_day_prices = requests.get(f'{coingecko_base_url}/coins/cosmos/market_chart/range?vs_currency=usd&from=1611705600&to=1611878400')
cosmos_one_day_prices.status_code

200

In [98]:
len(cosmos_one_day_prices.json()['prices'])

49

In [65]:
#filter the df for only relevant osmosis tokens
gecko_filtered_list = gecko_token_list_df[
    (gecko_token_list_df['symbol'].isin(lower_iter(osmo_token_list_df['symbol']))) &
    (~gecko_token_list_df['name'].str.contains('Wormhole')) &
    (~gecko_token_list_df['name'].str.contains('OLD')) &
    (~gecko_token_list_df['name'].str.contains('Wrapped'))
]

print(gecko_filtered_list.shape)
print(len(gecko_filtered_list['symbol'].drop_duplicates()))
gecko_filtered_list.sort_values(by='symbol')

(38, 3)
30


Unnamed: 0,id,symbol,name
548,akash-network,akt,Akash Network
2657,cosmos,atom,Cosmos
1541,bitcanna,bcna,BitCanna
1671,bitsong,btsg,BitSong
2343,cheqd-network,cheq,CHEQD Network
2559,comdex,cmdx,Comdex
2817,crypto-com-chain,cro,Crypto.com Coin
3278,desmos,dsm,Desmos
9528,sentinel,dvpn,Sentinel
3861,e-money-eur,eeur,e-Money EUR


In [53]:
lower_iter = lambda the_iter: [x.lower() for x in the_iter]

lower_iter(osmo_token_list_df['symbol'])

['scrt',
 'luna',
 'akt',
 'ngm',
 'regen',
 'krt',
 'atom',
 'dig',
 'med',
 'juno',
 'btsg',
 'iov',
 'eeur',
 'tick',
 'cheq',
 'iris',
 'lum',
 'dvpn',
 'stars',
 'like',
 'xprt',
 'xki',
 'huahua',
 'ust',
 'bcna',
 'cro',
 'vdl',
 'cmdx',
 'dsm',
 'ixo',
 'boot',
 'ion',
 'osmo']