# Dune Scrape
#### Step 3

In previous steps, we collected token metrics for the beginning of their trading lifecycles.

In order to determine a measure of success, we should know how well each token performed. We will do this by using the token address in our current data to find it's all time high price, and the date at which the all time high price occurs

In [52]:
import pandas as pd

Data sanity checks

In [68]:
df = pd.read_csv('../data/discord_data.csv')
df.head(3)

Unnamed: 0,address,created,verified,renounced,marketcap,buys,sells,buysell_rating,honeypot,buytax,...,taxrating,liquidity,owner,owner_rating,deployer_balance,deployer_tx,funding_source,funding_amount,max_wallet,max_tx
0,0xa190700f5ae95de4eabf29fa9469bd85ff5a7919,<t:1667604491:R>,True,Not Ownable,"$9,755",100,24,green,True,5%,...,green,7958,Unicrypt,green,0.06Ξ,18,:yellow_circle: [0x04af…65fd](https://ethersca...,0.03Ξ,2% | 20000000,1% | 10000000
1,0x9de736b02f3d09738ac42cdea046b014b0d54d60,<t:1667604059:R>,True,False,"$21,924",90,19,green,True,3%,...,green,8947,Unicrypt,green,0.61Ξ,9,:green_circle: [Binance 14](https://etherscan....,0.75Ξ,2% | 2000000,100% | 100000000000000000
2,0xaaf8a1aad53c9384be3aecb5a16af6121a5ad935,<t:1667603939:R>,True,Not Ownable,"$8,951",27,7,yellow,True,0%,...,green,7680,Team Finance,green,0.19Ξ,9,:yellow_circle: [Binance 15](https://etherscan...,1.34Ξ,-,-


In [173]:
len(df)

60237

Some links if interested in more information on Dune:
- app: https://dune.com/MatteoLeibowitz/uniswap-community
- api: https://dune.com/docs/api/quick-start/api-py/

Due to Dune api limits, and the time/resource limitations for this project. Were going to add a little finesse to our approach.

While making raw requests with Dunes api can be tedious and resource heavy, we can actually create custom SQL querys within Dune, save those querys, and call on their results remotely. 

Additionally, we can upload data, in this case we will upload our token address link so our SQL query knows which tokens to filter it's response for.

Due to upload size limitations, we've made 4 dune accounts, split our data into 4, and uploaded 4 files.

In [79]:
len(df) / 4

15059.25

In [70]:
token_addresses = df['address']
address_df = pd.DataFrame(token_addresses)

In [180]:
# give index column a name so it can be selected in Dune sql interface
address_df.index.name = 'col_id'

In [77]:
total_rows = len(original_df)
part_size = total_rows // 4

part1 = address_df.iloc[:part_size]
part2 = address_df.iloc[part_size: 2 * part_size]
part3 = address_df.iloc[2 * part_size: 3 * part_size]
part4 = address_df.iloc[3 * part_size:]

part1.to_csv('../data/token_address_dune/token_1.csv')
part2.to_csv('../data/token_address_dune/token_2.csv')  
part3.to_csv('../data/token_address_dune/token_3.csv') 
part4.to_csv('../data/token_address_dune/token_4.csv')

In [81]:
#sanity check
part_1 = pd.read_csv('../data/token_address_dune/token_1.csv')
part_1.tail()

Unnamed: 0,col_id,address
15054,15054,0x24E9274A44662D69db29EeACd3B488b19E6d8D62
15055,15055,0x2b4CD48A635603802dE98E76357f4C553497E865
15056,15056,0xBa3e91774cF19E35757B94Ea1D3a6e8e69F7B9C4
15057,15057,0xac5B8295E20989993dee2C3a0d8522Da387FCabE
15058,15058,0xD64eaFE2d1280f756cd5474285cd0273d7FAfb0c


Now that we have everything set up in dune, let's get our data:

In [None]:
pip install dune-client

In [59]:
ath_data = pd.DataFrame()

In [151]:
counter = 45177 # Since we have 4 files uploaded, we know we need to update our query roughly every 15000 calls (len(df) / 4)

In [1]:
from dune_client.types import QueryParameter
from dune_client.client import DuneClient
from dune_client.query import QueryBase

dune = DuneClient(api_key='api key here', base_url='https://api.dune.com', request_timeout=10)

while counter < 60500:
    query = QueryBase(
        name="token_ath_data", 
        query_id=3322121,
        params=[
            QueryParameter.number_type(name="start_param", value=counter),
            QueryParameter.number_type(name="end_param", value=counter+500)
        ]
    )
    results_df = dune.run_query_dataframe(query)

    ath_data = pd.concat([ath_data, results_df], ignore_index=True)
    counter += 500

Let's sanity check and take a quick look at our data:

In [174]:
 ath_data

Unnamed: 0,token_bought_address,block_date,price
0,0xca9b6c538171a011f7da4918c253267c174da0b8,2022-11-06,0.000003
1,0xb61c900c8ed056e28b7c320af461725648741f17,2022-11-06,0.000050
2,0x28232b3396c34ba06cbe879a21f7d2299d597e82,2022-11-07,0.000008
3,0xa95b2d0897e93cc5bd84f062c15e3a4812a067f5,2022-11-05,0.000029
4,0xcd31e0359e1f611b3166144766a15babfbaced9c,2022-11-06,0.000008
...,...,...,...
60138,0xd3e5aeecb5f8b299ef4e63ac2a3144dfa4f9578d,2023-12-29,0.001295
60139,0x7a0db13ebc5ecf21a750934707f266b187562e05,2023-12-29,0.000485
60140,0x8079db2a4abbb7a24cedb636cc1128ab6a2719ee,2023-12-29,0.000002
60141,0x5b506482324124132488cde2f497a7b7279dde0b,2023-12-29,0.003028


In [175]:
ath_data.to_csv('../data/ath_price_data.csv', index=False)

We now have the all time high price and dates for our tokens. We've exported this as a csv to merge with the rest of our data.

But first, we have to address a problem:

We want to measure growth in some way, but we have:
- starting marketcap
- all time high price

As tokens grow at various rates due to liquidity and price impact. 
In short:

These are **NOT** comparable metrics

So how can we calculate the all time high marketcap? 

We can use the all time high price, and find the total supply of tokens (shares) and multiple those two metrics to find our all time high marketcap.

We will be scraping this information from an app called Defined, the process can be seen here: **['defined_scrape_step_4'](./defined_scrape_step_4.ipynb)**