# Defined Scrape
#### Step 4

With our general data collected, we need a way to measure growth. As mentioned in step 3, we will be getting token supply data from defined to make this calculation. 

More information on Defined api here: https://docs.defined.fi/reference/overview

In [1]:
import pandas as pd
import requests
import json
import time

Sanity check of data, checking types and count

In [4]:
df = pd.read_csv('../data/ath_price_data.csv') # from dune_scrape

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60143 entries, 0 to 60142
Data columns (total 3 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   token_bought_address  60143 non-null  object 
 1   block_date            60143 non-null  object 
 2   price                 60143 non-null  float64
dtypes: float64(1), object(2)
memory usage: 1.4+ MB


Let's begin our scrape

As is the trend with many crypto related apis, limits are quite tight.

We will be using a similar multi account workaround as we did with Dune.

All information will be stored in our list, and iterations will let us know when api keys need to be switched, we'll also be using the iteration count to set our starting point for the scape as it will need to be rerun when switching keys.

In [11]:
defined_api_res = []

In [None]:
iterations_acc_1 = 50000

In [2]:
url = "https://graph.defined.fi/graphql"

headers = {
  "content_type":"application/json",
  "Authorization": "ceca6928a7b744249c4f43af3867047f13e58710"
}

for index, row in df.iloc[iterations_acc_1:].iterrows():
    value = row['token_bought_address']

    getNetworks = f"""
    query {{
      token(input: {{ address: "{value}", networkId: 1 }}) {{
        address
        name
        symbol
        totalSupply
        info {{
          circulatingSupply
        }}
        explorerData {{
          blueCheckmark
        }}
      }}
    }}
    """
    
    response = requests.post(url, headers=headers, json={"query": getNetworks})

    if response.status_code == 200:
        defined_api_results.append(json.loads(response.text))
        # progress sanity check
        print(len(defined_api_results))
        # track successful iterations only
        iterations_acc_1 += 1
    else:
        print(f"Request failed for index {index}. Status code: {response.status_code}")
        break

    time.sleep(0.25)

Sanity checks:

In [111]:
len(defined_api_results)

60144

In [113]:
#sanity check
defined_api_results[60143]

{'data': {'token': {'address': '0x6f6382f241e3c6ee0e9bee2390d91a73adc0afff',
   'name': 'Teenage Mutant Ninja Turtles',
   'symbol': 'TMNT',
   'totalSupply': '1000000000',
   'info': {'circulatingSupply': None},
   'explorerData': {'blueCheckmark': True}}}}

During the scrape, we encountered some unexpected data during testing, let's ensure these don't slip through:

In [127]:
count = 0

for item in defined_api_results:
    if item['data'] == None:
        count += 1

In [128]:
count

9

9 tokens with unexpected structure, meaning defined api was not able to get token information

In [129]:
filtered_api_res = [item for item in defined_api_results if item['data'] is not None]

In [130]:
len(filtered_api_res) # compare before and after filter length

60135

Let's also filter for None type data:

In [131]:
new_count = 0

for item in filtered_api_res:
    if item['data']['token']['info']['circulatingSupply'] == None:
        new_count += 1

In [132]:
new_count

59561

Vast majority of the datapoints have no data for circulatingSupply

While performing regular checks on scraper data, I could only recall ever seeing the blueCheckmark value as True

Let's see how many datapoints have it as True to guage if its worth keeping for potential insights

In [133]:
blue_count = 0

for item in filtered_api_res:
    if item['data']['token']['explorerData']['blueCheckmark'] == True:
        blue_count += 1

In [134]:
blue_count

59863

Vast majority also have a Blue Checkmark

Therefore we are NOT including'blueCheckmark' and 'circulatingSupply' in our dataFrame

In [136]:
values_list = []
line_number = 1

for item in filtered_api_res:
    value_item_list = [
        item['data']['token']['address'],
        item['data']['token']['name'],
        item['data']['token']['symbol'],
        item['data']['token']['totalSupply'],
    ]

    values_list.append(value_item_list)
    line_number += 1


Create dataframe

In [138]:
columns = ['address', 'name', 'symbol', 'totalSupply']
df = pd.DataFrame(values_list, columns=columns)

Let's see how our data looks:

In [139]:
df

Unnamed: 0,address,name,symbol,totalSupply
0,0xca9b6c538171a011f7da4918c253267c174da0b8,Vine,Vine,1000000000000000000
1,0xca9b6c538171a011f7da4918c253267c174da0b8,Vine,Vine,1000000000000000000
2,0xb61c900c8ed056e28b7c320af461725648741f17,8$,8$,100000000
3,0x28232b3396c34ba06cbe879a21f7d2299d597e82,DOAGE COIN,DOAGE,1000000000
4,0xa95b2d0897e93cc5bd84f062c15e3a4812a067f5,Ishibashi Inu,$IshibInu,100000000
...,...,...,...,...
60130,0xd3e5aeecb5f8b299ef4e63ac2a3144dfa4f9578d,Open Exchange,OPNX,46000000
60131,0x7a0db13ebc5ecf21a750934707f266b187562e05,PillBot,PILL,1000000000
60132,0x8079db2a4abbb7a24cedb636cc1128ab6a2719ee,Super Saiyan Coin,SSC,7000000000
60133,0x5b506482324124132488cde2f497a7b7279dde0b,YUMMY,YUMMY,10000000


In [140]:
df.to_csv('../data/defined_data.csv', index=False)

Now that we have the token supply information, we can get a measure of growth and work towards creating a target class for our dataset. 

That process can be found here: **['token_cleaning'](../Research/tokens_cleaning.ipynb)**
