# Pair Extraction

In our defined_volume_scrape, we realized we needed pair addresses instead of regular addresses. 

Were going to be using a link in our previously scraped discord data to extract this pair address. Here's how it works

This is an example token link:
- https://dexscreener.com/ethereum/0xba687c0617898c86d8923bad5cb68a98c3fd5b4c

Let's break down this link:
- 'https://dexscreener.com' == a charting website, shows trading activity for tokens
- '/ethereum' == the crypto network we are trading on
- '/0xba687c0617898c86d8923bad5cb68a98c3fd5b4c' the pair address for which we want to conduct our analysis/view trades

Let's get started.

In [1]:
import pandas as pd
import json

Let's test with just the first line of the json file

In [2]:
with open('../data/filtered_scrape_data.json', 'r') as file:
    first_line = file.readline()

json_data = json.loads(first_line)

print(json_data['dict24']['value'].split('·')[1].split('/')[-1][:-2])

0x989736bce931f4ecd9c39b0ba9aeaf058c3fe1f8


Get the Pair address from the end of the Dexscreener link as well as the token address to later merge with our main token data df

In [3]:
values_list = []
line_number = 1

with open('../data/filtered_scrape_data.json', 'r') as file:
    for line in file:
        json_data = json.loads(line.strip())
        
        value_item_list = [
            json_data['dict0'],  # contract address
            json_data['dict24']['value'].split('·')[1].split('/')[-1][:-2] # pair address
        ]

        # add to values_list
        values_list.append(value_item_list)
        
        line_number += 1  # Increment line number counter


# create dataframe with json data
columns = ['address', 'pair_address']
df = pd.DataFrame(values_list, columns=columns)

Let's take a look at our data:

In [4]:
df

Unnamed: 0,address,pair_address
0,0xa190700f5ae95de4eabf29fa9469bd85ff5a7919,0x989736bce931f4ecd9c39b0ba9aeaf058c3fe1f8
1,0x9de736b02f3d09738ac42cdea046b014b0d54d60,0x6343111c06d4bb6dde9c411de6f15c8ae8d0a41a
2,0xaaf8a1aad53c9384be3aecb5a16af6121a5ad935,0x0f7e412fe32fbc1d8d77c143e6c309c978a4592b
3,0xa17ae9a7174cdbc5294e3fad8afbafc1be1764a3,0x2f8d07b46aab40e8f7cfd7e4dececb3eeca3978e
4,0x3b2d93677c433c191aa379c78b97e0685c3f4798,0x5232a7ef61fef9594b40ffb07bddd6df00aea621
...,...,...
60232,0x7832eAFa8A3c90459F1574ccEd381a1F5C2C9435,0xeb9b1a55a9d55919f09651319d93dc12b352d880
60233,0xc7B40cdB7c8Acb28C2E2d63159a2D4133397f3D0,0x713e8b867fe1e44e1dc0c57c2c74386d6b5bfb34
60234,0x1750B71F31990F95A5d52534F50945289d888cb9,0x37580dfde64d6a0af46d425296536cc8238eb640
60235,0x281B7d6B3e98daf156A6Ed110Ff7b9E05D413677,0xac994cf17f1d2306dc78617d30eab9cef490af03


In [28]:
df.to_csv('../data/pair_addresses.csv', index=False)

Now that we have the necessary data to find our volumes, let's head back to: **['defined_volume_scrape'](./defined_volume_scrape.ipynb)**