# Chapter 31: Indexing and Querying Blockchain Data

---

Smart contracts emit events and store data, but retrieving that data efficiently is a challenge. Querying the blockchain directly—scanning all blocks, filtering logs, and aggregating results—is slow, expensive, and impractical for real-time applications. Indexing solutions solve this by extracting, processing, and serving blockchain data in a structured, queryable format. The Graph has emerged as the leading decentralized indexing protocol, enabling developers to build APIs (subgraphs) that are fast, reliable, and easy to query. In this chapter, we'll explore the need for indexing, dive deep into The Graph, and examine alternative solutions like Alchemy, Moralis, and QuickNode. You'll learn how to build and query subgraphs to power your DApps with efficient data access.

---

## 31.1 Challenges of Blockchain Data

### 31.1.1 Data Retrieval Limitations

Blockchains are designed for append-only, verifiable storage, not for efficient querying. Retrieving specific information often requires:

- **Scanning entire blocks**: To find all transfers of a token, you'd need to iterate through every block from genesis to present—millions of blocks.
- **Parsing transaction receipts**: Extracting event logs from each block and filtering by contract address and event signature.
- **Aggregating on-chain**: Computing things like "total value locked" or "user balances over time" would be prohibitively expensive if done in smart contracts.

**Example: Naive approach to get all Transfer events for an ERC-20**
```javascript
// This would be impossibly slow
for (let i = 0; i < latestBlock; i++) {
  const block = await provider.getBlockWithTransactions(i);
  for (const tx of block.transactions) {
    if (tx.to === tokenAddress) {
      // parse input data...
    }
  }
}
```

### 31.1.2 Event Logs and Filtering

Ethereum's event logs provide a more efficient way to access historical data. You can filter by address and topics (indexed parameters). However, even with logs, you face limitations:

- **RPC node limitations**: Most nodes limit the range of blocks you can query in one call (e.g., 10,000 blocks). To get data over a large range, you must make many requests.
- **Centralization**: Relying on a single RPC provider introduces a single point of failure.
- **No advanced queries**: You cannot ask "get me all transfers where amount > 100" directly; you must fetch all and filter client-side.
- **Real-time updates**: Polling for new logs is inefficient and slow.

Thus, specialized indexing solutions are needed.

---

## 31.2 The Graph Protocol

The Graph is a decentralized protocol for indexing and querying blockchain data. It allows developers to define **subgraphs** that specify which events and data to index, and then query them using a standard GraphQL API.

### 31.2.1 What is The Graph?

The Graph network consists of:

- **Indexers**: Node operators that process subgraphs and serve queries, earning GRT tokens.
- **Curators**: Signal which subgraphs are high-quality by staking GRT.
- **Delegators**: Stake GRT to indexers to secure the network.
- **Subgraphs**: Open APIs that define how to index and store data.

Developers create subgraphs and deploy them to the network. Indexers then index the data, and applications query the subgraph endpoints.

```
The Graph Architecture:
┌────────────┐      ┌────────────┐      ┌────────────┐
│  Ethereum  │─────▶│  Indexer   │─────▶│   GraphQL  │
│  (events)  │      │ (processes │      │    API     │
└────────────┘      │ subgraph)  │      └────────────┘
                    └────────────┘            │
                                              ▼
                                         ┌────────────┐
                                         │  DApp      │
                                         │  queries   │
                                         └────────────┘
```

### 31.2.2 Subgraphs

A subgraph is a project that defines:

- **Data sources**: Which contracts and events to index.
- **Entities**: The data models to store (like tables in a database).
- **Mappings**: Handlers that transform event data into entities.

Subgraphs are written in a combination of a manifest (`subgraph.yaml`), a GraphQL schema (`schema.graphql`), and AssemblyScript mapping files (`mapping.ts`).

**Example `schema.graphql` for a token:**
```graphql
type Token @entity {
  id: ID!
  name: String!
  symbol: String!
  totalSupply: BigInt!
  holders: [Holder!]! @derivedFrom(field: "token")
}

type Holder @entity {
  id: ID! # address
  balance: BigInt!
  token: Token!
}
```

**Example `subgraph.yaml` (manifest):**
```yaml
specVersion: 0.0.5
description: My Token Subgraph
repository: https://github.com/...
schema:
  file: ./schema.graphql
dataSources:
  - kind: ethereum
    name: Token
    network: mainnet
    source:
      address: "0x..."
      abi: Token
      startBlock: 1000000
    mapping:
      kind: ethereum/events
      apiVersion: 0.0.6
      language: wasm/assemblyscript
      entities:
        - Token
        - Holder
      abis:
        - name: Token
          file: ./abis/Token.json
      eventHandlers:
        - event: Transfer(indexed address,indexed address,uint256)
          handler: handleTransfer
      file: ./src/mapping.ts
```

**Example mapping (`mapping.ts`):**
```typescript
import { Transfer as TransferEvent } from "../generated/Token/Token"
import { Token, Holder } from "../generated/schema"

export function handleTransfer(event: TransferEvent): void {
  let token = Token.load("1")
  if (!token) {
    token = new Token("1")
    token.name = "My Token"
    token.symbol = "MTK"
    token.totalSupply = BigInt.fromI32(0)
  }

  let fromHolder = Holder.load(event.params.from.toHex())
  if (!fromHolder) {
    fromHolder = new Holder(event.params.from.toHex())
    fromHolder.token = token.id
    fromHolder.balance = BigInt.fromI32(0)
  }

  let toHolder = Holder.load(event.params.to.toHex())
  if (!toHolder) {
    toHolder = new Holder(event.params.to.toHex())
    toHolder.token = token.id
    toHolder.balance = BigInt.fromI32(0)
  }

  // update balances (simplified)
  fromHolder.balance = fromHolder.balance.minus(event.params.value)
  toHolder.balance = toHolder.balance.plus(event.params.value)

  fromHolder.save()
  toHolder.save()
  token.save()
}
```

### 31.2.3 GraphQL Queries

Once the subgraph is indexed, you can query it via GraphQL. GraphQL allows you to request exactly the data you need, in a single request.

**Example query: Get first 10 holders with balances > 1000**
```graphql
{
  holders(
    first: 10,
    where: { balance_gt: "1000" },
    orderBy: balance,
    orderDirection: desc
  ) {
    id
    balance
  }
}
```

**Example query: Get token info and all holders**
```graphql
{
  token(id: "1") {
    name
    symbol
    totalSupply
    holders {
      id
      balance
    }
  }
}
```

**Response:**
```json
{
  "data": {
    "token": {
      "name": "My Token",
      "symbol": "MTK",
      "totalSupply": "1000000",
      "holders": [
        { "id": "0x...", "balance": "500000" },
        { "id": "0x...", "balance": "300000" }
      ]
    }
  }
}
```

### 31.2.4 Building a Subgraph

**Step-by-step using The Graph CLI:**

1. **Install the CLI**:
```bash
npm install -g @graphprotocol/graph-cli
```

2. **Initialize a subgraph**:
```bash
graph init --from-contract 0x... --network mainnet --abi ./Token.json my-token-subgraph
```

3. **Define schema** in `schema.graphql`.
4. **Write mappings** in `src/mapping.ts`.
5. **Build**:
```bash
graph build
```

6. **Deploy** (requires The Graph hosted service account or a local Graph node):
```bash
graph deploy --product hosted-service username/my-token-subgraph
```

**Testing locally:** Use `graph test` or run a local Graph node with Docker.

**Querying:** After deployment, you'll get an endpoint like `https://api.thegraph.com/subgraphs/name/username/my-token-subgraph`. Use any GraphQL client to query.

---

## 31.3 Alternative Indexing Solutions

While The Graph is the most popular decentralized option, several centralized (but convenient) alternatives exist.

### 31.3.1 Alchemy

Alchemy provides a suite of blockchain developer tools, including **Alchemy SDK** and **Alchemy APIs** that offer enhanced query capabilities beyond standard RPC.

**Alchemy NFT API** allows querying NFTs by owner, contract, etc., without building your own indexer.

**Example: Get NFTs owned by an address**
```javascript
const { Alchemy, Network } = require("alchemy-sdk");

const alchemy = new Alchemy({
  apiKey: "YOUR_API_KEY",
  network: Network.ETH_MAINNET
});

const nfts = await alchemy.nft.getNftsForOwner("0x...");
console.log(nfts);
```

Alchemy also provides **Alchemy Transfers API**, **Alchemy Token API**, and **Alchemy Webhooks** for real-time notifications.

**Pros:**
- Easy to use, powerful APIs.
- Fast and reliable.
- Free tier available.

**Cons:**
- Centralized (dependency on Alchemy).
- Limited to supported chains and data types.

### 31.3.2 Moralis

Moralis is a Web3 development platform that offers a suite of APIs for blockchain data, user authentication, and more. It aggregates data from multiple chains and provides a unified interface.

**Moralis API example: Get token balances**
```javascript
import Moralis from 'moralis'

await Moralis.start({ apiKey: 'YOUR_API_KEY' })

const response = await Moralis.EvmApi.token.getWalletTokenBalances({
  address: '0x...',
  chain: '0x1'
})
console.log(response.raw)
```

Moralis also offers real-time webhooks, database sync, and cloud functions.

**Pros:**
- Multi-chain support.
- Real-time updates.
- Easy to integrate with frontend.

**Cons:**
- Centralized, free tier limited.
- May have query limits.

### 31.3.3 QuickNode

QuickNode provides RPC endpoints and additional tools like **QuickNode GraphQL APIs** for certain chains (e.g., Solana). They offer enhanced query capabilities for some data.

**QuickNode's NFT API** (via Alchemy-like endpoints) is available on their marketplace.

**Pros:**
- Reliable RPC infrastructure.
- Some indexing features.

**Cons:**
- Less comprehensive than dedicated indexing platforms.

---

## 31.4 Code Implementation

Let's build a practical subgraph for our marketplace contract from Chapter 14.

### 31.4.1 Creating a Subgraph

**1. Initialize subgraph**
```bash
graph init --from-contract <MARKETPLACE_ADDRESS> --network sepolia --abi ./Marketplace.json marketplace-subgraph
```

**2. Define schema (`schema.graphql`)**
```graphql
type Listing @entity {
  id: ID!
  seller: Bytes! # address
  name: String!
  description: String!
  price: BigInt!
  imageHash: String!
  active: Boolean!
  buyer: Bytes # address, null if not sold
  createdAt: BigInt!
  soldAt: BigInt
}

type User @entity {
  id: ID! # address
  listingsCreated: [Listing!]! @derivedFrom(field: "seller")
  listingsBought: [Listing!]! @derivedFrom(field: "buyer")
}
```

**3. Write mappings (`src/marketplace.ts`)**
```typescript
import {
  ListingCreated as ListingCreatedEvent,
  ListingPurchased as ListingPurchasedEvent,
  ListingCancelled as ListingCancelledEvent
} from "../generated/Marketplace/Marketplace"
import { Listing, User } from "../generated/schema"

export function handleListingCreated(event: ListingCreatedEvent): void {
  let listing = new Listing(event.params.id.toString())
  listing.seller = event.params.seller
  listing.name = event.params.name
  listing.description = event.params.description
  listing.price = event.params.price
  listing.imageHash = event.params.imageHash
  listing.active = true
  listing.createdAt = event.block.timestamp
  listing.save()

  // Update or create user
  let user = User.load(event.params.seller.toHex())
  if (!user) {
    user = new User(event.params.seller.toHex())
    user.save()
  }
}

export function handleListingPurchased(event: ListingPurchasedEvent): void {
  let listing = Listing.load(event.params.id.toString())
  if (listing) {
    listing.active = false
    listing.buyer = event.params.buyer
    listing.soldAt = event.block.timestamp
    listing.save()

    let user = User.load(event.params.buyer.toHex())
    if (!user) {
      user = new User(event.params.buyer.toHex())
      user.save()
    }
  }
}

export function handleListingCancelled(event: ListingCancelledEvent): void {
  let listing = Listing.load(event.params.id.toString())
  if (listing) {
    listing.active = false
    listing.save()
  }
}
```

**4. Update manifest (`subgraph.yaml`) to include events and entities.**

**5. Build and deploy**
```bash
graph build
graph deploy --product hosted-service username/marketplace-subgraph
```

### 31.4.2 Querying Subgraph Data

In your DApp frontend, you can query the subgraph using a GraphQL client like Apollo or URQL.

**Example: Fetch all active listings**
```javascript
import { ApolloClient, InMemoryCache, gql } from '@apollo/client'

const client = new ApolloClient({
  uri: 'https://api.thegraph.com/subgraphs/name/username/marketplace-subgraph',
  cache: new InMemoryCache()
})

const ACTIVE_LISTINGS = gql`
  query GetActiveListings {
    listings(where: { active: true }) {
      id
      name
      description
      price
      imageHash
      seller
      createdAt
    }
  }
`

const { data } = await client.query({ query: ACTIVE_LISTINGS })
console.log(data.listings)
```

**Example: Get user's created listings**
```graphql
query GetUserListings($userId: String!) {
  user(id: $userId) {
    id
    listingsCreated {
      id
      name
      price
      active
    }
  }
}
```

### 31.4.3 Real-Time Updates

The Graph supports subscriptions via WebSocket, allowing real-time updates.

**Example subscription: Listen for new listings**
```graphql
subscription {
  listings(where: { active: true }) {
    id
    name
    price
  }
}
```

In a frontend, you can use Apollo's `useSubscription` hook.

---

## Chapter Summary

```
┌─────────────────────────────────────────────────────────────────┐
│                    CHAPTER 31 SUMMARY                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Indexing solves the challenge of efficiently querying          │
│  blockchain data.                                               │
│                                                                 │
│  The Graph Protocol:                                            │
│    • Decentralized indexing network                            │
│    • Subgraphs define data sources, entities, and mappings     │
│    • GraphQL API for flexible queries                          │
│    • Build with graph-cli, deploy to hosted service or self-host│
│                                                                 │
│  Alternatives:                                                  │
│    • Alchemy: Enhanced APIs (NFT, Transfers, etc.)             │
│    • Moralis: Multi-chain APIs, real-time webhooks             │
│    • QuickNode: RPC + some indexing features                   │
│                                                                 │
│  Code examples:                                                 │
│    • Creating a subgraph for a marketplace contract            │
│    • Querying with GraphQL in a frontend                       │
│    • Real-time subscriptions                                   │
│                                                                 │
│  Indexing is essential for building responsive, data-rich DApps.│
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

**Next Chapter Preview:** Chapter 32 – Deploying to Production. We'll cover deployment strategies, infrastructure setup (RPC nodes, relayers), monitoring, and contract verification.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='30. decentralized_storage.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='../9. production_and_deployment/32. deploying_to_production.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
