[BUG] `offersByScrolling()` and `offersByScrollingByUrl()` not properly working #36

dcts · 2022-02-02T12:34:48Z

I noticed that the function offersByScrolling() and offersByScrollingByUrl() is not working properly. Most of the offers are not scraped (a lot of them are skipped for some reason, approximately 75% of the offers are not saved). This leads to the function being stuck for a long time, as it takes a lot longer to scrape the desired amount of offers when 75% of the offers are not scraped.

The text was updated successfully, but these errors were encountered:

dcts · 2022-02-02T12:35:36Z

If anyone experiences this too and relies on this function please comment below so I know its urgent 📝

SKreutz · 2022-02-03T22:52:33Z

This is exactly the problem I'm running into right now. should I close the other issue?
I didn't find a fix until now but I'll also keep looking into this. It also still doesn't occur when choosing "total_volume" instead of the other options.

dcts · 2022-02-04T00:22:46Z

Oh yeah you're right, somehow I didn't realize this is the same bug than you reported, I randomly noticed it during testing. Closing the other issue #34 as its the same.

dcts · 2022-02-11T16:16:54Z

@SKreutz do you need to scrape multiple pages or are the first 100 sufficient? Because there is a way of getting the top 100 elements without scrolling, just run this script:

const nextDataStr = document.getElementById("__NEXT_DATA__").innerText;
const nextData = JSON.parse(nextDataStr);
const top100 = nextData.props.relayCache[0][1].json.data.rankings.edges.map(obj => obj.node);

This is way faster and more efficient than scrolling and scraping the data from the DOM. I will integrate this in the repository soon and add the following functions:

OpenseaScraper.rankings("24h"); // https://opensea.io/rankings?sortBy=one_day_volume
OpenseaScraper.rankings("7d"); // https://opensea.io/rankings?sortBy=seven_day_volume
OpenseaScraper.rankings("30d"); // https://opensea.io/rankings?sortBy=thirty_day_volume
OpenseaScraper.rankings("total"); // https://opensea.io/rankings?sortBy=total_volume

// ❌ currently not working: scrape more than 100 items from rankings page
OpenseaScraper.rankingsByScrolling();

SKreutz · 2022-02-12T14:06:24Z

@dcts I only want to scrape the first 100 slugs yes. Where do I put the 3 lines of code you provided? Thank you for your help I really appreciate it!

dcts · 2022-02-12T19:40:58Z

@SKreutz I added this new method and updated the repository, just update to the latest version 6.0.0 and then you can do:

// scrape all slugs, names and ranks from the top collections from the rankings page
// "type" is one of the following:
// "24h": ranking of last 24 hours: https://opensea.io/rankings?sortBy=one_day_volume
// "7d": ranking of last 7 days: https://opensea.io/rankings?sortBy=seven_day_volume
// "30d": ranking of last 30 days: https://opensea.io/rankings?sortBy=thirty_day_volume
// "total": scrapes all time ranking: https://opensea.io/rankings?sortBy=total_volume
const type = "24h"; // possible values: "24h", "7d", "30d", "total"
const ranking = await OpenseaScraper.rankings(type, options);

SKreutz · 2022-02-15T11:13:05Z

@dcts your fix seems to work fine! Really appreciate your help. It's even a lot faster than before. This bug can be closed.

mlarcher · 2022-03-17T14:30:14Z

How come the issue has been closed ? Has the OpenseaScraper.offersByScrolling() method been fixed ?

It seems to me that the issue first expressed in this ticket is still happening, but you found a workaround for the rankings case. Is there something I am not interpreting correctly ?

mlarcher · 2022-03-17T14:57:44Z

Not sure it is the same issue, but when running our script we get "stats":{"totalOffers":416} even though the offers field only contains 410 elements after calling scraper.offersByScrolling when running the script locally. In production on GCP, we get an empty result that lookis like

offers: []
stats: {}

Something is definitely wrong with this method... What can we do to help investiguate the issue?

dcts · 2022-03-17T21:11:50Z

@mlarcher I just checked and yes, you are absolutely right, the issue was never resolved. Thanks for reporting!

I need to take a closer look at the code, something happend that broke the code.

SKreutz · 2022-03-17T21:35:26Z

I just tried to repoduce the issue.
When I try to check for example "slotienft" with currently 390 items on "buy now" and using the offers method works fine:

=== actions ===
new page created
opening url https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW
🚧 waiting for cloudflare to resolve...
extracting wired variable
closing browser...
extracting offers and stats from wired variable
total Offers: 390
top 3 Offers
[
{
name: 'Slotie #4606',
tokenId: '4606',
displayImageUrl: 'https://lh3.googleusercontent.com/6YxBtVI9cA4Y2kEMujrGodnXk55lEiJXRCdLDnGbwQRmpBI26Va7_BU7tmBvWYJz1YQz1lwGRuCZP_UtKHndL14Zj4qXwpy-Jfc8',
assetContract: '0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d',
offerUrl: 'https://opensea.io/assets/0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d/4606',
floorPrice: { amount: 0.685, currency: 'ETH' }
.
.
.

Scraping offers by scrolling also works fine for me.

✅ === OpenseaScraper.offersByScrolling(slug, 40) ===
=== scraping started ===
Scraping Opensea URL: https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW

=== options ===
debug : false
logs : true
browserInstance: default

=== actions ===
new page created
🚧 waiting for cloudflare to resolve
expose all helper functions
scrape offers until target resultsize reached or bottom of page reached
closing browser...
total Offers: 390
all scraped offers (max 40):
[

I also tried different collections. Everything works fine for me. I am using Mac OS Monetery 12.0.1 and Node v16.13.1. I also just downloaded the latest version of opensea scraper

Let me know if you need further information

mlarcher · 2022-03-17T22:17:48Z

Here's what I get:

server_1       | 2022-03-17T22:00:43.174Z debug: Start scraping prices
server_1       | === scraping started ===
server_1       | Scraping Opensea URL: https://opensea.io/collection/chumbivalleyofficial?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW
server_1       |
server_1       | === options ===
server_1       | debug          : false
server_1       | logs           : true
server_1       | browserInstance: default
server_1       |
server_1       | === actions ===
server_1       | new page created
server_1       | 🚧 waiting for cloudflare to resolve
server_1       | expose all helper functions
server_1       | scrape offers until target resultsize reached or bottom of page reached
server_1       | closing browser...
server_1       | 2022-03-17T22:11:17.853Z debug: Prices scraping done [{"foundOffersCount":408,"stats":{"totalOffers":412}}]

I'm on MacOS Monterey 12.3 in a docker container running node:16.14.0-alpine3.14

dcts · 2022-03-17T22:19:09Z

@mlarcher I published a fix, can you test and let me know if it works now, be sure to use version 6.0.2 :)

dcts · 2022-03-17T22:35:44Z

@SKreutz thanks for testing! I think it might have looked like everything works on your end, but in fact a lot of the offers were missing when using the offersByScrolling method. The bug was that 80% of the offers were skipped, only ~20% got scraped. This is particularly bad because sometimes it might seem that everything works, whereas it actually did not. And other times it just broke.

But now it should be fixed, at least the demo is working again (for me) with all relevant offers scraped. You can test it with

npm run demo

mlarcher · 2022-03-17T22:38:40Z

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me...
I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

mlarcher · 2022-03-17T22:45:09Z

second run got me [{"foundOffersCount":405,"stats":{"totalOffers":413}}] so we're not good yet :/

mlarcher · 2022-03-17T22:47:18Z

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

dcts · 2022-03-17T22:52:04Z

@mlarcher can you post what collection you scraped that got you these results?

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me... I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

mlarcher · 2022-03-17T23:03:08Z

here it is @dcts :

server_1 | Scraping Opensea URL: https://opensea.io/collection/chumbivalleyofficial?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW

dcts · 2022-03-17T23:35:32Z

When I run the following:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 40, options);

I get correct results, in fact, they are identical to running OpenseaScraper.offers("chumbivalleyofficial",options).

Can you try to run it locally (not on GCP)?

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

To answer your question: yes, its an unrelated problem that has nothing to do with the scraper, but with the environment. Cloud setups for scraping are always difficult because you don't have full control over the environment, ips etc. Also services like cloudflare can detect a cloud environment (through IP lists) and handle them differently (block them). See issues #40 #39. In case I find a solution for the cloud I will certainly share, but as of now I don't plan to work on that. But I encourage everybody to share working cloud setups, because it is a common thing that certainly a lot of people would like.

mlarcher · 2022-03-18T08:48:23Z

@dcts thanks for the information.

GCP is not at stake here, as we have absolutely no result at all there (even if it used to work at some point before). I'll check if I can do anything to change the script's external ip.

What I was giving are results in a docker container on my machine.

Your test got me thinking, and I tried directly on the host machine with no docker container involved and got the same issue : [{"foundOffersCount":419,"stats":{"totalOffers":422}}]

In your test you are limiting the results to 40, which is a way of avoiding the issue, but we want a way larger result set. There are about 420 items on sell, not 40... Maybe you could try on your machine with a limit set at 500 ?

Please let me know what else we can do to help investigate the issue.

dcts · 2022-03-18T14:31:20Z

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔

I think some offers don't get fetched because of how the scraping algorithm is designed:

the algorithm keeps scrolling as long as possible
scrolling triggers fetching of new data, which changes the DOM
then the algorithm gets the data from the DOM
This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

SKreutz · 2022-03-18T14:49:03Z

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:
const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428
So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔

I think some offers don't get fetched because of how the scraping algorithm is designed:

the algorithm keeps scrolling as long as possible

scrolling triggers fetching of new data, which changes the DOM

then the algorithm gets the data from the DOM
This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

I also thinks it’s not possible to fetch 100% because of the way opensea uses to display the items and as you mentioned the DOM changes. When scrolling manually and looking at the html, the DOM changes and adds the elements as they appear. Sometimes opensea is very slow or the nfts are gifs instead of jpegs which takes even longer and I think that’s why some items are skipped.

The only way to „fix“ this would in my opinion be to place a sleep of a few seconds after each „scroll“ so the items have more time
to display. But I don’t know how the code works exactly and even that would not be a nice solution and it would make the code slow.

mlarcher · 2022-03-18T15:57:11Z

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔
yes, that's it when run locally or in the docker cotainer on my home machine. On GCP I get no result at all, but as we saw it's not the same issue.

The only way to „fix“ this would in my opinion be to place a sleep of a few seconds after each „scroll“ so the items have more time to display.
Perhaps a timeout after the last scroll only somehow ?

I'll check if there is a better way to know when the DOM is "stabilized"...

mlarcher · 2022-03-18T20:57:02Z

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

dcts · 2022-03-19T10:43:29Z

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

@mlarcher Yes this is a good idea, I tried this at some point but could not make it work, maybe worth a revisit.

Also what could be even more efficient is scrolling and simply controling puppeteer network activity, like this:

// taken from => https://stackoverflow.com/a/55478226/6272061
page.on('response', (response) => {
    const headers = response.headers();

    // example test: check if content-type contains javascript or html
    const contentType = headers['content-type'];
    if (textRegex.test(contentType)) {
        console.log(response.url());
    }
});

Once new data needs to be fetched the graphql API is called and when we intercept that request we get the data in this format:

{
    "node": {
        "assetCount": null,
        "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
        "name": "DeadFellaz",
        "slug": "deadfellaz",
        "isVerified": true,
        "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
        "description": "10,000 undead NFTs on the Ethereum blockchain. Join the horde.\n\nAdditional official collections:\n\n[Halloween S1](https://opensea.io/collection/deadfellaz-infected-s1) | [Nifty Gateway Betty Pop Horror](https://opensea.io/collection/betty-pop-horror-by-deadfellaz) | [Deadfrenz Lab Access Pass](https://opensea.io/collection/deadfrenz-lab-access-pass) | [Deadfrenz Collection](https://opensea.io/collection/deadfrenz-collection)"
    }
}

I think thats a nice solution and should be fairly easy to develop 🎉 Added it to the roadmap 🚔!

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

dcts · 2022-03-19T10:51:32Z

Ups just realized that I posted the collection information above, the information for every single item (offer) looks like this:

{
  "assetContract": {
    "address": "0x2acab3dea77832c09420663b0e1cb386031ba17b",
    "chain": "ETHEREUM",
    "id": "QXNzZXRDb250cmFjdFR5cGU6MzAyOTQ1",
    "openseaVersion": null
  },
  "collection": {
    "isVerified": true,
    "relayId": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "displayData": {
        "cardDisplayStyle": "CONTAIN"
    },
    "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
    "slug": "deadfellaz",
    "isAuthorizedEditor": false,
    "name": "DeadFellaz"
  },
  "relayId": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "tokenId": "3036",
  "backgroundColor": null,
  "imageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "name": "DeadFellaz #3036",
  "id": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "isDelisted": false,
  "animationUrl": null,
  "displayImageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "decimals": 0,
  "favoritesCount": 23,
  "isFavorite": false,
  "isFrozen": false,
  "hasUnlockableContent": false,
  "orderData": {
    "bestAsk": {
      "relayId": "T3JkZXJWMlR5cGU6MzUyMjU2ODkzMQ==",
      "orderType": "BASIC",
      "maker": {
        "address": "0x28705f64c07079822c7afd66e43975b7c6095ef6",
        "id": "QWNjb3VudFR5cGU6MTQ1NjA1MTQy"
      },
      "closedAt": "2022-04-05T05:44:18",
      "dutchAuctionFinalPrice": null,
      "openedAt": "2022-03-17T21:48:42",
      "priceFnEndedAt": null,
      "quantity": "1",
      "decimals": null,
      "paymentAssetQuantity": {
        "quantity": "2690000000000000000",
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "id": "QXNzZXRRdWFudGl0eVR5cGU6Mjg3MDE4NzA3OTcyNTgyMjM1NjM1NTg1MDc0MTcxNjgyNzE3ODc4",
        "quantityInEth": "2690000000000000000"
      }
    },
    "bestBid": {
      "orderType": "BASIC",
      "paymentAssetQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/accae6b6fb3888cbff27a013729c22dc.svg",
          "symbol": "WETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzOA=="
          },
          "id": "QXNzZXRUeXBlOjQ2NDU2ODE="
        },
        "quantity": "1502841336452599400",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjEzNTc0NjA3Mzk2MzM3NzU2NjY4MTkxMzczOTUxNTUwMzAwMDE0"
      }
    }
  },
  "isEditable": {
    "value": false,
    "reason": "Unauthorized"
  },
  "isListable": true,
  "ownership": null,
  "creator": {
    "address": "0xe9d30eddd11dea8433cf6d2b2c22e9cce94113dc",
    "id": "QWNjb3VudFR5cGU6NjEyNTkxNTA="
  },
  "ownedQuantity": null,
  "assetEventData": {
    "lastSale": {
      "unitPriceQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "quantity": "1300000000000000000",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjQxMDUyNDMxOTA1OTU2ODY0MDMxNjQ3MTYzMjQyMzYyNTQ4MTkw"
      }
    }
  }
}

mlarcher · 2022-03-19T17:00:10Z

@dcts hooking into the graphql API sounds like a wonderful idea. It could drastically improve the performance and avoid some DOM related pitfalls 👍

mlarcher · 2022-03-19T23:09:42Z

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

Using the API would be nice, but from what I heard they don't give API tokens very easily, and even if granted an API Key you would be facing some limits/restrictions.

Also it seems the query they use on the site is not documented (AssetSearchQuery) and it requires an API key and a CSRF token that changes on every call, so I can see why it could be a pain to use...

using page.on('response', (response) => { sounds great though, as it would combine the best of both worlds. Any idea when you'll have time to give it a go ?

dcts · 2022-03-20T08:44:53Z

@mlarcher I'm working on it currently but not sure, depending on how long it will take to implement it could be today or next weekend maybe. But obviously no guarantees. ^^

mlarcher · 2022-03-20T20:44:22Z

great to read 👍
I'm looking forward to see it.
Let me know if I can do anything to help

dcts · 2022-03-23T18:26:55Z

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:
const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428
So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? thinking

I think some offers don't get fetched because of how the scraping algorithm is designed:

the algorithm keeps scrolling as long as possible

scrolling triggers fetching of new data, which changes the DOM

then the algorithm gets the data from the DOM
This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

@mlarcher I just found out that Opensea has a bug in their display of number of offers. The number they display on the page does not match the actual nfts displayed. For example check this page:
https://opensea.io/collection/deadfellaz?search[sortAscending]=true&search[sortBy]=PRICE&search[stringTraits][0][name]=Background&search[stringTraits][0][values][0]=Blue&search[stringTraits][1][name]=Body%20Grade&search[stringTraits][1][values][0]=Fresh&search[toggles][0]=BUY_NOW

opensea says that there are 76 items for sale, but if you count the nfts by scrolling down the page you will find that theres only 75 (obviously this can change but I'm pretty confident that it is a consistent bug).

So I think the scraping currently is working as it should, although scrapingByScrolling is not very efficient.

dcts · 2022-03-23T18:28:49Z

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

mlarcher · 2022-03-23T19:37:03Z

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

I'm looking forward to try it out !! 🤩

About your other point, the collection currently says 78 items and effectively lists them all, but I believe there can be a bug on their side there. There were never a big offset, so I'm fine leaving it at that 👍🏻

mlarcher · 2022-04-01T17:03:52Z

Any ETA for the new version by any chance? I'm eager to try it 😊

dcts · 2022-04-02T01:00:19Z

I have a working implementation with the new algorithm but its not stable, so I won't publish it. I can share my work in a seperate dev branch if you like.

mlarcher · 2022-04-02T09:23:19Z

I'd be interested in taking a look at it. Also, what's not stable ? Is there anything I can do to help ?

mlarcher · 2022-04-10T12:41:41Z

@dcts Any news ?
FYI we now have our scraping job on GCP stuck on "scrape offers until target resultsize reached or bottom of page reached" and never ending...
I'd like to check if the new implementation works any better there

dcts · 2022-04-10T23:45:31Z

@mlarcher if you like check out the branch dev-improve-offersByScrolling. The new implementation sometimes works, but not stable as I mentioned. The autoscrolling part needs improvement. You can test the new version by running git fetch and then git checkout dev-improve-offersByScrolling on your local machine. And then use the new version with:

const result = await OpenseaScraper.offersByScrolling("deadfellaz", 100, options);

When you run the scraper on GCP, do the other functions work (for example, can you run OpenseaScraper.offers()?). If yes would be awesome if you could share your setup, I think a lot of people would be interested in that :)

mlarcher · 2022-04-11T09:44:19Z

I tried offers() in production on GCP and got

TypeError: Cannot read properties of undefined (reading 'split')
    at _parseWiredVariable (/app/node_modules/opensea-scraper/src/functions/offers.js:105:49)
    at offersByUrl (/app/node_modules/opensea-scraper/src/functions/offers.js:90:21)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.offers (/app/node_modules/opensea-scraper/src/functions/offers.js:37:10)

right after extracting __wired__ variable
Is there a way to dump the html content for debugging ?

dcts · 2022-04-11T11:13:22Z

I would argue that if OpenseaScraper.offers() does not work on GCP there's no way that OpenseaScraper.offersByScrolling() will work (on GCP). So theres 2 problems here:

making OpenseaScraper run on GCP
making OpenseaScraper.offersByScrolling() work

Before takling 2 you need to figure our 1, otherwise theres no way to properly debug. The topic of this issue is 2 though.

dcts · 2022-04-11T11:38:20Z

You can get the HTML content from puppeteer with content() method:

const html = await page.content();

dcts · 2022-04-12T12:49:08Z

Lets move this conversation to issue #40 (moved your content over there)

zolmine · 2023-02-28T00:18:18Z

Hello y'all,
here's an updated version of the offerByScrollingByUrl function:

https://github.com/zolmine/openseaScraperByScrooling *

dcts self-assigned this Feb 2, 2022

dcts added the bug Something isn't working label Feb 2, 2022

dcts mentioned this issue Feb 4, 2022

Code freezes when running OpenseaScraper.rankings() #34

Closed

dcts closed this as completed Feb 15, 2022

dcts reopened this Mar 17, 2022

dcts mentioned this issue Mar 17, 2022

Fix offers by scrolling #44

Merged

Repository owner deleted a comment from mlarcher Apr 12, 2022

dcts mentioned this issue Jul 2, 2022

[BUG] Cannot get any result for floor price #55

Closed

dcts changed the title ~~OpenseaScraper.offersByScrolling() not properly working~~ offersByScrolling() and offersByScrollingByUrl() not properly working Jul 2, 2022

dcts changed the title ~~offersByScrolling() and offersByScrollingByUrl() not properly working~~ [BUG] offersByScrolling() and offersByScrollingByUrl() not properly working Jul 2, 2022

dcts mentioned this issue Nov 29, 2022

offersByScrolling() and offersByScrollingByUrl() does not work #63

Closed

[BUG] offersByScrolling() and offersByScrollingByUrl() not properly working #36

[BUG] offersByScrolling() and offersByScrollingByUrl() not properly working #36

Comments

dcts commented Feb 2, 2022 • edited

dcts commented Feb 2, 2022

SKreutz commented Feb 3, 2022

dcts commented Feb 4, 2022

dcts commented Feb 11, 2022 • edited

SKreutz commented Feb 12, 2022

dcts commented Feb 12, 2022

SKreutz commented Feb 15, 2022

mlarcher commented Mar 17, 2022

mlarcher commented Mar 17, 2022

dcts commented Mar 17, 2022

SKreutz commented Mar 17, 2022

mlarcher commented Mar 17, 2022

dcts commented Mar 17, 2022

dcts commented Mar 17, 2022

mlarcher commented Mar 17, 2022

mlarcher commented Mar 17, 2022

mlarcher commented Mar 17, 2022

dcts commented Mar 17, 2022

mlarcher commented Mar 17, 2022

dcts commented Mar 17, 2022 • edited

mlarcher commented Mar 18, 2022 • edited

dcts commented Mar 18, 2022

SKreutz commented Mar 18, 2022

mlarcher commented Mar 18, 2022

mlarcher commented Mar 18, 2022 • edited

dcts commented Mar 19, 2022 • edited

I think thats a nice solution and should be fairly easy to develop 🎉 Added it to the roadmap 🚔!

dcts commented Mar 19, 2022 • edited

mlarcher commented Mar 19, 2022

mlarcher commented Mar 19, 2022

dcts commented Mar 20, 2022

mlarcher commented Mar 20, 2022 • edited

dcts commented Mar 23, 2022

dcts commented Mar 23, 2022

mlarcher commented Mar 23, 2022

mlarcher commented Apr 1, 2022

dcts commented Apr 2, 2022

mlarcher commented Apr 2, 2022

mlarcher commented Apr 10, 2022

dcts commented Apr 10, 2022 • edited

mlarcher commented Apr 11, 2022

dcts commented Apr 11, 2022

dcts commented Apr 11, 2022

dcts commented Apr 12, 2022

zolmine commented Feb 28, 2023

[BUG] `offersByScrolling()` and `offersByScrollingByUrl()` not properly working #36

[BUG] `offersByScrolling()` and `offersByScrollingByUrl()` not properly working #36

dcts commented Feb 2, 2022 •

edited

dcts commented Feb 11, 2022 •

edited

dcts commented Mar 17, 2022 •

edited

mlarcher commented Mar 18, 2022 •

edited

mlarcher commented Mar 18, 2022 •

edited

dcts commented Mar 19, 2022 •

edited

dcts commented Mar 19, 2022 •

edited

mlarcher commented Mar 20, 2022 •

edited

dcts commented Apr 10, 2022 •

edited