Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] offersByScrolling() and offersByScrollingByUrl() not properly working #36

Open
dcts opened this issue Feb 2, 2022 · 44 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@dcts
Copy link
Owner

dcts commented Feb 2, 2022

I noticed that the function offersByScrolling() and offersByScrollingByUrl() is not working properly. Most of the offers are not scraped (a lot of them are skipped for some reason, approximately 75% of the offers are not saved). This leads to the function being stuck for a long time, as it takes a lot longer to scrape the desired amount of offers when 75% of the offers are not scraped.

@dcts
Copy link
Owner Author

dcts commented Feb 2, 2022

If anyone experiences this too and relies on this function please comment below so I know its urgent 📝

@dcts dcts self-assigned this Feb 2, 2022
@dcts dcts added the bug Something isn't working label Feb 2, 2022
@SKreutz
Copy link

SKreutz commented Feb 3, 2022

This is exactly the problem I'm running into right now. should I close the other issue?
I didn't find a fix until now but I'll also keep looking into this. It also still doesn't occur when choosing "total_volume" instead of the other options.

@dcts
Copy link
Owner Author

dcts commented Feb 4, 2022

Oh yeah you're right, somehow I didn't realize this is the same bug than you reported, I randomly noticed it during testing. Closing the other issue #34 as its the same.

@dcts
Copy link
Owner Author

dcts commented Feb 11, 2022

@SKreutz do you need to scrape multiple pages or are the first 100 sufficient? Because there is a way of getting the top 100 elements without scrolling, just run this script:

const nextDataStr = document.getElementById("__NEXT_DATA__").innerText;
const nextData = JSON.parse(nextDataStr);
const top100 = nextData.props.relayCache[0][1].json.data.rankings.edges.map(obj => obj.node);

This is way faster and more efficient than scrolling and scraping the data from the DOM. I will integrate this in the repository soon and add the following functions:

OpenseaScraper.rankings("24h"); // https://opensea.io/rankings?sortBy=one_day_volume
OpenseaScraper.rankings("7d"); // https://opensea.io/rankings?sortBy=seven_day_volume
OpenseaScraper.rankings("30d"); // https://opensea.io/rankings?sortBy=thirty_day_volume
OpenseaScraper.rankings("total"); // https://opensea.io/rankings?sortBy=total_volume

// ❌ currently not working: scrape more than 100 items from rankings page
OpenseaScraper.rankingsByScrolling(); 

@SKreutz
Copy link

SKreutz commented Feb 12, 2022

@dcts I only want to scrape the first 100 slugs yes. Where do I put the 3 lines of code you provided? Thank you for your help I really appreciate it!

@dcts
Copy link
Owner Author

dcts commented Feb 12, 2022

@SKreutz I added this new method and updated the repository, just update to the latest version 6.0.0 and then you can do:

// scrape all slugs, names and ranks from the top collections from the rankings page
// "type" is one of the following:
// "24h": ranking of last 24 hours: https://opensea.io/rankings?sortBy=one_day_volume
// "7d": ranking of last 7 days: https://opensea.io/rankings?sortBy=seven_day_volume
// "30d": ranking of last 30 days: https://opensea.io/rankings?sortBy=thirty_day_volume
// "total": scrapes all time ranking: https://opensea.io/rankings?sortBy=total_volume
const type = "24h"; // possible values: "24h", "7d", "30d", "total"
const ranking = await OpenseaScraper.rankings(type, options);

@SKreutz
Copy link

SKreutz commented Feb 15, 2022

@dcts your fix seems to work fine! Really appreciate your help. It's even a lot faster than before. This bug can be closed.

@dcts dcts closed this as completed Feb 15, 2022
@mlarcher
Copy link

How come the issue has been closed ? Has the OpenseaScraper.offersByScrolling() method been fixed ?

It seems to me that the issue first expressed in this ticket is still happening, but you found a workaround for the rankings case. Is there something I am not interpreting correctly ?

@mlarcher
Copy link

Not sure it is the same issue, but when running our script we get "stats":{"totalOffers":416} even though the offers field only contains 410 elements after calling scraper.offersByScrolling when running the script locally. In production on GCP, we get an empty result that lookis like

offers: []
stats: {}

Something is definitely wrong with this method... What can we do to help investiguate the issue?

@dcts dcts reopened this Mar 17, 2022
@dcts
Copy link
Owner Author

dcts commented Mar 17, 2022

@mlarcher I just checked and yes, you are absolutely right, the issue was never resolved. Thanks for reporting!

I need to take a closer look at the code, something happend that broke the code.

@SKreutz
Copy link

SKreutz commented Mar 17, 2022

I just tried to repoduce the issue.
When I try to check for example "slotienft" with currently 390 items on "buy now" and using the offers method works fine:

=== actions ===
new page created
opening url https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW
🚧 waiting for cloudflare to resolve...
extracting wired variable
closing browser...
extracting offers and stats from wired variable
total Offers: 390
top 3 Offers
[
{
name: 'Slotie #4606',
tokenId: '4606',
displayImageUrl: 'https://lh3.googleusercontent.com/6YxBtVI9cA4Y2kEMujrGodnXk55lEiJXRCdLDnGbwQRmpBI26Va7_BU7tmBvWYJz1YQz1lwGRuCZP_UtKHndL14Zj4qXwpy-Jfc8',
assetContract: '0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d',
offerUrl: 'https://opensea.io/assets/0x5fdb2b0c56afa73b8ca2228e6ab92be90325961d/4606',
floorPrice: { amount: 0.685, currency: 'ETH' }
.
.
.

Scraping offers by scrolling also works fine for me.

✅ === OpenseaScraper.offersByScrolling(slug, 40) ===
=== scraping started ===
Scraping Opensea URL: https://opensea.io/collection/slotienft?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW

=== options ===
debug : false
logs : true
browserInstance: default

=== actions ===
new page created
🚧 waiting for cloudflare to resolve
expose all helper functions
scrape offers until target resultsize reached or bottom of page reached
closing browser...
total Offers: 390
all scraped offers (max 40):
[

I also tried different collections. Everything works fine for me. I am using Mac OS Monetery 12.0.1 and Node v16.13.1. I also just downloaded the latest version of opensea scraper

Let me know if you need further information

@mlarcher
Copy link

Here's what I get:

server_1       | 2022-03-17T22:00:43.174Z debug: Start scraping prices
server_1       | === scraping started ===
server_1       | Scraping Opensea URL: https://opensea.io/collection/chumbivalleyofficial?search[sortAscending]=true&search[sortBy]=PRICE&search[toggles][0]=BUY_NOW
server_1       |
server_1       | === options ===
server_1       | debug          : false
server_1       | logs           : true
server_1       | browserInstance: default
server_1       |
server_1       | === actions ===
server_1       | new page created
server_1       | 🚧 waiting for cloudflare to resolve
server_1       | expose all helper functions
server_1       | scrape offers until target resultsize reached or bottom of page reached
server_1       | closing browser...
server_1       | 2022-03-17T22:11:17.853Z debug: Prices scraping done [{"foundOffersCount":408,"stats":{"totalOffers":412}}]

I'm on MacOS Monterey 12.3 in a docker container running node:16.14.0-alpine3.14

@dcts
Copy link
Owner Author

dcts commented Mar 17, 2022

@mlarcher I published a fix, can you test and let me know if it works now, be sure to use version 6.0.2 :)

@dcts
Copy link
Owner Author

dcts commented Mar 17, 2022

@SKreutz thanks for testing! I think it might have looked like everything works on your end, but in fact a lot of the offers were missing when using the offersByScrolling method. The bug was that 80% of the offers were skipped, only ~20% got scraped. This is particularly bad because sometimes it might seem that everything works, whereas it actually did not. And other times it just broke.

But now it should be fixed, at least the demo is working again (for me) with all relevant offers scraped. You can test it with

npm run demo

@mlarcher
Copy link

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me...
I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

@mlarcher
Copy link

second run got me [{"foundOffersCount":405,"stats":{"totalOffers":413}}] so we're not good yet :/

@mlarcher
Copy link

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

@dcts
Copy link
Owner Author

dcts commented Mar 17, 2022

@mlarcher can you post what collection you scraped that got you these results?

@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me... I just tested the 6.0.2 version, I got [{"foundOffersCount":412,"stats":{"totalOffers":413}}] so one offer is still missing in the offers array. I'm running it a second time to be sure, but I see 413 on opensea right now, so thre's probably still something going on.

@mlarcher
Copy link

@dcts
Copy link
Owner Author

dcts commented Mar 17, 2022

When I run the following:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 40, options);

I get correct results, in fact, they are identical to running OpenseaScraper.offers("chumbivalleyofficial",options).

Can you try to run it locally (not on GCP)?

also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ?

To answer your question: yes, its an unrelated problem that has nothing to do with the scraper, but with the environment. Cloud setups for scraping are always difficult because you don't have full control over the environment, ips etc. Also services like cloudflare can detect a cloud environment (through IP lists) and handle them differently (block them). See issues #40 #39. In case I find a solution for the cloud I will certainly share, but as of now I don't plan to work on that. But I encourage everybody to share working cloud setups, because it is a common thing that certainly a lot of people would like.

@mlarcher
Copy link

mlarcher commented Mar 18, 2022

@dcts thanks for the information.

GCP is not at stake here, as we have absolutely no result at all there (even if it used to work at some point before). I'll check if I can do anything to change the script's external ip.

What I was giving are results in a docker container on my machine.

Your test got me thinking, and I tried directly on the host machine with no docker container involved and got the same issue : [{"foundOffersCount":419,"stats":{"totalOffers":422}}]

In your test you are limiting the results to 40, which is a way of avoiding the issue, but we want a way larger result set. There are about 420 items on sell, not 40... Maybe you could try on your machine with a limit set at 500 ?

Please let me know what else we can do to help investigate the issue.

@dcts
Copy link
Owner Author

dcts commented Mar 18, 2022

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔

I think some offers don't get fetched because of how the scraping algorithm is designed:

  • the algorithm keeps scrolling as long as possible
  • scrolling triggers fetching of new data, which changes the DOM
  • then the algorithm gets the data from the DOM
    This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

@SKreutz
Copy link

SKreutz commented Mar 18, 2022

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔

I think some offers don't get fetched because of how the scraping algorithm is designed:

  • the algorithm keeps scrolling as long as possible
  • scrolling triggers fetching of new data, which changes the DOM
  • then the algorithm gets the data from the DOM
    This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

I also thinks it’s not possible to fetch 100% because of the way opensea uses to display the items and as you mentioned the DOM changes. When scrolling manually and looking at the html, the DOM changes and adds the elements as they appear. Sometimes opensea is very slow or the nfts are gifs instead of jpegs which takes even longer and I think that’s why some items are skipped.

The only way to „fix“ this would in my opinion be to place a sleep of a few seconds after each „scroll“ so the items have more time
to display. But I don’t know how the code works exactly and even that would not be a nice solution and it would make the code slow.

@mlarcher
Copy link

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔
yes, that's it when run locally or in the docker cotainer on my home machine. On GCP I get no result at all, but as we saw it's not the same issue.

The only way to „fix“ this would in my opinion be to place a sleep of a few seconds after each „scroll“ so the items have more time to display.
Perhaps a timeout after the last scroll only somehow ?

I'll check if there is a better way to know when the DOM is "stabilized"...

@mlarcher
Copy link

mlarcher commented Mar 18, 2022

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

@dcts
Copy link
Owner Author

dcts commented Mar 19, 2022

perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ?

@mlarcher Yes this is a good idea, I tried this at some point but could not make it work, maybe worth a revisit.

Also what could be even more efficient is scrolling and simply controling puppeteer network activity, like this:

// taken from => https://stackoverflow.com/a/55478226/6272061
page.on('response', (response) => {
    const headers = response.headers();

    // example test: check if content-type contains javascript or html
    const contentType = headers['content-type'];
    if (textRegex.test(contentType)) {
        console.log(response.url());
    }
});

Once new data needs to be fetched the graphql API is called and when we intercept that request we get the data in this format:

{
    "node": {
        "assetCount": null,
        "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
        "name": "DeadFellaz",
        "slug": "deadfellaz",
        "isVerified": true,
        "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
        "description": "10,000 undead NFTs on the Ethereum blockchain. Join the horde.\n\nAdditional official collections:\n\n[Halloween S1](https://opensea.io/collection/deadfellaz-infected-s1) | [Nifty Gateway Betty Pop Horror](https://opensea.io/collection/betty-pop-horror-by-deadfellaz) | [Deadfrenz Lab Access Pass](https://opensea.io/collection/deadfrenz-lab-access-pass) | [Deadfrenz Collection](https://opensea.io/collection/deadfrenz-collection)"
    }
}

Bildschirmfoto vom 2022-03-19 11-39-36

I think thats a nice solution and should be fairly easy to develop 🎉 Added it to the roadmap 🚔!

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

@dcts
Copy link
Owner Author

dcts commented Mar 19, 2022

Ups just realized that I posted the collection information above, the information for every single item (offer) looks like this:

{
  "assetContract": {
    "address": "0x2acab3dea77832c09420663b0e1cb386031ba17b",
    "chain": "ETHEREUM",
    "id": "QXNzZXRDb250cmFjdFR5cGU6MzAyOTQ1",
    "openseaVersion": null
  },
  "collection": {
    "isVerified": true,
    "relayId": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
    "displayData": {
        "cardDisplayStyle": "CONTAIN"
    },
    "imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
    "slug": "deadfellaz",
    "isAuthorizedEditor": false,
    "name": "DeadFellaz"
  },
  "relayId": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "tokenId": "3036",
  "backgroundColor": null,
  "imageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "name": "DeadFellaz #3036",
  "id": "QXNzZXRUeXBlOjM2Nzg2ODY0",
  "isDelisted": false,
  "animationUrl": null,
  "displayImageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
  "decimals": 0,
  "favoritesCount": 23,
  "isFavorite": false,
  "isFrozen": false,
  "hasUnlockableContent": false,
  "orderData": {
    "bestAsk": {
      "relayId": "T3JkZXJWMlR5cGU6MzUyMjU2ODkzMQ==",
      "orderType": "BASIC",
      "maker": {
        "address": "0x28705f64c07079822c7afd66e43975b7c6095ef6",
        "id": "QWNjb3VudFR5cGU6MTQ1NjA1MTQy"
      },
      "closedAt": "2022-04-05T05:44:18",
      "dutchAuctionFinalPrice": null,
      "openedAt": "2022-03-17T21:48:42",
      "priceFnEndedAt": null,
      "quantity": "1",
      "decimals": null,
      "paymentAssetQuantity": {
        "quantity": "2690000000000000000",
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "id": "QXNzZXRRdWFudGl0eVR5cGU6Mjg3MDE4NzA3OTcyNTgyMjM1NjM1NTg1MDc0MTcxNjgyNzE3ODc4",
        "quantityInEth": "2690000000000000000"
      }
    },
    "bestBid": {
      "orderType": "BASIC",
      "paymentAssetQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/accae6b6fb3888cbff27a013729c22dc.svg",
          "symbol": "WETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzOA=="
          },
          "id": "QXNzZXRUeXBlOjQ2NDU2ODE="
        },
        "quantity": "1502841336452599400",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjEzNTc0NjA3Mzk2MzM3NzU2NjY4MTkxMzczOTUxNTUwMzAwMDE0"
      }
    }
  },
  "isEditable": {
    "value": false,
    "reason": "Unauthorized"
  },
  "isListable": true,
  "ownership": null,
  "creator": {
    "address": "0xe9d30eddd11dea8433cf6d2b2c22e9cce94113dc",
    "id": "QWNjb3VudFR5cGU6NjEyNTkxNTA="
  },
  "ownedQuantity": null,
  "assetEventData": {
    "lastSale": {
      "unitPriceQuantity": {
        "asset": {
          "decimals": 18,
          "imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
          "symbol": "ETH",
          "usdSpotPrice": 2946.32,
          "assetContract": {
            "blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
            "chain": "ETHEREUM",
            "id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
          },
          "id": "QXNzZXRUeXBlOjEzNjg5MDc3"
        },
        "quantity": "1300000000000000000",
        "id": "QXNzZXRRdWFudGl0eVR5cGU6MjQxMDUyNDMxOTA1OTU2ODY0MDMxNjQ3MTYzMjQyMzYyNTQ4MTkw"
      }
    }
  }
}

@mlarcher
Copy link

@dcts hooking into the graphql API sounds like a wonderful idea. It could drastically improve the performance and avoid some DOM related pitfalls 👍

@mlarcher
Copy link

Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use.

Using the API would be nice, but from what I heard they don't give API tokens very easily, and even if granted an API Key you would be facing some limits/restrictions.

Also it seems the query they use on the site is not documented (AssetSearchQuery) and it requires an API key and a CSRF token that changes on every call, so I can see why it could be a pain to use...

using page.on('response', (response) => { sounds great though, as it would combine the best of both worlds. Any idea when you'll have time to give it a go ?

@dcts
Copy link
Owner Author

dcts commented Mar 20, 2022

@mlarcher I'm working on it currently but not sure, depending on how long it will take to implement it could be today or next weekend maybe. But obviously no guarantees. ^^

@mlarcher
Copy link

mlarcher commented Mar 20, 2022

great to read 👍
I'm looking forward to see it.
Let me know if I can do anything to help

@dcts
Copy link
Owner Author

dcts commented Mar 23, 2022

@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results:

const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428

So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? thinking

I think some offers don't get fetched because of how the scraping algorithm is designed:

  • the algorithm keeps scrolling as long as possible
  • scrolling triggers fetching of new data, which changes the DOM
  • then the algorithm gets the data from the DOM
    This is obviously not a great design, as its very error prone. What if the DOM is being checked before the data has been inserted? or what if the fetching fails? In those cases the algorithm would simply skip and continue.

I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem.

@mlarcher I just found out that Opensea has a bug in their display of number of offers. The number they display on the page does not match the actual nfts displayed. For example check this page:
https://opensea.io/collection/deadfellaz?search[sortAscending]=true&search[sortBy]=PRICE&search[stringTraits][0][name]=Background&search[stringTraits][0][values][0]=Blue&search[stringTraits][1][name]=Body%20Grade&search[stringTraits][1][values][0]=Fresh&search[toggles][0]=BUY_NOW

opensea says that there are 76 items for sale, but if you count the nfts by scrolling down the page you will find that theres only 75 (obviously this can change but I'm pretty confident that it is a consistent bug).

So I think the scraping currently is working as it should, although scrapingByScrolling is not very efficient.

@dcts
Copy link
Owner Author

dcts commented Mar 23, 2022

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

@mlarcher
Copy link

(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better)

I'm looking forward to try it out !! 🤩

About your other point, the collection currently says 78 items and effectively lists them all, but I believe there can be a bug on their side there. There were never a big offset, so I'm fine leaving it at that 👍🏻

@mlarcher
Copy link

mlarcher commented Apr 1, 2022

Any ETA for the new version by any chance? I'm eager to try it 😊

@dcts
Copy link
Owner Author

dcts commented Apr 2, 2022

I have a working implementation with the new algorithm but its not stable, so I won't publish it. I can share my work in a seperate dev branch if you like.

@mlarcher
Copy link

mlarcher commented Apr 2, 2022

I'd be interested in taking a look at it. Also, what's not stable ? Is there anything I can do to help ?

@mlarcher
Copy link

@dcts Any news ?
FYI we now have our scraping job on GCP stuck on "scrape offers until target resultsize reached or bottom of page reached" and never ending...
I'd like to check if the new implementation works any better there

@dcts
Copy link
Owner Author

dcts commented Apr 10, 2022

@mlarcher if you like check out the branch dev-improve-offersByScrolling. The new implementation sometimes works, but not stable as I mentioned. The autoscrolling part needs improvement. You can test the new version by running git fetch and then git checkout dev-improve-offersByScrolling on your local machine. And then use the new version with:

const result = await OpenseaScraper.offersByScrolling("deadfellaz", 100, options);

When you run the scraper on GCP, do the other functions work (for example, can you run OpenseaScraper.offers()?). If yes would be awesome if you could share your setup, I think a lot of people would be interested in that :)

@mlarcher
Copy link

I tried offers() in production on GCP and got

TypeError: Cannot read properties of undefined (reading 'split')
    at _parseWiredVariable (/app/node_modules/opensea-scraper/src/functions/offers.js:105:49)
    at offersByUrl (/app/node_modules/opensea-scraper/src/functions/offers.js:90:21)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.offers (/app/node_modules/opensea-scraper/src/functions/offers.js:37:10)

right after extracting __wired__ variable
Is there a way to dump the html content for debugging ?

@dcts
Copy link
Owner Author

dcts commented Apr 11, 2022

I would argue that if OpenseaScraper.offers() does not work on GCP there's no way that OpenseaScraper.offersByScrolling() will work (on GCP). So theres 2 problems here:

  1. making OpenseaScraper run on GCP
  2. making OpenseaScraper.offersByScrolling() work

Before takling 2 you need to figure our 1, otherwise theres no way to properly debug. The topic of this issue is 2 though.

@dcts
Copy link
Owner Author

dcts commented Apr 11, 2022

You can get the HTML content from puppeteer with content() method:

const html = await page.content();

@dcts
Copy link
Owner Author

dcts commented Apr 12, 2022

Lets move this conversation to issue #40 (moved your content over there)

Repository owner deleted a comment from mlarcher Apr 12, 2022
Repository owner deleted a comment from mlarcher Apr 12, 2022
@dcts dcts changed the title OpenseaScraper.offersByScrolling() not properly working offersByScrolling() and offersByScrollingByUrl() not properly working Jul 2, 2022
@dcts dcts changed the title offersByScrolling() and offersByScrollingByUrl() not properly working [BUG] offersByScrolling() and offersByScrollingByUrl() not properly working Jul 2, 2022
@zolmine
Copy link

zolmine commented Feb 28, 2023

Hello y'all,
here's an updated version of the offerByScrollingByUrl function:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants