-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] offersByScrolling()
and offersByScrollingByUrl()
not properly working
#36
Comments
If anyone experiences this too and relies on this function please comment below so I know its urgent 📝 |
This is exactly the problem I'm running into right now. should I close the other issue? |
Oh yeah you're right, somehow I didn't realize this is the same bug than you reported, I randomly noticed it during testing. Closing the other issue #34 as its the same. |
@SKreutz do you need to scrape multiple pages or are the first 100 sufficient? Because there is a way of getting the top 100 elements without scrolling, just run this script: const nextDataStr = document.getElementById("__NEXT_DATA__").innerText;
const nextData = JSON.parse(nextDataStr);
const top100 = nextData.props.relayCache[0][1].json.data.rankings.edges.map(obj => obj.node); This is way faster and more efficient than scrolling and scraping the data from the DOM. I will integrate this in the repository soon and add the following functions: OpenseaScraper.rankings("24h"); // https://opensea.io/rankings?sortBy=one_day_volume
OpenseaScraper.rankings("7d"); // https://opensea.io/rankings?sortBy=seven_day_volume
OpenseaScraper.rankings("30d"); // https://opensea.io/rankings?sortBy=thirty_day_volume
OpenseaScraper.rankings("total"); // https://opensea.io/rankings?sortBy=total_volume
// ❌ currently not working: scrape more than 100 items from rankings page
OpenseaScraper.rankingsByScrolling(); |
@dcts I only want to scrape the first 100 slugs yes. Where do I put the 3 lines of code you provided? Thank you for your help I really appreciate it! |
@SKreutz I added this new method and updated the repository, just update to the latest version // scrape all slugs, names and ranks from the top collections from the rankings page
// "type" is one of the following:
// "24h": ranking of last 24 hours: https://opensea.io/rankings?sortBy=one_day_volume
// "7d": ranking of last 7 days: https://opensea.io/rankings?sortBy=seven_day_volume
// "30d": ranking of last 30 days: https://opensea.io/rankings?sortBy=thirty_day_volume
// "total": scrapes all time ranking: https://opensea.io/rankings?sortBy=total_volume
const type = "24h"; // possible values: "24h", "7d", "30d", "total"
const ranking = await OpenseaScraper.rankings(type, options); |
@dcts your fix seems to work fine! Really appreciate your help. It's even a lot faster than before. This bug can be closed. |
How come the issue has been closed ? Has the It seems to me that the issue first expressed in this ticket is still happening, but you found a workaround for the |
Not sure it is the same issue, but when running our script we get
Something is definitely wrong with this method... What can we do to help investiguate the issue? |
@mlarcher I just checked and yes, you are absolutely right, the issue was never resolved. Thanks for reporting! I need to take a closer look at the code, something happend that broke the code. |
I just tried to repoduce the issue. === actions === Scraping offers by scrolling also works fine for me. ✅ === OpenseaScraper.offersByScrolling(slug, 40) === === options === === actions === I also tried different collections. Everything works fine for me. I am using Mac OS Monetery 12.0.1 and Node v16.13.1. I also just downloaded the latest version of opensea scraper Let me know if you need further information |
Here's what I get:
I'm on MacOS Monterey 12.3 in a docker container running node:16.14.0-alpine3.14 |
@mlarcher I published a fix, can you test and let me know if it works now, be sure to use version |
@SKreutz thanks for testing! I think it might have looked like everything works on your end, but in fact a lot of the offers were missing when using the But now it should be fixed, at least the demo is working again (for me) with all relevant offers scraped. You can test it with npm run demo |
@dcts it's @SKreutz who said "Scraping offers by scrolling also works fine for me" not me... |
second run got me |
also, is there any chance it works on GCP with current version, or is it an unrelated problem that I get empty results in production ? |
@mlarcher can you post what collection you scraped that got you these results?
|
here it is @dcts :
|
When I run the following: const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 40, options); I get correct results, in fact, they are identical to running Can you try to run it locally (not on GCP)?
To answer your question: yes, its an unrelated problem that has nothing to do with the scraper, but with the environment. Cloud setups for scraping are always difficult because you don't have full control over the environment, ips etc. Also services like cloudflare can detect a cloud environment (through IP lists) and handle them differently (block them). See issues #40 #39. In case I find a solution for the cloud I will certainly share, but as of now I don't plan to work on that. But I encourage everybody to share working cloud setups, because it is a common thing that certainly a lot of people would like. |
@dcts thanks for the information. GCP is not at stake here, as we have absolutely no result at all there (even if it used to work at some point before). I'll check if I can do anything to change the script's external ip. What I was giving are results in a docker container on my machine. Your test got me thinking, and I tried directly on the host machine with no docker container involved and got the same issue : In your test you are limiting the results to 40, which is a way of avoiding the issue, but we want a way larger result set. There are about 420 items on sell, not 40... Maybe you could try on your machine with a limit set at 500 ? Please let me know what else we can do to help investigate the issue. |
@mlarcher I tried the same with 500 and could replicate the inconsistency. Here are my results: const res = await OpenseaScraper.offersByScrolling("chumbivalleyofficial", 500, options);
console.log(res.offers.length); // => 420
console.log(res.stats.totalOffers); // => 428 So yes theres still an issue. But can you confirm that you at least get the algorithm running and you get most of the offers? (even if its not all of them)? You could get 419 offers out of 422, is that right? 🤔 I think some offers don't get fetched because of how the scraping algorithm is designed:
I am sure there is a better solution, and I agree would be great to have but, but on the other hand I did not yet come up with an idea on how to better solve this problem. |
I also thinks it’s not possible to fetch 100% because of the way opensea uses to display the items and as you mentioned the DOM changes. When scrolling manually and looking at the html, the DOM changes and adds the elements as they appear. Sometimes opensea is very slow or the nfts are gifs instead of jpegs which takes even longer and I think that’s why some items are skipped. The only way to „fix“ this would in my opinion be to place a sleep of a few seconds after each „scroll“ so the items have more time |
I'll check if there is a better way to know when the DOM is "stabilized"... |
perhaps you could use something like https://developer.mozilla.org/fr/docs/Web/API/MutationObserver to monitor dom changes, scroll, and debounce an ending function until nothing moves anymore ? |
@mlarcher Yes this is a good idea, I tried this at some point but could not make it work, maybe worth a revisit. Also what could be even more efficient is scrolling and simply controling puppeteer network activity, like this: // taken from => https://stackoverflow.com/a/55478226/6272061
page.on('response', (response) => {
const headers = response.headers();
// example test: check if content-type contains javascript or html
const contentType = headers['content-type'];
if (textRegex.test(contentType)) {
console.log(response.url());
}
}); Once new data needs to be fetched the graphql API is called and when we intercept that request we get the data in this format: {
"node": {
"assetCount": null,
"imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
"name": "DeadFellaz",
"slug": "deadfellaz",
"isVerified": true,
"id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
"description": "10,000 undead NFTs on the Ethereum blockchain. Join the horde.\n\nAdditional official collections:\n\n[Halloween S1](https://opensea.io/collection/deadfellaz-infected-s1) | [Nifty Gateway Betty Pop Horror](https://opensea.io/collection/betty-pop-horror-by-deadfellaz) | [Deadfrenz Lab Access Pass](https://opensea.io/collection/deadfrenz-lab-access-pass) | [Deadfrenz Collection](https://opensea.io/collection/deadfrenz-collection)"
}
} I think thats a nice solution and should be fairly easy to develop 🎉 Added it to the roadmap 🚔!Side note: At that point it might be worth trying to use the opensea graphQL api but I never could make it work and I heard from people that its a pain to use. |
Ups just realized that I posted the collection information above, the information for every single item (offer) looks like this: {
"assetContract": {
"address": "0x2acab3dea77832c09420663b0e1cb386031ba17b",
"chain": "ETHEREUM",
"id": "QXNzZXRDb250cmFjdFR5cGU6MzAyOTQ1",
"openseaVersion": null
},
"collection": {
"isVerified": true,
"relayId": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
"id": "Q29sbGVjdGlvblR5cGU6OTM2MTIx",
"displayData": {
"cardDisplayStyle": "CONTAIN"
},
"imageUrl": "https://lh3.googleusercontent.com/seJEwLWJP3RAXrxboeG11qbc_MYrxwVrsxGH0s0qxvF68hefOjf5qrPSKkIknUTYzfvinOUPWbYBdM8VEtGEE980Qv2ti_GGd86OWQ=s120",
"slug": "deadfellaz",
"isAuthorizedEditor": false,
"name": "DeadFellaz"
},
"relayId": "QXNzZXRUeXBlOjM2Nzg2ODY0",
"tokenId": "3036",
"backgroundColor": null,
"imageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
"name": "DeadFellaz #3036",
"id": "QXNzZXRUeXBlOjM2Nzg2ODY0",
"isDelisted": false,
"animationUrl": null,
"displayImageUrl": "https://lh3.googleusercontent.com/RQlR9mw-oJyhrj_GtwRZfRJdqk-fjtbJK4tElqpas4R1XksLXqnklhvnbw40LHsVliYoDO3z9rWE7OczRKp_qhDqSS_ZNzyRa9kG",
"decimals": 0,
"favoritesCount": 23,
"isFavorite": false,
"isFrozen": false,
"hasUnlockableContent": false,
"orderData": {
"bestAsk": {
"relayId": "T3JkZXJWMlR5cGU6MzUyMjU2ODkzMQ==",
"orderType": "BASIC",
"maker": {
"address": "0x28705f64c07079822c7afd66e43975b7c6095ef6",
"id": "QWNjb3VudFR5cGU6MTQ1NjA1MTQy"
},
"closedAt": "2022-04-05T05:44:18",
"dutchAuctionFinalPrice": null,
"openedAt": "2022-03-17T21:48:42",
"priceFnEndedAt": null,
"quantity": "1",
"decimals": null,
"paymentAssetQuantity": {
"quantity": "2690000000000000000",
"asset": {
"decimals": 18,
"imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
"symbol": "ETH",
"usdSpotPrice": 2946.32,
"assetContract": {
"blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
"chain": "ETHEREUM",
"id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
},
"id": "QXNzZXRUeXBlOjEzNjg5MDc3"
},
"id": "QXNzZXRRdWFudGl0eVR5cGU6Mjg3MDE4NzA3OTcyNTgyMjM1NjM1NTg1MDc0MTcxNjgyNzE3ODc4",
"quantityInEth": "2690000000000000000"
}
},
"bestBid": {
"orderType": "BASIC",
"paymentAssetQuantity": {
"asset": {
"decimals": 18,
"imageUrl": "https://openseauserdata.com/files/accae6b6fb3888cbff27a013729c22dc.svg",
"symbol": "WETH",
"usdSpotPrice": 2946.32,
"assetContract": {
"blockExplorerLink": "https://etherscan.io/address/0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
"chain": "ETHEREUM",
"id": "QXNzZXRDb250cmFjdFR5cGU6MjMzOA=="
},
"id": "QXNzZXRUeXBlOjQ2NDU2ODE="
},
"quantity": "1502841336452599400",
"id": "QXNzZXRRdWFudGl0eVR5cGU6MjEzNTc0NjA3Mzk2MzM3NzU2NjY4MTkxMzczOTUxNTUwMzAwMDE0"
}
}
},
"isEditable": {
"value": false,
"reason": "Unauthorized"
},
"isListable": true,
"ownership": null,
"creator": {
"address": "0xe9d30eddd11dea8433cf6d2b2c22e9cce94113dc",
"id": "QWNjb3VudFR5cGU6NjEyNTkxNTA="
},
"ownedQuantity": null,
"assetEventData": {
"lastSale": {
"unitPriceQuantity": {
"asset": {
"decimals": 18,
"imageUrl": "https://openseauserdata.com/files/6f8e2979d428180222796ff4a33ab929.svg",
"symbol": "ETH",
"usdSpotPrice": 2946.32,
"assetContract": {
"blockExplorerLink": "https://etherscan.io/address/0x0000000000000000000000000000000000000000",
"chain": "ETHEREUM",
"id": "QXNzZXRDb250cmFjdFR5cGU6MjMzMQ=="
},
"id": "QXNzZXRUeXBlOjEzNjg5MDc3"
},
"quantity": "1300000000000000000",
"id": "QXNzZXRRdWFudGl0eVR5cGU6MjQxMDUyNDMxOTA1OTU2ODY0MDMxNjQ3MTYzMjQyMzYyNTQ4MTkw"
}
}
}
} |
@dcts hooking into the graphql API sounds like a wonderful idea. It could drastically improve the performance and avoid some DOM related pitfalls 👍 |
Using the API would be nice, but from what I heard they don't give API tokens very easily, and even if granted an API Key you would be facing some limits/restrictions. Also it seems the query they use on the site is not documented (AssetSearchQuery) and it requires an API key and a CSRF token that changes on every call, so I can see why it could be a pain to use... using |
@mlarcher I'm working on it currently but not sure, depending on how long it will take to implement it could be today or next weekend maybe. But obviously no guarantees. ^^ |
great to read 👍 |
@mlarcher I just found out that Opensea has a bug in their display of number of offers. The number they display on the page does not match the actual nfts displayed. For example check this page: opensea says that there are 76 items for sale, but if you count the nfts by scrolling down the page you will find that theres only 75 (obviously this can change but I'm pretty confident that it is a consistent bug). So I think the scraping currently is working as it should, although |
(side note: I'm still gonna publish a v7 very soon with more efficient scrolling, as I already built it and like the architecture way better) |
I'm looking forward to try it out !! 🤩 About your other point, the collection currently says 78 items and effectively lists them all, but I believe there can be a bug on their side there. There were never a big offset, so I'm fine leaving it at that 👍🏻 |
Any ETA for the new version by any chance? I'm eager to try it 😊 |
I have a working implementation with the new algorithm but its not stable, so I won't publish it. I can share my work in a seperate dev branch if you like. |
I'd be interested in taking a look at it. Also, what's not stable ? Is there anything I can do to help ? |
@dcts Any news ? |
@mlarcher if you like check out the branch const result = await OpenseaScraper.offersByScrolling("deadfellaz", 100, options); When you run the scraper on GCP, do the other functions work (for example, can you run |
I tried
right after |
I would argue that if
Before takling 2 you need to figure our 1, otherwise theres no way to properly debug. The topic of this issue is 2 though. |
You can get the HTML content from puppeteer with const html = await page.content(); |
Lets move this conversation to issue #40 (moved your content over there) |
OpenseaScraper.offersByScrolling()
not properly workingoffersByScrolling()
and offersByScrollingByUrl()
not properly working
offersByScrolling()
and offersByScrollingByUrl()
not properly workingoffersByScrolling()
and offersByScrollingByUrl()
not properly working
Hello y'all, |
I noticed that the function
offersByScrolling()
andoffersByScrollingByUrl()
is not working properly. Most of the offers are not scraped (a lot of them are skipped for some reason, approximately 75% of the offers are not saved). This leads to the function being stuck for a long time, as it takes a lot longer to scrape the desired amount of offers when 75% of the offers are not scraped.The text was updated successfully, but these errors were encountered: