Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Contributing to label / scraping method #11

Open
brianleect opened this issue Aug 10, 2022 · 6 comments
Open

[Question] Contributing to label / scraping method #11

brianleect opened this issue Aug 10, 2022 · 6 comments

Comments

@brianleect
Copy link

Came across the repo while requiring address label data awhile back and noticed it only covered a specific subset of label data from etherscan and scraping needed a separate tamper monkey script for each label.

Due to needing other label data not covered, I ended up making a more generalized scraper for etherscan over at https://github.com/brianleect/etherscan-labels

Would love to know how I could contribute back to this repo to populate it with more label information and perhaps also the more generalized scraping method I utilized.

@dawsbot
Copy link
Owner

dawsbot commented Aug 10, 2022

Wow @brianleect, this is a significant contribution you've made to open-source! Clearly, I'd love to join forces and have a single repo with all the labels. Whether that's you, me, or us, I have no preference so long as the library is easy to consume in node.js and JavaScript.

I saw in a quick glance that you implemented the scraper in python. Are you familiar with the JS ecosystem too?

@brianleect
Copy link
Author

Thanks for responding @dawsbot !

I used Selenium Python for the scraping due to having used it prior. Just realized there was Selenium JS available as well. I'm familiar enough with JS and should be trivial to rewrite it.

Regarding labels

  • Label bloat (Some labels contain ~80-90k addresses which might not be relevant other users) (Do we wish to include these or leave it to users to scrape themselves?)
  • Porting over labels (Think a script to loop through my full list of json and generate code in the same format of your current labels can be done) (~400 labels atm)

Would love to know what you think about it.

@dawsbot
Copy link
Owner

dawsbot commented Aug 11, 2022

Rewriting in JS would be my goal here, but if that's a hassle, let's address that upfront.

I think the massive lists (80-90k addresses) is fine so long as we optimize the bundle output for JS. I'm happy to tag-team on this, but given my current work-load elsewhere (high), I've got ideas how to collab on this. Discord me at daws.eth# TWO FIVE SIX TWO 🙏

@brianleect
Copy link
Author

brianleect commented Aug 12, 2022

Sent you a friend request on discord. I'm transfixed#0001.

@brianleect
Copy link
Author

brianleect commented Aug 12, 2022

Rewrote the login and partial scraping format in selenium JS

https://github.com/brianleect/evm-labels/blob/master/scripts/scrape-all.js

Not too sure what is the javascript equivalent of pandas.read_html to retrieve table though.

@dawsbot
Copy link
Owner

dawsbot commented Aug 13, 2022

Nice @brianleect ! I'm excited to join forces 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants