Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Grant: Swarm Scraper for Verified Smart Contracts #1

Open
SiddigZeidan opened this issue May 29, 2019 · 30 comments

Comments

@SiddigZeidan
Copy link

commented May 29, 2019

Swarm Scraper for Verified Smart Contracts

Background and context

When a smart contract is verified in Etherscan it is automatically added to Swarm, a platform designed to provide a sufficiently decentralized and redundant store of Ethereum’s public record.

Task Specifics

This grant will fund the creation and development of a Swarm specific scraper that retrieves the metadata of verified smart contracts published to Swarm by Etherscan. The scraper SHOULD NOT scrape Etherscan website.

Example of a smart contract below:

POA20 verified smart contract in Etherscan: https://etherscan.io/address/0x6758b7d441a9739b98552b373703d8d3d14f9e62#contracts

Swarm Source (hash): bzzr://43fcf63231967257f94fd4ccac8d14567ecb2b601a4e04fcf78c45099bf157be

The scraper should retrieve the hashes (swarm source) for all verified smart contracts . When these hashes are entered through either Etherscan’s Swarm explorer OR the official Swarm hash explorer, they will return the verified smart contract code.

@gitcoinbot

This comment has been minimized.

Copy link

commented May 29, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 500.0 DAI (500.0 USD @ $1.0/DAI) attached to it as part of the poanetwork fund.

@StevenJNPearce

This comment has been minimized.

Copy link

commented May 29, 2019

@SiddigZeidan Do you have any requirement on speed/run-time? My first idea for it would take a very long time to reach the current block. I don't see how to do it efficiently without having some event source to know when a contract is verified on Etherscan.

@gitcoinbot

This comment has been minimized.

Copy link

commented May 29, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 11 months, 2 weeks from now.
Please review their action plans below:

1) yolo2776 has started work.

I'm interested, I'm starting work on fesability and the basics. @SiddigZeidan If possible lets do a call or chat to help me understand this better.
2) yolo2776 has started work.

I'm interested, I'm starting work on fesability and the basics. @SiddigZeidan If possible lets do a call or chat to help me understand this better.
3) vghorakavi has started work.

I am interested in starting on the swarm scraper. I am not sure what is the difference between a web scraper and a swarm scraper but I am interested in collaborating.
4) sagarjethi has started work.

i am research about Swarm Scraper For Verified Smart Contracts. i also know about swarm storge

Learn more on the Gitcoin Issue Details page.

@igorbarinov

This comment has been minimized.

Copy link

commented May 29, 2019

Hi @StevenJNPearce
we don't have requirements on speed because it's

  • research
  • completeness is more important than speed

We know that Etherscan posted all verified contracts on Swarm.

Other explorers and services which would be interested in the data should be able to get it in full without scrapping the website which

  • can be restricted by technical (WAF) and non-technical measures (robots.txt)
  • is unethical to abuse someone else resources.
@janus

This comment has been minimized.

Copy link

commented May 29, 2019

@igorbarinov Could one say that this issue is about getting verified contracts on Swarm, and not about scrapping Etherscan site to fetch Swarm source hash? Is this correct?

@igorbarinov

This comment has been minimized.

Copy link

commented May 29, 2019

@janus that is right. Getting verified smart contracts from Swarm into local storage suitable for working with it later, e.g. importing into other explorers.

This grant will fund the creation and development of a Swarm specific scraper that retrieves the metadata of verified smart contracts published to Swarm by Etherscan. 
@EnochMbaebie

This comment has been minimized.

Copy link

commented May 29, 2019

I'm ready to join in this challenge, I was talking to @ninabreznik about this issue. I don't have much experience with Swarm or Etherscan at the moment but I would love to give it a try. What is the time frame for the project?
@janus would you maybe be up to split the task and work on it together. I can help and learn on the go and do the tasks with you, if that's ok.

@serapath

This comment has been minimized.

Copy link

commented May 29, 2019

Hey, I work with @ninabreznik on the ethereum play project, which - i guess - is where the need for the above originated from :-)

I was wondering how to systematically approach the problem and think i have a suggestion.

  1. crawl all swarm data for entries which contain solidity source code and store the addresses somewhere
  2. work with @kolinko on retrieving all contract addresses from the blockchain (~11-18 million)
  3. retrieve the bytecode of each deployed contract address from the blockchain
  4. compile each solidity source code found on swarm to bytecode
  5. compare the compilation result with the bytecode from the blockchain to find matches

This will probably need a lot of computers running for a long time to get all this information and match it so that in the end we have pairs of matching contracts (addresses, sourcecode), but eventually we can use existing API's from etherescan to speed up the process and eventually even talk to the swarm team or the etherscan team to hope for support - because after all, it would be beneficial for everyone and using the above stragety we WILL get to what we want, it might just take longer.

Does that make sense? :-)

@janus

This comment has been minimized.

Copy link

commented May 29, 2019

@janus

This comment has been minimized.

Copy link

commented May 29, 2019

@StevenJNPearce

This comment has been minimized.

Copy link

commented May 29, 2019

@SiddigZeidan Are you sure Etherscan is actually uploading to Swarm? There isn't anything at the hash you put in the issue, but I checked it against the contract bytecode and the hash is correct. I don't see evidence of them uploading anything, or there would be something at the hash, right?

@ligi

This comment has been minimized.

Copy link

commented May 29, 2019

@StevenJNPearce good observation! I also do not see any content at this hash. Which is a bit sad as access to verified contacts is badly needed. Anyone has direct contact to etherscan to ask them if they can actually do this? Would be nice.
Or perhaps the scope of this research should change to: how can we decentralize the process of verifying contracts so we do not depend on the centralized service etherscan. Recently had a meeting with a guy from truffle and @chriseth to perhaps get this in the truffle workflow for deployment.
I personally would also love to see the process of verifying contracts as a requirement for deploying contracts. Or perhaps (a bit more realistic) incentivize contract deployers to publish their metadata and source for verification. E.g. by adding badges to contracts that are verified.

@chriseth

This comment has been minimized.

Copy link

commented May 29, 2019

I very much support a decentralized source code verification project! This tool here could be of interest (feel free to extend it): https://github.com/ethereum/source-verify It takes a solidity metadata json and reconstructs the compiler options from it so that the source can be re-compiled.

About etherscan and swarm: I'm pretty sure they do not upload anything to swarm due to a simple reason: The swarm hash they provide is the same as the one in the bytecode. This hash is generated by the Solidity compiler form the metadata json. The problem is that the hash still uses an old format that is not supported by the current swarm network anymore (We are currently updating the hash, though.) and thus it would not really work to upload the data via that hash.

Just as @ligi says, I think it is more important to incentivize people to publish their source code. In the end it does not matter whether it is on swarm or somewhere else. It also does not matter whether solidity uses the correct swarm hash or not. It would be sufficient if it would be possible to retrieve the source code starting with just the "swarm hash" generated by the solidity compiler. Such a retrieval could just search the web for the hash, search a specialized github repository, etc.

@EnochMbaebie

This comment has been minimized.

Copy link

commented May 31, 2019

Sure ... let's work together. Let private chat the way forward.

great, thanks

@ligi

This comment has been minimized.

Copy link

commented Jun 4, 2019

Let private chat the way forward.

I think it would be better to do this publicly

@agutsal

This comment has been minimized.

Copy link

commented Jun 6, 2019

Is anyone working on that currently, @EnochMbaebie @ligi @SiddigZeidan @StevenJNPearce ?

@ligi

This comment has been minimized.

Copy link

commented Jun 6, 2019

I am not sure how actually anyone can work on "this" as assumption of this task

When a smart contract is verified in Etherscan it is automatically added to Swarm

was wrong as it seems

That said: we need to attack the problem! But the scope of the task I think should change how to get verification data distributed and decoupled from Etherscan. Ideas:

  • let remix/truffle upload it automagically to swarm and perhaps some kind of open registry
  • have badges that contracts that are verified this way can use to stand out and encourage it

I think any scraper is not striking at the root but at the branches of the problem - so perhaps it is even good that there is nothing to scrape currently.

@agutsal

This comment has been minimized.

Copy link

commented Jun 6, 2019

@ligi got you ;) thanks

@janus

This comment has been minimized.

Copy link

commented Jun 7, 2019

@janus

This comment has been minimized.

Copy link

commented Jun 10, 2019

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

⚡️ A tip worth 25.00000 DAI (25.0 USD @ $1.0/DAI) has been granted to @ligi for this issue from @igorbarinov. ⚡️

Nice work @ligi! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

⚡️ A tip worth 50.00000 DAI (50.0 USD @ $1.0/DAI) has been granted to @chriseth for this issue from @igorbarinov. ⚡️

Nice work @chriseth! To redeem your tip, login to Gitcoin at https://gitcoin.co/explorer and select 'Claim Tip' from dropdown menu in the top right, or check your email for a link to the tip redemption page.

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

⚡️ A tip worth 25.00000 DAI (25.0 USD @ $1.0/DAI) has been granted to @StevenJNPearce for this issue from @igorbarinov. ⚡️

Nice work @StevenJNPearce! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

⚡️ A tip worth 25.00000 DAI (25.0 USD @ $1.0/DAI) has been granted to @serapath for this issue from @igorbarinov. ⚡️

Nice work @serapath! To redeem your tip, login to Gitcoin at https://gitcoin.co/explorer and select 'Claim Tip' from dropdown menu in the top right, or check your email for a link to the tip redemption page.

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

⚡️ A tip worth 25.00000 DAI (25.0 USD @ $1.0/DAI) has been granted to @janus for this issue from @igorbarinov. ⚡️

Nice work @janus! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot

This comment has been minimized.

Copy link

commented Jun 13, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This Bounty has been completed.

Additional Tips for this Bounty:

  • igorbarinov tipped 25.0000 DAI worth 25.0 USD to janus.
  • igorbarinov tipped 25.0000 DAI worth 25.0 USD to serapath.
  • igorbarinov tipped 25.0000 DAI worth 25.0 USD to stevenjnpearce.
  • igorbarinov tipped 50.0000 DAI worth 50.0 USD to chriseth.
  • igorbarinov tipped 25.0000 DAI worth 25.0 USD to ligi.

@loredanacirstea

This comment has been minimized.

Copy link

commented Sep 3, 2019

For anyone reading & hoping for smart contracts data, I will post a link to a dump of 4000+ sources (Mainnet, 3 testnets): https://github.com/pipeos-one/pipeline/tree/master/data. If you have issues with the format, ping us on https://gitter.im/pipeos-one/pipeline.

As for incentivizing people to open source their data, I really suggest EthPM - they have a good specification for the package structure & are starting to be integrated in web3 tools (web3.py, Truffle already have support). I know Blockscout already expressed interest in supporting EthPM but I am leaving this here for others to see.

Resources:
http://ethpm.com
https://docs.ethpm.com/
https://gitter.im/ethpm/Lobby

@vbaranov

This comment has been minimized.

Copy link

commented Sep 4, 2019

@loredanacirstea We would love to integrate an EthPM into Blockscout blockchain explorer. I have some questions in order to find a better way of integration. Would love to hear your opinion:

  1. How to generate package manifest from Blockscout verified smart-contract?

For sure, we can generate it at our side using your specification but since you already provide Etherscan URI support https://docs.ethpm.com/uris#etherscan-uris, I suppose it would be very easy for you to add support of Blockscout URI because:

  • We also provide getsourcecode API endpoint for each Blocksout instance. Description and "Try it out" section can be found in API docs page, for instance, for Ethereum Mainnet is https://blockscout.com/eth/mainnet/api_docs. It is very similar to that Etherscan supplies.
  • Moreover, Blockscout, for now, has no limits on API usage - no need to provide API keys.
  • Blockscout supports 5 Mainnets and 7 Testnets. All of them are EVM compatible and have the unified structure of pages/APIs.

Wouldn't it benefit EthPM users if they will have an ability to generate package manifest from Blockscout URI? In addition, we can use that Blockscout URI support to generate package and upload it to the on-chain registry. What do you think?

  1. What strategy do you keep in mind, who should pay for the gas in order to upload the package to the on-chain registry?

a) Blockscout user pays for gas: we can make it optional on Blockscout verified contract page where any user can deploy the contract to EthPM registry with a single button. In this case, a user will pay for transaction
b) Blockscout pays for gas?
c) EthPM pays for gas?

In case of b) c) we can automatically deploy metadata to the on-chain registry with any verification event in Blockscout

  1. For some reasons, EthPM doesn't support any EVM chain. The support is restricted with these chains ids [1, 3, 4, 5, 42]. Is it possible to extend support to other chain IDs since Blockscout supports other chains too?
@loredanacirstea

This comment has been minimized.

Copy link

commented Sep 4, 2019

@vbaranov,
Pinging @njgheorghita (I'll also let him know in the EthPM gitter), because he is in charge of the EthPM project. I am just using EthPM in my Pipeline project and think it is a good solution to decentralizing contract data + package management.

There are libraries for creating EthPM manifests: https://docs.ethpm.com/ethpm-developer-guide/ethpm-core-libraries, e.g. https://github.com/ethpm/ethpm.js. For Pipeline, I ended up creating my own tools initially, but I will probably move to also use ethpm.js soon.

I imagine users should pay for the gas. Ideally, this registration should happen at deployment time, but having the EthPM option when verifying a contract also helps with spreading the knowledge.

Verifying with Blockscout from EthPM data (e.g. even from the manifest swarm/ipfs uri) would also be useful.

@njgheorghita

This comment has been minimized.

Copy link

commented Sep 5, 2019

Hey @vbaranov 👋

I suppose it would be very easy for you to add support of Blockscout URI because..

Yup! I've added an issue here.

Wouldn't it benefit EthPM users if they will have an ability to generate package manifest from Blockscout URI?

Definitely - however, there are some limitations. The packages generated by the ethPM-CLI are limited by what information is exposed via an explorer's API. For example, etherscan's verified contract API doesn't expose the deployment bytecode, which can be a very useful piece of information to include in a package. I'm not sure why this is the case, but if an explorer exposed fully-populated packages with all the available data, it would be seamless for any tooling to use that data.

Also, there are benefits to having packages easily available / multiple sources of package data. For example, a user wouldn't have to install/learn how to use the ethPM-CLI to generate the package - their tooling can simply call the blockscout API and have the package JSON blob returned and ready to go.

In addition, we can use that Blockscout URI support to generate package and upload it to the on-chain registry. . . What strategy do you keep in mind, who should pay for the gas in order to upload the package to the on-chain registry?

Honestly, I don't think Blockscout should be responsible for uploading any ethPM packages to an on-chain registry. It's totally up to you if you want to for special/important contracts, but if a user wants the security of persisting the data on-chain, they should pay the gas fee. If Blockscout was simply a package index and made the manifest JSON easily available to users/tooling (via an API endpoint and/or in-browser option) - then that would be immensely useful.

Is it possible to extend support to other chain IDs since Blockscout supports other chains too?

It's definitely possible - the ethPM spec supports any chain. However, extending support to all chains specifically in the ethPM CLI is not that high on the priority list.

Feel free to ping me anytime or drop by our gitter if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.