In [1]:
import _init_paths

In [None]:
from IPython.display import Image 

# Step 1: Browse the Ocean Marketplace Using the Graphical User Interface

To work with Ocean Protocol you need to set up a digital wallet. First, create a metamask account. There are many guides for doing this available online (e.g. [here](https://docs.oceanprotocol.com/tutorials/metamask-setup/)). Switch from the Ethereum Mainnet to the Rinkeby Test Network from your metamask wallet. Rinkeby is where you can test with no real transaction fees. Instead we use Rinkeby tokens that can be requested from faucets. Ocean tokens are needed to purchase datasets on the Ocean Marketplace. You can request Rinkeby Ocean from the Ocean faucet [here](https://faucet.rinkeby.oceanprotocol.com/). Making transactions on Ocean Marketplace (e.g. purchasing a dataset) also costs gas in ETH. We can request Rinkeby ETH from the faucet [here](https://faucet.rinkeby.io/) (you will need to make a tweet). 

Now that we have some Rinkeby Ocean and ETH in our wallet, we can browse and purchase datasets on the [Ocean Marketplace](https://market.oceanprotocol.com/). When you enter a Web3 app in the browser you may need to sign in with your wallet. Make sure that you are browsing datasets on the Rinkeby network (see image below). 

In [None]:
display(Image(filename='images/marketplace-network.png', width = 400))

Check out the available datasets by Algovera [here](https://market.oceanprotocol.com/search?sort=created&sortOrder=desc&text=0x2338e4e94AEe1817701F65f2c751f7c844b0e43b). For this tutorial, we will work with the CryptoPunks image dataset. While the images for these NFTs are freely available online, we have uploaded it as a private dataset to practice the workflow. In future, we hope that the use of private datasets with generative art models opens up new use cases, such as collaborations between artists who don't want to lose control of their datasets and models. You can see the dataset on the Ocean marketplace [here](https://market.oceanprotocol.com/asset/did:op:C9D0568838fa670baEe7195Ea443b32EfCAc2281). 

In the traditional data science workflow, a data scientist downloads a dataset locally before running their code on it. In this scenario, the data comes to the code running on it. In contrast, private datasets on the marketplace cannot be downloaded. Instead, a data scientist can send code to the data itself where it runs the computations before returning the results. This is called Compute-to-Data (C2D), which is similar to Federated Learning. On the Ocean Marketplace, data providers should provide a sample of the data to give an idea of the quality of the data as well as the data interface through which it can be accessed. 

Download the sample data for CryptoPunks through the Marketplace GUI and inspect it (always make sure to only download samples from data providers that you trust!). 

In [None]:
display(Image(filename='images/download-sample.png', width = 400))

# Step 2: Browse the Ocean Marketplace Using the Ocean Python Library

Now lets do the same through the Ocean Python library. We have installed the library for you in the JupyterHub instance. If you need to do this yourself in future, it's simple (you can view the readme [here](https://github.com/oceanprotocol/ocean.py)). 

We need to connect to the Ethereum network via an Ethereum node. We have set the config parameters for you in a config file. We are currently using [Infura](https://infura.io) for this but will be migrating to a full Ethereum Erigon node asap for increased decentralization. 

In [None]:
from ocean_lib.ocean.ocean import Ocean
from ocean_lib.config import Config

config = Config('config.ini')
ocean = Ocean(config)

print(f"config.network_url = '{config.network_url}'")
print(f"config.metadata_cache_uri = '{config.metadata_cache_uri}'")
print(f"config.provider_url = '{config.provider_url}'")

Next, export your private key from your metamask wallet. We highly recommend doing this with a wallet that has no real tokens in it (only Rinkeby tokens). For more info on private keys, see [this](https://github.com/oceanprotocol/ocean.py/blob/main/READMEs/wallets.md) from the ocean.py documentation: 

*The whole point of crypto wallets is to store private keys. Wallets have various tradeoffs of cost, convienence, and security. For example, hardware wallets tend to be more secure but less convenient and not free. It can also be useful to store private keys locally on your machine, for testing, though only with a small amount of value at stake (keep the risk down). Do not store your private keys on anything public, unless you want your tokens to disappear. For example, don't store your private keys in GitHub or expose them on frontend webpage code.*

With this in mind, you can directly load your private key into the notebook. We use an envvar rather than storing it in code that might be pushed to a repo. We copy this in for a new session (you may need to restart the notebook server). Here's how we export an environmental variable using an example key (replace this with your actual private key.). From your console:

```console
export MY_TEST_KEY=0xaefd8bc8725c4b3d15fbe058d0f58f4d852e8caea2bf68e0f73acb1aeec19baa
```

Now initialize your wallet:

In [None]:
import os
from ocean_lib.web3_internal.wallet import Wallet

wallet = Wallet(ocean.web3, private_key=os.getenv('MY_TEST_KEY'), transaction_timeout=20, block_confirmations=config.block_confirmations)

print(f"public address = '{wallet.address}'")

This should print out the public key of your metamask wallet. Check that it matches the one displayed in your metamask. Let's check the balances in our wallet. These should match the amount you received from the faucets (minus any you've since spent).

In [None]:
from ocean_lib.web3_internal.currency import from_wei # wei is the smallest denomination of ether e.g. like cents
from ocean_lib.models.btoken import BToken #BToken is ERC20
OCEAN_token = BToken(ocean.web3, ocean.OCEAN_address)

print(f"ETH balance = '{from_wei(ocean.web3.eth.get_balance(wallet.address))}'")
print(f"OCEAN balance = '{from_wei(OCEAN_token.balanceOf(wallet.address))}'")

Now let's download a dataset. For the CryptoPunks Image dataset [here](https://market.oceanprotocol.com/asset/did:op:C9D0568838fa670baEe7195Ea443b32EfCAc2281), copy the decentralized identifier (DID).  

In [None]:
display(Image(filename='images/did.png', width = 400))

In [None]:
did = "did:op:e772c8585ad9916eD677320078748DD1cA827BB2"
asset = ocean.assets.resolve(did)

print(f"Data token info = '{asset.values['dataTokenInfo']}'")
print(f"Dataset name = '{asset.metadata['main']['name']}'")

We can get the URL to the sample data from the associated metadata:

In [None]:
from pathlib import Path
sample_link = asset.metadata['additionalInformation']['links'][0]['url']
ID = Path(sample_link).parts[4]
print(f"Sample link = '{sample_link}'")

Download the data from the command line (TO DO: fix+streamline this. For now, the sample is downloaded manually in the JupyterHub instance):

In [None]:
import gdrivefs
# use this the first time you run
token = 'browser'
# use this on subsequent attempts
#token = 'cache'

# shareable link to folder generated with
# https://drive.google.com/open?id=1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx
root_file_id = '1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx'

gdfs = gdrivefs.GoogleDriveFileSystem(token=token, root_file_id=root_file_id)
gdfs

In [None]:
package = gdrivefs
for importer, modname, ispkg in pkgutil.iter_modules(package.__path__):
    print("Found submodule %s (is a package: %s)" % (modname, ispkg))

In [None]:
root_file_id = '1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx'

gdfs = gdrivefs.GoogleDriveFileSystem(token=token, root_file_id='1FQzXM2E28WF6fV7vy1K7HdxNV-w6z_Wx')

In [None]:
download_path

In [None]:
download_dir = Path('data')
dataset_name = "punks-sample"
download_path = str(download_dir / (dataset_name))
if not download_dir.exists():
    download_dir.mkdir(parents=True)

In [None]:
!pwd

In [None]:
!gdown --id {ID} -O {download_path}

Unzip the downloaded file with:

In [None]:
!tar -xvzf {download_path} -C {str(download_dir)}

Now lets inspect the sample data. The data provider should provide this in the same format as the whole dataset. This helps us as data scientists to write scripts that run on both the sample data and the whole dataset. We call this the **interface** of the data. 

In [None]:
sample_dir = download_dir / dataset_name
print(f"Sub-directories: {sorted(list(sample_dir.glob('*')))}")

We have punks with clear backgrounds and punks with teal backgrounds.

In [None]:
clear_dir, teal_dir = sorted(list(sample_dir.glob('*')))

In [None]:
print(f"Images with clear backgrounds: {sorted(list(clear_dir.glob('*')))}")
print(f"Images with clear backgrounds: {sorted(list(teal_dir.glob('*')))}")

In [None]:
clear_images = sorted(list(clear_dir.glob('*')))
teal_images = sorted(list(teal_dir.glob('*')))

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img0 = mpimg.imread(clear_images[0])
img1 = mpimg.imread(clear_images[1])
fig, ax = plt.subplots(1,2)
ax[0].imshow(img0)
ax[1].imshow(img1)
[a.axis('off') for a in ax]
plt.show()

In [None]:
img0 = mpimg.imread(teal_images[0])
img1 = mpimg.imread(teal_images[1])
fig, ax = plt.subplots(1,2)
ax[0].imshow(img0)
ax[1].imshow(img1)
[a.axis('off') for a in ax]
plt.show()

The next step is to write some code to convert the raw data into a format that runs with StyleGAN2. We could write this on the sample data before sending it to run on the full dataset. 