# Setting up External Access for Snowflake Notebooks

By default, Snowflake restricts network traffic from requests from external endpoints. In order to access external endpoints, we first need to create an [external access integration](https://docs.snowflake.com/en/developer-guide/external-network-access/creating-using-external-network-access#label-creating-using-external-access-integration-access-integration). This example demonstrates how you can access an external endpoint to work with external services in your Snowflake Notebooks.

**Requirements:** 

- Running this notebook require that you have ACCOUNTADMIN role to create external access integrations.
- Please add the `transformers` and `pytorch` package from the package picker on the top right. We will be using these packages in the notebook.

To access the external access configuration UI, click on the `⋮` button on the top right, then navigate to `Notebook settings` and the `External access` pane. Here, you will see a list of external access integrations that is available to you. You can toggle each integration on and off in the UI. 

## Provisioining External Access Integration

External access integrations, alongside their underlying network rules, need to be created and provisioned by an organization admin. 

There are two steps in creating an external access integration for notebooks.

1. Create a network rule to define a set of IP addresses or domains via [CREATE NETWORK RULE](https://docs.snowflake.com/en/sql-reference/sql/create-network-rule).

2. Create an external access integration to specify the allowed list of network rules via [CREATE EXTERNAL ACCESS INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration).

You can find a list of external network access examples [here](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-examples). Below we demonstrate how admins can create the external access integration for several common data science and machine learning use cases.

## Example 1: Load sample dataset from Github

Let say that you are trying to load a sample CSV file from a public Github repo.
```python
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/altair-viz/vega_datasets/master/vega_datasets/_data/stocks.csv")
```

Without the external access setup, you will get the following error:

```
URLError: <urlopen error [Errno 16] Device or resource busy>
```

To resolve this issue, you can toggle the `GH_ACCESS_INTEGRATION` on if it is available in your external access setup UI. 

If the Github access integration is not available on your UI, run the following SQL *in a separate SQL worksheet* to create external access integration and network rule.

```sql
-- Create the Github external access integration and the network rule it relies on.
CREATE OR REPLACE NETWORK RULE gh_network_rule
  MODE = EGRESS
  TYPE = HOST_PORT
  VALUE_LIST = ('raw.githubusercontent.com', 'github.com');

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION gh_access_integration
  ALLOWED_NETWORK_RULES = (gh_network_rule)
  ENABLED = true;
```

Once you run the commands in the SQL worksheet, make sure to restart the notebook session. Now, you should see the `GH_ACCESS_INTEGRATION` in the external access UI and the following code can now run without error.

In [None]:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/altair-viz/vega_datasets/master/vega_datasets/_data/stocks.csv")

## Example 2: Accessing a pre-trained model from HuggingFace

Let say that you are use a pre-trained model from HuggingFace to perform some NLP tasks.
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('Snowflake/snowflake-arctic-embed-xs')
model = AutoModel.from_pretrained('Snowflake/snowflake-arctic-embed-xs', add_pooling_layer=False)
```

Without the external access setup, you will get the following error:

```
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like Snowflake/snowflake-arctic-embed-xs is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
```

To resolve this issue, you can toggle the `HF_ACCESS_INTEGRATION` on if it is available in your external access setup UI. 

If the HuggingFace access integration is not available on your UI, run the following SQL *in a separate SQL worksheet* to create external access integration and network rule and attach it to the notebook.

```sql
-- Create the HuggingFace external access integration and the network rule it relies on.
CREATE OR REPLACE NETWORK RULE hf_network_rule
  MODE = EGRESS
  TYPE = HOST_PORT
  VALUE_LIST = ('huggingface.co');

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION hf_access_integration
  ALLOWED_NETWORK_RULES = (hf_network_rule)
  ENABLED = true;
```

Once you run the commands in the SQL worksheet, make sure to restart the notebook session. 

Now, you should see the `HF_ACCESS_INTEGRATION` in the external access UI and the following code can now run without error.

In [None]:
import os
os.environ['HF_HOME'] = '/tmp/'

In [None]:
import torch
from transformers import AutoModel, AutoTokenizer
# Load the pre-trained model (https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)
tokenizer = AutoTokenizer.from_pretrained('Snowflake/snowflake-arctic-embed-xs')
model = AutoModel.from_pretrained('Snowflake/snowflake-arctic-embed-xs', add_pooling_layer=False)

In [None]:
# Set the model in evaluation model
model.eval()

Now that we have the pre-trained model, we can use it to score the relevance of the document given the query.

In [None]:
query_prefix = 'Represent this sentence for searching relevant passages: '
queries  = ['What is Snowflake?', 'Where can I get the best tacos?']
queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)

documents = ['The Data Cloud!', 'Mexico City, of course!']
document_tokens =  tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)

# Compute token embeddings
with torch.no_grad():
    query_embeddings = model(**query_tokens)[0][:, 0]
    doument_embeddings = model(**document_tokens)[0][:, 0]


# normalize embeddings
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
doument_embeddings = torch.nn.functional.normalize(doument_embeddings, p=2, dim=1)

scores = torch.mm(query_embeddings, doument_embeddings.transpose(0, 1))
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    #Output passages & scores
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

### Conclusion

- In this example, we demonstrated how you set up external access integration to access an external endpoint. 
- To load data from an external endpoint, you can also create an external access integration and attach it to a UDF. Check out [this tutorial](https://github.com/Snowflake-Labs/snowflake-demo-notebooks/blob/main/Ingest%20Public%20JSON/Ingest%20Public%20JSON.ipynb) to learn how you can load semi-structured JSON data from a public endpoint into a Snowflake table.
- To learn more about external network access to Snowflake, refer to the documentation [here](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview).