# How to Ingest JSON Data from Public Endpoint

This example demonstrates how you can download data from a public endpoint and transform it into a Snowpark Dataframe and save the results into a table in Snowflake.

**Note:** Running this notebook require that you have ACCOUNTADMIN or SECURITYADMIN roles to create new network rules.

In [None]:
USE ROLE ACCOUNTADMIN

In [None]:
CREATE or replace TABLE bike_riders (
  timestamp STRING,
  northbound NUMBER,
  southbound NUMBER
);

By default, Snowflake restricts network traffic from requests from public IP addresses. In order to access external data, we first need to create an [external access integration](https://docs.snowflake.com/en/developer-guide/external-network-access/creating-using-external-network-access#label-creating-using-external-access-integration-access-integration) to add `data.seattle.gov` as an allowed endpoint.

In [None]:
CREATE OR REPLACE NETWORK RULE SEATTLE_OPEN_DATA_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('DATA.SEATTLE.GOV');

In [None]:
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION SEATTLE_OPEN_DATA_INTEGRATION
ALLOWED_NETWORK_RULES = (SEATTLE_OPEN_DATA_RULE)
ENABLED=TRUE;

Next, we create a user-defined function (UDF) that allows users to connect outside of Snowflake and fetch the data from the remote endpoint. We attach the external access object that we created earlier to the UDF so that it has permission to access the allowed network. Read more about using external access integration in a UDF or procedure [here](https://docs.snowflake.com/en/developer-guide/external-network-access/creating-using-external-network-access#using-the-external-access-integration-in-a-function-or-procedure).



The external function uses the `requests` library in Python to get the JSON response from the URL.

In [None]:
CREATE OR REPLACE FUNCTION FETCH_ENDPOINT(URL STRING)
returns string
language python
runtime_version=3.8
handler = 'fetch_and_transform_data'
external_access_integrations=(SEATTLE_OPEN_DATA_INTEGRATION)
packages = ('requests')
as
$$
import requests
import _snowflake
session = requests.Session()
def fetch_and_transform_data(url):
    response = requests.get(url)
    data = response.json()
    # Perform data transformation here
    return data
$$;

Now we can call the external function on [this URL](https://data.seattle.gov/resource/65db-xm6k.json), we see the JSON string returned as output:

In [None]:
SELECT FETCH_ENDPOINT('https://data.seattle.gov/resource/65db-xm6k.json')

Next, we want to insert the JSON into the `bike_riders` table. We use Snowflake's [`PARSE_JSON`](https://docs.snowflake.com/en/sql-reference/functions/parse_json) function to process the data. 

Furthermore, we use the `::` operator to extract the value of the JSON field to the desired data type (STRING, NUMBER). Read more about how to work with semi-structured data in Snowflake [here](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#using-values-in-a-variant).

In [None]:
insert into bike_riders
with json_blob as 
(select parse_json(fetch_endpoint('https://data.seattle.gov/resource/65db-xm6k.json')) AS json_arr)
select 
   value:date::STRING AS date,
   value:fremont_bridge_nb::NUMBER AS northbound,
   value:fremont_bridge_sb::NUMBER AS southbound
from json_blob, TABLE(FLATTEN(input => json_arr))

Now that the table is loaded, we can use SQL to preview the data: 

In [None]:
select * from bike_riders

Alternatively, we can also load this table into a Snowpark Dataframe to work with your data in Python.

In [None]:
from snowflake.snowpark.context import get_active_session
session = get_active_session()
# Add a query tag to the session. This helps with troubleshooting and performance monitoring.
session.query_tag = {"origin":"sf_sit-is", 
                     "name":"notebook_demo_pack", 
                     "version":{"major":1, "minor":0},
                     "attributes":{"is_quickstart":1, "source":"notebook", "vignette":"public_json"}}
df = session.table("bike_riders")
df

In [None]:
# Compute descriptive statistics for overview
df.describe()

We can also convert our Snowpark DataFrame to pandas and operate on it with pandas.

In [None]:
pandas_df = df.to_pandas()

In [None]:
import pandas as pd
pandas_df["TIMESTAMP"] = pd.to_datetime(pandas_df["TIMESTAMP"])

Now, we can visualize the `TIMESTAMP` column by plot a histogram distribution of hours.

In [None]:
import altair as alt 
hours = pd.DataFrame(pandas_df["TIMESTAMP"].dt.hour)
alt.Chart(hours).mark_bar().encode(
    alt.X("TIMESTAMP:Q",bin = True),
    y = 'count()',
)

### Conclusion

In this example, we demonstrated how you can create an external access integration and attach it to a UDF that loads data from a public endpoint. We also showed how you can load semi-structured JSON data into a Snowflake table and work with it using SQL or Python. To learn more about external network access to Snowflake, refer to the documentation [here](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview).