# Loading Northwinds Data

### Introduction

In this lesson, we'll load some of our northwinds data from amazon S3 into snowflake.  Let's get started. 

### Getting started

We have uploaded sample hubspot data and northwinds data to s3.

<img src="./northwinds.png" width="100%">

And we can access these files at the following urls:

* `s3://jigsaw-labs-student/northwinds/northwinds_hubspot.csv`
* `s3://jigsaw-labs-student/northwinds/northwinds_mixpanel.csv`

And remember that our API keys to read from these files is the following:

* KEY_ID: `'AKIARIMMA5YSLC62OGJ4'`
* SECRET: `'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'`

Now use that information to create a stage for our hubspot data.  

> Make sure to create the stage in the database that DBT has access to.

Then let's see how we did by selecting first six columns from that stage. 

> We should see something like the following:

<img src="./select-data.png" width="100%">

If we look at the last column, we can see that this data is a little off.  The issue is that some of our business names have commas in it -- like "Fadel, Luilwitz and Nitzsche" -- and snowflake is reading these as separate values.  We can tell snowflake to not parse comas within the quotation marks by adding a FILE_FORMAT option of `FIELD_OPTIONALLY_ENCLOSED_BY='"'` to our stage, like so. 

```sql
CREATE or replace STAGE ...
FILE_FORMAT = (FIELD_OPTIONALLY_ENCLOSED_BY='"');
```

So add this and then we should see that only have five columns that look like the following:

<img src="./hubspot-select.png" width="100%">

Ok, now we're ready to load our data into a table.  Create a schema called `hubspot` and a table called `contacts` with five columns for each of the columns above.

Now it's time to copy in the data.  We can accomplish this without allowing for any errors -- so do not include a `ON_ERROR=CONTINUE` parameter.  Instead with the file format, we need to provide parameters to both skip the first row, and the `FIELD_OPTIONALLY_ENCLOSED_BY='"'` parameter.

Then if you select the first five rows from the table, you should see something like the following:

<img src="./first-five-rows.png" width="100%">

When that is accomplished, we're ready to move onto loading our mixpanel data.

### Mixpanel Data

Now let's create a stage for our mixpanel data.

When it's properly loaded, we should be able to perform the following select statement to see our loaded data.

```sql
select 
  c.$1,
  c.$2,
  c.$3,
  c.$4,
  c.$5,
  c.$6,
  c.$7,
  c.$8,
  c.$9,
  c.$10,
  c.$11,
  c.$12,
  c.$13,
  c.$14,
  c.$15,
from @mixpanel_stage as c;
```

Ok, next it's time to create the relevant table.  We did this for you as well.

```sql
CREATE TABLE "FIVETRAN_DATABASE"."MIXPANEL"."CONTACTS" (
  distinct_id varchar(100),
  created varchar(100),
  email varchar(100),
  first_name varchar(100),
  last_name varchar(100),
  abandon_cart_count float,
  account_created_count float,
  Gender varchar(100),
  registration_date varchar(100),
  city varchar(100),
  region varchar(100),
  last_event timestamp,
  last_purchase timestamp,
  last_search timestamp,
  last_share timestamp
);
```

Next *try* to copy over the data into our newly created table.  You should get the following error message.

<img src="./error-load.png" width="100%">

The issue is that the data in the csv file is not in a format that snowflake can recognize.  There are a couple of ways to handle this.  

1. Transform before load 

We could transform the data before we load it into our table.  Look at the following documentation to see how. 

<img src="./snowflake-doc.png" width="100%">

2. Transform after load

Or we can transform the data after loading it into our database.  That's what we'll do.  

So to easily load our data into the contacts table, drop the contacts table, and then change the timestamp columns to be of type `varchar(100)`.

```sql
CREATE or replace TABLE "FIVETRAN_DATABASE"."MIXPANEL"."CONTACTS" (
  distinct_id varchar(100),
  created varchar(100),
  email varchar(100),
  first_name varchar(100),
  last_name varchar(100),
  abandon_cart_count float,
  account_created_count float,
  gender varchar(100),
  registration_date varchar(100),
  city varchar(100),
  region varchar(100),
  last_event varchar(100),
  last_purchase varchar(100),
  last_search varchar(100),
  last_share varchar(100)
);
```

Now, let's load in the data again, and this time we should not see any errors.

<img src="./errors-seen.png" width="80%">

And then let's select the first couple of rows from our table to take a look at our data.

<img src="./selected-mix.png" width="80%">

At this point, it's probably a good idea to confirm that we can access the newly created tables from DBT.

> <img src="./load-from-dbt.png" width="80%">

### Summary

In this lesson, we practiced loading data into our database using staging.  To initially load our timestamp data -- which snowflake had a hard time interpreting -- we reduced the restrictions on that data by changing the data type to be of type string.

### Resources

[Transform timestamp](https://docs.snowflake.com/en/user-guide/data-load-transform.html#current-time-current-timestamp-default-column-values)

[Snowflake copy into](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html)