# Loading Data With Staging

### Introduction

In this lesson, we'll see how we can load data into our snowflake instance -- through staging -- and from there, loading data into a table in snowflake.

### Loading Data to Staging

Now we already have a loaded a csv file for movies in AWS S3.

> <img src="./aws-s3.png" width="100%">

And we have also created for you, a user in AWS that has access to this s3 data.  To load data into our snowflake instance the first step is to create a *stage*.

### Viewing the stage UI

In snowflake, staging is where we can store raw data before it is loaded into a snowflake table.  Storing the data in staging first allows us to potentially only select certain rows to move into a table, or to transform that data before loading it into a table.

Ok, so let's see how we can create a stage.

We can get to the user interface for staging by clicking on the databases dashboard.

> <img src="./db-dashboard.png" width="60%">

From there, we can click on the database that we want to create the stage for, and from there can click on the panel for stages.

> For this lesson, we can use the `labs_db` database.

> <img src="./stage-panel.png" width="100%">

Now, if you click on the create button to the left under `stage`, from there click on the option for `aws s3`.

> <img src="./stage-s3.png" width="100%">

Then we can fill in the following information, and after the form is filled out, click on `Show SQL`.

> <img src="./imdb_stage.png" width="100%">

The information is the following:
    
* Name: `imdb_stage`
* Schema name: `IMDB`
* URL: `s3://jigsaw-labs-student/imdb_movies.csv`
* AWS Key ID: `AKIARIMMA5YSLC62OGJ4`
* AWS Secret ID: `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`

Now once we click on Show SQL, next up, we can copy that SQL into the worksheet.  

> Do not yet run that SQL -- we'll do so below.

### Creating the Stage in Snowflake

Now before we create the stage, we first need to go to a snowflake worksheet and specify that we are using the correct database.

```sql
Use database labs_db;
```

And from there we can create the stage using the SQL we generated above.

> Notice that we have to fill in the AWS secret key.

```sql
CREATE STAGE "LABS_DB"."IMDB".imdb_stage URL = 's3://jigsaw-labs-student/imdb_movies.csv'
CREDENTIALS = (AWS_KEY_ID = 'AKIARIMMA5YSLC62OGJ4' AWS_SECRET_KEY = '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');
```

Once our stage is created, we can then find some information about the stage. We can reference the stage with by preceding the stage name with an @ symbol.  
The first we can do, is see some metadata about the stage with the following:

```sql
list @imdb.imdb_stage;
```

> <img src="./stage-meta.png" width="100%">

And from there, we can even select data from the stage.

```sql
select 
  c.$1 as title,
  c.$2 as genre,
  c.$3 as budget,
  c.$4 as runtime,
  c.$5 as year,
  c.$6 as month,
  c.$7 as revenue
from @imdb.imdb_stage as c;
```

<img src="./access-data.png" width="80%">

> So above, we reference the stage with the `@` symbol, alias it as `c`, and specify each column to select by their index.

So now that we see that we have successfully create a stage in snowflake, the next step is to copy this data into a table.

### Copying data into a table

Before we can copy data into a table, we first need to create the table.  Now that we had a peak at the data, we can have a sense of what our table should look like.  We can create our table with the following:

```sql
create table "LABS_DB"."IMDB"."movies" (
  title varchar (100),  
  genre string,
  budget integer,
  runtime integer,
  year integer,
  month integer,
  revenue integer
);
```

> Notice that our table does not have an id column.  This is because we need the number of columns in our table to match the column number in our CSV file.

Next we can copy over the data with the following:

```sql
COPY INTO "LABS_DB"."IMDB"."movies" FROM @imdb.imdb_stage
file_format = (skip_header = 1);
```

In the second line, we are specifying an option for file format, to skip one line at the top for the header -- as the first row in the csv file had our column names.

Then if we run the command we'll see that our data is not properly formatted.

<img src="./copy-into-error.png" width="100%">

At this point, we have the option of properly cleaning the data before loading it into our database.  But snowflake, also mentions an `ON_ERROR` `CONTINUE` option, which will skip over our poorly formatted data.  

Let's take advantage of simply skipping over our errors for now.  

We can specify that parameter with the following:

```sql
COPY INTO "LABS_DB"."IMDB"."movies" FROM @imdb.imdb_stage
ON_ERROR=CONTINUE
file_format = (skip_header = 1);
```

> <img src="./load-results.png" width="100%">

And from there, can select our data directly from our table.

<img src="./select-table.png" width="80%">

### Summary

In this lesson, we saw how to create stages in snowflake.  We did so by creating an S3 stage, and generating the SQL through the UI.

Then we created the stage, and viewed some of the data in our stage.

```sql
CREATE STAGE "LABS_DB"."IMDB".imdb_stage URL = 's3://jigsaw-labs-student/imdb_movies.csv'
CREDENTIALS = (AWS_KEY_ID = 'AKIARIMMA5YSLC62OGJ4' AWS_SECRET_KEY = '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');
```

```sql
select 
  c.$1 as title,
  c.$2 as genre,
  c.$3 as budget,
  c.$4 as runtime,
  c.$5 as year,
  c.$6 as month,
  c.$7 as revenue
from @imdb.imdb_stage as c;
```

From there, we created the relevant table, and copied data into our stage. 

```sql
COPY INTO "LABS_DB"."IMDB"."movies" FROM @imdb.imdb_stage
ON_ERROR=CONTINUE
file_format = (skip_header = 1);
```