Skip to content

Snowflake-Labs/sfguide-twitter-auto-ingest

Repository files navigation

Snowflake Guide: Auto-Ingest Twitter Data into Snowflake

➡️ Complete this end-to-end tutorial on guides.snowflake.com

This demo shows how to auto-ingest streaming and event-driven data from Twitter into Snowflake using Snowpipe. By completing this demo you will have built a docker image containing a python application that listens and saves live tweets; those tweets are uploaded into Snowflake using AWS S3 as a file stage.

The lessons learned in demo can be applied to any streaming or event-driven data source.

The core topics covered in this demo include:

  1. Data Loading: Load Twitter streaming data in an event-driven, real-time fashion into Snowflake with Snowpipe
  2. Semi-structured data: Querying semi-structured data (JSON) without needing transformations
  3. Secure Views: Create a Secure View to allow data analysts to query the data
  4. Snowpipe: Overview and configuration

Architecture:

Twitter to Snowflake Auto-Ingest Architecture

INSTRUCTIONS

PREREQUISITES

You will need:

SETUP SCRIPT

1. Download the repository

clone this repository locally

git clone https://github.com/Snowflake-Labs/demo-twitter-auto-ingest

navigate to the repository you just cloned:

cd demo-twitter-auto-ingest

2. Add your AWS and Twitter keys

Use your text editor of choice to edit the following files:

  • Dockerfile (lines 9 to 16)
  • 0_setup_twitter_snowpipe.sql (lines 23 to 25)

As you will be able to see in the files, you will also need to specify your AWS S3 bucket (where the data will be stored) and a default search keyword.

3. Build the image

  1. While in your demo-twitter-auto-ingest directory run:
docker build . -t snowflake-twitter

This command builds the Dockerfile in the current directory, and tags the built image as snowflake-twitter.

The last two lines of the output should look similar to the following:

Successfully built c1c0b7262436
Successfully tagged snowflake-twitter:latest

Note: In the above example, c1c0b7262436 is the image id - yours will likely be different.

4. Run the image

$ docker run --name <YOUR_CONTAINER_NAME> snowflake-twitter:latest <YOUR_TWITTER_KEYWORD>

Example (searching for #wednesdaymotivation):

$ docker run --name twitter-wednesdaymotivation snowflake-twitter:latest wednesdaymotivation

At this point you should be able to see the tweets coming in... (every . represents two tweets)

5. Configure Snowpipe in Snowflake

  • Log into your Snowflake demo account and load the 0_setup_twitter_snowpipe.sql script (edited at point 2).
  • Execute the script one statement at a time.
  • Make sure to configure event notifications in AWS S3 as described here.

6. Stop your container

Once you have finished with the setup, it's important that you stop your container in order not to reach your Twitter API rate limits.

Go back to Terminal, open a new Terminal tab (you can use the shortcut ⌘T) and execute the following command:

docker stop <YOUR_CONTAINER_NAME>

Note: the container has a "safety" timeout of 15 minutes.

About

Learn how to auto-ingest streaming data into Snowflake using Snowpipe.

Topics

Resources

License

Stars

Watchers

Forks