tweets-etl

ETL process for loading tweets from suspicious users into DynamoDB.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development purposes.

Prerequisites

Download the data,

https://s3.amazonaws.com/astscrmltestdata/RealUsers.tar.gz

Decompress it. You will find a single directory which contains 25,000 files, each of the files represents a Twitter user and all its tweets. Each of these files is a single text file that is compressed. Each line in the decompressed files represents a tweet, and is a valid JSON. The contents of the JSON are best described in the Twitter API’s website for the Tweet and User objects.

https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object.html

Create a "Data" directory in the root of this repository and move the 25,000 .gz files there. The code will decompress them for you, no worries.

Installing

Intall Docker Engine (https://www.docker.com/products/docker-engine)

Run the following commands from the Docker console

This command will build the Docker image.

docker build -t tweets-etl .

Run the container from the image created

docker run tweets-etl

Built With

Boto3 - The AWS SDK for Python

Authors

Carlos Montenegro - Initial work - carlos2606

See also the list of contributors who participated in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Dockerfile		Dockerfile
README.md		README.md
load.py		load.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile

Dockerfile

README.md

README.md

load.py

load.py

Repository files navigation

tweets-etl

Getting Started

Prerequisites

Installing

Built With

Authors

About

Releases

Packages

Languages

carlos2606/tweets-etl

Folders and files

Latest commit

History

Repository files navigation

tweets-etl

Getting Started

Prerequisites

Installing

Built With

Authors

About

Resources

Stars

Watchers

Forks

Languages