Skip to content

Latest commit

 

History

History
105 lines (72 loc) · 3.77 KB

HandsOnCourse.md

File metadata and controls

105 lines (72 loc) · 3.77 KB

Data Engineering Course: Building A Data Platform

Contents

What We Want To Do

  • Twitter data to predict best time to post using the hashtag datascience or ai

  • Find top tweets for the day

  • Top users

  • Analyze sentiment and keywords

Thoughts On Choosing A Development Environment

For a local environment you need a good PC. I thought a bit about a budget build around 1.000 Dollars or Euros.

Podcast Episode: #068 How to Build a Budget Data Science PC
In this podcast we look into configuring a sub 1000 dollar PC for data engineering and machine learning.
Watch on YouTube \ Listen on Anchor

A Look Into the Twitter API

Podcast Episode: #081 Twitter API Research
In this podcast we were looking into how the Twitter API works and how you get access to it.
Watch on YouTube

Ingesting Tweets with Apache Nifi

Podcast Episode: #082 Reading Tweets With Apache Nifi & IaaS vs PaaS vs SaaS
In this podcast we are trying to read Twitter Data with Nifi.
Watch on YouTube
Podcast Episode: #085 Trying to read Tweets with Nifi Part 2
We are looking into the Big Data landscape chart and we are trying to read Twitter Data with Nifi again.
Watch on YouTube

Writing from Nifi to Apache Kafka

Podcast Episode: #086 How to Write from Nifi to Kafka Part 1
I’ve been working a lot on the cookbook, because it’s so much fun. I gotta tell you what I added. Then we are trying to write the Tweets from Apache Nifi into Kafka. Also talk about Kafka basics.
Watch on YouTube
Podcast Episode: #088 How to Write from Nifi to Kafka Part 2
In this podcast we finally figure out how to write to Kafka from Nifi. The problem was the network configuration of the Docker containers.
Watch on YouTube

Apache Zeppelin

Install and Ingest Kafka Topic

Start the container:

docker run -d -p 8081:8080 --rm \
-v /Users/xxxx/Documents/DockerFiles/logs:/logs \
-v /Users/xxxx/Documents/DockerFiles/Notebooks:/notebook \
-e ZEPPELIN_LOG_DIR='/logs' \
-e ZEPPELIN_NOTEBOOK_DIR='/notebook' \
--network app-tier --name zeppelin apache/zeppelin:0.7.3

Processing Messages with Spark and SparkSQL

Visualizing Data

Switch Processing from Zeppelin to Spark

Install Spark

Ingest Messages from Kafka

Writing from Spark to Kafka

Move Zeppelin Code to Spark