Full-Text Search System to stream, collect, clean, store and filter data collected from different sources using Docker, Kafka, Elasticsearch, Kibana, Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch, Twitter HBC, Jsoup and Angular
Designing and implementing a system to stream, capture, clean, store data, and allows users to make full-text search and filter data collected from different sources (social media, world wide web). It has the following services:
-
Producer service to extract data from different sources (social media and web), clean, send it to Kafka producer and provide APIs to configure and launch the streaming for social media (twitter) and Crawling the web sites using Java, Spring Boot, Spring Kafka and Maven.
-
Consumer service is a Kafka Consumer to read messages from Kafka, process and index them into Elasticsearch and provide APIs for full-text searching and filtering data from Elasticsearch using Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch and Maven.
-
FrontEnd app is a simple UI application using Angular to configure and launch the real-time social media (twitter) streamer and web crawler. It also provide full-text search on the collected data in a simple way.
The next diagram shows the system architecture
- Tha admin of the system configure and launch the real-time social media streaming (Twitter for this case study) and web crawling for any other web site using simple UI app. The streaming data send to Kafka produce.
- The Kafka consumer consumes data and the consumer service converting it to Elasticsearch.
- The Elasticsearch receives data from Kafka to index and store it.
- Tha admin can use Kibana to visualize, monitor and manage data.
- The user can use a simple UI search app to make full-text search and filter the collected data.
This project is created using the following technologies:
-
Java 8
-
Maven Dependency Management
-
Spring Boot:
- Spring Web
- Spring Kafka
- Spring Data Elasticsearch
- Spring Actuator
-
Twitter hbc
-
Jsoup
-
Docker
-
Apache Kafka and Apache Zookeeper (required for Kafka)
-
Elasticsearch
-
Kibana
-
Angular
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
You need to install the following software:
- Java JDK 1.8+
- Maven 3.0+
- Git client
- Docker Compose: To install docker-compose
- Twitter API credentials: Set up an Twitter application account and get Twitter app credentials from https://apps.twitter.com/. For help: How to create a Twitter application account
- Angular 6+
To run this project, install it locally as follow:
-
Clone the application
git clone https://github.com/SayedBaladoh/Full-Text-Search-using-Docker-Kafka-Elasticsearch-Kibana-Java-Spring-Boot-HBC-Jsoup-Angular.git
-
Change twitter configuration with your API Key, API Secret Key, Access Token and Access Token Secret
- open
producer/src/main/resources/application.yml
file - change twitter
auth
properties:
social: twitter: auth: apiKey: API_KEY apiSecretKey: API_SECRET_KEY accessToken: ACCESS_TOKEN accessTokenSecret: ACCESS_TOKEN_SECRET
- open
-
Start the Kafka, Zookeper, Elasticsearch and Kibana using docker-compose
The project includes a docker-compose.yml file so you can use Docker Compose to start up them, no installation needed.
cd solution_directory docker-compose up -d
-
Check if Kafka, Zookeper, Elasticsearch and Kibana is running
From command prompt:
docker ps -a
You should see the following result:
You can even make other checks in order to make sure your Kibana and elasticsearch are running. Open your internet browser and use the following URLs:
- http://localhost:9200/ (Elasticsearch)
- http://localhost:5601/ (Kibana)
-
Run Producer service application
You can start the producer service by typing the following command
cd producer mvn spring-boot:run
The producer service will start on port
8081
, So you'll be able to visit it under addresshttp://localhost:8081
.-
http://localhost:8081/producer/actuator/info (To view
info
aboutproducer
service) -
http://localhost:8081/producer/actuator/health (To
Check Health
forproducer
service)
-
-
Run the Consumer service application
You can start the consumer service by typing the following command
cd consumer mvn spring-boot:run
The consumer service will start on port
8082
, So you'll be able to visit it under addresshttp://localhost:8082
.-
http://localhost:8082/consumer/actuator/info (To view
info
aboutconsumer
service) -
http://localhost:8082/consumer/actuator/health (To
Check Health
forproducer
service)
-
-
Start the Frontend application
You can start the UI application by typing the following commands
cd front-end npm install ng serve
The UI app will start on port
4200
by default, So once you have successfully started application you'll be able to visit it using http://localhost:4200 -
Package the applications
You can also package the applications in the form of a
jar
file and then run each application like socd service_directory mvn clean package java -jar target/service_name-0.0.1-SNAPSHOT.jar
- service_directory: the directory of the service.
- service_name: the name of the service.
-
Full-Text Search Front-end application
To access the frontend application use the following endpoins:
Now take a look on UI app: You will find two links (Settings | Search).
-
Full-Text Search Front-end Settings Tab
Use Settings to configure and launch the real-time social media (twitter) streaming and web crawling to collect your required data.
-
Full-Text Search Front-end Search Tab
The Search link provide full-text search on the collected data.
- Social media analytics: a survey of techniques, tools and platforms
- How to effectively clean social media data for analysis
- 8 Most Popular Java Web Crawling & Scraping Libraries
- Web Crawling: How can you extract news articles given keywords and news sources?
- How to make a simple web crawler in Java
- Get started with the Twitter developer platform
- Streaming Real-time Twitter feeds using Apache Kafka
- Write a Kafka Producer Using Twitter Stream
- Building a Full-Text Search App Using Docker and Elasticsearch
- Elasticsearch with Spring Boot
I am Sayed Baladoh - Phd. Senior Software Engineer. I like software development. You can contact me via:
Any improvement or comment about the project is always welcome! As well as others shared their code publicly I want to share mine! Thanks!
Thanks for reading.
Did I help you?
- Share it with someone you think it might be helpful.
- Give a star to this project