Skip to content

Full-Text Search System to stream, collect, clean, store and filter data collected from different sources using Docker, Kafka, Elasticsearch, Kibana, Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch, Twitter HBC, Jsoup and Angular

SayedBaladoh/Full-Text-Search-using-Docker-Kafka-Elasticsearch-Kibana-Java-Spring-Boot-HBC-Jsoup-Angular

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Full-Text Search System to stream, collect, clean, store and filter data collected from different sources using Docker, Kafka, Elasticsearch, Kibana, Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch, Twitter HBC, Jsoup and Angular

Designing and implementing a system to stream, capture, clean, store data, and allows users to make full-text search and filter data collected from different sources (social media, world wide web). It has the following services:

  • Producer service to extract data from different sources (social media and web), clean, send it to Kafka producer and provide APIs to configure and launch the streaming for social media (twitter) and Crawling the web sites using Java, Spring Boot, Spring Kafka and Maven.

  • Consumer service is a Kafka Consumer to read messages from Kafka, process and index them into Elasticsearch and provide APIs for full-text searching and filtering data from Elasticsearch using Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch and Maven.

  • FrontEnd app is a simple UI application using Angular to configure and launch the real-time social media (twitter) streamer and web crawler. It also provide full-text search on the collected data in a simple way.

Table of contents

Architecture

The next diagram shows the system architecture

Architecture Diagram

  • Tha admin of the system configure and launch the real-time social media streaming (Twitter for this case study) and web crawling for any other web site using simple UI app. The streaming data send to Kafka produce.
  • The Kafka consumer consumes data and the consumer service converting it to Elasticsearch.
  • The Elasticsearch receives data from Kafka to index and store it.
  • Tha admin can use Kibana to visualize, monitor and manage data.
  • The user can use a simple UI search app to make full-text search and filter the collected data.

Technologies

This project is created using the following technologies:

  1. Java 8

  2. Maven Dependency Management

  3. Spring Boot:

    • Spring Web
    • Spring Kafka
    • Spring Data Elasticsearch
    • Spring Actuator
  4. Twitter hbc

  5. Jsoup

  6. Docker

  7. Apache Kafka and Apache Zookeeper (required for Kafka)

  8. Elasticsearch

  9. Kibana

  10. Angular

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You need to install the following software:

Setup

To run this project, install it locally as follow:

  1. Clone the application

    git clone https://github.com/SayedBaladoh/Full-Text-Search-using-Docker-Kafka-Elasticsearch-Kibana-Java-Spring-Boot-HBC-Jsoup-Angular.git
  2. Change twitter configuration with your API Key, API Secret Key, Access Token and Access Token Secret

    • open producer/src/main/resources/application.yml file
    • change twitter auth properties:
    social:
      twitter:
        auth:
          apiKey: API_KEY
          apiSecretKey: API_SECRET_KEY
          accessToken: ACCESS_TOKEN
          accessTokenSecret: ACCESS_TOKEN_SECRET
  3. Start the Kafka, Zookeper, Elasticsearch and Kibana using docker-compose

    The project includes a docker-compose.yml file so you can use Docker Compose to start up them, no installation needed.

    cd solution_directory
    docker-compose up -d
  4. Check if Kafka, Zookeper, Elasticsearch and Kibana is running

    From command prompt:

    docker ps -a

    You should see the following result: Kafka, Zookeper, Elasticsearch and Kibana are UP and RUNNING

    You can even make other checks in order to make sure your Kibana and elasticsearch are running. Open your internet browser and use the following URLs:

  5. Run Producer service application

    You can start the producer service by typing the following command

    cd producer
    mvn spring-boot:run

    The producer service will start on port 8081, So you'll be able to visit it under address http://localhost:8081.

  6. Run the Consumer service application

    You can start the consumer service by typing the following command

    cd consumer
    mvn spring-boot:run

    The consumer service will start on port 8082, So you'll be able to visit it under address http://localhost:8082.

  7. Start the Frontend application

    You can start the UI application by typing the following commands

    cd front-end
    npm install
    ng serve

    The UI app will start on port 4200 by default, So once you have successfully started application you'll be able to visit it using http://localhost:4200

  8. Package the applications

    You can also package the applications in the form of a jar file and then run each application like so

    cd service_directory
    mvn clean package
    java -jar target/service_name-0.0.1-SNAPSHOT.jar
    • service_directory: the directory of the service.
    • service_name: the name of the service.

Running

  • Full-Text Search Front-end application

    To access the frontend application use the following endpoins:

    Full-Text Search UI

    Now take a look on UI app: You will find two links (Settings | Search).

  • Full-Text Search Front-end Settings Tab

    Use Settings to configure and launch the real-time social media (twitter) streaming and web crawling to collect your required data.

    Full-Text Search UI Settings tab

  • Full-Text Search Front-end Search Tab

    The Search link provide full-text search on the collected data.

    Full-Text Search UI Search tab

References

About me

I am Sayed Baladoh - Phd. Senior Software Engineer. I like software development. You can contact me via:

Any improvement or comment about the project is always welcome! As well as others shared their code publicly I want to share mine! Thanks!

Acknowledgments

Thanks for reading.

Did I help you?

  • Share it with someone you think it might be helpful.
  • Give a star to this project

About

Full-Text Search System to stream, collect, clean, store and filter data collected from different sources using Docker, Kafka, Elasticsearch, Kibana, Java, Spring Boot, Spring Kafka, Spring Data Elasticsearch, Twitter HBC, Jsoup and Angular

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published