Elasticsearch Documents Synchronization

This code provides a Python script for synchronizing data between two MVX Elasticsearch clusters. It retrieves documents from one cluster and indexes them into another cluster to ensure data consistency.

Prerequisites

Python (version 3.6 or higher)
elasticsearch library (install using pip install elasticsearch)
python-dotenv library (install using pip install python-dotenv)

Setup

Clone this repository.
Create a .env file at the root of the project and add the following lines, replacing the placeholders with your Elasticsearch connection information:
```
ES2_USERNAME=your_es2_username
ES2_PASSWORD=your_es2_password
```

Configuration

The script uses the following configuration parameters:

indices_name: A list of index names to process.
batch_size: The number of documents to retrieve per request (maximum 10000).
request_timeout: Maximum timeout for a request in seconds.
max_request_retries: Maximum number of request retries in case of an error.
request_interval: Interval for batching bulk requests.
delay: Delay between each request interval in seconds.
from_interval: Offset interval from which to start the document search from the current timestamp.
offset: Offset in seconds to add to the search interval of the second Elasticsearch cluster.

Usage

Install the required libraries: pip install elasticsearch, pip install python-dotenv.
Update the configuration parameters in the script to match your Elasticsearch clusters.
Run the script: python data_sync.py.
The script will compare the document IDs between the two Elasticsearch clusters and index the missing documents from one cluster to another.
The script will create log files named <index_name>_missing_documents.txt for each index processed, listing the missing document IDs.

Functionality

Query Elasticsearch to retrieve documents based on the given time range and page size.
Compare document IDs between two Elasticsearch clusters.
Fetch missing documents from one cluster and index them into the other cluster.
Write the missing document IDs to log files.

The script performs these steps for each index specified in the indices_name list.

Note: The script uses multithreading to query the Elasticsearch clusters in parallel for better performance.

Please make sure to adjust the configuration parameters and review the code according to your specific requirements before running the script.

Feel free to contribute or provide feedback to improve the script.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
data_sync.py		data_sync.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data_sync.py

data_sync.py

requirements.txt

requirements.txt

Repository files navigation

Elasticsearch Documents Synchronization

Prerequisites

Setup

Configuration

Usage

Functionality

About

Releases

Packages

Languages

E-Compass/elasticsearch-documents-synchronization

Folders and files

Latest commit

History

Repository files navigation

Elasticsearch Documents Synchronization

Prerequisites

Setup

Configuration

Usage

Functionality

About

Topics

Resources

Stars

Watchers

Forks

Languages