Skip to content

ashishtiwari1993/elasticsearch-csv-export

Repository files navigation

elasticsearch-csv-export

CLI script for CSV export from ES

A script which written in PHP, Will query in Elasticsearch and fetch the bulk data with help of the Scroll API. We can specify query in Query DSL syntax . Script does bulk write to the CSV file with the selected fields. It will export your data asynchronously. It will fetch data by forking child process and run them parallely. There is also class file Export.class.php, Which you can easily integrate with your custom script or application. It requires official Elasticsearch-PHP SDK to run.

Requirements

Installation

  • Install PHP-Elasticsearch SDK as mention here . You can find doc according to your ES versions Also check version compatibility from here
  • git clone https://github.com/ashishtiwari1993/elasticsearch-csv-export.git .
  • Include PHP-Elasticsearch sdk's /vendor/autoload.php in process.php. Like
    require '/change-with-PHP-Elasticsearch-SDK-directory-path/vendor/autoload.php';

Usage

php process.php [--host HOSTNAME:PORT] [--index INDEX] [--type TYPE]
		        [--query  QUERY] [--stm  TIMEOUT] [--fields FIELD1,FIELD2]
		        [--size SIZE] [--csvfile CSVPATH] [--logfile LOGPATH] [--async NUMBER_OF_SLICE]  
		        
Optional argument

--host      HOST:PORT       Elasticsearch hostname with port e.g. example.com:9200             [required]
--index     NAME            Provide multiple Index name with comma seperated.                  [required]
--type      TYPE            Specified type
--query     QUERY           Query string in Lucene syntax    
--stm       TIME            Value in seconds to open search context in Scroll API. Default
                            it set to 30.                                                      
--fields    field1,field2   Specify Field need to fetch with comma sperated.
			    It won't work for nested fields.				       [required] 
--size      SIZE            Per scroll api how much data should fetch in one call. By Default
                            it is set to 100.                                                     
--csvfile   CSVPATH         Path to csv file where to export.                                  [required]                         
--logfile   LOGFILEPATH     Path to log file where log will be write. 
--async	    2		    The maximum number of slices for scroll API. It will fork same
			    Number of child process.

NOTE

  • It is using Scroll API. You can find more here on Scroll APIs.
  • You can specify --query using Query string OR Query DSL .It should be same as when you define query param in POST call when query to elasticsearch via curl. Check here for example.
  • For nested fileds you need to specify nested field value like field.key in --field param. For example with this structure {info:[{name:php}]} , I can access my key name with this way info.name .
  • For --async request, We using pcntl_fork to forking child processes. It will fetch all data asynchronously irrespective of any order or sorting.

Example

php process.php --host 'localhost:9200' --index 'myindex1,myindex2' --type 'logs' --fields 'balance,firstname,gender,state,city' --stm 60 --size 500 --query '{"query":{"match":{"gender":"M"}}}' --csvfile '/home/ashish/records.csv' --logfile '/tmp/b.log' --async 2

License

This project is licensed under the MIT License - see the LICENSE file for details

Releases

No releases published

Packages

No packages published

Languages