This is a series of messy scripts that will gather the git history of a repository and convert it to the bulk import format for Elasticsearch.
- Export git history in json-ish format
This script is based off of the work in this gist.
git log --pretty=format:'{%n "commit": "%H",%n "abbreviated_commit": "%h",%n "tree": "%T",%n "abbreviated_tree": "%t",%n "parent": "%P",%n "abbreviated_parent": "%p",%n "refs": "%D",%n "encoding": "%e",%n "subject": "%s",%n "sanitized_subject_line": "%f",%n "body": "%b",%n "commit_notes": "%N",%n "verification_flag": "%G?",%n "signer": "%GS",%n "signer_key": "%GK",%n "author": {%n "name": "%aN",%n "email": "%aE",%n "date": "%aD"%n },%n "commiter": {%n "name": "%cN",%n "email": "%cE",%n "date": "%cD"%n }%n},' | sed "$ s/,$//" | sed ':a;N;$!ba;s/\r\n\([^{]\)/\\n\1/g'| awk 'BEGIN { print("[") } { print($0) } END { print("]") }' > history.txt
- Export the file list
git --no-pager log --name-status > files.txt
- Run the formatter
This will join the history.txt and files.txt files and format them for import to Elasticsearch.
node git2json.js
- Import to Elasticsearch
This script will upload your file to Elasticsearch:
./import.sh
A copy of the fully formatted finaljson.json file has been gernated, but as long as everything has
Here's a link to documentation for quickly spinning up an Elastic Stack cluster: https://searchbetter.dev/blog/quickstart-guide-elastic-stack-for-devs/
rm history.txt files.txt finaljson.json
Having a hard time redirecting input on WSL?? Try pasting the git commands into a .sh file first, and then redirecting that execution to the file
Having trouble reading files on Windows or WSL? Try converting the file to ascii
In dos:
cmd /c /a type history.txt>history.txt
cmd /c /a type files.txt>files.txt
Having a hard time with large repositories? Elastic will prohibit uploads greater than 100mb by default, you can change that settings but you might be better off splitting the file up (though make sure to do it only after an even line!)