Exploring Stack Overflow data with the Elastic Stack

Explore the Stack Overflow data set with the Elastic Stack using this gentle introduction. Stack Overflow data is indexed using .NET Core, a cross platform, open source platform for building applications, using NEST, the official Elasticsearch client for .NET.

Prerequisites

Download at least Elasticsearch 7.4.2
Download at least Kibana 7.4.2 (version must match same version as Elasticsearch)
Install .NET Core 3.0
Download latest Stack Overflow data set
- Under 7Z files, choose stackoverflow.com-Posts.7z , stackoverflow.com-Users.7z and stackoverflow.com-Badges.7z
Unzip Stack Overflow data set to a directory. You'll need around 90GB of available space!

Building

Restore project Nuget package dependencies. In the solution root directory
```
dotnet restore
```
Build the solution in Release configuration. In the solution root directory
```
dotnet build -c Release
```

Setting up Elasticsearch

Set the JVM heap size to at least 8GB, by adding the following to the jvm.options file in config directory within Elasticsearch home directory, and saving the file
```
-Xms8g
-Xmx8g
```
Start Elasticsearch using the elasticsearch.[sh|bat] file in bin directory within Elasticsearch home directory
```
./elasticsearch.bat
```

Indexing data

Navigate to StackOverflow.Indexer/bin/Release/netcoreapp3.0 directory from the root of the solution. There should be a compiled StackOverflow.Indexer.dll file in the directory from compiling the solution in previous steps.

Check available options for indexing posts or users using --help argument

dotnet .\StackOverflow.Indexer.dll --help

dotnet .\StackOverflow.Indexer.dll posts --help

dotnet .\StackOverflow.Indexer.dll users --help

dotnet .\StackOverflow.Indexer.dll tags --help

Index posts data
```
dotnet .\StackOverflow.Indexer.dll posts -e "http://localhost:9200" -f "/path/to/Posts.xml"
```
Wait ~90 minutes to index all questions and answers on a local single node Elasticsearch cluster

Index users data

dotnet .\StackOverflow.Indexer.dll users -e "http://localhost:9200" -f "/path/to/Users.xml" -b "/path/to/Badges.xml"

Wait ~15 minutes to index all users and their badges on a local single node Elasticsearch cluster

(Optional) Update answers with tags

If you'd like to be able to filter both questions and answers using tags, it can be useful to denormalize question tags onto answers. The source data can be transformed before ingesting to do this, but can also be achieved using the update by query API, which is what this command does.
```
dotnet .\StackOverflow.Indexer.dll tags -e "http://localhost:9200" -f "/path/to/Posts.xml"
```
This can take a few hours. The -s argument can be used to change the number of concurrent updates, so depending on the performance of the cluster into which you're indexing, you may be able to increase this to speed up the process.

Import Kibana Saved Objects

The kibana_saved_objects_742.ndjson file can be imported into Kibana to apply some preconfigured saved queries, visualizations and a dashboard:

Navigate to Management menu item within Kibana
Under Kibana, select Saved Objects
Select Import and choose the kibana_saved_objects_742.ndjson file.

There should now be

a Dashboard under the Dashboard menu item
a collection of Vizualizations under Vizualize menu item
a collection of Saved Queries under Discover menu item

License

Content of this repository made available under Apache 2.0 license.
Stack Overflow data is made available under Creative Commons Attribution-ShareAlike 4.0 International license.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
StackOverflow.Indexer		StackOverflow.Indexer
images		images
kibana		kibana
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
StackOverflow.sln		StackOverflow.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Stack Overflow data with the Elastic Stack

Prerequisites

Building

Setting up Elasticsearch

Indexing data

Import Kibana Saved Objects

License

About

Releases

Packages

Languages

License

JoinBugshare/stack-overflow

Folders and files

Latest commit

History

Repository files navigation

Exploring Stack Overflow data with the Elastic Stack

Prerequisites

Building

Setting up Elasticsearch

Indexing data

Import Kibana Saved Objects

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages