dgraph-benchmarks/twitterdata at main · hypermodeinc/dgraph-benchmarks

History

Name		Name	Last commit message	Last commit date
parent directory ..
postprocess		postprocess
stream		stream
.gitignore		.gitignore
README.md		README.md
credentials-template.json		credentials-template.json
go.mod		go.mod
go.sum		go.sum

README.md

Twitter data in Dgraph

This benchmark provides code to download twitter data using streaming APIs and output is written in json files. Then, the data is post-processed and converted into live loader acceptable json files.

Downloading twitter data

Output

Generates as many files as number of workers, and each line has one tweet.

Run

mkdir json
go run postprocess/pp.go -i json -o pp

Post Processing

This step cleans up the data downloaded from twitter. This step also assigns an external id to each tweet and user (author or user mentioned in a tweet).

mkdir pp
go run postprocess/pp.go 10 json pp

Run Live Loader

Setup Dgraph

docker run --rm -it -p 5080:5080 -p 6080:6080 -p 8080:8080 -p 9080:9080 -p 8000:8000 --name dgraph dgraph/dgraph dgraph zero
docker exec -it dgraph dgraph alpha --lru_mb 2048 --zero localhost:5080
docker exec -it dgraph dgraph-ratel

(Note, all data would be lost when container stops)

Setup Schema

Use dgraph ratel to setup following schema -

user_id: string @index(exact) @upsert .
user_name: string @index(hash) .
screen_name: string @index(term) .

id_str: string @index(exact) @upsert .
created_at: dateTime @index(hour) .
hashtags: [string] @index(exact) .

author: uid @count @reverse .
mention: uid @reverse .

Run

dgraph live -f pp -x xidmap --zero localhost:5080 -c 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

twitterdata

twitterdata

README.md

Twitter data in Dgraph

Downloading twitter data

Output

Run

Post Processing

Run Live Loader

Setup Dgraph

Setup Schema

Run

Files

twitterdata

Directory actions

More options

Directory actions

More options

Latest commit

History

twitterdata

Folders and files

parent directory

README.md

Twitter data in Dgraph

Downloading twitter data

Output

Run

Post Processing

Run Live Loader

Setup Dgraph

Setup Schema

Run