Skip to content
Permalink
Browse files
Update ci bulk ingest docs (#98)
  • Loading branch information
keith-turner authored and ctubbsii committed Jul 19, 2019
1 parent 5a0ac28 commit a37fb19c16701232d51527dbe913b212710282ff
Showing 1 changed file with 16 additions and 0 deletions.
@@ -8,6 +8,12 @@ in a loop like the following to continually bulk import data.
# create the ci table if necessary
./bin/cingest createtable
# Optionally, consider lowering the split threshold to make splits happen more
# frequently while the test runs. Choose a threshold base on the amount of data
# being imported and the desired number of splits.
#
# accumulo shell -u root -p secret -e 'config -t ci -s table.split.threshold=32M'
for i in $(seq 1 10); do
# run map reduce job to generate data for bulk import
./bin/cingest bulk /tmp/bt/$i
@@ -47,3 +53,13 @@ scan -t accumulo.metadata -b ~blip -e ~blip~
scan -t accumulo.metadata -c loaded
```

The counts (add referenced and unrefrenced) output by `cingest verify` should equal :

```
test.ci.bulk.map.task * test.ci.bulk.map.nodes * num_bulk_generate_jobs
```

Its possible the counts could be slightly smaller because of collisions. However collisions
are unlikely with the default settings given there are 63 bits of randomness in the row and
30 bits in the column. This gives a total of 93 bits of randomness per key.

0 comments on commit a37fb19

Please sign in to comment.