arquivo / webarchive-indexing
forked from ikreymer/webarchive-indexingBranch: master
-
updating job to index CDXs with python bin path
root committedJun 19, 2019
-
-
-
Update RunhadoopJobs.sh to use python3.5
igobranco committedMay 21, 2019 Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
igobranco committed
May 20, 2019 Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
igobranco committed
May 20, 2019 Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Now only indexing cdx requests that correspond to http status codes o…
Fernando-Melo committedJun 27, 2018 …f 100s, 200s and 300s. Changed hadoop input type to lineinput to run faster in the new cluster
-
Utilitie to convert CDX to CDXJ format
danielbicho committedDec 12, 2016
-
Fernando-Melo committed
Dec 5, 2016 -
root committed
Dec 5, 2016
-
indexwarcsjob.py: update hard-coded path to be correct (TODO: load fr…
ikreymer committedFeb 22, 2016 …om env) reqs: update to correct python-hadoop git
-
add newline after last line in zipnum block, fixes ikreymer#4
ikreymer committedNov 24, 2015
-
build local zipnum: add line numbering to final index (summary), add …
ikreymer committedOct 9, 2015 …option to specify cdx lines per block
-
Merge pull request ikreymer#2 from machawk1/patch-1
ikreymer committedJun 11, 2015 Fixed various MD weirdnesses
-
machawk1 committed
Jun 11, 2015
-
minor fixes from latest CC index build:
ikreymer committedMar 29, 2015 - fix typos - update reducer start for cluster job to be much later - update reqs
-
Merge branch 'master' of https://github.com/ikreymer/webarchive-indexing
ikreymer committedMar 12, 2015 -
add local zip num cluster building support!
ikreymer committedMar 12, 2015
-
sample job: add scaler to mapper also
ikreymer committedMar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
Update README.md with more usage instuctions.
ikreymer committedMar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
add sample index_env.sample.sh
ikreymer committedMar 11, 2015 -
Merge branch 'master' of https://github.com/ikreymer/webarchive-indexing
ikreymer committedMar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
ikreymer committed
Mar 11, 2015 -
more cleanup of run scripts, integrate seq upload into dosample job
ikreymer committedMar 11, 2015
-
refactoring: use cmdline options instead of fixed constants!
ikreymer committedMar 10, 2015 rename job files to end in job add integrated samplecdx script for running samplecdxjob and converting to sequencefile add seqfileutils