More information about the archiving project can be found on the ArchiveTeam wiki: [Google News Archive](http://archiveteam.org/index.php?title=Google News Archive)
Be sure to replace
YOURNICKHERE with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.
In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.
If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.
Running with a warrior
Follow the instructions on the ArchiveTeam wiki for installing the Warrior, and select the "Google News Archive" project in the Warrior interface.
Running without a warrior
To run this outside the warrior, clone this repository, cd into its directory and run:
pip install --upgrade seesaw ./get-wget-lua.sh
then start downloading with:
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
For more options, run:
If you don't have root access and/or your version of pip is very old, you can replace "pip install --upgrade seesaw" with:
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python get-pip.py --user ; ~/.local/bin/pip install --upgrade --user seesaw
so that pip and seesaw are installed in your home, then run
~/.local/bin/run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
Running multiple instances on different IPs
This feature requires seesaw version 0.0.16 or greater. Use
pip install --upgrade seesaw to upgrade.
--context-value argument to pass in
bind_address=126.96.36.199 (replace the IP address with your own).
Example of running 2 threads, no web interface, and Wget binding of IP address:
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server --context-value bind_address=188.8.131.52
adduser --system --group --shell /bin/bash archiveteam apt-get update && apt-get install -y git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev flex autoconf pip install --upgrade seesaw su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/google-newspapers.git; cd google-newspapers; ./get-wget-lua.sh" archiveteam screen su -c "cd /home/archiveteam/google-newspapers/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam [... ctrl+A D to detach ...]
In Debian Jessie, the
libgnutls-dev package was renamed to
libgnutls28-dev. So, you need to do the following instead:
adduser --system --group --shell /bin/bash archiveteam apt-get update && apt-get install -y git-core libgnutls28-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev flex autoconf [... pretty much the same as above ...]
Wget-lua is also available on ArchiveTeam's PPA for Ubuntu.
Ensure that you have the CentOS equivalent of bzip2 installed as well. You will the EPEL repository to be enabled.
yum -y install autoconf automake flex gnutls-devel lua-devel python-pip zlib-devel pip install --upgrade seesaw [... pretty much the same as above ...]
zypper install liblua5_1 lua51 lua51-devel screen python-pip libgnutls-devel bzip2 python-devel gcc make pip install --upgrade seesaw [... pretty much the same as above ...]
For OS X:
You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.
brew install python lua gnutls pip install --upgrade seesaw [... pretty much the same as above ...]
There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, google-newspapers will not work with your rsync version.
This supposedly fixes it:
For Arch Linux:
Ensure that you have the Arch equivalent of bzip2 installed as well.
- Make sure you have
- Install the wget-lua package from the AUR.
pip2 install --upgrade seesaw.
- Modify the run-pipeline script in seesaw to point at
useradd --system --group users --shell /bin/bash --create-home archiveteam
screen su -c "cd /home/archiveteam/google-newspapers/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam
Honestly, I have no idea.
./get-wget-lua.sh supposedly doesn't work due to differences in the
tar that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.efnet.org #archiveteam).
Broken? These are some of the possible solutions:
wget-lua was not successfully built
If you get errors about
wget.pod or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:
cd get-wget-lua.tmp mv src/wget ../wget-lua cd ..
get-wget-lua.tmp name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!
Optionally, if you know what you're doing, you may want to use wgetpod.patch.
Problem with gnutls or openssl during get-wget-lua
Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.
ImportError: No module named seesaw
If you're sure that you followed the steps to install
seesaw, permissions on your module directory may be set incorrectly. Try the following:
chmod o+rX -R /usr/local/lib/python2.7/dist-packages
run-pipeline: command not found
pip2 instead of
pip2 install seesaw
Issues in the code
If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
Are you a developer? Help write code for us! Look at our developer documentation for details.
Have an issue not listed here? Join us on IRC and ask! We can be found at irc.efnet.org #archiveteam.