Roteiro2Arc is a tool that converts to Internet Archive ARC files the local archive of the Portuguese Web present in the CD-ROM featured with the book "Novo Roteiro Prático da Internet" by José Magalhães
arquivo/roteiro2arc
master
Name already in use
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI.
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Roteiro2Arc =========================================================== Roteiro2Arc is a tool that converts to Internet Archive ARC files the local archive of the Portuguese Web present in the CD-ROM featured with the book "Novo Roteiro Prático da Internet" by José Magalhães. Roteiro2Arc is a tool created by the SAW group at FCCN <sawfccn@google.com> to be used by the Portuguese Web Archive for internal projects (http://www.arquivo.pt/). This software is distributed as is and is licensed with GNU LGPL licence. ----------------------------------------------------------- Running Roteiro2Arc ----------------------------------------------------------- Roteiro2Arc conversion is a two step process: 1) Search the files for an the embedded original URL and write to a database the information that relates the local path of the file to the found URL. 2) Reads the files and export them ARC files. This second steps is further divided in sub-steps. 2.1) Read the files that we know the original URL(it is in the DB), convert the local paths in the file to URLs and write it to an ARC file. During this step, every file that had no known original URL receives a generated URL (we use the URL of the file that called it and append the local path of the unknown file) and a reference to this file and its generated URL is stored in a special recovery DB. 2.2) The recovery DB is read and every file referenced is exported to ARC files using the generated URL. This way, it is possible that webpages can maintain the reference to images, stylesheets and other resources. The easiest way to run Roteiro2Arc is to use the "run" Ant task. This task handles both steps of running the conversion process: > ant run -Dsource=<source_dir> -Ddestination=<destination_dir> \ -Ddb=<db_dir> -Drecovery-db=<recovery-db_dir> The other way is to run each step separately: 1) URL extraction > ant run-url-extraction -Dsource=<source_dir> -Ddb=<db_dir> (with Ant) > java ExtractURL <source_dir> <db_dir> (calling the class) 2) Writing the ARC files > ant run-arc-conversion -Dsource=<source_dir> -Ddestination=<destination_dir> \ -Ddb=<db_dir> -Drecovery-db=<recovery-db_dir> (with Ant) > java ArcConverter <source_dir> <destination_dir> <db_dir> \ <recovery-db_dir> (calling the class) source_dir: the dir with the local files. destination_dir: the destination for the converted ARC files. db_dir: the path where the Berkeley DB with the detected paths/URLs will be stored. recovery-db_dir: the path where the Berkeley DB with the recovery information of the files without the original URL. This DB will hold path/generated-url informations.
About
Roteiro2Arc is a tool that converts to Internet Archive ARC files the local archive of the Portuguese Web present in the CD-ROM featured with the book "Novo Roteiro Prático da Internet" by José Magalhães