Skip to content
Software deliverables for the ArchAIDE project
Python HTML
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
c3d
.gitignore
README.md
terra-sigillata-complete-top-level.json

README.md

archaide-software

Software deliverables for the ArchAIDE project

Installation

The setup instructions were tested on Linux (Ubuntu 16.04) 64-bits. While they should work with minor modifications also on Windows/Mac, this was not tested.

Python

You will need to use Python >= 3.5.2

Required Python packages

To run the code, you will need to have at least the following packages installed in your environment:

  • numpy >= 1.15.1
  • tensorflow-gpu >= 1.10.0
    • It is possible also to use the CPU version (tensorflow), but it will be much slower, and so it's discouraged.
    • See notes about troubleshooting TensorFlow installation below.
  • scikit-image (aka skimage) >= 0.13.1
  • scikit-learn (aka sklearn) >= 0.19.1
  • Pillow (aka PIL) >= 5.0.0

We recommend using virtualenv (or conda) to install these packages in an isolated environment. Once inside your Python environment, all packages can simply be installed with pip:

# Optional: create and use a virtualenv
virtualenv venv/ -p python3
source venv/bin/activate

# And now install the packages
pip install numpy matplotlib scikit-learn scikit-image tensorflow-gpu Pillow

If you haven't installed TensorFlow with GPU support previously, you may encounter issues with the installation of TensorFlow. If indeed it does not install smoothly, please follow the official instructions.

Regardless of whether installation worked "smoothly" or not, make sure GPU support works properly (if you installed TensorFlow with GPU support) before proceeding to the next steps. You can do this by running the following command and making sure it finishes without an error.

python -c 'import tensorflow'

If you are getting errors, see the common installation problems of the guide.

ResNet dependency

In order to run the appearance classifier, we rely on a pre-trained version of the ResNet-101 network. This can be obtained for TensorFlow by following the instructions on the tensorflow-resnet project. We recommend using the already converted model they supply in the torrent link (and extracting the tensorflow-resnet-pretrained-20160509 folder here inside the root folder of the repository). If the torrent is not available for some reason, follow the conversion instructions on their documentation.

PointNet dependency

The current architecture of the shape-based classification, is based on PointNet. Our code uses a modified version of PointNet, which was modified for our own needs, and is available at PointNet-ArchAIDE. All we need for now, is just to clone this repository and then set it up to use revision @48ce752 by doing:

# Clone the repository and then enter it's directory
git checkout 48ce752

Usage

Appearance based similarity

Foreground extraction

Due to the reasons detailed in D6.3, to achieve improved accuracy and reduce the bias towards specific backgrounds for specific classes, we use a foreground extraction process - a process that automatically extracts sherds from their backgrounds, before using them for training the classification.

To do this, we first use the following tool on the folder containing the original data:

python -m c3d.appearance2.fg_extract \
  path/to/raw/data/folder \
  --alternate_base path/to/extracted/data/folder \
  --num_workers 10

This will create a new folder with the foreground extracted version of the images. This is the folder you should use for the actual training.

Note the parameter num_workers - use this to manage the number of workers that run in parallel to process the image. Increasing this can cause the processing to run faster, but it requires more CPUs from your computer. We recommend setting this to not more than <The number of CPUs on your computer> - 2.

Training

Train the model on images in the folder (leaving out some images for testing). The label of each image will be the name of the folder containing it:

# Create a directory for saving intermediate training results (used to continue
# training if it was interrupted)
mkdir path/to/cache

# Train the model
python -m c3d.appearance2 \
  path/to/model.pkl path/to/extracted/data/folder \
  --resnet-dir path/to/resnet-dir \
  train
  --cache_dir path/to/cache

Classifying

Classify all images in a given folder. For each image, list the top 3 guesses:

python -m c3d.appearance2 \
  path/to/model.pkl path/to/extracted/data/folder \
  --k 3
  classify

Testing

Evaluate our training - test the model on the images that were left out for testing, listing top 5 guesses:

python -m c3d.appearance2 \
  path/to/model.pkl path/to/extracted/data/folder \
  --k 5
  test

Note: The test should be carried out on the same folder with the source images.

Further help

Additional arguments can be found using the --help flag:

python -m c3d.appearance2 --help

Shape based similarity

Generating the synthetic sherds

To train the shape recognition model, we first need to generate the training data - lots of synthetic sherd fractures. To do this, we use the following tool on the folder containing the profiles to train on:

python -m c3d.pipeline.fracture2 \
  path/to/profile-svgs/data/folder \
  path/to/sherds/data/folder \
  2000

This will create a new folder with the specified number of synthetic sherds generated for each profile drawing (SVG file). This is the data we'll use in the training process.

Training

To train the model on the synthetic sherds in the folder, run

# Create a directory for saving intermediate training results (used to continue
# training if it was interrupted)
mkdir path/to/cache

# Train the model
PYTHONPATH=path/to/pointnet-archaide:$PYTHONPATH python -m c3d.shape2 \
  path/to/model.pkl path/to/sherds/data/folder \
  [ --label-mapping-file path/to/mapping/file.json \ ]
  train \
  --cache_dir path/to/cache \
  [ --eval_set path/to/sherd-svgs/data/folder ]

Note that we need to add the path to the pointnet-archaide repository to the PYTHONPATH environment variable. Furthermore, if we want to specify that multiple profiles are equivalent (for example, to specify that all subtypes of a specific top-level type are equivalent, for training a top-level classifier), we need to specify a path to a label mapping file. The label mapping is a JSON file with the following structure:

{
  "profile01_name": "new_name01",
  "profile02_name": "new_name02",
  ...
}

For convinience, we include an example such file mapping all subtypes of Terra Sigillata (from the Conspectus) to their top level profiles. This file is terra-sigillata-complete-top-level.json which is provided in this repository. Note that we excluded subtypes that only show a partial shape (and not a full profile); by excluding profiles from the label mapping file, we remove them from the training process.

Another argument we specified is an evaluation set - if specified, this should be the path to a folder containing SVGs extracted from real sherds. The SVGs should be organized under folders with the same names as in the training set, and they will be periodically used to evaluate the classifier accuracy on them, providing a measure of the accuracy on real data.

Classifying

To classify all sherd SVGs in a given folder, listing the top 3 guesses for each input, run the following command:

PYTHONPATH=path/to/pointnet-archaide:$PYTHONPATH python -m c3d.shape2 \
  path/to/model.pkl path/to/sherds/data/folder \
  [ --label-mapping-file path/to/mapping/file.json \ ]
  --svg-inputs \
  --k 3 \
  classify

This command will also print the accuracy of the classification (assuming the folder names match the names used during the training).

You can’t perform that action at this time.