BuSLR knows about various packages that are useful for machine learning (originally speech and language processing), and how to go get them and build them. BuSLR supports two build systems:
- One is built around cmake's ExternalProject package. It uses make dependencies to handle package dependencies.
- The other is built around conda. It functions as a
repository for conda's
meta.yaml
andbuild.sh
metadata.
Not all packages are supported in both systems.
There is an aspect of balance between the wider conda
infrastructure and the linux distribution packaging systems. In general, if
something is normally in a linux distribution (e.g.,
sox) then there's no point handling it here. If
it's in conda
then the same argument applies, but more subjectively.
pytorch is better in conda,
kaldi perhaps not.
Also, with conda, bear in mind that this is not a joke; the thing marked "another PIP?" does exist.
Clone the repo and do
cd buslr/local
cp Configure.example configure.sh # Edit if necessary
./configure.sh
make <package name>
The package is built in local
and installed to local
unless the appropriate
line in configure.sh
is changed. You can set:
export PATH=<path-to-buslr>/local/bin
to access the builds, or do source <path-to-buslr>/local/etc/buslrvars.sh
to set other appropriate variables too. Set the INHIBIT
line in configure.sh
to inhibit building of packages for which you might have a system version (typically cuda
or mkl
).
Clone the repo and do
cd buslr
conda build src/<package name>
As long as the conda-bld
directory is on your channel list (it is indexed and
functions as a local channel), you can do this:
conda install <package name>
conda build purge
Many of the packages were initialised with this command
conda-skeleton pypi <name-of-pip-package>
It allows conda versions of PIP packages to be built, thus avoiding the problem with muliple PIPs and conda being unaware of PIP.
- HTS requires the HTK sources to be downloaded manually.
- SRILM also requires a manual download
- Some packages (festival, kaldi, SRILM) don't really support a
make install
. See the in-place build section below.
There is a directory for each package. Typically there are only
CMakeLists.txt
and meta.yaml
files, but there can also be patched or whole
files to be copied into the tree. In the case of HTS and SRILM, the manually
downloaded files are placed there too.
Following the man page for patch
, patches can be generated by copying the
original file to <path-to>/<file>.org
, modifying the file, then running
diff -Naur <path-to>/<old-file> <path-to>/<new-file>
This is typically run relative to a directory called
package/package-prefix/src/package
. At patch time, cmake will cd to that
directory. The patch can be applied using
PATCH_COMMAND patch -p0 < ${CMAKE_CURRENT_SOURCE_DIR}/patch.txt
in the CMakeLists.txt
file. A precedent for this is the sctk
package, which patches the installation directory of a deep makefile.
If there are multiple patched files, it's better to run it on a copy of the
whole directory. In this case, it will prepend a directory so we need patch -p1
.
If the package is git based then git can generate the patch using git diff
.
It functions like the directory case, so patch -p1
. However patching git
checkouts causes problems on updates; see irstlm
.
Where a package doesn't even have a build system, a cmake
file can be copied
directly into the tree. This approach is taken in sph2pipe
.
Some packages don't have an install step. The native CMake install can work
well in these cases. CMake's install()
command actually writes things to a
file called cmake_install.cmake
. The trick is to use this file as the
INSTALL_COMMAND
for these cases. The simplest precedent is libresample
.
So, define this:
set(CMAKE_INSTALL_SCRIPT ${CMAKE_CURRENT_BINARY_DIR}/cmake_install.cmake)
add this command
INSTALL_COMMAND ${CMAKE_COMMAND} -P ${CMAKE_INSTALL_SCRIPT}
and specify the files using install(FILES <files> DESTINATION <where>)
.
Some packages, notably kaldi
and the festvox
family, don't really support being installed. For these, we set SOURCE_DIR
to something at top level (rather than buried in the src
tree) and set INSTALL_COMMAND true
to suppress installation. true
here is the unix command that returns 1; empty strings don't survive the BuSLR_Add
wrapper.