manuals/en/main/migration.tex


\chapter{Migration and Copy}
\label{MigrationChapter}
\index[general]{Migration}
\index[general]{Copy}

The term Migration, as used in the context of Bareos, means moving data from
one Volume to another.  In particular it refers to a Job (similar to a backup
job) that reads data that was previously backed up to a Volume and writes
it to another Volume.  As part of this process, the File catalog records
associated with the first backup job are purged.  In other words, Migration
moves Bareos Job data from one Volume to another by reading the Job data
from the Volume it is stored on, writing it to a different Volume in a
different Pool, and then purging the database records for the first Job.

The Copy process is essentially identical to the Migration feature with the
exception that the Job that is copied is left unchanged.  This essentially
creates two identical copies of the same backup. However, the copy is treated
as a copy rather than a backup job, and hence is not directly available for
restore. If Bareos finds a copy when a job record is purged (deleted) from the
catalog, it will promote the copy as \textsl{real} backup and will make it
available for automatic restore.

Copy and Migration jobs do not involve the File daemon.

Jobs can be selected for migration based on a number of criteria such as:
\begin{itemize}
\item a single previous Job
\item a Volume
\item a Client
\item a regular expression matching a Job, Volume, or Client name
\item the time a Job has been on a Volume
\item high and low water marks (usage or occupation) of a Pool
\item Volume size
\end{itemize}

The details of these selection criteria will be defined below.

To run a Migration job, you must first define a Job resource very similar
to a Backup Job but with \linkResourceDirective{Dir}{Job}{Type} = Migrate 
instead of \linkResourceDirective{Dir}{Job}{Type} = Backup.
One of the key points to remember is that the Pool that is
specified for the migration job is the only pool from which jobs will
be migrated, with one exception noted below. In addition, the Pool to
which the selected Job or Jobs will be migrated is defined by the
\linkResourceDirective{Dir}{Pool}{Next Pool} = ... in the Pool resource 
specified for the Migration Job.

Bareos permits Pools to contain Volumes of different Media Types.
However, when doing migration, this is a very undesirable condition.  For
migration to work properly, you should use Pools containing only Volumes of
the same Media Type for all migration jobs.

A migration job can be started manually or from a Schedule, like
a backup job. It searches for existing backup Jobs that match the
parameters specified in the migration Job resource, primarily a
\linkResourceDirective{Dir}{Job}{Selection Type}. If no match was
found, the Migration job terminates without further action. Otherwise,
for each Job found this way, the Migration Job will run a new Job
which copies the Job data to a new Volume in the Migration Pool.

Normally three jobs are involved during a migration:

\begin{itemize}
\item The Migration control Job which starts the migration child Jobs.
\item The previous Backup Job (already run). The File records
      of this Job are purged when the Migration job terminates
      successfully. The data remain on the Volume until it is recycled.

\item A new Migration Backup Job that moves the data from the
      previous Backup job to the new Volume.  If you subsequently
      do a restore, the data will be read from this Job.
\end{itemize}

If the Migration control Job finds more than one existing Job to
migrate, it creates one migration job for each of them. This may result
in a large number of Jobs. Please note that Migration doesn't scale too
well if you migrate data off of a large Volume because each job must
read the same Volume, hence the jobs will have to run consecutively
rather than simultaneously.

\section{Important Migration Considerations}
\index[general]{Important Migration Considerations}
\begin{itemize}
\item Each Pool into which you migrate Jobs or Volumes {\bf must}
      contain Volumes of only one \linkResourceDirective{Dir}{Storage}{Media Type}.

\item Migration takes place on a JobId by JobId basis. That is
      each JobId is migrated in its entirety and independently
      of other JobIds. Once the Job is migrated, it will be
      on the new medium in the new Pool, but for the most part,
      aside from having a new JobId, it will appear with all the
      same characteristics of the original job (start, end time, ...).
      The column RealEndTime in the catalog Job table will contain the
      time and date that the Migration terminated, and by comparing
      it with the EndTime column you can tell whether or not the
      job was migrated. Also, the Job table contains a PriorJobId
      column which is set to the original JobId for migration jobs. For
      non-migration jobs this column is zero.

\item After a Job has been migrated, the File records are purged from
      the original Job. Moreover, the Type of the original Job is
      changed from "B" (backup) to "M" (migrated), and another Type
      "B" job record is added which refers to the new location of the
      data. Since the original Job record stays in the bareos catalog,
      it is still possible to restore from the old media by specifying
      the original JobId for the restore. However, no file selection is
      possible in this case, so one can only restore {\bf all} files
      this way.

\item A Job will be migrated only if all Volumes on which the job
      is stored are marked Full, Used, or Error. In particular, Volumes
      marked Append will not be considered for migration which rules out
      the possibility that new files are appended to a migrated Volume.
      This policy also prevents deadlock situations, like attempting to
      read and write the same Volume from two jobs at the same time.

\item Migration works only if the Job resource of the original Job is
      still defined in the current director configuration. Otherwise
      you'll get a fatal error.

\item As noted above, for the Migration High Bytes, the calculation
      of the bytes to migrate is somewhat approximate.

\item If you keep Volumes of different Media Types in the same Pool,
      it is not clear how well migration will work.  We recommend only
      one \linkResourceDirective{Dir}{Storage}{Media Type} per pool.

\item It is possible to get into a resource deadlock where Bareos does
      not find enough drives to simultaneously read and write all the
      Volumes needed to do Migrations. For the moment, you must take
      care as all the resource deadlock algorithms are not yet implemented.

\item Setting the Migration High Bytes watermark is not sufficient
      for migration to take place. In addition, you must define
      and schedule a migration job which looks for jobs that can
      be migrated.

\item If you migrate a number of Volumes, a very large number of Migration
      jobs may start.

\item To figure out which jobs are going to be migrated by a given
      configuration, choose a debug level of 100 or more. This
      activates information about the migration selection process.

\item Bareos currently does only minimal Storage conflict resolution, so you
      must take care to ensure that you don't try to read and write to the
      same device or Bareos may block waiting to reserve a drive that it
      will never find. In general, ensure that all your migration
      pools contain only one \linkResourceDirective{Dir}{Storage}{Media Type},
      and that you always
      migrate to pools with different Media Types.

\item The \linkResourceDirective{Dir}{Pool}{Next Pool} = ... directive must be defined in the Pool
     referenced in the Migration Job to define the Pool into which the
     data will be migrated.

\item Migration has only be tested carefully for the "Job" and "Volume"
      selection types. All other selection types (time, occupancy,
      smallest, oldest, ...) are experimental features.

\end{itemize}

\section{Configure Copy or Migration Jobs}

The following directives can be used to define a Copy or Migration job:

\paragraph{Job Resource}

\begin{itemize}
    \item \linkResourceDirective{Dir}{Job}{Type} = Migrate{\textbar}Copy
    \item \linkResourceDirective{Dir}{Job}{Selection Type}
        \item \linkResourceDirective{Dir}{Job}{Selection Pattern}
    \item \linkResourceDirective{Dir}{Job}{Pool} \\
        For \linkResourceDirective{Dir}{Job}{Selection Type} other than SQLQuery, 
        this defines what Pool will be examined for finding JobIds to migrate
    \item \linkResourceDirective{Dir}{Job}{Purge Migration Job}
\end{itemize}

\paragraph{Pool Resource}

\begin{itemize}
    \item \linkResourceDirective{Dir}{Pool}{Next Pool} \\
        to what pool Jobs will be migrated
    \item \linkResourceDirective{Dir}{Pool}{Migration Time} \\
        if \linkResourceDirective{Dir}{Job}{Selection Type} = PoolTime
    \item \linkResourceDirective{Dir}{Pool}{Migration High Bytes} \\
        if \linkResourceDirective{Dir}{Job}{Selection Type} = PoolOccupancy
    \item \linkResourceDirective{Dir}{Pool}{Migration Low Bytes} \\
        optional if \linkResourceDirective{Dir}{Job}{Selection Type} = PoolOccupancy is used
    \item \linkResourceDirective{Dir}{Pool}{Storage} \\
        if Copy/Migration involves multiple Storage Daemon, see \nameref{sec:CopyMigrationJobsMultipleStorageDaemons}
\end{itemize}


\subsection{Example Migration Jobs}
\index[general]{Example!Migration Jobs}

Assume a simple configuration with a single backup job as described
below.

\begin{bconfig}{Backup Job}
# Define the backup Job
Job {
  Name = "NightlySave"
  Type = Backup
  Level = Incremental                 # default
  Client=rufus-fd
  FileSet="Full Set"
  Schedule = "WeeklyCycle"
  Messages = Standard
  Pool = Default
}

# Default pool definition
Pool {
  Name = Default
  Pool Type = Backup
  AutoPrune = yes
  Recycle = yes
  Next Pool = Tape
  Storage = File
  LabelFormat = "File"
}

# Tape pool definition
Pool {
  Name = Tape
  Pool Type = Backup
  AutoPrune = yes
  Recycle = yes
  Storage = DLTDrive
}

# Definition of File storage device
Storage {
  Name = File
  Address = rufus
  Password = "secret"
  Device = "File"          # same as Device in Storage daemon
  Media Type = File        # same as MediaType in Storage daemon
}

# Definition of DLT tape storage device
Storage {
  Name = DLTDrive
  Address = rufus
  Password = "secret"
  Device = "HP DLT 80"      # same as Device in Storage daemon
  Media Type = DLT8000      # same as MediaType in Storage daemon
}
\end{bconfig}

Note that the backup job writes to the \pool{Default} pool, which
corresponds to \resourcename{File} storage. There is no
\linkResourceDirective{Dir}{Pool}{Storage} directive
in the Job resource while the two \configresource{Pool} resources contain
different \linkResourceDirective{Dir}{Pool}{Storage} directives.
Moreover, the \pool{Default}
pool contains a \linkResourceDirective{Dir}{Pool}{Next Pool} directive
that refers to the \pool{Tape} pool.

In order to migrate jobs from the \pool{Default} pool to the \pool{Tape} pool
we add the following Job resource:

\begin{bconfig}{migrate all volumes of a pool}
Job {
  Name = "migrate-volume"
  Type = Migrate
  Messages = Standard
  Pool = Default
  Selection Type = Volume
  Selection Pattern = "."
}
\end{bconfig}

The \linkResourceDirective{Dir}{Job}{Selection Type} and
\linkResourceDirective{Dir}{Job}{Selection Pattern} directives
instruct Bareos to select all volumes of the given pool (\pool{Default})
whose volume names match the given regular expression (\argument{"."}),
i.e., all volumes. Hence those jobs which were backed up to any volume
in the \pool{Default} pool will be migrated. Because of the
\linkResourceDirective{Dir}{Pool}{Next Pool} directive
of the \pool{Default} pool resource, the jobs will be
migrated to tape storage.

Another way to accomplish the same is the following Job resource:

\begin{bconfig}{migrate all jobs named *Save}
Job {
  Name = "migrate"
  Type = Migrate
  Messages = Standard
  Pool = Default
  Selection Type = Job
  Selection Pattern = ".*Save"
}
\end{bconfig}

This migrates all jobs ending with \argument{Save} from the \pool{Default}
pool to the \pool{Tape} pool, i.e., from File storage to Tape storage.

\subsubsection{Multiple Storage Daemons}
    \label{sec:CopyMigrationJobsMultipleStorageDaemons}

Beginning from Bareos \sinceVersion{dir}{Copy and Migration Jobs between different Storage Daemons}{13.2.0}, 
Migration and Copy jobs are also possible from one Storage daemon to another Storage Daemon.

Please note:
\begin{itemize}
 \item the director must have two different storage resources configured (e.g. storage1 and storage2)
    \item each storage needs an own device and an individual pool (e.g. pool1, pool2)
    \item each pool is linked to its own storage via the storage directive in the pool resource
    \item to configure the migration from pool1 to pool2, the \linkResourceDirective{Dir}{Pool}{Next Pool} directive of pool1 has to point to pool2
    \item the copy job itself has to be of type copy/migrate (exactly as already known in copy- and migration jobs)
\end{itemize}

Example:

\begin{bconfig}{bareos-dir.conf: Copy Job between different Storage Daemons}
#bareos-dir.conf

# Fake fileset for copy jobs
Fileset {
  Name = None
  Include {
    Options {
      signature = MD5
    }
  }
}

# Fake client for copy jobs
Client {
  Name = None
  Address = localhost
  Password = "NoNe"
  Catalog = MyCatalog
}

# Source storage for migration
Storage {
   Name = storage1
   Address = sd1.example.com
   Password = "secret1"
   Device = File1
   Media Type = File
}

# Target storage for migration
Storage {
   Name = storage2
   Address = sd2.example.com
   Password = "secret2"
   Device = File2
   Media Type = File2   # Has to be different than in storage1
}

Pool {
   Name = pool1
   Storage = storage1
   Next Pool = pool2    # This points to the target storage
}

Pool {
   Name = pool2
   Storage = storage2
}

Job {
   Name = CopyToRemote
   Type = Copy
   Messages = Standard
   Selection Type = PoolUncopiedJobs
   Spool Data = Yes
   Pool = pool1
}
\end{bconfig}