Skip to content
This repository has been archived by the owner on Jan 8, 2019. It is now read-only.

Commit

Permalink
Merge pull request #1226 from ronny-macmaster/backup
Browse files Browse the repository at this point in the history
Backup
  • Loading branch information
Andrew Jones committed Jul 5, 2018
2 parents 57395a2 + ebff19b commit ce48d45
Show file tree
Hide file tree
Showing 34 changed files with 1,271 additions and 1 deletion.
2 changes: 2 additions & 0 deletions Berksfile
Expand Up @@ -11,10 +11,12 @@ solver ENV.fetch('BERKS_SOLVER', :gecode)
#
# Local cookbooks, inside our repository.
#
cookbook 'bach_backup', path: './cookbooks/bach_backup'
cookbook 'bach_common', path: './cookbooks/bach_common'
cookbook 'bach_krb5', path: './cookbooks/bach_krb5'
cookbook 'bach_repository', path: './cookbooks/bach_repository'
cookbook 'bach_spark', path: './cookbooks/bach_spark'
cookbook 'backup', path: './cookbooks/backup'
cookbook 'bcpc', path: './cookbooks/bcpc'
cookbook 'bcpc-hadoop', path: './cookbooks/bcpc-hadoop'
cookbook 'bcpc_jmxtrans', path: './cookbooks/bcpc_jmxtrans'
Expand Down
9 changes: 9 additions & 0 deletions cookbooks/bach_backup/README.md
@@ -0,0 +1,9 @@
# bach_backup

`bach_backup` is a wrapper cookbook to configure HDFS backups for BACH clusters.
It overrides some attributes in the `backup` library cookbook for use on BACH clusters.

# Managing Backup Jobs
The Backup jobs schedules can be managed through the jobs.yml YAML files.

* [hdfs](files/default/hdfs/jobs.yml)
62 changes: 62 additions & 0 deletions cookbooks/bach_backup/attributes/default.rb
@@ -0,0 +1,62 @@
# Cookbook Name:: bach_backup_wrapper
# Override Attributes
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## override global backup properties
force_default[:backup][:user] = "bach_backup"
force_default[:backup][:root] = "/archive"
force_default[:backup][:local][:root] = "/etc/archive"


# storage cluster
set_hosts # set bcpc hadoop hosts
p force_default[:backup][:namenode] = node[:bcpc][:hadoop][:hdfs_url]
p force_default[:backup][:jobtracker] = node[:bcpc][:hadoop][:rm_address]
p force_default[:backup][:oozie] = node[:bcpc][:hadoop][:oozie_url]

# Mapreduce Queue
force_default[:backup][:queue] = "root.default.#{node[:backup][:user]}"

# hdfs backup jobs list
## NOTE: refer to file/default/hdfs_jobs.yml for proper data scheme.
## force_default[:backup][:hdfs][:schedules] = YAML.load_file(File.join(
## Chef::Config[:file_cache_path],
## 'cookbooks',
## 'bach_backup',
## 'files/default/hdfs/jobs.yml'
## ))
### force_default[:backup][:hdfs][:groups] = node[:backup][:hdfs][:jobs].keys

force_default[:backup][:hdfs][:schedules] = {
hdfs: {
hdfs: 'hdfs://Test-Laptop',
start: '2018-02-16T12:00Z',
end: '2018-06-16T06:00Z',
jobs: [
{ path: '/tmp', period: 360, },
{ path: '/user', period: 480, },
]
},
ubuntu: {
hdfs: 'hdfs://Test-Laptop',
start: '2018-02-16T12:00Z',
end: '2018-06-16T06:00Z',
jobs: [
{ path: '/tmp', period: 1440, },
{ path: '/user', period: 720, },
]
},
}
34 changes: 34 additions & 0 deletions cookbooks/bach_backup/files/default/hdfs/jobs.yml
@@ -0,0 +1,34 @@
# hdfs_jobs.yaml
# YAML model of HDFS backup jobs.
# All keys should be preceded with a ':' s.t. they're read as ruby symbols.
#
# :group_name:
# :hdfs: valid hdfs source uri (string)
# :start: ISO datetime start (string)
# :end: ISO datetime end (string)
# :jobs:
# - :path: hdfs path to backup target (string)
# :hdfs: (OPTIONAL) override top-level hdfs uri (string)
# :period: period in minutes between backup actions (integer)
---
:hdfs:
:hdfs: 'hdfs://Test-Laptop'
:start: '2018-02-16T12:00Z'
:end: '2018-06-16T06:00Z'
:jobs:
- :path: '/tmp'
:period: 360
- :path: '/user'
:hdfs: 'hdfs://Test-Laptop'
:period: 480

:ubuntu:
:hdfs: 'hdfs://Test-Laptop'
:start: '2018-02-16T08:00Z'
:end: '2018-06-16T08:00Z'
:jobs:
- :path: '/user'
:period: 720
- :path: '/tmp'
:period: 1440

28 changes: 28 additions & 0 deletions cookbooks/bach_backup/metadata.rb
@@ -0,0 +1,28 @@
# Cookbook Name:: bach_backup_wrapper
# metadata.rb
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# encoding: utf-8
name 'bach_backup'
maintainer 'Bloomberg Finance L.P.'
maintainer_email 'hadoop@bloomberg.net'
description 'Overrides the default attributes in the backup cookbook.'
license 'Apache 2.0'
long_description IO.read(File.join(File.dirname(__FILE__), 'README.md'))
version '0.1.0'

# dependencies
depends 'bcpc-hadoop' # needed for kerberos authentication
31 changes: 31 additions & 0 deletions cookbooks/bach_backup/recipes/default.rb
@@ -0,0 +1,31 @@
# Cookbook Name:: bach_backup_wrapper
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Resources here are run at compile time.
# This is necessary to avoid errors in bcpc-hadoop's resource search.

backup_user = node[:backup][:user]

# create hdfs home
execute 'hdfs home for backup service' do
command "hdfs dfs -mkdir -p /user/#{backup_user}"
user 'hdfs'
end

execute 'chown hdfs home for backup service' do
command "hdfs dfs -chown #{backup_user} /user/#{backup_user}"
user 'hdfs'
end
8 changes: 8 additions & 0 deletions cookbooks/backup/README.md
@@ -0,0 +1,8 @@
# backup

`backup` is a chef cookbook to setup periodic HDFS inter-cluster backups.

The backup service regularly schedules HDFS distcp actions from source to backup cluster.
Distcps are run periodically using oozie coordinators and workflows.

## FUTURE: HBase, Hive, and Phoenix backups
30 changes: 30 additions & 0 deletions cookbooks/backup/attributes/default.rb
@@ -0,0 +1,30 @@
# Cookbook Name:: backup
# Default Attributes
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## global backup properties
default[:backup][:user] = "backup"
default[:backup][:root] = "/backup"
default[:backup][:local][:root] = "/etc/backup"

# list of enabled backup services
default[:backup][:services] = [:hdfs]

# storage cluster
default[:backup][:namenode] = "hdfs://localhost:9000"
default[:backup][:jobtracker] = "localhost:8032"
default[:backup][:oozie] = "http://localhost:11000/oozie"
default[:backup][:queue] = "default"
43 changes: 43 additions & 0 deletions cookbooks/backup/attributes/hdfs.rb
@@ -0,0 +1,43 @@
# Cookbook Name:: backup
# HDFS Backup Attributes
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

### hdfs backups
default[:backup][:hdfs][:user] = node[:backup][:user]
default[:backup][:hdfs][:root] = "#{node[:backup][:root]}/hdfs"
default[:backup][:hdfs][:local][:root] = "#{node[:backup][:local][:root]}/hdfs"

# local oozie config dir
default[:backup][:hdfs][:local][:oozie] =
"#{node[:backup][:hdfs][:local][:root]}/oozie"

## hdfs backup tuning parameters
# timeout in minutes before aborting distcp request
default[:backup][:hdfs][:timeout] = -1

# bandlimit in MB/s per mapper
default[:backup][:hdfs][:mapper][:bandwidth] = 25

### hdfs backup requests
default[:backup][:hdfs][:schedules] = {}

## NOTE: refer to files/default/hdfs/jobs.yml for the proper data scheme.
# default[:backup][:hdfs][:schedules] = YAML.load_file(File.join(
# Chef::Config[:file_cache_path],
# 'cookbooks',
# 'backup',
# 'files/default/hdfs/jobs.yml'
# ))
24 changes: 24 additions & 0 deletions cookbooks/backup/files/default/hdfs/jobs.yml
@@ -0,0 +1,24 @@
# hdfs_jobs.yaml
# YAML model of backup jobs.
# Disclaimer: This is an example.
# All keys should be preceded with a ':' s.t. they're read as ruby symbols.
#
# :group_name:
# :hdfs: valid hdfs source uri (string)
# :start: ISO datetime start (string)
# :end: ISO datetime end (string)
# :jobs:
# - :path: hdfs path to backup target (string)
# :hdfs: (OPTIONAL) override top-level hdfs uri (string)
# :period: period in minutes between backup actions (integer)
---
:group:
:hdfs: 'hdfs://localhost:9000'
:start: '2018-01-01T12:00Z'
:end: '2019-12-25T06:00Z'
:jobs:
- :path: '/tmp'
:period: 360
- :path: '/user'
:hdfs: 'hdfs://localhost:9000' # optional override
:period: 480
101 changes: 101 additions & 0 deletions cookbooks/backup/libraries/oozie_client.rb
@@ -0,0 +1,101 @@
# oozie_client.rb
# ruby client for managing oozie jobs
#
# Copyright 2018, Bloomberg Finance L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

module Oozie
class ClientV1
attr_accessor :oozie_url, :user

def initialize(oozie_url='http://localhost:11000/oozie', user='oozie')
@oozie = oozie_url
@user = user
end

# list the oozie jobs on the server.
def jobs(filter={}, jobtype='workflow', len=10)
execute('jobs', user, {
oozie: @oozie,
jobtype: jobtype,
filter: "\"#{filter_string(filter)}\"",
len: len
})
end

# kill the oozie jobs the match the filter.
def kill_jobs(filter={}, jobtype='workflow', len=1000)
execute('jobs', user, {
oozie: @oozie,
jobtype: jobtype,
filter: "\"#{filter_string(filter)}\"",
len: len,
kill: nil
})
end

# run an oozie job.
def run(config, user=@user)
execute('job', user, {
oozie: @oozie,
config: config,
run: nil
})
end

# rerun an oozie action.
def rerun(action_id, config, user=@user)
execute('job', user, {
oozie: @oozie,
config: config,
rerun: action_id
})
end

# kill an oozie job with the given id.
def kill(job_id, user=@user)
execute('job', user, {
oozie: @oozie,
kill: job_id
})
end

# get the id of an oozie job with the given name (if it exists).
def get_id(job_name, jobtype='workflow', status='RUNNING')
jobs_cmd = jobs({ name: job_name, status: status }, 'coordinator', 1)
match = jobs_cmd.stdout.match(/(\S+)\s+#{job_name}/)
return match.nil? ? nil : match[1]
end

private ## private methods

def execute(subcommand, user=@user, options={})
require 'mixlib/shellout'
command = "oozie #{subcommand} #{options_string(options)}"
# puts command ## print debug command
return Mixlib::ShellOut.new(command, user: user, timeout: 90).run_command
end

def options_string(options)
options.map { |key, value| "-#{key.to_s} #{value}" }.join(' ')
end

def filter_string(filter)
filter.map { |key, value| "#{key.to_s}=#{value}" }.join(';')
end
end

class Client < ClientV1
end
end

0 comments on commit ce48d45

Please sign in to comment.