Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

summit 2012 agenda

Adam Spiers edited this page · 3 revisions
Clone this wiki locally

The agenda for the Design Summit is as follows:

• 8:30 (45 mins): Crowbar 2.0 Challenges & Objectives • 9:15 (30 mins): Packaging and Delivery • 9:45 (15 mins) Break • 10:00 (30 mins): Networking • 10:30 (30 mins): Data Representation and Framework • 11:00 (30 mins): Next Steps, Future meetings & work assignments • 11:30 official content ends & lunch • 12:00 (30 mins) breakout sessions based on discussions above, TBD during summit • 12:30 (30 mins) breakout sessions based on discussions above, TBD during summit • 1:00 (30 mins) breakout sessions based on discussions above, TBD during summit • 1:30 (30 mins) breakout sessions based on discussions above, TBD during summit • 2:00 meeting ends

Intro

  1. Legals
    • Friend DA – we have not published Summit attendees names – you can disclose (please do!), do not discose others.
    • All other content from Summit will be public
    • Contributor agreements are needed if you want to add
  2. Format
    • Discussion, not review
    • Limit time on “big idea” focus on changes to enable use case
    • Follow-up w/ open weekly sync meetings
  3. Design objectives
    • Get back to working stable
    • Incremental architecture
    • Automation & testing
    • Open development
  4. Background
    • why refactor? early assumptions served us well
  5. Use Case Approach

Packaging & Delivery

  • Pull From Source vs Online Mode
    • online mode lets you referene the original repos
      • this is required for heterogeneous
      • begining path is to have the admin node be the "center of the local universe"
      • current implementation is through caching proxies
      • how is the proxy caching staged?
        • we could provide validated caches so that people would not have to "go to the bleed edge Internet" - not committed
        • customers usually have a validated dev, flowing to production
      • SuSE is using SMT (subscription management tool) - allows users to pull packages from a local cache and build a dev testing cache
        • already integrated into their version of Crowbar
        • this is missing from other distributions, so we need to figure it out
    • design would be that there are multiple Crowbar instances so we could upgrade Crowbar
    • why install from source
      • it's a dev concern if you are making changes that are not in packages (testing)
      • production for changes that are non gated or merged
      • could we use an RPM generator? this is something to consider - matches current model
      • packages do things that are more than just code - we need to consider that
    • delivery of Crowbar
      • big ISO is still needed for offline install, but proxy cache can be pre-populated
      • people want barclamps setup as packages
      • FPM could scale to more distributions (https://github.com/jordansissel/fpm/)
      • we don't want a HUGE RPM - we need to have a barclamp RPM & barclamp packages RPM (install or not)
      • RPM to populate the cache
    • SUSE has packaged crowbar along repo lines, e.g. there is a crowbar package and crowbar-barclamp-* packages

Heterogeneous OS

* this needs to be for "non-Crowbar" OS like Solaris, ESXi, Windows, Switches...
* Opscode has a new tool in development called metarepo that supports apt and yum and builds versioned repos based on mirrors or files and tracks changes over time
      <https://github.com/adamhjk/metarepo>
* how  do we make baremetal as similar to cloud as possible
* why not use Cobbler - does not use autodiscovery, does not using everything
* why Razor (GPL) - wanted provisioning to be more decoupled than Crowbar
   * auto-inventory, PXE boots a micro OS in RAM that runs factor to populate an inventory database
   * tag rules get passed to a config rules
   * config rules tell the system how to configure/model the system based on the tags
   * important feature was to make sure it was not coupled to Puppet
*   (GPL)
   * not using Cobbler
   * needed managing of DHCP &amp; kickstart
   * user can define burn in tests and other things
   * node gets commissions and shuts off
* Dreamhost is using Nephology (BSD) - is a flat bare metal provisioning system.  many-to-many.  very decoupled from everything, not monolithic
    * <https://github.com/edolnx/nephology-server-perl>
    * <https://github.com/edolnx/nephology-client>
    * <https://github.com/puppetlabs/razor>
    * boots into an iPXE image that sends information
    * configuraiton comes from provisioning server
    * matches DMI information that's needed
  • It looks like everyone is doing very similar activities (Rackspace & HP have similar tools too)
  • Crowbar uses Sledgehammer
    • we need to be able to run tools from it for configuration
    • we have a version that does both ohhi and factor
    • waits in state for further action (could be expanded to power down instead of hold)
    • deployer barclamp manages this process, could have a rules engine
  • Late binding is very important for our desire of this process
  • Could a design goal for Crowbar to be pluggable?
    • Crowbar does not distingish between RAID configuration and app configuration
    • the controll of the sledgehammer pieces is integrated into the orchestration

Networking

  • Crowbar networking is confusing
    • Admin network is set the the time of the time of install
    • Fixing networking requires really looking at the network barclamp
      • we need better modeling of the data
      • this is carried w/ the database refactoring
    • 1st class API around creating and removing networks
    • not trying to fix that admin network is a high stakes config, more trying to decouple the rest of the network from the admin config

Data Representation & Framework (CMDB agnostic)

  • how tightly coupled are the modules to each other?
    • are attributes injected, assumed, requested?
    • APIs are important - classes that search for dependencies
    • sometimes you have to export things to the database ("secondary class")
    • orchestration class is at the heart of coordinating between deploy scripts
  • crowbar is acting as the node DB - it's the wrapper that starts the chain
  • we get feedback from community that our recipes are not externally consumable because of Crowbarisms
    • we put stuff into the recipes that link cookbooks together
    • network configuration is the biggest offender - if you want a network, you have to poke into another barclamp
  • we expect that we'll inject attributes into the modules/recipes
    • e.g.: cassandra script that has the networking pushed in as attributes
  • this transcendes search
    • search will still work, but...
    • crowbar takes an active role of configuraiton (we don't use convergence)
    • crowbar pushes the configuration, does not wait
  • orchestration is not convergence - order and operation matters
    • we don't rely on the converengce mechanisms, we push it and test it
  • modules/recipes would be 100% attribute driven
  • we want to use upstreams
    • we are responsible for pushing upstream where the parameterization was not done correctly
    • e.g.: you can't have "ip_address" as the way to code networking, need to be more generic
    • most of the CMDBs have a way to create resonable defaults, so we can deal w/ the lack of attributes
    • we see that there is a community drive to do this
    • in bound and out bound attributes so that you know where to find the API end-points
  • we talk about "API" but that may just be that we're talking about an agreed namespace
    • we need to formalize how we give input/output
    • the CMDBs don't care about this data is shared
  • A barclamp should know the information it needs for it's "immediate decisions"
    • we are trying to limit the amount of data that is pulled out and stored
    • we are starting from none, adding more
    • barclamp defines what data it needs (that's pretty straight forward)
    • this goes back and forth - barclamp need both an injector and an abstractor
  • the generic facts in Puppet - are they replicated into Crowbar?
    • most of the facts (especially the common ones) would be handled by the abstraction
    • "data of intent" would be the barclamp information, not the "data of reality"
    • the monitoring views in Crowbar are supposed to be tied to reality

Concluding Action Items

  • Baremetal Bringup
    • create an abstraction layer for provisioning [Morphlabs]
    • API for machine provisioning (Nova API?)
    • take on provising of non-linux operating systems [Morphlabs]
  • Networking Abstractions
  • Online / Staged Changes / Proxy [SUSE]
  • Pull from Source
    • consolidate of modules
  • CMDB abstraction [Morphlabs/Puppet Labs/enStratus] [SUSE]
  • Process related issues [SUSE]
  • API definition (places where & how to define)
  • Rails3 & DB migration
  • Security
  • Proxy nodes
  • Operational intergration.

Parking Lots

  • Pluggable Monitoring
    • Allowing changing over to Zenoss
  • Security
    • Access management
    • Authentization
    • Missing Security Audit
  • Stable API for Crowbar
    • Hardware API that looks like a cloud
    • Use of the Nova API
    • What should the API do? What are some of the use case
    • Event Queue based notification
  • Process
    • Dev Tool issues that prevent working in the Suse org. This is a very important issue for SUSE and will certainly be an inhibitor for other orgs joining the project too.
    • built in-testing/CI
    • github & issue tracker
      • with mailing lists alone, issues cannot be tracked properly, e.g. which issues are resolved, with are still outstanding, who is working on the issues, etc.
    • Documentation not being in sources
      • Use Textinfo or Docbook?
    • FHS compliance - log files are not in a consistent place.
      • Logs are considered variable data and should not be under /opt
      • /updates is not a permitted top-level directory
      • files like bc-network.json should be under /etc/opt not /opt
    • Coding standards, not documented / not complied with
      • Coding standards doc should require all commits contain corresponding doc changes
  • Operational Tooling
    • no good ways to tie Nagios into a warning system
    • change the log monitoring to move to external or roll into barclamp
    • really up the operational caliber of the running system
  • Event Queue
Something went wrong with that request. Please try again.