Skip to content

Latest commit

 

History

History
812 lines (707 loc) · 29.3 KB

units.rst

File metadata and controls

812 lines (707 loc) · 29.3 KB

Units in Technology Section - Spring 2016

Projects on Big Data Software course introduces lessons in two sections: Theory and Technology. Units in Technology are listed in this page. To navigate Theory Units, syllabus and discussions, please use the course site scholargrid.org.

Schedule for Units in Technology Section

Schedule Section on Technologies
Topic Due
Gaining Access to FutureSystems and Core Technologies 01/25
.. comment::

   * - **The Basics of OpenStack**
     -
   * - **Cloudmesh - Cloud Management Software**
     -
   * - **IT Operations - Automation and Orchestration**
     -
   * - **Virtual Clusters I (First Appearance of Hadoop)**
     -
   * - **Virtual Clusters II (Composite Cluster with Sub-Clusters)**
     -
   * - **Other Technologies**
     -

:ref:`System Notice <ref-class-notice>`

In this unit, you will learn how to gain access to the FutureSystems resources. It includes the portal account creation, class project participation, SSH key generation and login node access. Some of other lessons have been prepared for the beginners to help understand the basics of Linux operating systems and the collaboration tools i.e. GitHub, Google Hangout and Remote Desktop. Please watch video lessons and read through web contents.

Collaboration Tools
Topic Video Text
Overview and Introduction 16 mins 10 mins
Google
  • Google+, Hangout, Remote Desktop
4 mins 15 mins
GitHub 18 mins 30 mins
System Access to FutureSystems
Topic Video Text
ssh-keygen 4 mins 10 mins
Account Creation 12 mins 10 mins
Remote Login 6 mins 10 mins
Putty for Windows 11 mins 10 mins
Linux Basics
Topic Video Text
Overview and Introduction 4 mins 5 mins
Shell Scripting 15 mins 30 mins
Editors
  • Emacs, vi, and nano
5 mins 30 mins
Python
  • virtualenv, Pypi
27 mins 1 hour
Package Managers
  • yum, apt-get, and brew
3 mins 10 mins
Advanced SSH
  • SSH Config and Tunnel
3 mins 20 mins
Modules 3 mins 10 mins

Note

Find an editor that you will be using to do your programming with. For advanced Python programming we recommend PyCharm. However you can use others e.g. Enthought Canopy on your local computer. The way you could use it is to edit python locally, push the code into github and check it out into your vm or your login node on india.futuresystems.org. This is how many of us work.

  • Total of video lessons: 2 hours
  • Total of study materials: 4 hours and 30 minutes
Get Ready for FutureSystems and Warm-Up
Topic Description
Start with Account, Github and Python 9 tasks
.. comment::

        Unit 2
        -------------------------------------------------------------------------------

        Introduction to OpenStack and Public Clouds
        *******************************************************************************

        OpenStack is a open-source cloud computing software platform and a
        community-driven project. You can use OpenStack to build a cloud infrastructure
        in your public or private network, or you can simply use cloud software for
        your services. The lessons in this week are specifically prepared to try
        OpenStack Software and give you the confidence and understanding of using IaaS
        cloud platforms. There are tutorial lessons to explore OpenStack web dashboard
        (Horizon) and compute engine (Nova) including Public Clouds e.g. Amazon EC2 or
        Microsoft Azure.

        .. list-table:: Basics of OpenStack
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Introduction and Overview**
             - `12 mins <https://mix.office.com/watch/u7uovy9i06jo>`_
             - `10 mins <lesson/iaas/overview_openstack.html>`_
             -
             - 03/30
             -
           * - **OpenStack for Beginners**
             - `27 mins <https://mix.office.com/watch/1r7zifdtjoa6j>`_
             -
             -
             - 03/30
             -
           * - -- Compute Engine (Nova)
             -
             - `1 hour <lesson/iaas/openstack.html>`_
             - `30 mins <lesson/iaas/openstack.html#exercises>`_
             - 03/30
             - 04/10
           * - -- Web Dashboard (Horizon)
             -
             - `15 mins <lesson/iaas/openstack_horizon.html>`_
             - `15 mins <lesson/iaas/openstack_horizon.html#exercises>`_
             - 03/30
             - 04/10
           * - **Storage (Swift)**
             - `3 mins <https://mix.office.com/watch/w3rko4itecgc>`_
             - `10 mins <lesson/iaas/openstack.html#swift-storage>`_
             -
             - 03/30
             -
           * - **Network (Neutron)**
             - `3 mins <https://mix.office.com/watch/1dt5hp0e2grov>`_
             - `10 mins <lesson/iaas/openstack.html#neutron-network>`_
             -
             - 03/30
             -
           * - **Introduction to OpenStack Juno Release**
             - `2 mins <https://mix.office.com/watch/cz6xehrs9xor>`_
             - `10 mins <lesson/iaas/openstack_juno.html>`_
             -
             - 03/30
             -

        .. list-table:: Other IaaS Platforms - Public Commercial Clouds
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Amazon Web Services (AWS)**
             - `16 mins <https://mix.office.com/watch/1351hz8j187i7>`_
             - `30 mins <lesson/iaas/aws_tutorial.html>`_
             - `45 mins <lesson/iaas/aws_tutorial.html#exercises>`_
               (optional, not required)
             - 03/30
             -
           * - **Microsoft Azure**
             - `29 mins <https://mix.office.com/watch/kzh0nwvdw6tm>`_
             - `50 mins <lesson/iaas/azure_tutorial.html>`_
             - `10 mins <lesson/iaas/azure_tutorial.html#exercise1>`_
               (optional, not required)
             - 03/30
             -

        .. list-table:: Additional (optional) Further Study Materials
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **OpenStack for Beginners**
                 - Compute Engine (Nova)
             -
             - `2 hours <../../iaas/index.html>`_
             - `50 mins <../../iaas/openstack.html#exercises>`_
             - Not due
             - Not due
           * - **Other IaaS Platforms**
                - Public Commercial Clouds
                     - Microsoft Azure
             -
             -
             - `50 mins <lesson/iaas/azure_tutorial.html#exercise2>`_
             - Not due
             - Not due

        Length of the lessons in Unit 2
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        * Total of video lessons: 1 hour and 30 minutes
        * Total of study materials: 3 hours and 15 minutes
        * Total of lab sessions: 1 hours 40 minutes


        Unit 3
        -------------------------------------------------------------------------------


        Cloudmesh - Cloud Management Software
        *******************************************************************************

        Cloudmesh is a cloud resource management software written in Python. It
        automates launching multiple VM instances across different cloud platforms
        including Amazon EC2, Microsoft Azure Virtual Machine, HP Cloud, OpenStack, and
        Eucalyptus. The web interface of Cloudmesh helps users and administrators
        manage entire cloud resources with the most cutting-edge technologies such as
        Apache LibCloud, Celery, IPython, Flask, Fabric, Docopt, YAML, MongoDB, and
        Sphinx. Command Line Tools and Rest APIs are also supported.

        .. list-table:: Basics of Cloudmesh
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Introduction and Overview**
             - `29 mins <http://www.youtube.com/watch?v=njHHjRMb7V8>`_
             - `30 mins <../../cloudmesh/overview.html>`_
             -
             - 04/06
             - Not due

        .. list-table:: Cloudmesh for Beginners
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Installation on a local machine**
             - `18 mins <http://www.youtube.com/watch?v=lGiJifD0VgU>`_
             - `30 mins <../../cloudmesh/setup/quickstart.html>`_
             - (not required, only read the text and watch the video)
             - 04/06
             - N/A
           * - **Installation on a virtual machine OpenStack**
             - `33 mins <http://www.youtube.com/watch?v=rcecpgm-47g>`_
             - `30 mins <../../cloudmesh/setup/setup_openstack.html>`_
             - follow the text and video
             - 04/06
             - 04/17
           * - **Command Line Tools (CLI)**
             - `12 mins <http://www.youtube.com/watch?v=hdq-t-ggkXA>`_
             - `30 mins <../../cloudmesh/shell/index.html>`_
             - use the previously created VM and follow text and video
               use `cm help` and review man pages
             - 04/06
             - 04/17
           * - **Web Interface (GUI)**
             - `16 mins <http://www.youtube.com/watch?v=l_P4G85rysA>`_
             - `30 mins <../../cloudmesh/gui/index.html>`_
             - `Excersise 4: 20 mins <../../cloudmesh/api/exercises.html#exercise-4>`_ (optional)
             - 04/06
             - 04/17
           * - **Python APIs**
             - `15 mins <http://www.youtube.com/watch?v=xOL_-Sfh9MA>`_
             - `30 mins <../../cloudmesh/api/index.html>`_
             - `Excersise 1 (10 mins) <../../cloudmesh/api/exercises.html#exercise-1>`_, `Excersise 2 (10 mins) <../../cloudmesh/api/exercises.html#exercise-2>`_
             - 04/06
             - 04/17
           * - **IPython on Cloudmesh** (optional)
             - `15 mins <http://www.youtube.com/watch?v=1dn_av-zC00>`_
             - `20 mins <../../cloudmesh/ipython.html>`_
             -  (not required, only read text and watch video)
             - 04/06
             - N/A





        .. list-table:: Advanced Cloudmesh
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Adding new Commands via a Python Package**
             - `5 mins <https://www.youtube.com/watch?v=UFLyCVpDhgI&feature=em-upload_owner>`_
             - `5 mins <http://cloudmesh.github.io/cmd3/manual.html#generating-independent-packages>`_
             - `1 hour <../../cloudmesh/cm/cmd3.html#exercise-1>`_
             - 04/06
             - 04/17
           * - **Virtual Clusters with Cloudmesh**
                - SSH Connections between nodes, Host Configuration
             - `5 mins <https://mix.office.com/watch/lk39mr08k0ox>`_
             - `20 mins <../../cloudmesh/cm/_cm-cluster.html>`_
             - see text and video
             - 04/06
             - 04/17

        ..   * - **Introduction and Overview**
             - Not yet available
             - Not yet available
             -
             - 04/06
             - 04/10
           * - **VM Management**
             - Not yet available
             - Not yet available
             - see text and video
             - 04/06
             - 04/10

        Length of the lessons in Unit 3
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        * Total of video lessons: 2 hours and 33 minutes
        * Total of study materials: 4 hours and 15 minutes
        * Total of lab sessions: 1 hour and 30 minutes

        Unit 4
        -------------------------------------------------------------------------------

        In this week, you will learn open-source configuration management (CM)
        software as part of IT automation and orchestration. We focus on Ansible and
        OpenStack Heat to review the system configuration and management but Salt,
        Puppet, Chef, and Juju are introduced to explore other tools as well. With
        different features of these software, you will see which tool is ideal for your
        system environment and understand basic CM techniques. We have a few lab
        sessions to provide hands-on experience about deploying and configuring
        applications on IT infrastructure.

        IT Operations - Automation and Orchestration
        *******************************************************************************

        .. list-table:: DevOps Tools
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - Ansible
             - `17 mins <https://www.youtube.com/watch?v=JTv1QWjTWS8&index=1&list=PLLO4AVszo1SOkNPAv4E824AFScdduO9NF>`_
             - :ref:`1.5 hours <ref-class-lesson-devops-ansible>`
             - :ref:`30 mins <ref-class-lesson-devops-ansible-lab>`
             - 04/21
             - 04/24
           * - SaltStack
             -
             - :ref:`1.5 hours <ref-class-lesson-devops-saltstack>`
             - :ref:`10 mins <ref-class-lesson-devops-saltstack-exercises>` (optional)
             -
             -
           * - Puppet
             -
             - :ref:`1 hour <ref-class-lesson-devops-puppet>`
             - :ref:`20 mins <ref-class-lesson-devops-puppet-exercises>` (optional)
             -
             -
           * - Chef
             - `35 mins <https://mix.office.com/watch/1g90jbv8llv0j>`_
             - :ref:`1 hour <ref-class-lesson-devops-chef>`
             - :ref:`30 mins <ref-class-lesson-devops-chef-exercises>` (optional)
             - 04/21
             -
           * - OpenStack Heat
             - `20 mins <https://mix.office.com/watch/1ry7jrkuvkfwh>`_
             - :ref:`1 hour <ref-class-lesson-devops-openstack-heat>`
             - :ref:`1 hour <ref-class-lesson-devops-openstack-heat-exercises>`
             - 04/21
             - 04/24
           * - Ubuntu Juju
             -
             - :ref:`30 mins <ref-class-lesson-devops-juju>`
             - :ref:`10 mins <ref-class-lesson-devops-juju-exercises>` (optional)
             -
             -

        .. .. list-table:: Discussion
           :widths: 30 10 10 10 10 10
           :header-rows: 1

        ..   * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - Orchestration vs Collective DevOps
             -
             -
             -
             -
             -
           * - PaaS
             -
             -
             -
             -
             -
           * - Cloudmesh
             -
             -
             -
             -
             -

        Length of the lessons in Unit 4
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        * Total of video lessons: 1 hour and 12 minutes
        * Total of study materials: 2.5 hours
        * Total of lab sessions: 1 hour and 30 minutes

        Additional (optional) Lessons
        """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

        * Total of optional study materials: 4 hours
        * Total of optional lab sessions: 1 hour and 10 minutes

        Unit 5
        -------------------------------------------------------------------------------

        This week, you will learn basics of virtual clusters. Typically, analyzing
        large data sets containing unstructured data types requires distributed
        computing resources for data processing with high performance, scalability, and
        availability. With virtualization technology, cluster computing can be more
        flexible, effective and cost-efficient in terms of resource utilization. There
        are three basic tutorials about deploying a virtual cluster, Hadoop cluster and
        MongoDB Sharded cluster which give you a chance to gain some experience of how
        to setup virtual clusters manually and configure software with Cloudmesh. In
        Unit 6, advanced topics of virtual clusters will be discussed.

        Virtual Clusters I
        *******************************************************************************

        **First Appearance of Hadoop**

        .. list-table:: Virtual Clusters I
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Introduction and Overview**
             - `4 mins <https://mix.office.com/watch/eap9zdqfifgp>`_
             -
             - see video
             - 04/29
             -
           * - **Dynamic Deployment of Arbitrary X Software on Virtual Cluster**
             - `4 mins <https://mix.office.com/watch/zukoz9wswe7z>`_
             -
             - see video
             - 04/29
             -
           * - **Deploying Virtual Cluster with Cloudmesh**
             - `22 mins <https://www.youtube.com/watch?v=oSlq0287m1Q>`_
             - :ref:`30 mins <ref-class-lesson-deploying-virtual-cluster-with-cloudmesh>`
             - :ref:`10 mins <ref-class-lesson-deploying-virtual-cluster-with-cloudmesh-exercise>` (optional)
             - 04/29
             -
           * - **Deploying Hadoop Cluster**
             -
             - :ref:`45 mins <ref-class-lesson-deploying-hadoop-cluster-manual>`
             - :ref:`20 mins <ref-class-lesson-deploying-hadoop-cluster-manual-exercise>` (optional)
             - 04/29
             -
           * - **Deploying Hadoop Cluster with Cloudmesh**
             -
             - :ref:`30 mins <ref-class-lesson-deploying-hadoop-cluster-with-cloudmesh>`
             - see text
             - 04/29
             -
           * - **Hadoop Example: Word Count**
             - `33 mins <https://mix.office.com/watch/1on4q8t1vcjfh>`_
             - :ref:`1 hour <ref-class-lesson-hadoop-word-count>`
             - see video and text
             - 04/29
             -
           * - **Deploying MongoDB Sharded Cluster**
             - `4 mins <https://mix.office.com/watch/1rx90yz48fqpn>`_
             - :ref:`1 hour <ref-class-lesson-mongodb-sharded-cluster>`
             - see video and text
             - 04/29
             -
           * - **``cluster`` Cloudmesh Command for Virtual Clusters**
                - SSH Connections between nodes, Host Configuration
             - `5 mins <https://mix.office.com/watch/lk39mr08k0ox>`_
             - `20 mins <../../cloudmesh/cm/_cm-cluster.html>`_ (repeated practice)
             - `20 mins <../../cloudmesh/cm/_cm-cluster.html#exercise>`_
             - 04/29
             - 05/01

        ..
           * - **Hadoop Virtual Cluster**
                - Cloudmesh
                - Discussion
                - Advanced Topics with Hadoop
                     - Zookeeper and HBase
                     - Yarn
                     - OpenStack Sahara
             - Not yet available
             - Not yet available
             -
             - 04/20
             - 04/24

        Length of the lessons in Unit 5
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        * Total of video lessons: 1 hour and 12 minutes
        * Total of study materials: 4 hours and 05 minutes
        * Total of lab sessions:  50 minutes

        Unit 6
        -------------------------------------------------------------------------------


        Virtual Cluster II: Composite Cluster with Sub-Clusters
        *******************************************************************************

        .. list-table:: Virtual Cluster II
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Composite Cluster with Sub-Clusters** (Not taught in this class)
                - Introduction and Overview
                - Creating a Cross Resource Virtual Cluster
             - Not taught in this class
             - Not taught in this class
             -
             -
             -
           * - **Apache Hadoop YARN**
             - `34 mins <https://mix.office.com/watch/1eopy3tfq6kim>`_
             - :ref:`1 hour <ref-class-lesson-hadoop-yarn>`
             -
             - 05/14
             -
           * - **Apache ZooKeeper**
             - `40 mins <https://mix.office.com/watch/1ptxm2uj2s7y3>`_
             - :ref:`1 hour <ref-class-lesson-zookeeper>`
             -
             - 05/14
             -
           * - **Open MPI Virtual Cluster**
                - Introduction and Overview
                - HPC Stack - MPI
                - Cloudmesh HPC (Not taught in this class)
             -
             - :ref:`1 hour <ref-class-lesson-openmpi-with-cloudmesh>`
             -
             - 05/14
             -
           * - **HPC Queuing System** (optional)
             - `8 mins <https://www.youtube.com/watch?v=6oUsMyDt7gU>`_ (optional)
             - :ref:`1 hour <s-hpc>` (optional)
             -
             - 05/14
             -
           * - **MongoDB Virtual Cluster** (repeated lesson)
                - Introduction and Overview
                - Sharded MongoDB
             - `4 mins <https://mix.office.com/watch/1rx90yz48fqpn>`_
             - :ref:`1 hour <ref-class-lesson-mongodb-sharded-cluster>`
             -
             - 05/14
             -

        Length of the lessons in Unit 6
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        * Total of video lessons: 1 hour and 26 minutes
        * Total of study materials: 5 hours

        Unit 7
        -------------------------------------------------------------------------------


        Other Technologies (under preparation)
        *******************************************************************************

        .. list-table:: Other Technologies
           :widths: 30 10 10 10 10 10
           :header-rows: 1

           * - Topic
             - Video
             - Text
             - Assignment
             - Study Material By
             - HW Due
           * - **Docker Basics**
             -
             - :ref:`1 hour <ref-class-lesson-docker>`
             -
             - 05/21
             -
           * - **VM Software - Vagrant**
             - Not yet available
             - :ref:`30 min <ref-virtualization-tools>`
             -
             - 05/13
             - 05/15
           * - **Hadoop MRv2**
             -
             - :ref:`1 hour <ref-class-lesson-hadoop2>`
             -
             -
             -
           * - **Hadoop MRv2 with Cloudmesh ``launcher``**
             -
             - :ref:`30 mins <ref-class-lesson-hadoop2-launcher>`
             -
             -
             -
           * - **Apache ZooKeeper** (repeated lesson)
             - `40 mins <https://mix.office.com/watch/1ptxm2uj2s7y3>`_
             - :ref:`1 hour <ref-class-lesson-zookeeper>`
             -
             - 05/21
             -
           * - **Apache Big Data Stack (ABDS)**
                 - Apache Zookeeper
                 - Apache Storm
                 - Apache Mesos
                 - Apache HBase
                 - Apache Spark
                 - Apache Pig
                 - Apache Hive
             - Not yet available
             - Not yet available
             -
             - 05/13
             - 05/15
           * - **Glossary**
             - Not yet available
             - Not yet available
             -
             - 05/13
             - 05/15

        .. comment::

             * - **Virtualization Technologies**
                 - Introduction and Overview
                 - Hypervisors
                     - KVM
                     - Containers (LXC)
                     - Docker
             - Not yet available
             - Not yet available
             -
             - 05/13
             - 05/15

               - Oracle VirtualBox
               - VMWare

        .. comment::

                Unit 8
                -------------------------------------------------------------------------------


                Future (under preparation)
                *******************************************************************************

                .. list-table:: Future
                   :widths: 30 10 10 10 10 10
                   :header-rows: 1

                   * - Topic
                     - Video
                     - Text
                     - Assignment
                     - Study Material By
                     - HW Due
                   * - **What will the Future Bring**
                     - Not yet available
                     - Not yet available
                     -
                     - Not due
                     - Not due
                   * - **GE Industrial Internet of Things (IIoT)**
                     - Not yet available
                     - Not yet available
                     -
                     - Not due
                     - Not due




        .. comment::

           * - **Using India OpenStack on Cloudmesh**
             - `5 mins <https://mix.office.com/watch/irhlsfq220zh>`_
             - `30 mins <../../cloudmesh/setup/cloudmesh_yaml.html>`_
             - `10 mins <../../cloudmesh/api/exercises.html#exercise-3>`_
             - 04/06
             - 04/10