Page Contents
Projects on Big Data Software course introduces lessons in two sections: Theory and Technology. Units in Technology are listed in this page. To navigate Theory Units, syllabus and discussions, please use the course site scholargrid.org.
Schedule for Units in Technology Section
Topic | Due |
---|---|
Gaining Access to FutureSystems and Core Technologies | 01/25 |
.. comment:: * - **The Basics of OpenStack** - * - **Cloudmesh - Cloud Management Software** - * - **IT Operations - Automation and Orchestration** - * - **Virtual Clusters I (First Appearance of Hadoop)** - * - **Virtual Clusters II (Composite Cluster with Sub-Clusters)** - * - **Other Technologies** -
:ref:`System Notice <ref-class-notice>`
In this unit, you will learn how to gain access to the FutureSystems resources. It includes the portal account creation, class project participation, SSH key generation and login node access. Some of other lessons have been prepared for the beginners to help understand the basics of Linux operating systems and the collaboration tools i.e. GitHub, Google Hangout and Remote Desktop. Please watch video lessons and read through web contents.
Topic | Video | Text |
---|---|---|
Overview and Introduction | 16 mins | 10 mins |
|
4 mins | 15 mins |
GitHub | 18 mins | 30 mins |
Topic | Video | Text |
---|---|---|
ssh-keygen | 4 mins | 10 mins |
Account Creation | 12 mins | 10 mins |
Remote Login | 6 mins | 10 mins |
Putty for Windows | 11 mins | 10 mins |
Topic | Video | Text |
---|---|---|
Overview and Introduction | 4 mins | 5 mins |
Shell Scripting | 15 mins | 30 mins |
|
5 mins | 30 mins |
|
27 mins | 1 hour |
|
3 mins | 10 mins |
|
3 mins | 20 mins |
Modules | 3 mins | 10 mins |
Note
Find an editor that you will be using to do your programming with. For advanced Python programming we recommend PyCharm. However you can use others e.g. Enthought Canopy on your local computer. The way you could use it is to edit python locally, push the code into github and check it out into your vm or your login node on india.futuresystems.org. This is how many of us work.
- Total of video lessons: 2 hours
- Total of study materials: 4 hours and 30 minutes
Topic | Description |
---|---|
Start with Account, Github and Python | 9 tasks |
.. comment:: Unit 2 ------------------------------------------------------------------------------- Introduction to OpenStack and Public Clouds ******************************************************************************* OpenStack is a open-source cloud computing software platform and a community-driven project. You can use OpenStack to build a cloud infrastructure in your public or private network, or you can simply use cloud software for your services. The lessons in this week are specifically prepared to try OpenStack Software and give you the confidence and understanding of using IaaS cloud platforms. There are tutorial lessons to explore OpenStack web dashboard (Horizon) and compute engine (Nova) including Public Clouds e.g. Amazon EC2 or Microsoft Azure. .. list-table:: Basics of OpenStack :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Introduction and Overview** - `12 mins <https://mix.office.com/watch/u7uovy9i06jo>`_ - `10 mins <lesson/iaas/overview_openstack.html>`_ - - 03/30 - * - **OpenStack for Beginners** - `27 mins <https://mix.office.com/watch/1r7zifdtjoa6j>`_ - - - 03/30 - * - -- Compute Engine (Nova) - - `1 hour <lesson/iaas/openstack.html>`_ - `30 mins <lesson/iaas/openstack.html#exercises>`_ - 03/30 - 04/10 * - -- Web Dashboard (Horizon) - - `15 mins <lesson/iaas/openstack_horizon.html>`_ - `15 mins <lesson/iaas/openstack_horizon.html#exercises>`_ - 03/30 - 04/10 * - **Storage (Swift)** - `3 mins <https://mix.office.com/watch/w3rko4itecgc>`_ - `10 mins <lesson/iaas/openstack.html#swift-storage>`_ - - 03/30 - * - **Network (Neutron)** - `3 mins <https://mix.office.com/watch/1dt5hp0e2grov>`_ - `10 mins <lesson/iaas/openstack.html#neutron-network>`_ - - 03/30 - * - **Introduction to OpenStack Juno Release** - `2 mins <https://mix.office.com/watch/cz6xehrs9xor>`_ - `10 mins <lesson/iaas/openstack_juno.html>`_ - - 03/30 - .. list-table:: Other IaaS Platforms - Public Commercial Clouds :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Amazon Web Services (AWS)** - `16 mins <https://mix.office.com/watch/1351hz8j187i7>`_ - `30 mins <lesson/iaas/aws_tutorial.html>`_ - `45 mins <lesson/iaas/aws_tutorial.html#exercises>`_ (optional, not required) - 03/30 - * - **Microsoft Azure** - `29 mins <https://mix.office.com/watch/kzh0nwvdw6tm>`_ - `50 mins <lesson/iaas/azure_tutorial.html>`_ - `10 mins <lesson/iaas/azure_tutorial.html#exercise1>`_ (optional, not required) - 03/30 - .. list-table:: Additional (optional) Further Study Materials :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **OpenStack for Beginners** - Compute Engine (Nova) - - `2 hours <../../iaas/index.html>`_ - `50 mins <../../iaas/openstack.html#exercises>`_ - Not due - Not due * - **Other IaaS Platforms** - Public Commercial Clouds - Microsoft Azure - - - `50 mins <lesson/iaas/azure_tutorial.html#exercise2>`_ - Not due - Not due Length of the lessons in Unit 2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Total of video lessons: 1 hour and 30 minutes * Total of study materials: 3 hours and 15 minutes * Total of lab sessions: 1 hours 40 minutes Unit 3 ------------------------------------------------------------------------------- Cloudmesh - Cloud Management Software ******************************************************************************* Cloudmesh is a cloud resource management software written in Python. It automates launching multiple VM instances across different cloud platforms including Amazon EC2, Microsoft Azure Virtual Machine, HP Cloud, OpenStack, and Eucalyptus. The web interface of Cloudmesh helps users and administrators manage entire cloud resources with the most cutting-edge technologies such as Apache LibCloud, Celery, IPython, Flask, Fabric, Docopt, YAML, MongoDB, and Sphinx. Command Line Tools and Rest APIs are also supported. .. list-table:: Basics of Cloudmesh :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Introduction and Overview** - `29 mins <http://www.youtube.com/watch?v=njHHjRMb7V8>`_ - `30 mins <../../cloudmesh/overview.html>`_ - - 04/06 - Not due .. list-table:: Cloudmesh for Beginners :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Installation on a local machine** - `18 mins <http://www.youtube.com/watch?v=lGiJifD0VgU>`_ - `30 mins <../../cloudmesh/setup/quickstart.html>`_ - (not required, only read the text and watch the video) - 04/06 - N/A * - **Installation on a virtual machine OpenStack** - `33 mins <http://www.youtube.com/watch?v=rcecpgm-47g>`_ - `30 mins <../../cloudmesh/setup/setup_openstack.html>`_ - follow the text and video - 04/06 - 04/17 * - **Command Line Tools (CLI)** - `12 mins <http://www.youtube.com/watch?v=hdq-t-ggkXA>`_ - `30 mins <../../cloudmesh/shell/index.html>`_ - use the previously created VM and follow text and video use `cm help` and review man pages - 04/06 - 04/17 * - **Web Interface (GUI)** - `16 mins <http://www.youtube.com/watch?v=l_P4G85rysA>`_ - `30 mins <../../cloudmesh/gui/index.html>`_ - `Excersise 4: 20 mins <../../cloudmesh/api/exercises.html#exercise-4>`_ (optional) - 04/06 - 04/17 * - **Python APIs** - `15 mins <http://www.youtube.com/watch?v=xOL_-Sfh9MA>`_ - `30 mins <../../cloudmesh/api/index.html>`_ - `Excersise 1 (10 mins) <../../cloudmesh/api/exercises.html#exercise-1>`_, `Excersise 2 (10 mins) <../../cloudmesh/api/exercises.html#exercise-2>`_ - 04/06 - 04/17 * - **IPython on Cloudmesh** (optional) - `15 mins <http://www.youtube.com/watch?v=1dn_av-zC00>`_ - `20 mins <../../cloudmesh/ipython.html>`_ - (not required, only read text and watch video) - 04/06 - N/A .. list-table:: Advanced Cloudmesh :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Adding new Commands via a Python Package** - `5 mins <https://www.youtube.com/watch?v=UFLyCVpDhgI&feature=em-upload_owner>`_ - `5 mins <http://cloudmesh.github.io/cmd3/manual.html#generating-independent-packages>`_ - `1 hour <../../cloudmesh/cm/cmd3.html#exercise-1>`_ - 04/06 - 04/17 * - **Virtual Clusters with Cloudmesh** - SSH Connections between nodes, Host Configuration - `5 mins <https://mix.office.com/watch/lk39mr08k0ox>`_ - `20 mins <../../cloudmesh/cm/_cm-cluster.html>`_ - see text and video - 04/06 - 04/17 .. * - **Introduction and Overview** - Not yet available - Not yet available - - 04/06 - 04/10 * - **VM Management** - Not yet available - Not yet available - see text and video - 04/06 - 04/10 Length of the lessons in Unit 3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Total of video lessons: 2 hours and 33 minutes * Total of study materials: 4 hours and 15 minutes * Total of lab sessions: 1 hour and 30 minutes Unit 4 ------------------------------------------------------------------------------- In this week, you will learn open-source configuration management (CM) software as part of IT automation and orchestration. We focus on Ansible and OpenStack Heat to review the system configuration and management but Salt, Puppet, Chef, and Juju are introduced to explore other tools as well. With different features of these software, you will see which tool is ideal for your system environment and understand basic CM techniques. We have a few lab sessions to provide hands-on experience about deploying and configuring applications on IT infrastructure. IT Operations - Automation and Orchestration ******************************************************************************* .. list-table:: DevOps Tools :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - Ansible - `17 mins <https://www.youtube.com/watch?v=JTv1QWjTWS8&index=1&list=PLLO4AVszo1SOkNPAv4E824AFScdduO9NF>`_ - :ref:`1.5 hours <ref-class-lesson-devops-ansible>` - :ref:`30 mins <ref-class-lesson-devops-ansible-lab>` - 04/21 - 04/24 * - SaltStack - - :ref:`1.5 hours <ref-class-lesson-devops-saltstack>` - :ref:`10 mins <ref-class-lesson-devops-saltstack-exercises>` (optional) - - * - Puppet - - :ref:`1 hour <ref-class-lesson-devops-puppet>` - :ref:`20 mins <ref-class-lesson-devops-puppet-exercises>` (optional) - - * - Chef - `35 mins <https://mix.office.com/watch/1g90jbv8llv0j>`_ - :ref:`1 hour <ref-class-lesson-devops-chef>` - :ref:`30 mins <ref-class-lesson-devops-chef-exercises>` (optional) - 04/21 - * - OpenStack Heat - `20 mins <https://mix.office.com/watch/1ry7jrkuvkfwh>`_ - :ref:`1 hour <ref-class-lesson-devops-openstack-heat>` - :ref:`1 hour <ref-class-lesson-devops-openstack-heat-exercises>` - 04/21 - 04/24 * - Ubuntu Juju - - :ref:`30 mins <ref-class-lesson-devops-juju>` - :ref:`10 mins <ref-class-lesson-devops-juju-exercises>` (optional) - - .. .. list-table:: Discussion :widths: 30 10 10 10 10 10 :header-rows: 1 .. * - Topic - Video - Text - Assignment - Study Material By - HW Due * - Orchestration vs Collective DevOps - - - - - * - PaaS - - - - - * - Cloudmesh - - - - - Length of the lessons in Unit 4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Total of video lessons: 1 hour and 12 minutes * Total of study materials: 2.5 hours * Total of lab sessions: 1 hour and 30 minutes Additional (optional) Lessons """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" * Total of optional study materials: 4 hours * Total of optional lab sessions: 1 hour and 10 minutes Unit 5 ------------------------------------------------------------------------------- This week, you will learn basics of virtual clusters. Typically, analyzing large data sets containing unstructured data types requires distributed computing resources for data processing with high performance, scalability, and availability. With virtualization technology, cluster computing can be more flexible, effective and cost-efficient in terms of resource utilization. There are three basic tutorials about deploying a virtual cluster, Hadoop cluster and MongoDB Sharded cluster which give you a chance to gain some experience of how to setup virtual clusters manually and configure software with Cloudmesh. In Unit 6, advanced topics of virtual clusters will be discussed. Virtual Clusters I ******************************************************************************* **First Appearance of Hadoop** .. list-table:: Virtual Clusters I :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Introduction and Overview** - `4 mins <https://mix.office.com/watch/eap9zdqfifgp>`_ - - see video - 04/29 - * - **Dynamic Deployment of Arbitrary X Software on Virtual Cluster** - `4 mins <https://mix.office.com/watch/zukoz9wswe7z>`_ - - see video - 04/29 - * - **Deploying Virtual Cluster with Cloudmesh** - `22 mins <https://www.youtube.com/watch?v=oSlq0287m1Q>`_ - :ref:`30 mins <ref-class-lesson-deploying-virtual-cluster-with-cloudmesh>` - :ref:`10 mins <ref-class-lesson-deploying-virtual-cluster-with-cloudmesh-exercise>` (optional) - 04/29 - * - **Deploying Hadoop Cluster** - - :ref:`45 mins <ref-class-lesson-deploying-hadoop-cluster-manual>` - :ref:`20 mins <ref-class-lesson-deploying-hadoop-cluster-manual-exercise>` (optional) - 04/29 - * - **Deploying Hadoop Cluster with Cloudmesh** - - :ref:`30 mins <ref-class-lesson-deploying-hadoop-cluster-with-cloudmesh>` - see text - 04/29 - * - **Hadoop Example: Word Count** - `33 mins <https://mix.office.com/watch/1on4q8t1vcjfh>`_ - :ref:`1 hour <ref-class-lesson-hadoop-word-count>` - see video and text - 04/29 - * - **Deploying MongoDB Sharded Cluster** - `4 mins <https://mix.office.com/watch/1rx90yz48fqpn>`_ - :ref:`1 hour <ref-class-lesson-mongodb-sharded-cluster>` - see video and text - 04/29 - * - **``cluster`` Cloudmesh Command for Virtual Clusters** - SSH Connections between nodes, Host Configuration - `5 mins <https://mix.office.com/watch/lk39mr08k0ox>`_ - `20 mins <../../cloudmesh/cm/_cm-cluster.html>`_ (repeated practice) - `20 mins <../../cloudmesh/cm/_cm-cluster.html#exercise>`_ - 04/29 - 05/01 .. * - **Hadoop Virtual Cluster** - Cloudmesh - Discussion - Advanced Topics with Hadoop - Zookeeper and HBase - Yarn - OpenStack Sahara - Not yet available - Not yet available - - 04/20 - 04/24 Length of the lessons in Unit 5 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Total of video lessons: 1 hour and 12 minutes * Total of study materials: 4 hours and 05 minutes * Total of lab sessions: 50 minutes Unit 6 ------------------------------------------------------------------------------- Virtual Cluster II: Composite Cluster with Sub-Clusters ******************************************************************************* .. list-table:: Virtual Cluster II :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Composite Cluster with Sub-Clusters** (Not taught in this class) - Introduction and Overview - Creating a Cross Resource Virtual Cluster - Not taught in this class - Not taught in this class - - - * - **Apache Hadoop YARN** - `34 mins <https://mix.office.com/watch/1eopy3tfq6kim>`_ - :ref:`1 hour <ref-class-lesson-hadoop-yarn>` - - 05/14 - * - **Apache ZooKeeper** - `40 mins <https://mix.office.com/watch/1ptxm2uj2s7y3>`_ - :ref:`1 hour <ref-class-lesson-zookeeper>` - - 05/14 - * - **Open MPI Virtual Cluster** - Introduction and Overview - HPC Stack - MPI - Cloudmesh HPC (Not taught in this class) - - :ref:`1 hour <ref-class-lesson-openmpi-with-cloudmesh>` - - 05/14 - * - **HPC Queuing System** (optional) - `8 mins <https://www.youtube.com/watch?v=6oUsMyDt7gU>`_ (optional) - :ref:`1 hour <s-hpc>` (optional) - - 05/14 - * - **MongoDB Virtual Cluster** (repeated lesson) - Introduction and Overview - Sharded MongoDB - `4 mins <https://mix.office.com/watch/1rx90yz48fqpn>`_ - :ref:`1 hour <ref-class-lesson-mongodb-sharded-cluster>` - - 05/14 - Length of the lessons in Unit 6 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Total of video lessons: 1 hour and 26 minutes * Total of study materials: 5 hours Unit 7 ------------------------------------------------------------------------------- Other Technologies (under preparation) ******************************************************************************* .. list-table:: Other Technologies :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **Docker Basics** - - :ref:`1 hour <ref-class-lesson-docker>` - - 05/21 - * - **VM Software - Vagrant** - Not yet available - :ref:`30 min <ref-virtualization-tools>` - - 05/13 - 05/15 * - **Hadoop MRv2** - - :ref:`1 hour <ref-class-lesson-hadoop2>` - - - * - **Hadoop MRv2 with Cloudmesh ``launcher``** - - :ref:`30 mins <ref-class-lesson-hadoop2-launcher>` - - - * - **Apache ZooKeeper** (repeated lesson) - `40 mins <https://mix.office.com/watch/1ptxm2uj2s7y3>`_ - :ref:`1 hour <ref-class-lesson-zookeeper>` - - 05/21 - * - **Apache Big Data Stack (ABDS)** - Apache Zookeeper - Apache Storm - Apache Mesos - Apache HBase - Apache Spark - Apache Pig - Apache Hive - Not yet available - Not yet available - - 05/13 - 05/15 * - **Glossary** - Not yet available - Not yet available - - 05/13 - 05/15 .. comment:: * - **Virtualization Technologies** - Introduction and Overview - Hypervisors - KVM - Containers (LXC) - Docker - Not yet available - Not yet available - - 05/13 - 05/15 - Oracle VirtualBox - VMWare .. comment:: Unit 8 ------------------------------------------------------------------------------- Future (under preparation) ******************************************************************************* .. list-table:: Future :widths: 30 10 10 10 10 10 :header-rows: 1 * - Topic - Video - Text - Assignment - Study Material By - HW Due * - **What will the Future Bring** - Not yet available - Not yet available - - Not due - Not due * - **GE Industrial Internet of Things (IIoT)** - Not yet available - Not yet available - - Not due - Not due .. comment:: * - **Using India OpenStack on Cloudmesh** - `5 mins <https://mix.office.com/watch/irhlsfq220zh>`_ - `30 mins <../../cloudmesh/setup/cloudmesh_yaml.html>`_ - `10 mins <../../cloudmesh/api/exercises.html#exercise-3>`_ - 04/06 - 04/10