In [1]:
# preamble to be able to run notebooks in Jupyter and Colab
try:
    from google.colab import drive
    import sys
    
    drive.mount('/content/drive')
    notes_home = "/content/drive/Shared drives/CSC310/notes/"
    user_home = "/content/drive/My Drive/"
    
    sys.path.insert(1,notes_home) # let the notebook access the notes folder
    
except ModuleNotFoundError:
    notes_home = "" # running native Jupyter environment -- notes home is the same as the notebook
    user_home = ""  # under Jupyter we assume the user directory is the same as the notebook

# Cloud Computing

**Definition**: Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Cloud computing relies on sharing of resources to achieve coherence and [economies of scale](https://en.wikipedia.org/wiki/Economies_of_scale).

[-Wikipedia](https://en.wikipedia.org/wiki/Cloud_computing)

## Architecture

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Cloud_computing.svg/1280px-Cloud_computing.svg.png" height="350" width="500">

**Cloud computing metaphor**: the group of networked elements providing services need not be individually addressed or managed by users; instead, the entire provider-managed suite of hardware and software can be thought of as an **amorphous cloud**.

Googles Colab Notebooks is an example of application based cloud computing (for that matter their whole suite of application falls into that category).

## Service Models

* [Software as a service (SaaS)](https://en.wikipedia.org/wiki/Cloud_computing#Software_as_a_service_(SaaS))
    
    **Definition**: The capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. 
    <!-- The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings. -->
    
    **Examples**: Google Docs and Colab Notebooks
 

   
* [Platform as a service (PaaS)](https://en.wikipedia.org/wiki/Cloud_computing#Platform_as_a_service_(PaaS))

    **Definition**: The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. 
    <!-- The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.-->
    
    **Examples**: AWS Sagemaker and S3
 

   
* [Infrastructure as a service (IaaS)](https://en.wikipedia.org/wiki/Cloud_computing#Infrastructure_as_a_service_(IaaS))

    **Definition**: The consumer is able to deploy and run arbitrary software, which can include operating systems and applications and has control over operating systems, storage, and deployed applications.
    
    **Examples**: AWS and Azure

## Data Science in the Cloud

<img src="https://dmhnzl5mp9mj6.cloudfront.net/bigdata_awsblog/images/White_paper_image1.PNG" width="600" height="200">

A cloud-based architecture of a data science processing pipeline taking advantage of AWS' IaaS.  All the components can be provisioned and configured in AWS console or through their DevOps API.  ([Source](https://aws.amazon.com/blogs/big-data/big-data-analytics-options-on-aws-updated-white-paper/))

We will take a look at two components in the above diagram:

* Cloud-based Storage: [S3](https://aws.amazon.com/s3/) (component #3)
    
    Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance.
  

  
* Cloud-based Machine Learning: [Sagemaker](https://aws.amazon.com/sagemaker/) (component #5)
    
    Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. 

## Experiments


If you have an AWS account here are two nice tutorials

* [s3 tutorial](https://aws.amazon.com/getting-started/hands-on/backup-files-to-amazon-s3/)

* [Jupyter/Sagemaker](https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/)

We will be doing work in AWS Classrooms.