# AWS Set-up Notes

### Part of *Signal's* Distributed Computing Assignment

We will be using https://aws.amazon.com/ to set up an [EC2](https://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) (*Electric Compute Cloud*) [instance](https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#).

## Steps
1. Basic set-up (for EC2 Instance)
1. Shut-down procedure
1. SSHing in
    - Command-line quick-ref
1. Installing packages
    - Virtual Machines (venv)
1. Elastic IP
1. Server
    - Rstudio and rstudio-server
    - http
    - Self-work: find analog to rstudio-server for jupyter? [link](http://jupyter-notebook.readthedocs.io/en/latest/public_server.html)
1. 

**Appendix**

1. Access permissions, basic security (cursory)
    - Fingerprints (SSH, not going to get into it)
    - Keypairs (downloaded but once)
    - Adding users
    - Within-application users/password (rstudio-server)
1. Paralellization (With R)

### Basic Set-Up (for EC2 Instance)

While on free account...

Region: Oregon

Amazon Linux AMI, t2.micro, subnet:us-west-2a (optional?), Root:10GiB, gp2 SSD, *skip tags for single instance*

Security (low): permit SSH(anywhere), HTTP(anywhere)

**IF rstudio-server**: Custom TCP Port:8787 (anywhere)

Launch (optional:new authentication key)

## Shut-Down Procedure

Instances-Actions-InstanceState-Stop

ElasticIPs-Actions-DissociateAddresses-ReleaseAddresses

Instances-Actions-InstanceState-Terminate

Volumes-Actions-DeleteVolume

    aws terminate-instances


## SSHing In

[**Amazon Guide to SSHing into EC2**](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstances.html)

Get ID of EC2 instance from Amazon

[Instructions to configure AWS](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html)

    #A couple aws ec2 utilities while not logged in to ec2
    aws ec2 describe-addresses
    aws ec2 describe-instances
    
    aws ec2 get-console-output --instance-id <instanceID>
    #Gets fingerprint
    
    ssh -i <keypath> ec2-user@<public_DNS>.amazonaws.com
        yes
        
    
    ssh -i <keypath> <filepath> ec2-user@<instance_id_full>.amazonaws.com:~
    #Transfers a file from your computer to EC2


## Installing Packages

### Quick Install (but sometimes excessive)

    sudo yum groupinstall "Development Tools"
    
### Specific Install

    sudo yum install httpd
    sudo yum install python34
    sudo yum install R
    #(check name!) yum search python3
    sudo yum install gcc gcc-c++ gcc-gfortrain
    sudo yum install blas-devel lapack-devel atlas-devel
    
### Virtual Env 

[**Python Virtual Environment Docs**](http://docs.python-guide.org/en/latest/dev/virtualenvs/)

    pip install virtualenv
    virtualenv venv
    which python3
    virtualenv -p <path> ~./venv
    source venv/bin/activate
    
    
    pip install <py libraries>
    
    
    #Libraries as prep for vec2pca
    pip install numpy
    pip install scipy
    pip install nltk
        (In Python):
        import nltk
        nltk.download()
            d
            punkt
    pip install gensim beautifulsoup4 plac pandas
    
    #Libraries as prep for parellel gbtrees (Install in R!)
    doMC foreach tictoc xgboost
    
    deactivate
    
(which python3 locates path to python3*)

### Command Line Quick-Ref

    #A couple aws ec2 utilities while not logged in to ec2
    aws ec2 describe-addresses
    aws ec2 describe-instances
    
    chmod 400 /path/my-key-pair.pem
    #Hide keys.pem
    
    top -o cpu
    #Watch cpu usage; q to quit
    
    ls -a
    #See semi-invisible files
    
    ls -lh <optional: file>
    #Get file size in a more readable format
    
    ls -d */
    #List folders in current directory
    
    rm <file>
    
    sudo rm -R <directory>
    #Delete entire directory (WARNING: Dangerous)
    
    head -n <#> <file>
    #Print out the first # lines of a file into commandline
    
    cp <filein> <path/fileout>
    #Make a copy of a file somewhere else
    
    yum search <x>
    #Search for packages containing "x" in name
    
    sudo yum <pkg>
    
    wc -l <file> 
    #counts lines in file
    
    wget <url_of_data>
    #Download data from internet
    
    nano <filename>
    #Might want to eventually replace it with Vim, Emacs,(Vi?) though)
        ctrl-w 
        #search
        ctrl-o enter 
        #save
        ctrl-x 
        #exit
        
    ctrl-d
    #Leave EC2 instance
   
   
    

   
**Code for Allocating 1G Swap Space**

*for those times where you don't have enough RAM to install everything, stick some on the hard drive storage*

sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024 sudo /sbin/mkswap /var/swap.1
sudo chmod 0600 /var/swap.1
sudo /sbin/swapon /var/swap.1

### Virtual Environments

venv

conda? (may be too large for a small EC2 instance)

## Servers

**httpd server**

    sudo /etc/init.d/httpd start
    #start the server, or swap start for stop to stop server
    
    sudo chkconfig --levels 3 httpd on
    #Get the httpd server to start whenever the server is rebooted
    
    sudo cp <webserver_file.html> /var/www/html/index.html
    #Moves a html file to where it will be demonstrated from
    
Go to {ElasticIP}
    
**rstudio server**

[Download instructions](https://www.rstudio.com/products/rstudio/download-server/)

**Current** download instructions

    wget https://download2.rstudio.org/rstudio-server-rhel-0.99.903-x86_64.rpm
    sudo yum install --nogpgcheck rstudio-server-rhel-0.99.903-x86_64.rpm

Starting the server

    sudo rstudio-server start
    sudo rstudio-server status
    sudo adduser
    sudo passwd
    
    rstudio-server stop
    

Go to {ElasticIP}:8787



[xgboost](https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf) (which automatically implements paralell, but comes with annoyingly different defaults from gbm)