#### LINK TO DOWNLOAD NEO4J

http://neo4j.com/artifact.php?name=neo4j-community-2.3.1-unix.tar.gz
 
#### AWS SAMPLE CODE

https://alestic.com/2013/11/aws-cli-query/

---
## W205 Final Project: Million Song Dataset (MSD)

Requirements: W205 AMI with Hadoop and Spark
              aws cli installed and configured (run the following:

`$aws configure`
 AWS Access Key ID [None]: <access key>  
 AWS Secret Access Key [None]: <secret access key>  
 Default region name [None]: us-east-1  
 Default output format [None]: json  

This configurations scripts is run from within the EC2 instance.

It assumes that the instance DOES NOT have any volume attached and that the mount

point /data is available
 
Python Libraries: py2neo,

# Attempt to automate configuration

In [None]:
%%bash
# === Installations ===
sudo yum install jq
pip install awscli

pip install pyspark

pip install cython
pip install numpy
pip install numexpr


wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.16.tar
tar -xvf hdf5-1.8.16.tar 
cd hdf5-1.8.16
./configure -prefix=/usr/local
make
make install
pip install h5py


wget http://s3.amazonaws.com/ec2metadata/ec2-metadata
chmod a+x ec2-metadata
mv ec2-metadata /usr/bin


## AWS Setup

Attache 2 volumes to this instance:

- **Graph Volume:** 200GB volume created to store graph
- **MSD Volume:** 280GB volume created form snapshot *snap-5178cf30* with the entire Million Song Dataset (MSD). For details about the snapshot see [AWS Datasets](https://aws.amazon.com/datasets/million-song-dataset/)

In [2]:
!mkdir config

In [13]:
%%writefile config/create_volumes.sh
#!/usr/bin/env bash

# =============================================
# RUN SCRIPT AS ROOT USER
# Attaches 2 volumes to this instance: a Graph Volume of 200 GB mounted to /graph
# and a MSD Volume 280GB from snap-5178cf30 with the entire dataset mounted to /msong_dataset

cd ~

# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Save instance info in environment variables
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

# Get instance id
INSTANCE_ID=$(ec2-metadata -i | cut -d:  -f2| cut -d' ' -f2)
export INSTANCE_ID
# Get instance public hostname
INSTANCE_PDNS=$(ec2-metadata -p | cut -d:  -f2| cut -d' ' -f2)
export INSTANCE_PDNS
# Get instance availability zone
INSTANCE_ZONE=$(ec2-metadata -z | cut -d:  -f2| cut -d' ' -f2)
export INSTANCE_ZONE

#echo 'export INSTANCE_ID='$INSTANCE_ID >> ~/.bashrc
#echo 'export INSTANCE_PDNS='$INSTANCE_PDNS >> ~/.bashrc
#echo 'export INSTANCE_ZONE='$INSTANCE_ZONE >> ~/.bashrc
#source ~/.bashrc

# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Create Volumes
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

mkdir -p aws-info
 
### Create volume to store graph
echo LOG: Creating graph volume...
aws ec2 create-volume --size 200 --availability-zone $INSTANCE_ZONE --volume-type gp2 > aws-info/graph-volume.json
wait
GRAPH_VOL_ID=$(jq '.VolumeId' aws-info/graph-volume.json)
GRAPH_VOL_ID="${GRAPH_VOL_ID%\"}"
GRAPH_VOL_ID="${GRAPH_VOL_ID#\"}"
export GRAPH_VOL_ID

### Create volume from AWS snapshot of Million Song Dataset (full dataset)
echo LOG: Copying Million Song Dataset volume...
aws ec2 create-volume --availability-zone $INSTANCE_ZONE \
--snapshot-id snap-5178cf30 --volume-type gp2 > aws-info/msd-volume.json
wait
MSD_VOL_ID=$(jq '.VolumeId' aws-info/msd-volume.json)
MSD_VOL_ID="${MSD_VOL_ID%\"}"
MSD_VOL_ID="${MSD_VOL_ID#\"}"
export MSD_VOL_ID

echo LOG: Wait for volumes to become available...

sleep 30

# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Attache volumes to this instance
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    
echo LOG: Attaching graph volumne...
aws ec2 attach-volume --volume-id $GRAPH_VOL_ID --instance-id $INSTANCE_ID --device /dev/xvdh

echo LOG: Attaching Million Song Dataset volume...
aws ec2 attach-volume --volume-id $MSD_VOL_ID --instance-id $INSTANCE_ID --device /dev/xvdj

echo LOG: Wait for volumes to be attached...
    
sleep 30 

# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Mount volumes to instance
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

mkdir -p /graph
sudo mkfs -t ext4 /dev/xvdh
sudo mount -t ext4 /dev/xvdh /graph
chmod g+rwx -R /graph/

mkdir -p /msong_dataset
sudo mount /dev/xvdj /msong_dataset


Overwriting config/create_volumes.sh


In [8]:
!chmod a+x config/create_volumes.sh

In [None]:
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Unmount volumes from instance
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
umount /graph
umount /msong_dataset

# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Detaching volumes from this instance
# ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    
echo LOG: Detaching graph volumne...
aws ec2 detach-volume --volume-id $GRAPH_VOL_ID --instance-id $INSTANCE_ID

echo LOG: Detaching Million Song Dataset volume...
aws ec2 detach-volume --volume-id $MSD_VOL_ID --instance-id $INSTANCE_ID

In [None]:
aws emr create-cluster --name "ProcessMSD" \
--release-label emr-4.2.0 --applications Name=Spark \
--instance-count 5 --use-default-roles --ec2-attributes KeyName=atjose_kpair \
--instance-type m3.2xlarge \
--configurations file:///data/asoto/projectW205/config/emrconfig.json | tee aws-info/spark

In [None]:
i-df177369
MSD_VOL_ID=vol-7e18339d
GRAPH_VOL_ID=vol-191833fa
ec2-54-227-33-108.compute-1.amazonaws.com

In [None]:
aws ec2 attach-volume --volume-id vol-191833fa --instance-id i-df177369 --device /dev/xvdh
aws ec2 attach-volume --volume-id vol-7e18339d --instance-id i-df177369 --device /dev/xvdj


## Install Neo4j in /graph directory

In [None]:
%%bash
# === Install Neo4j in /graph directory ===

cd ~
cd /graph
wget http://neo4j.com/artifact.php?name=neo4j-community-2.3.1-unix.tar.gz
tar -xf neo4j-community-2.3.1-unix.tar.gz
export NEO4J_HOME="/data/neo4j"

## Install Anaconda

In [None]:

cd /graph
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh
bash Anaconda2-2.4.1-Linux-x86_64.sh
conda install h5py
    