# EC2 Instances

EC2 instances are the building blocks of cloud computing. In essence, they are a remote computer that you can use to run your code. You can use them to run your code on a cluster, or you can use them to run your code on a single computer, but in this lesson we will focus on the latter.

The way you access an EC2 instance is through your terminal. You can access the instance through the command line setting an SSH connection to the instance. Once you have an SSH connection to the instance, you can run commands as you would in your command line (so it is important to remember the terminal commands, e.g. `ls`).

# SSH key

The first thing you have to create, is a key-pair. This is a file that you will use to connect to the instance. To create a key-pair, you can go to the AWS console and click on the EC2 section. You will see a list of key-pairs. You can click on the `Create Key Pair` button.

<p align="center">
    <img src="images/EC2_1.png" width="500"/>
</p>

<p align="center">
    <img src="images/EC2_2.png" width="500"/>
</p>

In the next page, select 'Create Key Pair'. You will be asked to give a name to the key-pair. You can give a name to the key-pair and an extension.

<p align="center">
    <img src="images/EC2_Create_KP.png" width="500"/>
</p>

A `.pem` file with the name of the key-pair will be created. This file is the key-pair that you will use to connect to the instance and will be directly download as soon as you click `'Create key pair'`. We need to make sure the `.pem` file has only read permission. To do this, you can run the following command:

`chmod 400 <key-pair-name>.pem`

`chmod` stand for `change mode` and `400` is the permission. You can check the numbers of chmod in this [link](https://en.wikipedia.org/wiki/Chmod)

Of course, when you run the command, make sure you are in the same directory as the `.pem` file.

# Create EC2 Instance

Remember that, in order to make sure you have access to the EC2 instance, you also need a security group. You can create a security group by going to the AWS console and clicking on the EC2 section. You will see a list of security groups. You can click on the `Create Security Group` button. Make sure you are setting the correct inbound rules:

- HTTP: Anywhere IPv4
- HTTPS: Anywhere IPv4
- SSH: My IP

Once you are done, remember to give it a name you will remeber.
After you have created the security group, you can click on the `Launch Instance` button in the EC2 main page.

<p align="center">
    <img src="images/EC2_Launch_1.png" width="500"/>
</p>

You will be asked to choose an AMI (Amazon Machine Image). For this lesson, we will use the `Amazon Linux 2 AMI` image

<p align="center">
    <img src="images/EC2_Launch_2.png" width="500"/>
</p>

Select the t2.micro instance type, and click on Review and Launch. In the next section we need to choose the security groups. Select the one you created before with the three inbound rules.

<p align="center">
    <img src="images/EC2_Launch_3.png" width="500"/>
</p>

Click on Launch and you will be prompted to select a key-pair. Select the key-pair you created before, and click Launch Instance. Your instance will take a few minutes to be launched. Once it is launched, you will need to look at its public DNS:

<p align="center">
    <img src="images/EC2_Launch_4.png" width="500"/>
</p>

You are all set! You can now connect to the instance through SSH. Make sure your `.pem` file has read permission, and that you specify the right path to the `.pem` file.

# Connect to the EC2 Instance

You will connect to your EC2 instance through the command line. To do this, you can run the following command:

`ssh -i <key-pair-name>.pem ec2-user@<public-dns>`

Remember that the in the key name you should specify the path to the `.pem` file (unless, of course, you are in the same directory). You might be prompted with a finger print. This is the fingerprint of the key-pair. You can ignore this fingerprint and type `yes` to continue. After that, you will see something like this:

<p align="center">
    <img src="images/EC2_Console.png" width="500"/>
</p>


Congratulations! You have successfully connected to your EC2 instance! Now, the instance will be empty, so let's populate it using secure copying.

In the terminal of your LOCAL machine you can copy files from it to the EC2 instance by typing this:

`scp -i </path/my-key-pair.pem> </path/my-file> ec2-user@<public-dns>:<path/>`


## Try it out

1. Create a simple script that creates a sample dataframe and that saves that dataframe into a csv file. In the `to_csv` method, set the keyword argument `mode` to `a`, so it appends the dataframe to the existing csv file
2. Move that script to your EC2 instance
3. SSH to your EC2 instance
4. Run the script using the python3 command
5. Oh no! pandas is not installed. Run `pip install pandas`
6. Oh no! pip is not installed. Run `curl -O https://bootstrap.pypa.io/get-pip.py` and then `python3 get-pip.py --user`
7. Install pandas and run the file again
8. Observe that, when running `ls` in the terminal, a new file has been created


# Schedule jobs: Cron

In many ocassions, you would like to run the same script multiple times at a specific time of the day, or each week, or once each month. In those cases, you can use cron jobs, which are scheduled jobs that run a specific task at a specific time or in a specific period of time. 

If you are on Mac or Linux, you can simply edit your crontab file, if you are on Windows, the OS offers a nice UI to set the schedule for the given task. Check the following link to know [how to do it](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10)

In this case, we are going to focus on crontabs, since we are going to be using Linux in our EC2 instance. Thus, if you are on Windows, connect to your EC2 instance, otherwise, you can schedule your jobs either on your local computer or on your EC2 instance.

## Crontab

You can manage the scheduled tasks using crontab. For this notebook, we are going to focus on three things you can do with crontab:

- `crontab -l`: List the tasks that have been scheduled
- `crontab -r`: Remove the tasks that have been scheduled
- `crontab -e`: Edit the tasks you want to schedule

Before editing our crontab, let's see the structure of a crontab. First, you will set a schedule using five numbers:

`minutes hours day_of_month month day_of_week`

The number you give to each of these fields will set the frequency of the command. You also have the `*` wildcard to set `every`. Some examples:

- `* * * * *` Every minute, every hour, every day of the month, every month, and every day of the week
- `30 4 * * *` Every day at 4:30
- `* * 5 * 6` Every minute all Saturdays (6) on the fifth day of the month
- `0 0 * 11 *` Every day at midnight during November (11)
- `0 0 1 1 *` Yearly on the 1st of January

Check out this webpage for creating your schedules: [https://crontab.guru](https://crontab.guru)

The second part of your crontab is the command you want to run. Remember you are in the terminal, if you want to run python script, you should write something within the lines: `python3 test.py`

Thus, the following command:

- `0 0 * * 5 python3 test.py` Will run `test.py` weekly at Friday midnight


We are going to add some tasks to our crontab file using the last command. Upon running it, you will open Vim, which is a text editor that, if you are not used to it, might look quite complicated. But don't worry, we got you! 

There are only two commands you need to know now, if you want delve more into it, go to this [link](https://vim.rtorr.com).

- `i`: Enter Insert mode. After this, you will be able to write
- `:wq`: Write and Quit, so you are saving the changes and quitting the editor

The Vim commands are written in its console. If you are in Insert mode, simply press Esc to enter the console

## Try it out

1. SSH to your EC2 instance
2. Set a new cron job using `crontab -e`
3. The cron job will run, every minute, the file you created previously to append elements to a csv 
4. Make sure it was installed with `crontab -l`
5. Let's take a 10 minutes break
6. Make some coffee
7. Check that the csv file is now populated with much more rows
8. Delete the cron job using `crontab -r`
9. Check that the deletion was effective using `crontab -l`

# EC2 and Selenium

The scraper you created might be intended to be ran daily or weekly:

- Problem: Running it manually might be tedious
- Solution: We can use `crontab`. <br>
<br>
- Problem: `crontab` doesn't work if your computer is off. 
- Solution: we can run it on our EC2 instance and leave it open<br>
<br>
- Problem: We don't have Chrome or chromedriver to run Selenium
- Solution: Download it!<br>
<br>
- Eventual Problem: When logging out, the script will halt
- Eventual Problem: Create a terminal multiplexor that leaves a session running


_Note: This will only work if you are on Amazon Linux 2 AMI instance_

Looks like we have work to do!

1. First of all, let's download google Chrome and check its version. In your EC2 instance, run the following commands:

`sudo curl https://intoli.com/install-google-chrome.sh | bash`

`sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome`

2. If no problem appeared, run the following to check the version: `google-chrome --version`

<details>
  <summary>If you are on Ubuntu 20.04</summary>

  If you are on Ubuntu, you can use the following commands to download the latest version of Chrome:

  `wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb`

  `sudo apt install ./google-chrome-stable_current_amd64.deb`

  Then, you can follow the same instructions to download the corresponding chromedriver

</details>

3. At the time of writing, the version is 93.x. Now, we have to donwload the corresponding chromedriver version. Remember you are in a Linux instance, so in the next webpage, look for the right version: [https://sites.google.com/chromium.org/driver/](https://sites.google.com/chromium.org/driver/). Make sure you copy the link, you don't have to download the file. 

`sudo wget https://chromedriver.storage.googleapis.com/93.0.4577.15/chromedriver_linux64.zip`

4. Unzip the downloaded file: `sudo unzip chromedriver_linux64.zip` 
5. Move that file to PATH: `sudo mv chromedriver /usr/bin/chromedriver`
6. You are ready to use selenium! (Don't forget to download `pip install selenium`)




# Multiplexing

As you start using selenium, you will see that your process will halt after ssh is terminated. You can use a terminal multiplexer with `tmux`. This, in essence, will run a new terminal (or set of terminals) and if you disconnect from it, your process won't die.

To install it, run: `sudo yum install tmux`

You can play around using `tmux`, since it has plenty of commands. In this notebook, we are going to focus on creating a terminals:

- To create a new session and connect to it, run: `tmux new-session -s session_name`
- Once inside, you can run anything as if you were in the main terminal. To detach from the session press `Ctrl + B`, and then `D`
- You can create as many sessions as you want. To list the active sessions, run: `t`mux list-sessions`
- To reconnect to an existing session, run: `tmux attach-session -t session_name`
- To end a session, you can run: `tmux kill-session -t session_name`

Once you leave it running, you can logout from the EC2 instance, and all the work will still be running
