## AWS RDS

AWS RDS is a service that allows you to create a database in the cloud. This database is highly scalable and has a variety of uses. The service facilitates the creation of different types of databases, such as PostgreSQL, MySQL, Oracle and Amazon Aurora. For demonstration, we will create a PostgreSQL database.

### Creating a security group for the database

First if you haven't already configured your own security group on AWS, you will need to create a security group for our database. Security groups act as a virtual firewall to protect any resources on your AWS account from outside traffic as well as blocking outgoing traffic to unverified sources. We will need to preconfigure one before creating the database so we can attach it the database after creation. This will allow the control of traffic flowing into and out of the database. 

From the AWS dashboard navigate to the VPC dashboard.

<img src="./images/select_VPC.png">

From the VPC dashboard select security groups.

<img src="./images/click_security.png">

Create a new security group by clicking the `Create security group` button.

<img src="./images/create_security.png">

Give the security group a nice memorable description and name so that you know can identify it at a later date. 

By default the security group has no rules allowing inbound traffic but is configured to allow all outbound traffic. This can be seen under the `Outbound rules` the type is configured with `All traffic` which means all ports and protocols possible. The destination is set to custom with `0.0.0.0/0` set as the destination, the IPv4 address `0.0.0.0/0` is a placeholder that represents all possible `IPv4` addresses. 

`IPv4` addresses are the most common form of IP addresses represented in the form `x.x.x.x` each `x` can take on values between 0 and 255 to represent a unique address on the internet. So addresses can range from `0.0.0.0` all the way to `255.255.255.255` an example might be the address `88.23.101.192`. The `/0` at the end of the IP addresses represents the CIDR(Classless Inter-Domain Routing) it can range between the values 0 to 32. This number represents a range of IP addresses from one IP address all the way up to 4,294,967,296(every possible) IPv4 address. For a personal computer this value will be `/32` which represents one IP address for bigger companies they might have a value like `/21` which gives them access to 2,048 possible IP addresses. 

The new form of IP addresses IPv6, have the form `FE80:CD00:0000:0CDE:1257:0000:211E:729C` though aren't commonly used yet. They can represent up to 340 trillion trillion trillion different IP addresses. They will be more common in the future since believe it or not we're running out of IPv4 IP addresses.

<img src="./images/security_group_init.png">

After you have finished entering the details scroll down the the bottom of the page and select `Create security group` the security group should now be created. 

<img src="./images/security_group_created.png?modified=32423">

We don't need to worry about configuring the security group at the moment. We can configure it after it is attached to the RDS database, we can now continue on to create the database.

### Creating a PostgreSQL database

Go to the [AWS Console](https://console.aws.amazon.com/console/home), and select the Services tab. Next, click on the RDS tab, followed by `'Create database'`.

![](images/Create_RDS.png)

In the next window, select PostgreSQL as the type of database, and select the latest version of PostgresSQL, this will be the version with the highest version number. In this case the latest version is 13.7-R1 has been selected. Make sure to choose the free tier, unless you are willing to pay.



![](images/Create_RDS2.png)

Subsequently, provide an identifier for your instance:

![](images/Create_RDS3.png)

In the DB instance class, choose `db.t3.micro`, which is free. In the connectivity options, select publicly accessible, and in `Existing VPC Security Group` section, select the security group you previously created.

![](images/Create_RDS4.png)

Click on Create, and wait for the process to be completed. Note that this might take a while. Once completed, the status should change to `available`. Now, click on the instance ID, and you should see the details of the instance. Take note of the Endpoint, which is the IP address of the database. This is unique to your RDS database and will differ from the one in the image.

![](images/Create_RDS5.png)

We will now finally configure access to the database from outside AWS, by default our security group will only allow outbound traffic. The security group has two types of rules from controlling traffic to resources inside the security group:

- __Inbound rules__ - Controls which IP addresses and protocols can connect to resources inside the security group from outside the security group.
- __Outbound rules__ - Controls which IP addresses and protocols resources inside the security group can connect to outside the security group. 

From the `Connectivity & security` tab select the security group you created in the security group section.

<img src="./images/open_security_group.png">



Make sure the `Inbound rules` tab is selected and click the `Edit inbound rules` button to begin configuring the inbound security rules. 

<img src="./images/edit_inbound.png">

Next select click the `Add rule` button to begin adding a new rule. Select the `Type` as `PostgreSQL` since we want to allow connections to our PostgreSQL database. 

For the `Source` we have the following options:

- __Custom__ - Define a custom IP address which will be allowed to connect
- __Anywhere-IPv4__ - Allow all Ipv4 addresses to connect to the database
- __Anywhere-IPv6__ - Allow all Ipv6 addresses to connect to the database.  
- __My IP__ - Allow the IP address of the current machine to connect to the database. 
AWS will automatically retrieve the IP address for you if you select this option.

Normally it is a good idea to allow only your IP address to connect to the database to ensure only you can connect. This would ensure that it's only available to your machine. This does cause some issues, anytime your internet connection resets your internet service provider(ISP) will assign you a new IP address invaliding the security rule. If you select this option be aware that any time your connection drops then you may lose access to the database. 

In industry there will be tight restrictions on which IP addresses can connect the database and who it is available to. In this case since this is just a practice database and contains no sensitive information we can just allow all IP addresses to connect with the `Anywhere-IPv4` option. Just remember this is not the best practice but will save time resetting the security rule anytime your IP address changes.

Configure the security rule as follows and save the rule with the `Save rules` button.

<img src="./images/config_inbound_rule.png">

Once all the steps above have been completed, you should be ready to connect to the database. The default user and database are `postgres` and `postgres`, respectively.

In [None]:
from sqlalchemy import create_engine
DATABASE_TYPE = 'postgresql'
DBAPI = 'psycopg2'
ENDPOINT = 'aicoredb.c8k7he1p0ynz.us-east-1.rds.amazonaws.com' # Change it to your AWS endpoint
USER = 'postgres'
PASSWORD = 'Cosamona94'
PORT = 5432
DATABASE = 'postgres'
engine = create_engine(f"{DATABASE_TYPE}+{DBAPI}://{USER}:{PASSWORD}@{ENDPOINT}:{PORT}/{DATABASE}")



Run the following cell to confirm that everything works:

In [None]:
engine.connect()

<sqlalchemy.engine.base.Connection at 0x7fa9766ee7f0>

Now, we create a new table in our database. We will insert the iris dataset into our database. The iris dataset is a set of measurements of flowers from 150 species. It is a well-known dataset that is used in many machine-learning applications.

In [None]:
!pip install sklearn

In [None]:
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
iris = pd.DataFrame(data['data'], columns=data['feature_names'])
iris.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [None]:
iris.to_sql('iris_dataset', engine, if_exists='replace')

Unfortunately, AWS RDS does not allow you to view the tables you created; however, you can still access them using pgAdmin or SQLAlchemy. 

In [None]:
df = pd.read_sql_table('iris_dataset', engine)

In [None]:
df.head()

Unnamed: 0,index,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,0,5.1,3.5,1.4,0.2
1,1,4.9,3.0,1.4,0.2
2,2,4.7,3.2,1.3,0.2
3,3,4.6,3.1,1.5,0.2
4,4,5.0,3.6,1.4,0.2


If you go to pgAdmin, you will see the created table. As the first step, we establish a connection to the database:

![](images/pgAdmin4.png)

As can be observed, the iris_dataset is contained in the postgres database.

![](images/pgAdmin4_2.png)

## Conclusion
At this point, you should have a good understanding of

- the basics of cloud computing.
- how to create Security groups that limit the range of IP addresses that can access the service.
- how to create an RDS instance.
- how to connect to the RDS instance using SQLAlchemy.
- how to create a table in the database.
- how to connect to the database using pgAdmin.