# Access MySQL with R

This notebook shows how to access MySQL using R by following the steps below:
1. Install and import the RMySQL library
1. Identify and enter the database connection credentials
1. Create the database connection
1. Query the data
1. Close the database connection



## What is MySQL?

MySQL is an open-source relational database management system (RDBMS) that is widely used as client–server model RDBMS.

## Why MySQL?

When dealing with large datasets (for example 50 GB) that potentially exceed the memory of your machine (RAM), it is nice to have another possibility such as an PostgreSQL database, where you can query the data in smaller digestible chunks. In this way, you just query data in smaller chunks (for instance 2 GB), and leave resources for the computation.


## Import the RMySQL library

__RMySQL__ is the R package that allows you to talk to MySQL (and MariaDB) databases. This package is already pre-installed in your Workbench.


In [None]:
library(RMySQL)

## Identify the database connection credentials

Connecting to MySQL database requires the following information:
* Database name 
* Host DNS name or IP address 
* Host port
* User ID
* User Password

All of this information must be captured in a connection string in a subsequent step.

__Notice:__ To obtain credentials follow this [user guide](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html)

The following code snippet creates a connection string `dsn`

In [None]:
#Enter the values for you database connection
dsn_database = "<your database name>"            # e.g. "BLUDB"
dsn_hostname = "<your hostname>" # e.g.: "mydbinstance.cz6pjylrdjko.us-east-1.rds.amazonaws.com"
dsn_port = <your port>                # e.g. 3306 without quotation marks
dsn_uid = "<your username>"        # e.g. "user1"
dsn_pwd = "<your password"      # e.g. "7dBZ3jWt9xN6$o0JiX!m"

## Create the database connection

The following code snippet creates a connection object, `conn`.

In [None]:
conn = dbConnect(MySQL(), user=dsn_uid, password=dsn_pwd, dbname=dsn_database, host=dsn_hostname)
conn

## Create a table
We create a test table namely __Cars__. Use the below code to drop the __Cars__ table if it already exist and then create the table.

In [None]:
dbSendQuery(conn, 'DROP TABLE IF EXISTS Cars')
dbSendQuery(conn, 'CREATE TABLE Cars(Id INTEGER PRIMARY KEY, Name VARCHAR(20), Price INT)')

In [None]:
dbSendQuery(conn,"INSERT INTO Cars VALUES(1,'Audi',52642)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(2,'Mercedes',57127)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(3,'Skoda',9000)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(4,'Volvo',29000)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(5,'Bentley',350000)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(6,'Citroen',21000)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(7,'Hummer',41400)")
dbSendQuery(conn,"INSERT INTO Cars VALUES(8,'Volkswagen',21600)")

## Query the Data
You can now use the connection object `conn` to query the database.

In [None]:
query = "SELECT * FROM Cars";
rs = dbSendQuery(conn, query);
df = fetch(rs, -1);
df

## Close the Connection
It is good practice to close your database connection after work is done.

In [None]:
dbDisconnect(conn)

## Want to learn more?

### Free courses on [Big Data University](https://bigdatauniversity.com/courses/?utm_source=tutorial-mysql-r&utm_medium=dswb&utm_campaign=bdu):
<a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-mysql-r&utm_medium=dswb&utm_campaign=bdu"><img src = "https://ibm.box.com/shared/static/xomeu7dacwufkoawbg3owc8wzuezltn6.png" width=600px> </a>


<h3>Authors:</h3>
<article class="teacher">
<div class="teacher-image" style="    float: left;
    width: 115px;
    height: 115px;
    margin-right: 10px;
    margin-bottom: 10px;
    border: 1px solid #CCC;
    padding: 3px;
    border-radius: 3px;
    text-align: center;"><img class="alignnone wp-image-2258 " src="https://ibm.box.com/shared/static/tyd41rlrnmfrrk78jx521eb73fljwvv0.jpg" alt="Saeed Aghabozorgi" width="178" height="178" /></div>
<h4>Saeed Aghabozorgi</h4>
<p><a href="https://ca.linkedin.com/in/saeedaghabozorgi">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>
</article>
<article class="teacher">
<div class="teacher-image" style="    float: left;
    width: 115px;
    height: 115px;
    margin-right: 10px;
    margin-bottom: 10px;
    border: 1px solid #CCC;
    padding: 3px;
    border-radius: 3px;
    text-align: center;"><img class="alignnone size-medium wp-image-2177" src="https://ibm.box.com/shared/static/2ygdi03ahcr97df2ofrr6cf8knq4kodd.jpg" alt="Polong Lin" width="300" height="300" /></div>
<h4>Polong Lin</h4>
<p>
<a href="https://ca.linkedin.com/in/polonglin">Polong Lin</a> is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through Big Data University. Polong is a regular speaker in conferences and meetups, and holds a M.Sc. in Cognitive Psychology.</p>
</article>

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​