# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Session 10 - part 1: Relational Databases and Pandas 

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

![](../img/IntroRDataScience_NonTech-1.jpg)

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

![](../img/DK_Logo_100.png)

***

### 0. What do we want to do today?

Our goal in Session 10 - part 1 is to prepare ourselves technically for what follows.

Below you will find sections on how to perform all necessary installations for Windows (0.1), macOS (0.2), and Ubuntu/Debian (0.3). Section 0.4 provides an overview of the installed packages. 

Section 2 has some test code; run it.

#### 0.1 Installations for Windows


* Install MariaDB on Windows:
- Go to official MariaDB [site](https://mariadb.org/download/?t=mariadb&p=mariadb&r=10.11.2&os=windows&cpu=x86_64&pkg=msi&m=bme), and select **MariaDB Server 10.11.2** for **Windows**, architecture **x86_64**, and package type **MSI Package**.
- Click **Download** and run the installer once the file is downloaded.
- Before you start the installation process, keep in mind that the setup procedure will ask you for the *password*. This is the administrator's password. For consistency, set it to **datakolektiv111**. Obviously, this is for the purpose of the course. In real life, you should keep it a secret and not make it easy to guess.
- Follow the installation steps from [here](https://www.mariadbtutorial.com/getting-started/install-mariadb/). Disregard if there is no *Step 6.* in your installation steps.
- Once you are finished, you will have **MariaDB** relational database server installed on your host.


#### 0.2 Install MariaDB on MacOS:
- Instal [Homebrew](https://brew.sh/) package manager if it is not already installed:
- go to [Homebrew](https://brew.sh/) and copy the command under **Install Homebrew** section;
- then just copy and paste it into your terminal and follow the instructions.
- When Homebrew is installed, in your terminal: `brew install mariadb`.
- When the installation completes, start MariaDB Server: `mysql.server start`.
- Test: in terminal, `mysql`, then `show databases;` in the MariaDB prompt, then `exit` to exit MariaDB.
- You now have **MariaDB** relational database server installed on your host.


#### 0.3 Installations for Ubuntu/Debian:
- Install MariaDB on Ubuntu/Debian:
- Open the terminal.
- Run `sudo apt-get update` and then `sudo apt-get install mariadb-server`. This will install MariaDB on your host.
- Then, by running the next command, you will be given questions with choices. We will go through each choice below.
- Run `sudo mysql_secure_installation`. This should ask you a couple of questions regarding the initial setup of the database server.
- On question **Change the root password? [Y/n]** answer with **Y**. Type in the **datakolektiv111** as the password. Obviously, this is for the purpose of the course. In real life, you should keep it a secret that is not easy to guess.
- On question **Remove anonymous users? [Y/n]** answer with **Y**.
- On question **Disallow root login remotely? [Y/n]** answer with **Y**.
- On question **Remove test database and access to it? [Y/n]** answer with **Y**.
- On question **Reload privilege tables now? [Y/n]** answer with **Y**.
- Once you are finished, you will have **MariaDB** relational database server installed on your host.



### 1 Setting up the database and tables

Once you have your relational database management system (RDBMS) up and running, you will need to create a database, users, grant user permissions on the database, and create tables.


Again, this will be slightly different for each operating system, so we will describe steps for each.



#### 1.1 Windows setup:
To do necessary database preparation, go through next steps: 
- Click on the *Windows* button and type 'maria'. You should be presented with a couple of options related to the newly installed MariaDB.
- Run **Command Prompt (MariaDB 10.11 (x64))**.
- Once you have your command prompt opened, navigate to this session's `_data` folder. Remember, `cd` command is for switching directories, `cd ..` is for "moving" to the parent directory.
- Now that you are in the `session10/_data` folder, run `dir` and make sure you have 4 files with *sql* extension.
- The following `.sql` files should be found there: `create_database.sql`, `create_tables.sql`, `drop_database.sql`, `drop_tables.sql`.
- We will first "set up" the database, user, and user privileges by running the following command: `mysql --user=root --password=datakolektiv111 < create_database.sql`.
- After we have our database and user in place, run the command: `mysql --user=datakolektiv --password=datakolektiv123 < create_tables.sql`.
- Now you are ready to go straight to Section 2.

#### 1.2 MacOS setup:
- In your terminal, navigate to the `session10/_data` directory in the directory where the course repository is found.
- Now that you are in the `session10/_data` folder, run `ls` and make sure you have 4 files with *sql* extension.
- The following `.sql` files should be found there: `create_database.sql`, `create_tables.sql`, `drop_database.sql`, `drop_tables.sql`.
- We will first "set up" the database, user, and user privileges by running command: `sudo mysql --user=root < create_database.sql`.
- After we have our database and user in place, run command: `mysql --user=datakolektiv --password=datakolektiv123 < create_tables.sql`.
- Now you are ready to go straight to the Section 2.

#### 1.3 Ubuntu/Debian setup:
To do necessary database preparation, go through next steps:
- Open terminal.
- Once you have your terminal opened, navigate to this session's `_data` folder. Remember, `cd` command is for switching directories, `cd ..` is for "moving" to the parent directory.
- Now that you are in the `session10/_data` folder, run `ls` and make sure you have 4 files with *sql* extension.
- The following `.sql` files should be found there: `create_database.sql`, `create_tables.sql`, `drop_database.sql`, `drop_tables.sql`.
- We will first "set up" the database, user, and user privileges by running command: `sudo mysql --user=root < create_database.sql`.
- After we have our database and user in place, run command: `mysql --user=datakolektiv --password=datakolektiv123 < create_tables.sql`.
- Now you are ready to go straight to the Section 2.


### 2 Insert the data in tables

Now that we have the database, users, and tables in place, we want to insert our data into the tables.

However, this either requires using command prompt tools or finding another way. Let's use Pandas way.

Before we are able to use Pandas, we must install additional libraries in our Python virtual environment. To proceed with installing packages in a virtual environment, you have to activate it. By now, you should be able to do that by yourself.

After activating it, you will run:

`pip install mysql-connector==2.2.9 SQLAlchemy==2.0.7` 

This will install the necessary libraries so we can run the next cells for populating our tables with data.

In [8]:
import os
import pandas as pd

from sqlalchemy import create_engine
from sqlalchemy import text as sql_text

In [9]:
engine = create_engine('mysql+mysqlconnector://datakolektiv:datakolektiv123@localhost/nycflights') # use this as is


In [10]:
for file in list(os.walk('_data/'))[0][2]:
    if len(file.split('.')[1]) > 1 and file.split('.')[1] == 'csv':
        print(f'Inserting data from {file}')
        file_name = file.split('.')[0]

        df = pd.read_csv(f'_data/{file}', index_col=0)

        df.to_sql(file_name,
                  con=engine, 
                  index=False, 
                  if_exists='append',
                  chunksize=10_000)

Inserting data from airlines.csv


  df.to_sql(file_name,


AttributeError: 'Engine' object has no attribute 'cursor'

In [11]:
# populate indexed flights table
df = pd.read_csv('_data/flights.csv', index_col=0)
df.to_sql('flights_i', con=engine, index=False, if_exists='append', chunksize=10_000)

FileNotFoundError: [Errno 2] No such file or directory: '_data/flights.csv'

In [None]:
# populate indexed airports table
df = pd.read_csv('_data/airports.csv', index_col=0)
df.to_sql('airports_i', con=engine, index=False, if_exists='append', chunksize=10_000)

Inserting `flights.csv` data set takes a while. It does have over 300k rows after all.

In [None]:
pd.read_sql_query(sql=sql_text("SELECT * FROM flights"), con=engine.connect())

You should be able to see the `flights` DataFrame rows, similarly to what we had in session07.

Now you are ready for the session 10 - part 2.

<hr>

DataKolektiv, 2022/23.

[hello@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com)

![](../img/DK_Logo_100.png)

<font size=1>License: [GPLv3](https://www.gnu.org/licenses/gpl-3.0.txt) This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.</font>