# Master SQL Fundamentals Effortlessly as a Pandas User
## There was a time it was the other way around
![](images/unsplash.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@itssammoqadam?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Sam Moqadam</a>
        on 
        <a href='https://unsplash.com/?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash</a>
    </strong>
</figcaption>

### Motivation

When Pandas package gained public exposure in 2009, SQL had been dominating the data world since 1974. Pandas came with an attractive set of features such as in-built visualization, flexible data handling and became an ultimate data exploration tool. As it started gaining mass popularity, many courses and resources emerged teaching Pandas and comparing it to SQL.

Flash-forward to 2021, people are now getting introduced to the Pandas package *first* rather than the universal data language - SQL. Even though SQL is as popular as ever, the flexibility and multi-functionality of Pandas is making it the first choice for beginner data scientists.

Then, why do you need SQL if you know Pandas?

Even though Pandas may seem a better choice, SQL still plays a crucial role in day-to-day job of a data scientist. In fact, SQL is the second most in-demand and the third most growing programming language for data science (see [here](https://towardsdatascience.com/the-most-in-demand-skills-for-data-scientists-in-2021-4b2a808f4005)). So, it is a must to add SQL to your CV if you want to get a job in the field. And knowing Pandas, learning SQL should be a breeze, as you will see in this article.

### Connecting to a Database

Setting up an SQL workspace and connecting to a database can be a real pain in the neck. First, you need to install your favorite SQL flavor (PostgreSQL, MySQL, etc.) and download an SQL IDE too. Doing all those here would deviate us from the purpose of the article, so we will use a shortcut. 

Specifically, Pandas allows us to run SQL queries directly and store the results as a DataFrame. All we need is a connection to the relevant SQL database.

To illustrate how basic SQL statements work, we will be using the Chinook database which can be downloaded [here](https://www.sqlitetutorial.net/sqlite-sample-database/). The database has 11 tables, each of which has its own name. 

To connect to this database and retrieve data stored in its tables, we will use SQLAlchemy and Pandas:

In [14]:
import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine("sqlite:///data/chinook.db")

SQLAlchemy creates an object to create a connection to a local database. This object is called an engine and can be created using the `create_engine` function. The function takes the path to the database. In the example, I am connecting to the downloaded Chinook database which is stored in the *data* folder of the current directory. The valid pathname in `create_engine` should always start with `sqlite:///` prefix. 

To see if the connection was successful, you can call the `table_names` method:

In [20]:
engine.table_names()

['albums',
 'artists',
 'customers',
 'employees',
 'genres',
 'invoice_items',
 'invoices',
 'media_types',
 'playlist_track',
 'playlists',
 'sqlite_sequence',
 'sqlite_stat1',
 'tracks']

> You can learn more about SQLAlchemy [here](https://docs.sqlalchemy.org/en/14/orm/tutorial.html).

If you wish to connect to a remote database, I have a separate article on how to do so:

https://towardsdev.com/storing-digital-files-in-remote-sql-databases-in-python-73494f09d39b
