# 1. Why SQL is Important to Learn

Welcome to the fundamentals of SQL and databases! In this course, we'll explore and analyze data in SQL through hands-on learning. Before we get started, let's quickly cover why SQL is essential for working with data.

Structured Query Language, or SQL, is more than forty years old, and it is one of the most popular technologies used by data professionals, including data analysts, data scientists, and data engineers. Understanding the fundamentals of a more general-purpose language like Python or R is critical for working with data, but knowing SQL helps data professionals do more with their data. And if working with R or Python is one of your goals, SQL can help gather insights from data.

Here are a few key reasons why learning SQL will help anybody interested in working with data.

**SQL is everywhere.**

Almost all of the biggest names in tech use SQL — which is pronounced either “sequel” or “S.Q.L.” Companies like Facebook, Google, and Amazon have built their own high-performance database systems, but even their data teams use SQL to query data and perform data analysis. And it’s not just tech companies: companies big and small around the world use SQL.

**SQL enables us to pull data from many sources.**

In many real-word situations, data is distributed across many sources. SQL allows us to select specific data and transform it to fit our needs. For example, working with spreadsheets can be difficult if the data we need to answer our question is distributed across many files. SQL allows us to structure our data in a way that makes it accessible from one place.

SQL data is structured into multiple, connected tables.


**SQL is here to stay.**

The Stack Overflow annual Developer Survey, which is the largest and most comprehensive survey of programmers around the world, consistently reveals that SQL is one of the most popular technologies used today.

Check out [this Dataquest blog post](https://www.dataquest.io/blog/why-sql-is-the-most-important-language-to-learn/) if you'd like to learn more about why it's important to learn SQL. And if you'd like to learn more about how to learn SQL online with Dataquest, check out [this blog post](https://www.dataquest.io/blog/learn-sql-online/) for some tips and tricks to learn SQL online.

So, let's learn about the language itself and how you can use it to query data.

# 2. Introduction to Databases

Before we get started, let's learn a little about databases. Don't worry: you'll start practicing in the next screen.

When we work with data stored on our computers, we load the data from files like spreadsheets (and text files in several different formats). Working with files solely on our computer is fine most of the time, but we run into problems when we consider a few questions:

* What if the data is too big to fit into a single spreadsheet file?
* What if you share the data with team members and keep it updated?
* What if there's sensitive information in your data that needs protection?

Thankfully, these problems already have a solution: the database. A database structures data just like a spreadsheet by organizing data in different tables, which are comprised of rows and columns. In the example below, we see a table in a spreadsheet where each row represents a chess player and each column gives information about the player.

![](https://dq-content.s3.amazonaws.com/252/spreadsheet.png)

A database can store much more data more securely than a spreadsheet or a text file. Unlike simply opening a spreadsheet, we actually have to "ask" for data from the database.

We primarily interact with a database using a database management system (DBMS) — a computer program to help users interact with data by giving the computer instructions through the DBMS.

We'll begin learning SQL with the DBMS SQLite. SQLite is a lightweight DBMS, and it is the most popular database in the world. Move on to the next screen to start learning some SQL code!

# 3. Your First Query

In [3]:
%load_ext sql
%sql sqlite://


In [8]:
%sql sqlite:////home/mohammeds/datasets/jobs.db

In [38]:
%%sql

SELECT *
    FROM recent_grads
LIMIT 20;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972
5,6,2418,NUCLEAR ENGINEERING,Engineering,2573,17,2200,373,0.144966965,1857,2038,264,1449,400,0.177226407,65000,50000,102000,1142,657,244
6,7,6202,ACTUARIAL SCIENCE,Business,3777,51,832,960,0.535714286,2912,2924,296,2482,308,0.095652174,62000,53000,72000,1768,314,259
7,8,5001,ASTRONOMY AND ASTROPHYSICS,Physical Sciences,1792,10,2110,1667,0.4413555729999999,1526,1085,553,827,33,0.021167415,62000,31500,109000,972,500,220
8,9,2414,MECHANICAL ENGINEERING,Engineering,91227,1029,12953,2105,0.139792801,76442,71298,13101,54639,4650,0.0573422779999999,60000,48000,70000,52844,16384,3253
9,10,2408,ELECTRICAL ENGINEERING,Engineering,81527,631,8407,6548,0.437846874,61928,55450,12695,41413,3895,0.059173845,60000,45000,72000,45829,10874,3170


# 4. Understanding Your First Query

Congratulations on running your first SQL query in this course! The output was the entirety of the recent_grads table.

The process you used to visualize recent_grads breaks down into two steps:

* Write a SQL query that expresses the request "fetch all the data in the table."
* Ask the SQLite DBMS software to run the code and display the results.

The query you ran is an example of computer code. Writing computer code is called programming.

SQL is one of many programming languages, and just like spoken languages, code in SQL has to follow a defined structure and vocabulary. To display recent_grads, you ran the following SQL query:


    SELECT * 
      FROM recent_grads;
In the query above, we specified the following:

* The columns we wanted using SELECT * — the symbol * selects all the columns.
* The table we wanted to query using FROM recent_grads.

The order of the different words in this query and the space between SELECT, *, FROM, and recent_grads are crucial features of SQL syntax. If we don't follow the syntax, the database will probably not return the information we want.

The ; character signals the end of the query, but it isn't mandatory.

Here's a visual breakdown of the different components of the query:

![](https://s3.amazonaws.com/dq-content/252/select_breakdown.svg)

You may have noticed that SELECT and FROM use uppercase letters. This isn't required, but it makes your code easier to read.

Since we often read code more often than we write it, it's common for coders to follow certain conventions, so that programs written by different people look the same, reducing the amount of work it takes to read them.

A couple of other elements that aren't required are the line change and indentation right before FROM. The reason why we changed lines and indented this query is the same as above: stylistic conventions.

You may also have noticed that some words are highlighted. This happens because they are reserved words, (i.e., they are words that serve a particular purpose, so you can't or shouldn't use them for anything else).

In these courses, we'll be following this SQL Style Guide. We suggest that you explore it as you progress through the course.

Let's confirm that lines changes, capitalization, and indentation aren't crucial for the query to run.

In [39]:
%%sql

select *
    from recent_grads
limit 20;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972
5,6,2418,NUCLEAR ENGINEERING,Engineering,2573,17,2200,373,0.144966965,1857,2038,264,1449,400,0.177226407,65000,50000,102000,1142,657,244
6,7,6202,ACTUARIAL SCIENCE,Business,3777,51,832,960,0.535714286,2912,2924,296,2482,308,0.095652174,62000,53000,72000,1768,314,259
7,8,5001,ASTRONOMY AND ASTROPHYSICS,Physical Sciences,1792,10,2110,1667,0.4413555729999999,1526,1085,553,827,33,0.021167415,62000,31500,109000,972,500,220
8,9,2414,MECHANICAL ENGINEERING,Engineering,91227,1029,12953,2105,0.139792801,76442,71298,13101,54639,4650,0.0573422779999999,60000,48000,70000,52844,16384,3253
9,10,2408,ELECTRICAL ENGINEERING,Engineering,81527,631,8407,6548,0.437846874,61928,55450,12695,41413,3895,0.059173845,60000,45000,72000,45829,10874,3170


# 5. Previewing a Table

You may have noticed that in the last screen, despite the fact that we wrote two queries, we only saw the result of the second one.

This isn't because the queries are the same. It's a quirk of the SQLite database that only the last query will display visually. If we want to see the results of multiple queries, we can run each query by itself.

You may also have noticed that this table has 173 rows and over 20 columns. For a computer, this isn't much information. For a human, however, it's difficult to make sense of this much data.

![](https://dq-content.s3.amazonaws.com/252/whole_table.png)

In the following lessons and in the next course, we'll learn how to make sense of large amounts of data using SQL. For now, we'll focus on how we can preview a table without displaying it completely.

In practice, you will often need to access a database without any documentation. In this situation, you'll have to rely on the surrounding context of the database and on your own exploration.

Some tables have millions and millions of rows, so a task as simple as displaying a table can take a very long time, and if you're just trying to explore the table, it isn't really useful to see all of it.

Fortunately, SQL allows us to limit the number of rows we see by using the LIMIT clause. Move on to the next screen to learn how.

# 6. The LIMIT Clause

In [40]:
%%sql

SELECT *
    FROM recent_grads
LIMIT 5;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972


# 7. Selecting Specific Columns

In [41]:
%%sql

SELECT Major, ShareWomen
    FROM recent_grads
LIMIT 5;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,ShareWomen
PETROLEUM ENGINEERING,0.120564344
MINING AND MINERAL ENGINEERING,0.1018518519999999
METALLURGICAL ENGINEERING,0.153037383
NAVAL ARCHITECTURE AND MARINE ENGINEERING,0.107313196
CHEMICAL ENGINEERING,0.341630502


# 8. Filtering Rows Using WHERE

In [42]:
%%sql

SELECT Major, ShareWomen
    FROM recent_grads
WHERE ShareWomen<0.5
LIMIT 5;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,ShareWomen
PETROLEUM ENGINEERING,0.120564344
MINING AND MINERAL ENGINEERING,0.1018518519999999
METALLURGICAL ENGINEERING,0.153037383
NAVAL ARCHITECTURE AND MARINE ENGINEERING,0.107313196
CHEMICAL ENGINEERING,0.341630502


# 9. Expressing Multiple Filter Criteria Using 'AND'

In [43]:
%%sql

SELECT Major, Major_category, Median, ShareWomen
    FROM recent_grads
WHERE ShareWomen>0.5
    AND Median>50000;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,Major_category,Median,ShareWomen
ACTUARIAL SCIENCE,Business,62000,0.535714286
COMPUTER SCIENCE,Computers & Mathematics,53000,0.578766338


# 10. Returning One of Several Conditions With OR

In [44]:
%%sql

SELECT Major, Median, Unemployed
    from recent_grads
WHERE Median >= 10000 
    OR Men>Women
LIMIT 20;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,Median,Unemployed
PETROLEUM ENGINEERING,110000,37
MINING AND MINERAL ENGINEERING,75000,85
METALLURGICAL ENGINEERING,73000,16
NAVAL ARCHITECTURE AND MARINE ENGINEERING,70000,40
CHEMICAL ENGINEERING,65000,1672
NUCLEAR ENGINEERING,65000,400
ACTUARIAL SCIENCE,62000,308
ASTRONOMY AND ASTROPHYSICS,62000,33
MECHANICAL ENGINEERING,60000,4650
ELECTRICAL ENGINEERING,60000,3895


# 11. Grouping Operators with Parentheses

In [46]:
%%sql

SELECT Major, Major_category, ShareWomen, Unemployment_rate
    FROM recent_grads
WHERE Major_category=="Engineering"
    AND (ShareWomen>50
    OR Unemployment_rate<0.051)

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,Major_category,ShareWomen,Unemployment_rate
PETROLEUM ENGINEERING,Engineering,0.120564344,0.018380527
METALLURGICAL ENGINEERING,Engineering,0.153037383,0.024096386
NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,0.107313196,0.050125313
MATERIALS SCIENCE,Engineering,0.310820285,0.023042836
ENGINEERING MECHANICS PHYSICS AND SCIENCE,Engineering,0.183985189,0.006334343
INDUSTRIAL AND MANUFACTURING ENGINEERING,Engineering,0.3434732179999999,0.042875544
MATERIALS ENGINEERING AND MATERIALS SCIENCE,Engineering,0.292607004,0.027788805
INDUSTRIAL PRODUCTION TECHNOLOGIES,Engineering,0.75047259,0.028308097
ENGINEERING AND INDUSTRIAL MANAGEMENT,Engineering,0.174122505,0.03365166


# 12. Ordering Results Using ORDER BY

In [52]:
%%sql

SELECT Major, ShareWomen, Unemployment_rate
    FROM recent_grads
WHERE ShareWomen>0.3
    AND Unemployment_rate<0.1
ORDER BY ShareWomen DESC
LIMIT 10;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,ShareWomen,Unemployment_rate
EARLY CHILDHOOD EDUCATION,0.967998119,0.040104981
MATHEMATICS AND COMPUTER SCIENCE,0.927807246,0.0
ELEMENTARY EDUCATION,0.923745479,0.046585715
ANIMAL SCIENCES,0.91093257,0.050862499
PHYSIOLOGY,0.906677337,0.0691628
MISCELLANEOUS PSYCHOLOGY,0.90558993,0.05190783
HUMAN SERVICES AND COMMUNITY ORGANIZATION,0.904074544,0.037819026
NURSING,0.896018988,0.044862724
GEOSCIENCES,0.881293889,0.024373731
MASS MEDIA,0.8772275279999999,0.089836827


# 13. Practice Writing a Query

In [53]:
%%sql

SELECT Major_category, Major, Unemployment_rate
    FROM recent_grads
WHERE Major_category=="Engineering"
    OR Major_category=="Physical Sciences"
ORDER BY Unemployment_rate ASC;

   sqlite://
   sqlite:////home/mohammeds/Documents/Data
   sqlite:////home/mohammeds/Documents/Data\
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major_category,Major,Unemployment_rate
Engineering,ENGINEERING MECHANICS PHYSICS AND SCIENCE,0.006334343
Engineering,PETROLEUM ENGINEERING,0.018380527
Physical Sciences,ASTRONOMY AND ASTROPHYSICS,0.021167415
Physical Sciences,ATMOSPHERIC SCIENCES AND METEOROLOGY,0.022228555
Engineering,MATERIALS SCIENCE,0.023042836
Engineering,METALLURGICAL ENGINEERING,0.024096386
Physical Sciences,GEOSCIENCES,0.024373731
Engineering,MATERIALS ENGINEERING AND MATERIALS SCIENCE,0.027788805
Engineering,INDUSTRIAL PRODUCTION TECHNOLOGIES,0.028308097
Engineering,ENGINEERING AND INDUSTRIAL MANAGEMENT,0.03365166


# 14. Next Steps

In this lesson, we became familiar with a dataset stored in a SQLite table by learning how to craft basic SQL queries. The kind of queries we learned here and that we'll keep studying for the next few courses are one of four types of SQL commands we can give:

* Data query language
* Data definition language
* Data control language
* Data manipulation language

In this course and the next, we'll focus on data query language (DQL). DQL is the part of SQL that allows users to extract data from databases. Data engineers use the remaining types of instructions to create and maintain databases.

Here are a few things to note:

* We rarely linked to SQLite documentation, because it's a bit challenging to understand when you're just beginning. Sites like W3 Schools and SQL ZOO are more friendly for looking up SQL commands.
* We learned about clauses, statements, keywords, and operators in SQL. Here's a diagram describing the difference between each term:

![](https://s3.amazonaws.com/dq-content/252/sql_components.svg)

The ability to quickly iterate on queries as you think of new questions is the appeal of SQL. The SQL workflow lets data professionals focus on asking and answering questions, instead of lower-level programming concepts. There's a clear separation of concerns between the engine that stores, organizes, and retrieves the data and the language that lets people interface with the data without worrying about the underlying mechanics.

As the scale of data has increased, engineers have maintained the SQL interface while changing the database engine. This allows people who need to ask and answer questions easily transfer their SQL experience, even as database technologies change. For example, the Presto project lets you query using SQL but use data from database systems like MySQL, from a distributed file system like HDFS, and more.

In the next lesson, we'll learn how to compute summary statistics and perform reductions on the same data in SQL.