# Course Description

The role of a data scientist is to turn raw data into actionable insights. Much of the world's raw data—from electronic medical records to customer transaction histories—lives in organized collections of tables called relational databases. Therefore, to be an effective data scientist, you must know how to wrangle and extract data from these databases using a language called SQL (pronounced ess-que-ell, or sequel). This course teaches you everything you need to know to begin working with databases today!

# 1) Selecting columns
This chapter provides a brief introduction to working with relational databases. You'll learn about their structure, how to talk about them using database lingo, and how to begin an analysis by using simple SQL commands to select and summarize columns from database tables.

## 1.1) Beginning your SQL journey
SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

The table of employees might look something like this:

| id | name    | age | nationality |   |
|----|---------|-----|-------------|---|
| 1  | Jessica | 22  | Ireland     |   |
| 2  | Gabriel | 48  | France      |   |
| 3  | Laura   | 36  | USA         |   |

for this example we have 4 files or columns.

## 1.2) SELECTing single columns
While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

In SQL, you can select data from a table using a SELECT statement. For example, the following query selects the name column from the people table:

    SELECT name
    FROM people;
    
In this query, SELECT and FROM are called keywords. In SQL, keywords are not case-sensitive, which means you can write the same query as:

    select name
    from people;
    
## 1.2) SELECTing multiple columns
Well done! Now you know how to select single columns.

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

For example, this query selects two columns, name and birthdate, from the people table:

    SELECT name, birthdate
    FROM people;
    
Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut:

    SELECT *
    FROM people;
If you only want to return a certain number of results, you can use the LIMIT keyword to limit the number of rows returned:

    SELECT *
    FROM people
    LIMIT 10;

## 1.3) SELECT DISTINCT
Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the DISTINCT keyword.

This might be useful if, for example, you're interested in knowing which languages are represented in the films table:

    SELECT DISTINCT language
    FROM films
    
## 1.4) Learning to COUNT
What if you want to count the number of employees in your employees table? The COUNT statement lets you do this by returning the number of rows in one or more columns.

For example, this code gives the number of rows in the people table:

    SELECT COUNT(*)
    FROM people; 
    
## 1.5) Practice with COUNT
As you've seen, COUNT(*) tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call COUNT on just that column.

For example, to count the number of birth dates present in the people table:

    SELECT COUNT(birthdate)
    FROM people;
It's also common to combine COUNT with DISTINCT to count the number of distinct values in a column.

For example, this query counts the number of distinct birth dates contained in the people table:

    SELECT COUNT(DISTINCT birthdate)
    FROM people;  
    
    

# 2) Filtering rows
This chapter builds on the first by teaching you how to filter tables for rows satisfying some criteria of interest. You'll learn how to use basic comparison operators, combine multiple criteria, match patterns in text, and much more.

## 2.1) Filtering results
Congrats on finishing the first chapter! You now know how to select columns and perform basic counts. This chapter will focus on filtering your results.

In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

    = equal
    <> not equal
    < less than
    > greater than
    <= less than or equal to
    >= greater than or equal to
    For example, you can filter text records such as title. The following code returns all films with the title 'Metropolis':

    SELECT title
    FROM films
    WHERE title = 'Metropolis';
    Notice that the WHERE clause always comes after the FROM statement!

**Note that in this course we will use <> and not != for the not equal operator, as per the SQL standard.**

## 2.2 ) LIKE and NOT LIKE
As you've seen, the WHERE clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string.

In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:

The % wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC' 'DataCamp', 'DataMind', and so on:

    SELECT name
    FROM companies
    WHERE name LIKE 'Data%';
The _ wildcard will match a single character. For example, the following query matches companies like 'DataCamp', 'DataComp', and so on:

    SELECT name
    FROM companies
    WHERE name LIKE 'DataC_mp';
You can also use the NOT LIKE operator to find records that don't match the pattern you specify.

# 3) Aggregate Functions
this chapter builds on the first two by teaching you how to use aggregate functions to summarize your data and gain useful insights. Additionally, you'll learn about arithmetic in SQL, and how to use aliases to make your results more readable!

Aggregate functions:Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

    SELECT AVG(budget)
    FROM films;
gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget:

    SELECT MAX(budget)
    FROM films;
The SUM function returns the result of adding up the numeric values in a column:

    SELECT SUM(budget)
    FROM films;
You can probably guess what the MIN function does! Now it's your turn to try out some SQL functions.

## 3.1) Even more aliasing
Let's practice your newfound aliasing skills some more before moving on!

Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

This means that the following will erroneously result in 400.0:

    SELECT 45 / 10 * 100.0;
This is because 45 / 10 evaluates to an integer (4), and not a decimal number like we would expect.

So when you're dividing make sure at least one of your numbers has a decimal place:

    SELECT 45 * 100.0 / 10;
The above now gives the correct answer of 450.0 since the numerator (45 * 100.0) of the division is now a decimal!

# 4) Sorting, grouping and joins
This chapter provides a brief introduction to sorting and grouping your results, and briefly touches on the concept of joins.

## 4.1) ORDER BY
Congratulations on making it this far! You now know how to select and filter your results.

In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!

In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,

    SELECT title
    FROM films
    ORDER BY release_year DESC;
    
gives you the titles of films sorted by release year, from newest to oldest.

## 4.2) HAVING a great time
In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid:

    SELECT release_year
    FROM films
    GROUP BY release_year
    WHERE COUNT(title) > 10;
This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,

    SELECT release_year
    FROM films
    GROUP BY release_year
    HAVING COUNT(title) > 10;
shows only those years in which more than 10 films were released.