# Beginning Data Exploration with SELECT

In SQL, viewing data starts with the `SELECT` keyword, which retrieves rows & columns from one or more of the tables in a database. A `SELECT` statement can be simple, retrieving everything in a single table, or it can be complex enough to link dozens of tables while handling multiple calculations & filtering by exact criteria. 

We'll start with simple `SELECT` statements & then look into the more powerful things `SELECT` can do.

---

# Basic SELECT Syntax

Here's a `SELECT` statement that fetches every row & column in a table called `my_table`:

```
SELECT * FROM my_table;
```

This single line of code shows the most basic form of a SQL query. The asterisk following the `SELECT` keyword is called a *wildcard*, which is like a stand-in for a value; it doesn't represent anything in particular & instead represents everything that value could possibly be. Here, it's shorthad for "select all columns". If you had given a column name instead of a wildcard, this command would select the values in that column. The `FROM` keyword indicates you want the query to return data from a particular table. The semicolon after the table name tells PostgreSQL it's the end of the query statement.

Let's use this `SELECT` statement with the asterisk wildcard on the `teachers` table we created in the previous lesson. Once again, open Postgres.app & pgAdmin, select the `analysis` database, & open the Query Tool. Then execute the following statement.

```
SELECT * FROM teachers;
```

Once you execute the query, the result set in the Query Tool's output pane contains all the rows & columns you inserted into the `teachers` table in the previous lesson.

<img src = "Query All Rows & Columns From teachers Table.png" width = "600" style = "margin:auto"/>

There are actually two other ways to view all rows in a table using pgAdmin. You can right-click the `teachers` in the object tree & choose **View/Edit Data -> All Rows**. Or you can use a little known bit of standard SQL:

```
TABLE teachers;
```

<img src = "View teachers Table Using Standard SQL.png" width = "600" style = "margin:auto"/>

Both provide the same result. Now, let's make this query a bit more specific.

## Querying a Subset of Columns

Often, it's more practical to limit the columns the query retrieves, especially with large databases, so you don't have to wade through excess information. You can do this by naming columns, separated by commas, right after the `SELECT` keyword. Here's an example"

```
SELECT some_column, another_column, amazing_column
FROM table_name;
```

With that syntax, the query will retrieve all rows from just those three columns. 

Let's apply this to the `teachers` table. Perhaps in your analysis you want to focus on teachers' names & salaries. In that case, you would select just the relevant columns. Notice that the order of the columns in the query is different than the order in the table: you're able to retrieve columns in any order you'd like.

```
SELECT last_name, first_name, salary
FROM teachers;
```

Now, in the result set, you've limited the columns to three:

<img src = "Querying Subset of Columns.png" width = "600" style = "margin:auto"/>

Although these examples are basic, they illustrate a good strategy for beginning your interview of a dataset. Generally, it's wise to start your analysis by checking whether your data is present & in the format you expect, which is a task well suited to `SELECT`. 

We're only working with a table of six rows, but when you're facing a table of thousands or eve millions of rows, it's essential to get a quick read on your data quality & the range of values it contains. 

Let's dig deeper & add several SQL keywords.

---

# Sorting Data with ORDER BY

Data can make more sense & may reveal patterns more readily when it's arranged in order rather than jumbled randomly.

In SQL, we order the results of a query using a clause containing the keywords `ORDER BY` followed by the name of the column or columns to sort. Applying this clause doesn't change th original table, only the result of the query. Below is an example using the `teachers` table.

```
SELECT first_name, last_name, salary
FROM teachers
ORDER BY salary DESC;
```

By default, `ORDER BY` sorts values in ascending order, but here, we sort in descending order by adding the `DESC` keyword (The optional `ASC` keyword specifies sorting in ascending order). Now, by ordering the `salary` column from highest to lowest, we can determing which teachers earn the most:

<img src = "Sorting a Column with ORDER BY.png" width = "600" style = "margin:auto"/>

The `ORDER BY` column also accepts numbers instead of column names, with the number identifying the sort column according to its position in the `SELECT` clause. Thus, you could rewrite the above SQL statement in this way, using `3` to refer to the third column in the `SELECT` clause, `salary`:

```
SELECT first_name, last_name, salary
FROM teachers
ORDER BY 3 DESC;
```

<img src = "Sorting a Column by Position Number.png" width = "600" style = "margin:auto"/>

The ability to sort in our queries gives us a great flexibility in how we view & present data. For example, we're not limited to sorting on just one column.

```
SELECT last_name, school, hire_date
FROM teachers
ORDER BY school ASC, hire_date DESC;
```

In this case, we're retrieving the last names of teachers, their school, & the date they were hired. By sorting the `school` column in ascending order & `hire_date` in descending order, we create a listing of teachers grouped by school with the most recently hired teachers listed first. This shows us who the newest teachers are at each school. 

<img src = "Sorting Multiple Columns with ORDER BY.png" width = "600" style = "margin:auto"/>

You can use `ORDER BY` on more than two columns, but you'll soon reach a point of diminishing returns where the effect will be hardly noticeable. Therefore, a better strategy is to limit the number of columns in your query to only the most important, & then run several queries to answer each question you have.

---

# Using DISTINCT to Find Unique Values

In a table, it's not unusual for a column to contain rows with duplicate values. In the `teachers` table, for example, the `school` column lists the same school names multiple times because each school employs many teachers.

To understand the range of values in a column, we can use the `DISTINCT` keyword as part of a query that eliminates duplicates & shows only unique values. Use `DISTINCT` immediately after `SELECT`.

```
SELECT DISTINCT school
FROM teachers
ORDER BY school;
```

The result is as follows:

<img src = "Querying Distinct Values in the school Column.png" width = "600" style = "margin:auto"/>

Even though six rows are in the table, the output shows just the two unique school names in teh `school` column. This is a helpful first step toward assessing data quality. For example, if a school name is spelled in more than one way, those spelling variations will be easy to spot & correct, especially if you sort the output.

The `DISTINCT` keyword also works on more than one column at a time. If we add a column, the query returns each unique pair of values.

```
SELECT DISTINCT school, salary
FROM teachers
ORDER BY school, salary;
```

Now the query returns each unique (or distinct) salary earned at each school. Because two teachers at Myers Middle School earn $43,500, that pair is listed in just one row, & the query returns five rows rather than all six in the table:

<img src = "Querying Distinct Pairs of Values in the school & salary Columns.png" width = "600" style = "margin:auto"/>

---

# Filtering Rows with WHERE

Sometimes, you'll want to limit the rows a query returns to only those in which one or more columns meet certain criteria. Using `teachers` as an example, you migth want to find all teachers hired before a particular year or all teachers making more than $75,000 at elementary schools. For these tasks, we use the `WHERE` clause.

The `WHERE` clause allows you to find rows that match a specific value, a range of values, or multiple values based on criteria supplied via an *operator* -- a keyword that lets us perform math, comparison, & logical operations. You can also use criteria to exclude rows.

The below SQL statement shows a basic example. Note that in standard SQL syntax, the `WHERE` clause follows the `FROM` keyword & the name of the table or tables being queried.

```
SELECT last_name, school, hire_date
FROM teachers
WHERE school = 'Myers Middle School';
```

The result set shows just the teachers assigned to Myers Middle School:

<img src = "Filtering Rows with WHERE.png" width = "600" style = "margin:auto"/>

Here, we're using the equals comparison operator to find rows that exactly match a value, but of course, you can use other operators with `WHERE` to customise your filter criteria. This table summarise the most commonly used comparison operators. Depending on your database system, many more might be available.

|Operator|Function|Example|
|:---|:---|:---|
|=|Equal to|`WHERE school = 'Baker Middle'`|
|<> or !=|Not equal to|`WHERE school <> 'Baker Middle'`|
|>|Greater than|`WHERE salary > 20000`|
|<|Less than|`WHERE salary < 60500`|
|>=|Greater than or equal to|`WHERE salary >= 20000`|
|<=|Less than or equal to|`WHERE salary <= 60500`|
|BETWEEN|Within a range|`WHERE salary BETWEEN 20000 AND 40000`|
|IN|Match one of a set of values|`WHERE last_name IN ('Bush', 'Roush')`|
|LIKE|Match a pattern (case sensitive)|`WHERE first_name LIKE 'Sam%'`|
|ILIKE|Match a pattern (case insensitive)|`WHERE first_name ILIKE 'sam%'`|
|NOT|Negates a condition|`WHERE first_name NOT ILIKE 'sam%'`|

The following examples show comparison operators in action. First, we use the equal operator to find teachers whose first name is Janet:

```
SELECT first_name, last_name, school
FROM teachers
WHERE first_name = 'Janet';
```

<img src = "Filtering Using the Equal Operator.png" width = "600" style = "margin:auto"/>

Next, we list all the school names in the table, but exclude F.D.Roosevelt HS using the not-equal operator:

```
SELECT school
FROM teachers
WHERE school <> 'F.D. Roosevelt HS';
```

<img src = "Filtering Using Not-Equal Operator.png" width = "600" style = "margin:auto"/>

Here, we use the less-than operator to list teachers hired before January 1, 2000 (using the date format `YYYY-MM-DD`):

```
SELECT first_name, last_name, hire_date
FROM teachers
WHERE hire_date < '2000-01-01';
```

<img src = "Filtering Using the Less-Than Operator.png" width = "600" style = "margin:auto"/>

Then we find teachers who ear $43,500 or more using the `>=` operator:

```
SELECT first_name, last_name, salary
FROM teachers
WHERE salary >= 43500;
```

<img src = "Filtering Using Greater-Than-Or-Equal-To Operator.png" width = "600" style = "margin:auto"/>

The next query uses the `BETWEEN` operator to find teachers who earn from \\$40,000 to \\$65,000. Note that `BETWEEN` is *inclusive*, meaning the result will include values matching the start & end ranges specified.

```
SELECT first_name, last_name, school, salary
FROM teachers
WHERE salary BETWEEN 40000 AND 65000;
```

<img src = "Filtering Using BETWEEN Operator.png" width = "600" style = "margin:auto"/>

Use caution with `BETWEEN`, because its inclusive nature can lead to inadvertent double-counting of values. For example, if you filter for values with `BETWEEN 10 AND 20` & run a second query using `BETWEEN 20 AND 30`, a row with the value of 20 will appear in both query results. You can avoid this by using the more explicit greater-than & less-than operators to define ranges. For example, this query returns the same result as the previous one, but more obviously specifies the range:

```
SELECT first_name, last_name, school, salary
FROM teachers
WHERE salary >= 40000 AND salary <= 65000;
```

We'll return to these operators through the course, because they'll play an important role in helping us ferret out the data & answers we want to find.

## Using LIKE & ILIKE with WHERE

Comparison operators are fairly straightforward, but the matching operators `LIKE` & `ILIKE` deserve additional explanation. Both let you find a variety of values that include characters matching a specified pattern, which is handy if you don't know exactly what you're searching for or if you're rooting out misspelled words. To use `LIKE` & `ILIKE`, you specify a pattern to match using one or both of these symbols:

**Percent sign (%)** - A wildcard matching one or more characters

**Underscore (_)** - A wildcard matching just one character

For example, if you're trying to find the word `baker`, the following `LIKE` patterns will match it:

```
LIKE 'b%'
LIKE '%ak%'
LIKE '_aker'
LIKE 'ba_er'
```
The difference? The `LIKE` operator, which is part of the ANSI SQL standard, is case sensitive. The `ILIKE` oeprator, which is a PostgreSQL-only implementation, is case insensitive. The below SQL statements show how the two keywords give different results. 

```
SELECT first_name
FROM teachers
WHERE first_name LIKE 'sam%';

SELECT first_name
FROM teachers
WHERE first_name ILIKE 'sam%';
```

The first `WHERE` clause uses `LIKE` to find names that start with the characters `sam`, & because it's case sensitive, it will return zero results. The second, using the case-insensitive `ILIKE`, will return `Samuel` & `Samantha` from the table.

<img src = "Filtering with LIKE.png" width = "600" style = "margin:auto"/>
<img src = "Filtering with ILIKE.png" width = "600" style = "margin:auto"/>

Because `LIKE` & `ILIKE` search for patterns, performance on large databases can be slow. But we will learn a way to speed the queries in later lessons.

## Combining Operators with AND & OR

Comparison operators become even more useful when we combine them. To do this, we connect them using the logical operators `AND` & `OR` along with, if needed, parentheses.

The SQL statements below show three examples that combine operators this way.

```
SELECT *
FROM teachers
WHERE school = 'Myers Middle School'
    AND salary < 40000;

SELECT *
FROM teachers
WHERE last_name = 'Cole'
    OR last_name = 'Bush';

SELECT *
FROM teachers
WHERE school = 'F.D. Roosevelt HS'
    AND (salary < 38000 OR salary > 40000);
```

The first query uses `AND` in the `WHERE` clause to find teachers who work at Myers Middle School & have a salary less than \\$40,000. Because we connect these two conditions using `AND`, both must be true for a row to meet the criteria in the `WHERE` clause & be returned in the query results.

<img src = "Filtering with AND Operator.png" width = "600" style = "margin:auto"/>

The second example uses `OR` to search for any teacher whose last name matches Cole or Bush. When we connect conditions using `OR`, only one of the conditions must be true for a row to meet the criteria of the `WHERE` clause.

<img src = "Filtering with OR Operator.png" width = "600" style = "margin:auto"/>

The final example looks for teachers at Roosevelt whose salaries are either less than \\$38,000 or greater than \\$40,000. When we place statements inside parentheses, those are evaluated as a group before being combined with other criteria. In this case, the school name must be exactly `F.D. Roosevelt HS`, & the salary must be either less or higher than specified for a row to meet the criteria of the `WHERE` clause.

<img src = "Combining Operators Using AND & OR.png" width = "600" style = "margin:auto"/>

If we use both `AND` with `OR` in a clause but don't use any parentheses, the database will evalute the `AND` condition first & then the `OR` condition. In the final example, that means we'd see a different result if we omitted parentheses -- the database would look for rows where the school name is `F.D. Roosevelt HS` & the salary is less than \\$38,000 or rows for any school where the salary is more than \\$40,000. 

---

# Putting It All Together

You can begin to see how even the previous simple queries allow us to delve into our data with flexibility & precision to find what we're looking for. You can combine comparison operator statements using the `AND` & `OR` keywords to provide multiple criteria for filtering, & you can include an `ORDER BY` clause to rank the results.

With the preceding information in mind, we can combine the concepts in this lesson into one statement to show how they fit together. SQL is particular about the order of keywords, so follow this convention.

```
SELECT column_names
FROM table_name
WHERE criteria
ORDER BY column_names;
```

The below SQL statement shows a query against the `teachers` table that includes all the aforementioned pieces.

```
SELECT first_name, last_name, school, hire_date, salary
FROM teachers
WHERE school LIKE '%Roos%'
ORDER BY hire_date DESC;
```

This listing returns teachers at Roosevelt High School, ordered from newest hire to earliest. We can see some connection between a teacher's hire date at the school & their current salary level:

<img src = "SELECT Statement Including WHERE & ORDER BY.png" width = "600" style = "margin:auto"/>

---

# Wrapping Up

Now that you've learned the basic structure of a few different SQL queries, we have the foundation for many additional skills we'll cover in later lessson. Sorting, filtering, & choosing only the most important columns from a table can yield a surprising amount of information from your data & help you find the store it tells.