<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 1. Selecting Columns
---

This chapter provides a brief introduction to working with relational databases. You'll learn about their structure, how to talk about them using database lingo, and how to begin an analysis using simple SQL commands to select and summarize columns from database tables.

In [1]:
# %pip install ipython-sql

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///data/database.db

'Connected: @data/database.db'

## Onboarding | Tables
---

For this course, you'll be using a database containing information on almost 5000 films. 

Who is the first person listed in the people table?

In [4]:
%%sql 

SELECT *
FROM   people
LIMIT  10 

 * sqlite:///data/database.db
Done.


id,name,birthdate,deathdate
1,50 Cent,1975-07-06,
2,A. Michael Baldwin,1963-04-04,
3,A. Raven Cruz,,
4,A.J. Buckley,1978-02-09,
5,A.J. DeLucia,,
6,A.J. Langer,1974-05-22,
7,Aaliyah,1979-01-16,2001-08-25
8,Aaron Ashmore,1979-10-07,
9,Aaron Hann,,
10,Aaron Hill,1983-04-23,


## Onboarding | Query Result
---

Run this query in the editor and check out the resulting table in the query result tab!

`SELECT name FROM people`

Who is the second person listed in the query result?

In [5]:
%%sql 

SELECT name
FROM   people
LIMIT  10 

 * sqlite:///data/database.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia
A.J. Langer
Aaliyah
Aaron Ashmore
Aaron Hann
Aaron Hill


## Beginning your SQL journey
---

SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

In [6]:
%%sql 

SELECT *
FROM   employees  

 * sqlite:///data/database.db
Done.


id,name,age,nationality
1,Jessica,22,Ireland
2,Gabriel,48,France
3,Laura,36,USA


## SELECTing single columns
---

While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

In SQL, you can select data from a table using a `SELECT` statement. For example, the following query selects the `name` column from the `people` table:

`SELECT name FROM people`

In this query, `SELECT` and `FROM` are called keywords. In SQL, keywords are not case-sensitive, which means you can write the same query as:

`select name from people`

That said, it's good practice to make SQL keywords uppercase to distinguish them from other parts of your query, like column and table names.

### Instructions

Select the `title` column from the `films` table.

In [7]:
%%sql 

SELECT title
FROM   films
LIMIT  10

 * sqlite:///data/database.db
Done.


title
Intolerance: Love's Struggle Throughout the Ages
Over the Hill to the Poorhouse
The Big Parade
Metropolis
Pandora's Box
The Broadway Melody
Hell's Angels
A Farewell to Arms
42nd Street
She Done Him Wrong


Select the `release_year` column from the `films` table.

In [8]:
%%sql

SELECT release_year
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year
1916
1920
1925
1927
1929
1929
1930
1932
1933
1933


Select the `name` of each person in the `people` table.

In [9]:
%%sql

SELECT name
FROM   people
LIMIT  10 

 * sqlite:///data/database.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia
A.J. Langer
Aaliyah
Aaron Ashmore
Aaron Hann
Aaron Hill


## SELECTing multiple columns
---

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

For example, this query selects two columns, `name` and `birthdate`, from the `people` table:

`SELECT name, birthdate FROM people`

Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut:

`SELECT * FROM people`

If you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned:

`SELECT * FROM people LIMIT 10`

### Instructions

Get the title of every film from the `films` table.

In [10]:
%%sql

SELECT title
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


title
Intolerance: Love's Struggle Throughout the Ages
Over the Hill to the Poorhouse
The Big Parade
Metropolis
Pandora's Box
The Broadway Melody
Hell's Angels
A Farewell to Arms
42nd Street
She Done Him Wrong


Get the title and release year for every film.

In [11]:
%%sql

SELECT title,
       release_year
FROM   films
LIMIT  10

 * sqlite:///data/database.db
Done.


title,release_year
Intolerance: Love's Struggle Throughout the Ages,1916
Over the Hill to the Poorhouse,1920
The Big Parade,1925
Metropolis,1927
Pandora's Box,1929
The Broadway Melody,1929
Hell's Angels,1930
A Farewell to Arms,1932
42nd Street,1933
She Done Him Wrong,1933


Get the title, release year and country for every film.

In [12]:
%%sql

SELECT title,
       release_year,
       country
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


title,release_year,country
Intolerance: Love's Struggle Throughout the Ages,1916,USA
Over the Hill to the Poorhouse,1920,USA
The Big Parade,1925,USA
Metropolis,1927,Germany
Pandora's Box,1929,Germany
The Broadway Melody,1929,USA
Hell's Angels,1930,USA
A Farewell to Arms,1932,USA
42nd Street,1933,USA
She Done Him Wrong,1933,USA


Get all columns from the `films` table.

In [13]:
%%sql

SELECT *
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1,Intolerance: Love's Struggle Throughout the Ages,1916,USA,123,,Not Rated,,385907.0
2,Over the Hill to the Poorhouse,1920,USA,110,,,3000000.0,100000.0
3,The Big Parade,1925,USA,151,,Not Rated,,245000.0
4,Metropolis,1927,Germany,145,German,Not Rated,26435.0,6000000.0
5,Pandora's Box,1929,Germany,110,German,Not Rated,9950.0,
6,The Broadway Melody,1929,USA,100,English,Passed,2808000.0,379000.0
7,Hell's Angels,1930,USA,96,English,Passed,,3950000.0
8,A Farewell to Arms,1932,USA,79,English,Unrated,,800000.0
9,42nd Street,1933,USA,89,English,Unrated,2300000.0,439000.0
10,She Done Him Wrong,1933,USA,66,English,Approved,,200000.0


## SELECT DISTINCT
---

Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the `DISTINCT` keyword.

This might be useful if, for example, you're interested in knowing which languages are represented in the `films` table:

`SELECT DISTINCT language FROM films`

### Instructions
Get all the unique countries represented in the `films` table.

In [14]:
%%sql

SELECT DISTINCT( country )
FROM   films

 * sqlite:///data/database.db
Done.


country
USA
Germany
Japan
Denmark
UK
Italy
France
West Germany
Sweden
Soviet Union


Get all the different film certifications from the `films` table.

In [15]:
%%sql

SELECT DISTINCT( certification )
FROM   films 

 * sqlite:///data/database.db
Done.


certification
Not Rated
""
Passed
Unrated
Approved
G
PG
R
PG-13
M


Get the different types of film roles from the `roles` table.

In [16]:
%%sql

SELECT DISTINCT( role )
FROM   roles 

 * sqlite:///data/database.db
Done.


role
director
actor


## Learning to COUNT
---

What if you want to count the number of employees in your employees table? The `COUNT()` function lets you do this by returning the number of rows in one or more columns.

For example, this code gives the number of rows in the `people` table:

`SELECT COUNT(*) FROM people`

How many records are contained in the `reviews` table?

In [17]:
%%sql

SELECT COUNT(*)
FROM   reviews 

 * sqlite:///data/database.db
Done.


COUNT(*)
4968


## Practice with COUNT
---

As you've seen, `COUNT(*)` tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call `COUNT()` on just that column.

For example, to count the number of birth dates present in the `people` table:

`SELECT COUNT(birthdate) FROM people`

It's also common to combine `COUNT()` with `DISTINCT` to count the number of distinct values in a column.

For example, this query counts the number of distinct birth dates contained in the `people` table:

`SELECT COUNT(DISTINCT birthdate) FROM people`

### Instructions
Count the number of rows in the `people` table.

In [18]:
%%sql

SELECT COUNT(*)
FROM   people 

 * sqlite:///data/database.db
Done.


Count(*)
8397


Count the number of (non-missing) birth dates in the `people` table.

In [19]:
%%sql

SELECT COUNT( birthdate )
FROM   people 

 * sqlite:///data/database.db
Done.


Count(birthdate)
6152


Count the number of unique birth dates in the `people` table.

In [20]:
%%sql

SELECT COUNT( DISTINCT birthdate )
FROM   people 

 * sqlite:///data/database.db
Done.


Count(DISTINCT birthdate)
5398


Count the number of unique languages in the `films` table.

In [21]:
%%sql

SELECT COUNT( DISTINCT language )
FROM   films 

 * sqlite:///data/database.db
Done.


Count(DISTINCT language)
47


Count the number of unique countries in the `films` table.

In [22]:
%%sql

SELECT COUNT( DISTINCT country )
FROM   films 

 * sqlite:///data/database.db
Done.


Count(DISTINCT country)
64
