# Translating Relational Algebra to SQL

## Anatomy of a Select Statement

The SQL select statement has several parts or clauses to it.  Each of these clauses has some correspondence with one or more of the relational algebra operators.

* SELECT
* FROM
* WHERE
* GROUP BY
* HAVING
* ORDER BY
* LIMIT


In [26]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [27]:
%sql postgresql://millbr02:@localhost/world

'Connected: millbr02@world'

### Translating a project

```sql
select column1, column2, ...
from relation
```

In [28]:
%%sql

select name, population, surfacearea
from country
limit 10

10 rows affected.


name,population,surfacearea
Afghanistan,22720000,652090.0
Netherlands,15864000,41526.0
Netherlands Antilles,217000,800.0
Albania,3401200,28748.0
Algeria,31471000,2381740.0
American Samoa,68000,199.0
Andorra,78000,468.0
Angola,12878000,1246700.0
Anguilla,8000,96.0
Antigua and Barbuda,68000,442.0


## Renaming a column

You can rename a column right in the select clause


In [36]:
%%sql

select name as countryname
from country
limit 10

10 rows affected.


countryname
Afghanistan
Netherlands
Netherlands Antilles
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antigua and Barbuda


## eliminating duplicate rows

The distinct function allows us to eliminate duplicate rows in sql


In [38]:
%%sql

select distinct continent
from country


7 rows affected.


continent
Europe
Oceania
Asia
North America
Africa
Antarctica
South America


In [41]:
%%sql

select distinct continent, region
from country
order by continent

25 rows affected.


continent,region
Africa,Northern Africa
Africa,Southern Africa
Africa,Central Africa
Africa,Eastern Africa
Africa,Western Africa
Antarctica,Antarctica
Asia,Eastern Asia
Asia,Southern and Central Asia
Asia,Southeast Asia
Asia,Middle East


### Translating a query

When translating a query, the conditions from the query become part of the SQL where clause:

```sql
select column1, column2
from relation
where condition1 and condition2 or condition3
```

Lets translate a specific example:

```python
city.query("population > 1000000").project(['name','population'])
```

Translating this into SQL becomes:

```sql
select name, population
from city
where population > 1000000
```


In [29]:
%%sql

select name, population
from country
where population > 1000000
limit 10


10 rows affected.


name,population
Afghanistan,22720000
Netherlands,15864000
Albania,3401200
Algeria,31471000
Angola,12878000
United Arab Emirates,2441000
Argentina,37032000
Armenia,3520000
Australia,18886000
Azerbaijan,7734000


### Translating a sort

The sort operator translates directly to the ``order by`` clause of the select statement

```sql
select column1, column2
from relation
where condition1
order by column1, column2 [optional desc]
```

The optional desc at the end of an order by clause causes the order of sorting to be reversed.

In [30]:
%%sql

select name, population
from country
where population > 1000000 
order by population desc
limit 10

10 rows affected.


name,population
China,1277558000
India,1013662000
United States,278357000
Indonesia,212107000
Brazil,170115000
Pakistan,156483000
Russian Federation,146934000
Bangladesh,129155000
Japan,126714000
Nigeria,111506000


### Translating a groupby

The groupby is the first operator that is a bit more tricky to translate:

1.  The grouping column must be a part of the select, Other columns may not be part of the select.
2.  Aggregate operators may also be part of the select or they may be used in the order by clause.  

```
country.groupby(['region']).count('name')
```

```sql
select region, count(*)
from country
group by region
```

We can limit our count of cities to regions in a single continent as follows:

```
country.query("continent == 'Africa'").groupby(['region']).count('name')
```

In SQL:

```sql
select region, count(*)
from country
where continent = 'Asia'
group by region
```


In [31]:
%%sql

select region, count(*)
from country
where continent = 'Asia'
group by region

4 rows affected.


region,count
Middle East,18
Southern and Central Asia,14
Southeast Asia,11
Eastern Asia,8


### Translating a groupby followed by a query

Some times you want to make a further query after you have the results of a groupby.  For example, calculate the total surface area of each region in Asia, and report the regions that have more than 12000000 in surface area.

```
country.query("continent == 'Asia'").groupby(['region']).sum('surfacearea').query("sum_surfacearea > 12000000")
```

Translating this to SQL will require both a where clause and a having clause.

```sql
select region, sum(surfacearea)
from country
where continent = 'Asia'
group by region
having sum(surfacearea) > 12000000
```


In [32]:
%%sql

select region, sum(surfacearea)
from country
where continent = 'Asia'
group by region
having sum(surfacearea) > 10791100


2 rows affected.


region,sum
Southern and Central Asia,10791100.0
Eastern Asia,11774500.0


### Translating a cartesian product

Translating a cartesian product is easy, just remember that this not not usually what you want to do.

```
country.cartesian_product(city).query("countrycode == code")
```

This tranlates into:

```sql
select *
from city, country
where countrycode = code
```

Lets first try a query without a where clause:

In [33]:
%%sql

select city.name, countrycode, country.name, code
from country, city
limit 10

10 rows affected.


name,countrycode,name_1,code
Kabul,AFG,Afghanistan,AFG
Kabul,AFG,Netherlands,NLD
Kabul,AFG,Netherlands Antilles,ANT
Kabul,AFG,Albania,ALB
Kabul,AFG,Algeria,DZA
Kabul,AFG,American Samoa,ASM
Kabul,AFG,Andorra,AND
Kabul,AFG,Angola,AGO
Kabul,AFG,Anguilla,AIA
Kabul,AFG,Antigua and Barbuda,ATG


You would probably agree that it is a strange looking result.  Why have Kabul on the same row as Antigua?  That is why we need the where clause, so that we limit the information on the same row to be cities that are in the right country, that is where code = countrycode!

In [34]:
%%sql

select city.name, countrycode, country.name, code
from country, city
where code = countrycode
limit 10


10 rows affected.


name,countrycode,name_1,code
Kabul,AFG,Afghanistan,AFG
Qandahar,AFG,Afghanistan,AFG
Herat,AFG,Afghanistan,AFG
Mazar-e-Sharif,AFG,Afghanistan,AFG
Amsterdam,NLD,Netherlands,NLD
Rotterdam,NLD,Netherlands,NLD
Haag,NLD,Netherlands,NLD
Utrecht,NLD,Netherlands,NLD
Eindhoven,NLD,Netherlands,NLD
Tilburg,NLD,Netherlands,NLD


In [35]:
%%sql

select *
from city, country
where countrycode = code
limit 10

10 rows affected.


id,name,countrycode,district,population,code,name_1,continent,region,surfacearea,indepyear,population_1,lifeexpectancy,gnp,gnpold,localname,governmentform,headofstate,capital,code2
1,Kabul,AFG,Kabol,1780000,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
2,Qandahar,AFG,Qandahar,237500,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
3,Herat,AFG,Herat,186800,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
4,Mazar-e-Sharif,AFG,Balkh,127800,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1,AF
5,Amsterdam,NLD,Noord-Holland,731200,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
6,Rotterdam,NLD,Zuid-Holland,593321,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
7,Haag,NLD,Zuid-Holland,440900,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
8,Utrecht,NLD,Utrecht,234323,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
9,Eindhoven,NLD,Noord-Brabant,201843,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
10,Tilburg,NLD,Noord-Brabant,193238,NLD,Netherlands,Europe,Western Europe,41526.0,1581,15864000,78.3,371362.0,360478.0,Nederland,Constitutional Monarchy,Beatrix,5,NL
