Skip to content

Commit

Permalink
Merge pull request #53 from PhilReedData/patch-4
Browse files Browse the repository at this point in the history
Fixed many typos in episode 5
  • Loading branch information
tobyhodges committed Apr 26, 2023
2 parents 6ee3062 + 980c612 commit bc6be6b
Showing 1 changed file with 22 additions and 21 deletions.
43 changes: 22 additions & 21 deletions _episodes/05-creating-new-columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ objectives:
- "Use SQL syntax to conditionally create new values"
- "Use SQL syntax to create a new column of ‘binned’ values"
keypoints:
- "New result columns can be created using arithmetic operators or builtin functions"
- "New columns have to be given names or Alias'"
- "New result columns can be created using arithmetic operators or built-in functions"
- "New columns have to be given names or aliases"
- "The `CASE` coding structure can be used to create new columns"
- "The new columns are only in the query results. The original table is not changed"
---
Expand Down Expand Up @@ -45,9 +45,9 @@ It is always the case that if you create a column in the results of the query it
SQL will create one for it. Other relational databases take different approaches to the problem and will pseudo-randomly name the new columns for you with such things as '_c0'.
SQLite uses the expression you used to create the column name.

## Renaming columns using alias'
## Renaming columns using aliases

Given that creating new columns is so commonly done, SQL does provide a mechansim for giving them names of your choice. This is done using the **AS** clause
Given that creating new columns is so commonly done, SQL does provide a mechanism for giving them names of your choice. This is done using the **AS** clause

~~~
SELECT D02_total_plot * 2.4701 AS D02_total_plot_converted
Expand All @@ -64,13 +64,13 @@ This may seem a bit strange for columns which had no real name in the first plac

## Using built-in functions to create new values

In addition to using simple arithmetic operations to create new columns, you can also use some of the SQLite builtin functions.
Full details of the available builtin functions are available from the SQLite.org website [here](https://sqlite.org/lang_corefunc.html#instr).
In addition to using simple arithmetic operations to create new columns, you can also use some of the SQLite built-in functions.
Full details of the available built-in functions are available from the SQLite.org website [here](https://sqlite.org/lang_corefunc.html#instr).

We will look at some of the arithmetic and statistical functions when we deal with aggregations in a later lesson.

You may have noticed in the output from are last query that the number of decimal places can change from one row to another. In order to make the output
more tidy, we may wish to always produce the same number of decimal places , e.g. 2. We can do this using the `ROUND` function.
more tidy, we may wish to always produce the same number of decimal places, e.g. 2. We can do this using the `ROUND` function.

The `ROUND` function works in a similar way as its spreadsheet equivalent, you specify the value you wish to round and the required number of decimal places.

Expand Down Expand Up @@ -115,10 +115,11 @@ sometimes with different names.


`instr` can be used to check a character or string of characters occurs within another string.
`substr` can be used to extract a portion of a string based on a startinfg position and the number of characters required.
`substr` can be used to extract a portion of a string based on a starting position and the number of characters required.


In the Farms table, the three columns A01_interview_date, A04_start and A05_end are all recognisable as a dates with the A04_srtart and A05_end also including times.
These last twoo are automatically generated by the eSurvey software when the data is collected. I.e. they are automatically entered. The A01_interview_date however is manually input.
In the Farms table, the three columns A01_interview_date, A04_start and A05_end are all recognisable as a dates with the A04_start and A05_end also including times.
These last two are automatically generated by the eSurvey software when the data is collected, i.e. they are automatically entered. The A01_interview_date however is manually input.
In all three cases however SQLite thinks that they are all just strings of characters.
We can confirm this by selecting the `Database Structure` tab and expanding the `Farms` entry and notice that the data type for all three columns is listed as 'TEXT'

Expand All @@ -144,9 +145,9 @@ ORDER BY A01_interview_date
~~~
{: .sql}

NB. we are using the UK and European representation of dates in this dicussion. The same issue will occur if you were using US date formats
NB. we are using the UK and European representation of dates in this discussion. The same issue will occur if you were using US date formats.

It is unlikely that the results of the above query is what you wanted. '01/07/2017' has been ordered before '01/12/2016'. This is because the sorting process treats the dates as simple strings
It is unlikely that the result of the above query is what you wanted. '01/07/2017' has been ordered before '01/12/2016'. This is because the sorting process treats the dates as simple strings
and a '0' in the month position is less than a '1' in the months position.

In order to sort the A01_interview_date column into date order we need to make SQLite see it as a date.
Expand All @@ -166,7 +167,7 @@ Although it doesn't produce an error, the attempted conversion of A01_interview_

![Conversion failure](../fig/SQL_05_dates_01.png)

On the otherhand the A04_start conversion did work. The problem is that the date function expects the string to be converted to be in a certain format.; like ISO-8601.
On the other hand the A04_start conversion did work. The problem is that the date function expects the string to be converted to be in a certain format like ISO-8601.

We need to change the way A01_interview_date looks. Instead of dd/mm/yyyy we need yyyy-mm-dd. To do this we can use the `substr` function along with the `||`
operator which is used to concatenate strings together.
Expand All @@ -183,7 +184,7 @@ FROM Farms
~~~
{: .sql}

But in order to convert it into a date we need all three parts concatenated together along with '-' to seperate the parts
But in order to convert it into a date we need all three parts concatenated together along with '-' to separate the parts.

~~~
SELECT A01_interview_date,
Expand Down Expand Up @@ -246,8 +247,8 @@ ORDER BY converted_date
> {: .solution}
{: .challenge}

In the Spreadsheets lesson we discussed that splitting dates into year month and day components was a good way of making
the meaning of the date parts un-ambiguous. Our first SQL query for the date conversion did this;
In the Spreadsheets lesson we discussed that splitting dates into year, month and day components was a good way of making
the meaning of the date parts unambiguous. Our first SQL query for the date conversion did this;

~~~
SELECT A01_interview_date,
Expand All @@ -259,8 +260,8 @@ FROM Farms
~~~
{: .sql}

Having the date components split in this way does not prevent us from sorting them. We just need to specify all of the columns we want to sort byin the order
in which we want them sorted
Having the date components split in this way does not prevent us from sorting them. We just need to specify all of the columns we want to sort by in the order
in which we want them sorted.

~~~
SELECT A01_interview_date,
Expand All @@ -274,7 +275,7 @@ ORDER BY year, month, day
{: .sql}

By default the `ORDER BY` clause will sort in ascending order, smallest to
biggist, we can make this explicit by usingthe `ASC` keyword. Or if we want to
biggest; we can make this explicit by usingthe `ASC` keyword. Or if we want to
sort in descending order we can use the `DESC` keyword.

~~~
Expand Down Expand Up @@ -312,11 +313,11 @@ There is a more general form which allows to to perform any kind of test.

## Using SQL syntax to create ‘binned’ values

It is often the case that we wish to convert a continous variable into a discrete factor type variable.
It is often the case that we wish to convert a continuous variable into a discrete factor type variable.

We can use a `CASE` statement to create this type of effect.

The column `A11_years_farm` in the Farms table is an indication of how many years the respondent has been on the farm. The values are in years and range from 1 tp 60.
The column `A11_years_farm` in the Farms table is an indication of how many years the respondent has been on the farm. The values are in years and range from 1 to 60.
Instead of using individual years we may want to group these values into ranges like 1-10, 11-20 etc. We can do this using a `CASE`
statement as part of the `SELECT` clause

Expand Down

0 comments on commit bc6be6b

Please sign in to comment.