# Understanding Data Types

In a SQL database, each column in a table can only hold one data type, which you define in the `CREATE TABLE` statement by decalring the data type after the column name. In the following simple example table, you will find columns with three different data types: a date, an integers, & text.

```
CREATE TABLE eagle_watch (
    observation_date date,
    eagles_seen integer,
    notes text
)
```

In this table named `eagle_watch` (for a hypothetical inventory of bald eagles), we declare the `observation_date` column to hold date values by adding the `date` type declaration after its name. Similarly, we set `eagles_seen` to hold whole numbers with the `integer` type declaration & declare `notes` to hold characters via the `text` type.

These data types fall into the three categories you'll encounter most:

1. **Characters**: any character or symbol
2. **Numbers**: includes whole numbers & fractions
3. **Dates & Times**: temporal information

We'll look at each data type in depth, noting whether they're part of ANSI SQL standards or specific to PostgreSQL along the way. You can find an in-depth look at where PostgreSQL deviates from the SQL standard [here](https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard).

---

# Understanding Characters

*Character string types* are general-purpose types suitable for any combination of text, numbers, & symbols. Character types include the following:

1. **char(n)**
   - A fixed-length column where the character is specified by `n`. A column set at `char(20)` stores 20 characters per row regardless of how many characters you insert. If you insert fewer than 20 characters in any row, PostgreSQL pads the rest of the column with spaces. This type is part of standard SQL & can be specified with the longer name `character(n)` as well.
3. **varchar(n)**
   - A variable-length column where the *maximum* length is specified by `n`. If you insert fewer characters than the maximum, PostgreSQL will not store extra spaces. For example, the string `blue` will take four spaces, whereas the string `123` will take three. In large databases, this practice saves considerable space. This type is included in standard SQL & can also be specified using its longer name `character varying(n)`.
5. **text**
   - A variable-length column of unlimited length. The `text` type is not part of the SQL standard, but there are similar implementations in other database systems.

There is no substantial difference in performance among these three types. But, the flexibility & potential space savings of `varchar` & `text` seem to give them an advantage. However, if you search discussions online, some users suggest that defining a column that will always have the same number of characters with `char` is a good way to signal what data it should contain. For example, you might see `char(2)` used for US state postal abbreviations.

To see these three character types in action, run the below SQL script. It will build & load a simple table, then export it to a text file on your computer.

```
CREATE TABLE char_data_types (
    char_column char(10),
    varchar_column varchar(10),
    text_column text
);

INSERT INTO char_data_types
VALUES ('abc', 'abc', 'abc'),
       ('defghi', 'defghi', 'defghi');

COPY char_data_types TO 'C:/YourDirectory/typetest.txt'
WITH (FORMAT CSV, HEADER, DELIMITER '|');
```

We define three character columns of different types & insert two rows of the same string into each. Unlike the `INSERT INTO` statement we learned in previous lessons, we're not specifying the names of the columns. If the `VALUES` statements match the number of columns in the table, the database will assume you're inserting values in the order the column definitions were specified in the table.

Next, we use the PostgreSQL `COPY` keyword to export the data to a text file named *typetest.txt* in a directory you specify. For example, the path to the SQL folder where we downloaded the course materials for this lesson is */Users/jiehengyu/Desktop/SQL/Chapter_04*. The directory must exist already or else PostgreSQL won't create it for you.

<img src = "Character Data Types in Action.png" width = "600" style = "margin:auto"/>

In PostgreSQL, `COPY table_name FROM` is the import function, & `COPY table_name TO` is the export function. We'll cover these in more detail in the next lesson. All you need to know now is that the `WITH` keyword options will format the data in the file with each column separated by a *pipe* (|) character. That way, you can easily see where spaces fill out the unused portions of the `char` column.

To see the output, open *typetest.txt* using a text editor. The contents should look like this:

```
char_column|varchar_column|text_column
abc       |abc|abc
defghi    |defghi|defghi
```

Even though we specified 10 characters for both the `char` & `varchar` columns, only the `char` column outputs 10 characters in both rows, padding unused characters with spaces. The `varchar` & `text` columns store only the characters you inserted.

Again, there's no real performance different among the three types, although this example shows that `char` consumes more storage space than needed. A few unused spaces in each column might seem negligible, but multiply that over millions of rows in dozens of tables & you'll soon wish you had been more economical.

---

# Understanding Numbers

Number columns hold various types of (you guessed it) numbers, but that's not all: they also allow you to perform calculations on those numbers. That's an important distinction from members you store as strings in a character column, which can't be added, multiplied, divided, or perform any other math operation. Also, numbers stored as characters sort differently than numbers stored as numbers, so if you're doing math or the numeric order is important, use number types.

The SQL number types include the following:

1. **Integers**: whole numbers, both positive & negative
2. **Fixed-point & floating-point**: two formats of fractions of whole numbers

We'll look at each type separately.

## Using Integers

The integer data types are the most common number types you'll find when exploring a SQL database. These are *whole numbers*, both positive & negative, including zero. Think of all the places integers appear in life: your street or apartment number, the serial number on your refrigerator, the number on a raffle ticket.

The SQL standard provides three integer types: `smallint`, `integer`, & `bigint`. The difference between the three types is the maximum size of the numbers they can hold. The table below shows the upper & lower limits of each, as well as how much storage each requires in bytes.

|Data type|Storage size|Range|
|:---|:---|:---|
|`smallint`|2 bytes|-32768 to +32768|
|`integer`|4 bytes|-2147483648 to +2147483648|
|`bigint`|8 bytes|-9223372036854775808 to +9223372036854775808|

The `bigint` type will cover just about any requirement you'll ever have with a number column, though it eats up the most storage. Its use is a must if you're working with numbers larger than about 2.1 billion, but you also can easily make it your go-to default & never worry about not being able to fit a number in the column. However, if you're confident numbers will remain within the `integer` limit, that type is a good choice because it doesn't consume as much space as `bigint` (a concern when dealing with millions of data rows).

When you know that values will remain constrained, `smallint` makes sense: days of the month or years are good examples. The `smallint` type will use half the storage as `integer`, so it's a smart database design if the column values will always fit within its range.

If you try to insert a number into any of these columns that is outside its range, the database will stop the operation & return an `out of range` error.

## Auto-Incrementing Integers

Sometimes, it's helpful to create a column that holds integers that *auto-increment* each time you add a row to the table. For example, you might use an auto-incrementing column to create a unique ID number, aslo known as a *primary key*, for each row in the table. Each row then has its own ID that other tables in the database can reference.

With PostgreSQL, you have two ways to auto-increment an integer column. One is the *serial* data type, a PostgreSQL-specific implementation of the ANSI SQL standard for auto-numbered *identity columns*. The other is the ANSI SQL standard `IDENTITY` keyword. Let's start with serial.

### Auto-Incrementing with Serial

In the lesson when we made the `teachers` table, we created an `id` column with the declaration of `bigserial`: this & its siblings `smallserial` & `serial` are not so much true data types as special *implementation* of the corresponding *smallint*, *integer*, *bigint* types. When you add a column with a serial type, PostgreSQL will auto-increment the value each time you insert a row, starting with 1, up to the maximum of each integer.

The table below shows the serialt ypes & the ranges they cover.

|Data type|Storage size|Range|
|:---|:---|:---|
|`smallserial`|2 bytes|1 to 32767|
|`serial`|4 bytes|1 to 2147483647|
|`bigserial`|8 bytes|1 to 9223372036854775808|

To use a serial type on a column, declare it in the `CREATE TABLE` statement as you would an integer type. For example, you could create a table called `people` that has an `id` column equivalent in size to the `integer` data type:

```
CREATE TABLE people (
    id serial,
    person_name varchar(100)
);
```

### Auto-Incrementing with IDENTITY

As of version 10, PostgreSQL include support for `IDENTITY`, the standard SQL implementation for auto-incrementing integers. The `IDENTITY` syntax is more verbose, but some database users prefer it for its cross-compatability wtih other database systems (such as Oracle) & also because it has an option to prevent users from accidently inserting values in the auto-incrementing column (which serial types will permit).

You can specify `IDENTITY` in two ways:

`GENERATED ALWAYS AS IDENTITY` tells the database to always fill the column with an auto-incremented value. A user cannot insert a value into the `id` column without manually overriding that setting. See the `OVERRIDING SYSTEM VALUE` section of the PostgreSQL `INSERT` documentation [here](https://www.postgresql.org/docs/current/sql-insert.html) for details.

`GENERATED BY DEFAULT AS IDENTITY` tells the database to fill the column with an auto-incremented value by default if the user does not supply one.

For now, we'll stick with the first option, using `ALWAYS`. To create a table called `people` that has an `id` column populated via `IDENTITY`, you would use this syntax:

```
CREATE TABLE people (
    id integer GENERATED ALWAYS AS IDENTITY,
    person_name varchar(100)
);
```

For the `id` data type, we use `integer` followed by the keywords `GENERATED ALWAYS AS IDENTITY`. Now, every time we insert a `person_name` value into the table, the database will fill the `id` column with an auto-incremented value.

Given its compatibility with the ANSI SQL standard, we'll use `IDENTITY` for the remainder of the course.

<img src = "Auto-Incrementing with IDENTITY.png" width = "600" style = "margin:auto"/>

## Using Decimal Numbers

*Decimals* represent a whole number plus a fraction of a whole number; the fraction is represented by digits following a *decimal point*. In a SQL database, they're handled by *fixed-point* & *floating-point* data types. For example, if the distance from my house to the nearest grocer is 6.7 miles; I could insert 6.7 into either a fixed-point or floating-point column with no complaint from PostgreSQL. The only difference is how the computer stores the data.

### Understanding Fixed-Point Numbers

The fixed-point type, also called the *arbitrary precision* type, is `numeric(precision, scale)`. You give the argument `precision` as the maximum number of digits to the left of the decimal point, & the argument `scale` as the number of digits allowable on the right of the decimal point. Alternatively, you can specify this type using `decimal(precision, scale)`. Both are part of the ANSI SQL standard. If you omit specifying a scale value, the scale will set to zero; in effect, that creates an integer.

For example, let's say we're working for the US National Weather Service, & we measure rainfall up to two decimal places. To record rainfall in a database using five digits total (the precision) & two digits maximum to the right of the decimal (the scale), we'd specify it as `numeric(5, 2)`. The database will always return two digits to the right of the decimal point, even if you don't enter a number that contains two digits such as 1.47, 1.00, & 121.50.

### Understanding Floating-Point Types

The two floating-point types are `real` & `double precision`, both part of the SQL standard. The difference between the two is how much data they store. The `real` type allows precision to six decimal digits, & `double precision` to 15 decimal digits of precision, both of which include the number of digits on both sides of the point. These floating-point types are also called *variable-precision* types. The database stores the number in parts representing the digits & an exponent -- the location where the decimal point belongs. So, unlike `numeric`, where we specify fixed precision & scale, the decimal point in a given column can "float" depending on the number.

### Using Fixed- & Floating-Point Types

Each type has differeing limits on the number of total digits, or precision, it can hold.

|Data type|Storage size|Storage type|Range|
|:---|:---|:---|:---|
|`numeric, decimal`|Variable|Fixed-point|Up to 131,072 digits before the decimal point; up to 16,383 digits after the decimal point|
|`real`|4 bytes|Floating-point|6 decimal digits precision|
|`double precision`|8 bytes|Floating-point|15 decimal digits precision|

To see how each of the three data types handles the same numbers, create a small table & insert a variety of test cases.

```
CREATE TABLE number_data_types (
    numeric_column numeric(20, 5),
    real_column real,
    double_column double precision
);

INSERT INTO number_data_types
VALUES (.7, .7, .7),
       (2.13579, 2.13579, 2.13579),
       (2.1357987654, 2.1357987654, 2.1357987654);

SELECT * FROM number_data_types;
```

We create a table with one column for each of the fractional data types & load three rows into the table. Each row repeats the same number across all three columns. When the last line of the script runs & we select everything from the table, we get the following:

<img src = "Number Data Types in Action.png" width = "600" style = "margin:auto"/>

Notice what happened. The `numeric` column, set with a scale of five, stores five digits after the decimal point whether or not you inserted that many. If fewer than five, it pads the rest with zeros. If more than five, it rounds them -- as with the third-row number with 10 digits after the decimal.

The `real` & `double precision` columns add no padding. On the third row, you see PostgreSQL's default behavior in those two columns, which is representation rather than show the entire value.

### Running into Trouble with Floating-Point Math

If you're thinking, "Well, numbers stored as a floating-point look just like numbers stored as fixed," tread cautiously. The way computers store floating-point numbers can lead to unintented mathematical errors. Look at what happens when we do some calculations on these numbers.

```
SELECT numeric_column * 10000000 AS fixed,
       real_column * 10000000 AS floating
FROM number_data_types
WHERE numeric_column = .7;
```

Here, we multiple the `numeric_column` & the `real_column` by 10 million & use a `WHERE` clause to filter out just the first row. We should get the same result for both calculations, right? Here's what the query returns:

<img src = "Rounding Issues with Float Columns.png" width = "600" style = "margin:auto"/>

No wonder floating-point types are referred to as "inexact". The reason floating-point math produces such errors is that the computer attempts to squeeze lots of information into a finite number of bits.

The storage required by the `numeric` data type is variable, & depending on the precision & scale specified, `numeric` can consume considerably more space than the floating-point types. If you're working with millions of rows, it's worth considering whether you can live with relatively inexact floating-point math.

## Choosing Your Number Data Type

For now, there are three guidelines to consider when you're dealing with number data types. If you're working with decimal data & need calculations to be exact (dealing with money, for example), choose `numeric` or its equivalent, `decimal`. Float types will save space, but use them only when exactness is not as important.

Choose a big enough number type. Unless you're designing a database to hold millions of rows, err on the side of bigger. Wehn using `numeric` or `decimal`, set the precision large enough to accomodate the number of digits on both sides of the decimal point. With whole numbers, use `bigint` unless you're absolutely sure column values will be constrained to fit into the smaller `integer` or `smallint` type.

---

# Understanding Dates & Times
