# Times and places

**Dénes Csala**  
University of Bristol, 2021  

Based on *Elements of Data Science* ([Allen B. Downey](https://allendowney.com), 2021) and *Python Data Science Handbook* ([Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/), 2018)

License: [MIT](https://mit-license.org/)

In the previous workbook, you learned about variables and two kinds of values: integers and floating-point numbers.

In this chapter, you'll see some additional types:

* Strings, which represent text.

* Time stamps, which represent dates and times.

* And several ways to represent and display geographical locations.

Not every data science project uses all of these types, but many projects use at least one.

## Strings

A **string** is a sequence of letters, numbers, and punctuation marks.
In Python you can create a string by typing letters between single or double quotation marks.

In [7]:
'Hello'

'Hello'

In [8]:
"World"

'World'

And you can assign string values to variables.

In [3]:
first = 'Data'

In [4]:
last = "Science"

Some arithmetic operators work with strings, but they might no do what you expect.  For example, the `+` operator "concatenates" two strings; that is, it creates a new string that contains the first string followed by the second string:

In [9]:
first + last

'DataScience'

If you want to put a space between the words, you can use a string that contains a space:

In [10]:
first + ' ' + last

'Data Science'

Strings are used to store text data like names, addresses, titles, etc.

When you read data from a file, you might see values that look like numbers, but they are actually strings, like this:

In [11]:
not_actually_a_number = '123'

If you try to do math with these strings, you *might* get an error.
For example, the following expression causes a `TypeError` with the message "can only concatenate `str` (not `int`) to `str`".

```
not_actually_a_number + 1
```

But you don't always get an error; instead, you might get a surprising result.  For example:

In [12]:
not_actually_a_number * 3

'123123123'

If you multiply a string by an integer, Python repeats the string the given number of times.

If you have a string that contains only digits, you can convert it to an integer using the `int` function:

In [13]:
int('123')

123

Or you can convert it to a floating-point number using `float`:

In [14]:
float('123')

123.0

But if the string contains a decimal point, you can't convert it to an `int`.

Going in the other direction, you can convert any type of value to a string using `str`:

In [15]:
str(123)

'123'

In [16]:
str(12.3)

'12.3'

## Dates and times

If you read data from a file, you might also find that dates and times are represented with strings.

In [17]:
not_really_a_date = 'June 4, 1989'

To confirm that this value is a string, we can use the `type` function, which takes a value and reports its type.

In [18]:
type(not_really_a_date)

str

`str` indicates that the value of `not_really_a_date` is a string.

We get the same result with `not_really_a_time`, below:

In [19]:
not_really_a_time = '6:30:00'
type(not_really_a_time)

str

Strings that represent dates and times a readable for people, but they are not useful for computation.

Fortunately, Python provides libraries for working with date and time data; the one we'll use is called Pandas.
As always, we have to import a library before we use it; it is conventional to import Pandas with the abbreviated name `pd`:

In [20]:
import pandas as pd

Pandas provides a type called `Timestamp`, which represents a date and time.

It also provides a function called `Timestamp`, which we can use to convert a string to a `Timestamp`:

In [21]:
pd.Timestamp('6:30:00')

Timestamp('2021-11-22 06:30:00')

Or we can do the same thing using the variable defined above.

In [23]:
pd.Timestamp(not_really_a_time)

Timestamp('2021-11-22 06:30:00')

You can do the same with the `to_datetime()` function.

In [24]:
pd.to_datetime(not_really_a_time)

Timestamp('2021-11-22 06:30:00')

In this example, the string specifies a time but no date, so Pandas fills in today's date.

A `Timestamp` is a value, so you can assign it to a variable.

In [25]:
date_of_birth = pd.Timestamp('June 4, 1989')
date_of_birth

Timestamp('1989-06-04 00:00:00')

If the string specifies a date but no time, Pandas fills in midnight as the default time.

If you assign the `Timestamp` to a variable, you can use the variable name to get the year, month, and day, like this:

In [26]:
date_of_birth.year, date_of_birth.month, date_of_birth.day

(1989, 6, 4)

You can also get the name of the month and the day of the week.

In [27]:
date_of_birth.day_name(), date_of_birth.month_name()

('Sunday', 'June')

`Timestamp` provides a function called `now` that returns the current date and time.

In [35]:
now = pd.Timestamp.now()
now

Timestamp('2021-11-22 09:11:51.839086')

Same as:

In [34]:
pd.to_datetime('now')

Timestamp('2021-11-22 09:11:44.248622')

## Timedelta

`Timestamp` values support some arithmetic operations.  For example, you can compute the difference between two `Timestamps`:

In [36]:
age = now - date_of_birth
age

Timedelta('11859 days 09:11:51.839086')

The result is a `Timedelta` that represents the current age of someone born on `date_of_birth`.
The `Timedelta` contains `components` that store the number of days, hours, etc. between the two `Timestamp` values.

In [37]:
age.components

Components(days=11859, hours=9, minutes=11, seconds=51, milliseconds=839, microseconds=86, nanoseconds=0)

You can get one of the components like this:

In [38]:
age.days

11859

The biggest component of `Timedelta` is days, not years, because days are well defined and years are problematic.

Most years are 365 days, but some are 366.  The average calendar year is 365.24 days, which is a very good approximation of a solar year, but it is not exact (see <https://pumas.jpl.nasa.gov/files/04_21_97_1.pdf>).

One way to compute age in years is to divide age in days by 365.24:

In [39]:
age.days / 365.24

32.469061439053775

But people usually report their ages in integer years.  We can use the Numpy `floor` function to round down:

In [40]:
import numpy as np

np.floor(age.days / 365.24)

32.0

Or the `ceil` function (which stands for "ceiling") to round up:

In [41]:
np.ceil(age.days / 365.24)

33.0

We can also compare `Timestamp` values to see which comes first.
For example, let's see if a person with a given birthdate has already had a birthday this year.
Here's a new `Timestamp` with the year from `now` and the month and day from `date_of_birth`.

In [42]:
bday_this_year = pd.Timestamp(now.year, 
                              date_of_birth.month, 
                              date_of_birth.day)
bday_this_year

Timestamp('2021-06-04 00:00:00')

The result represents the person's birthday this year.  Now we can use the `>` operator to check whether `now` is later than the birthday:

In [43]:
now > bday_this_year

True

The result is either `True` or `False`.
These values belong to a type called `bool`, short for "Boolean algebra", which is a branch of algebra where all values are either true or false. 

In [44]:
type(True)

bool

In [45]:
type(False)

bool