***
# Python Data Types and Methods: Numeric Types, Strings
***

In this session, we explore basic data types and methods that operate on them using Python methods associated with each type. In this and subsequent notebooks, we draw on material from various sources, including Jean Mark Gawron's book [Python for Social Science](http://gawron.sdsu.edu/python_for_ss/course_core/book_draft/) and the [Python 3 documentation](https://docs.python.org/3/).

## [Numeric Types](https://docs.python.org/3/library/stdtypes.html#typesnumeric)

We have already seen some of the basic interactions with numbers in Python. The main two numeric types are [`int`](https://docs.python.org/3/library/functions.html#int) and [`float`](https://docs.python.org/3/library/functions.html#float). 

In [1]:
type(12)

int

In [2]:
type(12.0)

float

In [3]:
# We can assign a value to a variable to reuse it.
x = 12.0
print(x)

12.0


In [4]:
# 0.1 cannot be precisely represented in binary.
print(1.2 - 1.1)

0.09999999999999987


You can cast the type of a number to convert it to a specified type, like converting from `float` to `int`:

In [5]:
print(x)
y = int(x)
print(y)

12.0
12


In [6]:
float(y)

12.0

### Why not use floats all the time?

Floating-point numbers are [internally represented](https://docs.python.org/3/tutorial/floatingpoint.html) in scientific notation (with **base 2** instead of 10) and susceptible to rounding error. While 0.5 can be easily represented as $2^{-1}$ or "0.1" in **binary**, 0.1 in **decimal** has no precise binary representation, and would be approximated as `0.0001100110011001100110011001100110011001100110011...`. **Rounding error** can accumulate across calculations.

During the Persian Gulf War (1991), 28 soldiers were killed after a Patriot Missile battery [failed to track and intercept an incoming Iraqi Scud missile](https://www-users.cse.umn.edu/~arnold/disasters/patriot.html). The system had kept track of elapsed time in 1/10 second increments, which led to an accumulated 0.34 seconds of error after the system had been running for about 100 hours.

Typically Python uses IEEE 754 double precision floats, and the precision varies linearly with the magnitude of the value.

![IEEE 754 precision](https://upload.wikimedia.org/wikipedia/commons/3/3f/IEEE754.png)

If two numbers are within the precision tolerance of each other, also known as [machine epsilon](https://en.wikipedia.org/wiki/Machine_epsilon), Python will not be able to tell the difference. `1e-15` is shorthand for the scientific notation $1 \times 10^{-15} = 10^{-15}$.

In [7]:
x + 1e-15 == x

False

In [8]:
x + 1e-16 == x

True

![xkcd floating point](https://imgs.xkcd.com/comics/e_to_the_pi_minus_pi.png)

A second reason is that floating-point numbers generally require more space in memory and on disk. Typically Python represents `int` with 32 **bits** (ones and zeros), but represents `float` with 64 **bits**.

### Built-in Operations for Numeric Data Types

Reviewing some of the built-in methods in Python that apply to numeric data types:

In [9]:
x = 200
y = 12

In [10]:
# Adding two values
x + y

212

In [11]:
# Subtracting
x - y

188

In [12]:
# Multiplying
x * y

2400

In [13]:
# Dividing
x / y

16.666666666666668

In [14]:
# Integer division -- floored quotient
x // y

16

In [15]:
# Modulo -- remainder of x / y
x % y

8

In [16]:
# Whether a number is even -- divisible by 2
79 % 2 == 0

False

In [17]:
# Flipping the sign
y = -x
y

-200

In [18]:
# Works in the other direction as well
-y

200

In [19]:
# Raising x to the power of y
x = 10
y = 5
x**y

100000

### Importing Additional Methods from the [math](https://docs.python.org/3.5/library/math.html) Library

In addition to the built-in functions and operators above, many more are available in the math library, which is always available, but you have to import the library to access them. A few examples below.

In [20]:
import math

In [21]:
round(math.pi, 4)

3.1416

In [22]:
math.sqrt(x)

3.1622776601683795

You can see the full list of functions available in the math library by using Tab after the name of the library and a dot.

In [23]:
# Uncomment the line below, then press Tab
# math.

And you can get more documentation on a specific function by asking for it:

In [24]:
?math.log

In [25]:
math.log(x)

2.302585092994046

In [26]:
# What happens if we take the log of a number with a value of 0?
x = 0
# math.log(x)

In [27]:
# We could add a 1 to x to avoid this problem
math.log(x + 1)

0.0

In [28]:
# Or we could use another log function that avoids returning an error
math.log1p(x)

0.0

In [29]:
# A common problem is division where the denominator has a value of zero
# y / x

In [30]:
y / (x + 1)

5.0

In [31]:
# Compare two values to see if they are approximately equal, within some tolerance
x = 12.1
z = 12.2
math.isclose(x, z, rel_tol=0.01)

True

In [32]:
# Of course you can put several operations together to compute things,
# like a quadratic equation.
a = 2
b = 3
c = 4
y = a + b * x + c * x**2
y

623.9399999999999

Python follows the mathematical order of operations: Parentheses, Exponents, Multiplication/Division, Addition/Subtraction (PEMDAS).

In [33]:
a + b * x

38.3

In [34]:
(a + b) * x

60.5

## [Strings](https://docs.python.org/3/library/stdtypes.html#str)

Strings are just text, like in the introductory "Hello World!" example.

Let's explore some methods that operate on them, and explain an important distinction between data types. Let's review quickly what we already know about strings. We can assign any string to a variable like we would assign an `int` or a `float` to a variable.

The string needs to be in quotes for variable assignment to work. The quotes can be single or double, but have to match, or you will get an error as Python can't find the end of the string.

In [35]:
a = "Uranium"
type(a)

str

What if you need to create a string that has multiple lines? There are two ways to create such a string. The first uses triple quotes.

In [36]:
X = """Dr. Strangelove
or: How I Learned to Stop Worrying and Love the Bomb"""

print(X)

Dr. Strangelove
or: How I Learned to Stop Worrying and Love the Bomb


In [37]:
# The second way uses \n to insert the line endings
X = "Dr. Strangelove\nor: How I Learned to Stop Worrying and Love the Bomb"
print(X)

Dr. Strangelove
or: How I Learned to Stop Worrying and Love the Bomb


The string object `X` actually has `\n` (newline) as part of it. The `print` function does not print this control character, it just starts a new line. If you just type `X`, the built-in function to print itself (`repr`) shows its contents:

In [38]:
X

'Dr. Strangelove\nor: How I Learned to Stop Worrying and Love the Bomb'

### Indexing and Slicing Strings

We can get individual elements of a string (characters) by using indexes, that give us positions within a string.

Counting in Python starts from zero. Essentially all counters are offsets from the first position. Think of it as the way building floors in Europe generally start with zero. The first floor in Europe would be a second floor in the U.S. Alternatively, think of it as analogous to the seconds on a stopwatch, which start elapsed time from 0 seconds.

In [39]:
a[0]

'U'

![donald knuth](https://imgs.xkcd.com/comics/donald_knuth.png)

We can use string indexing to extract a range, or a specific section of a string, beginning from any position and ending in any position.

Python uses a syntax that separates the starting from the ending index position by a colon. If we leave out the first or last, then the indexing gives all the values up to (but not including) the second value, or all the ones from the first value to the end. Some examples should make this clearer: 

In [40]:
a[1:5]

'rani'

In [41]:
a[:5]

'Urani'

In [42]:
a[92:]

''

Let's see how this is going... How would we get a slice of the string `a` that contains the first two elements?

### Working with Strings

In [43]:
# A variable containing a string is still an object
a

'Uranium'

`print` works with strings the same way as with numbers, suppressing the quotes.

In [44]:
print(a)

Uranium


In [45]:
a = "I am not a zombie!"

We can find the length of a string using the built-in `len` function

In [46]:
len(a)

18


Related to indexing, here is a string function to look up a specific substring within a string, and return its index, or position:

In [47]:
str.find(a, "z")

11

Let's see what other string functions are available, using tab completion after `str.`:

In [48]:
# str.

Some of these function names are pretty self-explanatory, like `capitalize`, but others are less so. As usual, you can look up some quick help on any of those functions:

In [49]:
?str.expandtabs

Since we assigned a string to the variable `a`, that variable is now an object of type string, and it has access to the string methods directly. In programming jargon this is known as [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming) or OOP.

In [50]:
print(a)

# Do this instead of str.find(a, "z")
a.find("z")

I am not a zombie!


11

We can check whether a string contains a character or substring:

In [51]:
print(a)
"R" in a

I am not a zombie!


False

We can remove characters at the beginning or end of a string with the `strip` method:

In [52]:
a.strip("!")

'I am not a zombie'

To remove any leading and trailing spaces from a string, just use the `strip` method with no argument:

In [53]:
b = " " + a
print(b)
print(b.strip())

 I am not a zombie!
I am not a zombie!


It is often helpful to put several operations together on one line, nesting them. Going from left to right, we first take the values from the 11th index value to the end of the string, then we strip the '!' from that result, and then we capitalize the result:

In [54]:
a[11:].strip("!").upper()

'ZOMBIE'

Another handy function lets you capitalize each word:

In [55]:
a.title()

'I Am Not A Zombie!'

We cannot assign a new letter to part of the string by its index location. This is because in Python, strings are an **immutable** data type. As we will see shortly, other data types like lists are **mutable**.

In [56]:
# This will raise a TypeError
# a[0] = 't'

There is a function that will let you replace string values by returning a new string:

In [57]:
print(a)
print(a.replace("!", "?"))

I am not a zombie!
I am not a zombie?


## Converting between string and numeric types

![rent is too damn high](https://pbs.twimg.com/profile_images/636007295634112512/L7cM-VMh.png)

In [58]:
rent = "2500"
type(rent)

str

Let's say we have a string object that contains numeric values and we want to do mathematical operations on it. What happens?

In [59]:
rent * 2

'25002500'

In [60]:
# This will raise a TypeError
# rent*1.5

If we need to do mathematical operations, we really need to convert this string object to a numeric type -- either an `int` or a `float`.

In [61]:
rent_int = int(rent)
type(rent_int)

int

In [62]:
rent_int * 2

5000

In [63]:
rent_float = float(rent)
rent_float

2500.0

You can also convert an `int` to a `float` by a mathematical operation that involves a floating-point component so that the result is coerced to type `float`:

In [64]:
rent_flt = rent_int * 1.5
rent_flt

3750.0

The `int` method won't convert a string that looks like a floating point number. How can we fix this?

In [65]:
# This will raise a ValueError
# rent_i = int('2500.0')

You can do this if you first convert to `float` and then convert to `int`:

In [66]:
rent_i = int(float("2500.0"))
print(rent_i)
type(rent_i)

2500


int

Of course, you sometimes may need to convert data from numeric to string type. It works the same way:

In [67]:
rent_str = str(rent_int)
rent_str

'2500'

## What You Learned


In this session, you learned how Python uses numeric data types like `int` and `float`, and how string data types work. You learned that Python indexing starts at 0 and learned how to do some string manipulations and convert data between types. Now would be a good time to review and experiment with these data types and methods to get comfortable with them.