# Python Data Types and Methods: Numeric Types, Strings and Lists

In this session, we explore basic data types and methods that operate on them using Python methods associated with each type.  In this and subsequent notebooks, we draw on material from various sources, including Jean Mark Gawron's book [Python for Social Science](http://gawron.sdsu.edu/python_for_ss/course_core/book_draft/) and the [Python 3 documentation](https://docs.python.org/3/).

## [Numeric Types](https://docs.python.org/3/library/stdtypes.html#typesnumeric)

We have already seen some of the basic interactions with numbers in Python.  The main two numeric types are [`int`](https://docs.python.org/3/library/functions.html#int) and [`float`](https://docs.python.org/3/library/functions.html#float).  In Python 2 there were two versions of integers (`int` and `long`), but these have been unified in Python 3.

In [1]:
type(12)

int

In [2]:
type(12.0000000000001)

float

In [3]:
# We can assign values to a variable to reuse them.
x = 12.
print(x)

12.0


Why not use floats all the time?  [Several reasons](https://docs.python.org/3/tutorial/floatingpoint.html). 

Floating point numbers are internally represented in scientific notation (base 2 instead of 10) and susceptible to rounding error. While 0.5 can be easily represented as $2^{-1}$ or "0.1" in binary, 0.1 in decimal has no precise binary representation, and would be approximated as "0.0001100110011001100110011001100....". Rounding error can accumulate across calculations with [catastrophic results](http://www-users.math.umn.edu/~arnold/disasters/patriot.html).

Typically Python uses IEEE 754 double precision floats, and the precision varies linearly with the magnitude of the value.

![IEEE 754 precision](https://upload.wikimedia.org/wikipedia/commons/3/3f/IEEE754.png)

Also, floating point numbers generally require more space in memory and on disk. Typically Python represents `int` with 32 bits (ones and zeros) but represents `float` with 64 bits.

If two numbers are within the precision tolerance of each other, known as [machine epsilon](https://en.wikipedia.org/wiki/Machine_epsilon), Python will not be able to tell the difference.

In [4]:
x + 10**-15 == x

False

In [5]:
x + 10**-16 == x

True

![xkcd floating point](https://imgs.xkcd.com/comics/e_to_the_pi_minus_pi.png)

### Built-in Operations for Numeric Data Types

Reviewing some of the built-in methods in Python that apply to numeric data types:

In [6]:
x = 200
y = 12

# Adding two values
x + y

212

In [7]:
# Subtracting
x - y

188

In [8]:
# Multiplying
x * y

2400

In [9]:
# Dividing
x / y

16.666666666666668

In [10]:
# Integer division -- floored quotient
x // y

16

In [11]:
# Remainder of x / y
x % y

8

In [12]:
# Flipping the sign
y = -x
y

-200

In [13]:
# Works in the other direction as well
-y

200

In [14]:
# Raising x to the power of y
x = 10
y = 5
x**y

100000

### Importing Additional Methods from the [math](https://docs.python.org/3.5/library/math.html) Library

In addition to the built-in functions and operators above, many more are available in the math library, which is always available, but you have to import the library to access them.  A few examples below.

In [15]:
import math
math.sqrt(x)

3.1622776601683795

You can see the full list of functions available in the math library by using tab after the name of the library and a dot:

In [16]:
#math.

And you can get more documentation on a specific function by asking for it:

In [17]:
math.log?

In [18]:
math.log(x)

2.302585092994046

In [19]:
# What happens if we take the log of a number with a value of 0?
x = 0
math.log(x)

ValueError: math domain error

In [20]:
# We could add a 1 to x to avoid this problem
math.log(x + 1)

0.0

In [21]:
# Or we could use one of the other log functions in math that does this and avoids returning an error
math.log1p(x)

0.0

In [22]:
# A common problem is division where the denominator has a value of zero
y / x

ZeroDivisionError: division by zero

In [23]:
y / (x + 1)

5.0

In [24]:
# Comparing two values to see if they are approximately the same, within some tolerance:
x = 12.001
z = 12.002
math.isclose(x, z, rel_tol=0.001)

True

In [25]:
# Of course you can put several operations together to compute things, like a quadratic equation.  
# We will see how to do this on set of numbers a bit later.
a = 2
b = 3
c = 4
y = a + b * x + c * x**2 
y

614.099004

## [Strings](https://docs.python.org/3/library/stdtypes.html#str)

Strings are just text, like in the introductory "Hello World!" example.  

Let's explore some methods that operate on them, and explain an important distinction between data types.  Let's review quickly what we already know about strings.  We can assign any string to a variable like we would assign an `int` or a `float` to a variable:

In [26]:
# Try this first just with a text string assigned to a variable
a = pierogi

NameError: name 'pierogi' is not defined

In [27]:
# The string needs to be in quotes for this variable assignment to work
a = 'pierogi'
type(a)

str

In [28]:
# The quotes can be single or double, but have to match, 
# or you will get an error as Python can't find the end of the string.
a = 'pierogi"

SyntaxError: EOL while scanning string literal (<ipython-input-28-78865934fc9e>, line 3)

What if you need to create a string that has multiple lines?  There are two ways to create such a string.  The first uses triple quotes.

In [29]:
X = """
  The Zen of Python:
  
  Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.
"""

print(X)


  The Zen of Python:
  
  Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.



In [30]:
# The second way uses \n to insert the line endings
X = "\n   Beautiful is better than ugly.\n   Explicit is better than implicit.\n   Simple is better than complex.\n   Complex is better than complicated."
print(X)


   Beautiful is better than ugly.
   Explicit is better than implicit.
   Simple is better than complex.
   Complex is better than complicated.


In [31]:
# Notice that the string object X actually has \n line endings as part of it. 
# The print function does not print those characters, it just starts a new line.
# But if you just type X, its built-in function to print itself shows its contents:
X

'\n   Beautiful is better than ugly.\n   Explicit is better than implicit.\n   Simple is better than complex.\n   Complex is better than complicated.'

### Indexing and Slicing Strings

We can get individual elements of a string (characters) by using indices, that give us pointers to the positions within a string.  

![xkcd knuth](https://imgs.xkcd.com/comics/donald_knuth.png)

Notice that counting in Python starts from zero -- essentially all counters are offsets from the first position. This can take a bit of getting used to -- think of it like the way building floors in Europe generally start with zero.  The first floor in Europe would be a second floor in the U.S.

In [32]:
a[0]

'p'

We can use a the string indexing method to extract a range, or a specific section of a string, beginning from any position and ending in any position.  

Python uses a syntax that separates the starting from the ending index position by a colon.  If we leave out the first or last, then the indexing gives all the values up to (but not including) the second value, or all the ones from the first value to the end.  Some examples should make this clearer: 

In [33]:
a[1:5]

'iero'

In [34]:
a[:5]

'piero'

In [35]:
a[8:]

''

### Working with Strings

In [36]:
# A variable containing a string is still an object
a

'pierogi'

Print works with strings the same way as with numbers, suppressing the quotes.

In [37]:
print(a)

pierogi


In [38]:
a = 'This is cool!'

We can find the length of a string using the built-in `len` function.

In [39]:
len(a)

13


Related to indexing, here is a string function to look up a specific substring within a string, and return its index, or position:

In [40]:
str.find(a, 'c')

8

Let's see what other string functions are available, using tab completion after `str.`:

In [41]:
str.

SyntaxError: invalid syntax (<ipython-input-41-8c081d95124d>, line 1)

Some of these function names are pretty self-explanatory, like 'capitalize', but others are less so.  As usual, you can look up some quick help on any of those functions:

In [42]:
str.expandtabs?

Note that since we assigned a string to a variable, a, that variable is now an object of type `str`, and it has access to the `str` methods directly. In programming jargon this is known as [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming) or OOP.

In [43]:
print(a)
a.find('T')

This is cool!


0

We can check whether a string contains a character or substring:

In [44]:
'R' in a

False

We can remove specific characters in a string with the `strip` method:

In [45]:
a.strip('!')

'This is cool'

To remove any leading and trailing spaces from a string, just use the strip function with no argument:

In [46]:
b = ' ' + a
print(b)
print(b.strip())

 This is cool!
This is cool!


It is often helpful to put several operations together on one line, nesting them.  Going from left to right, we first take the values from the 8th index value to the end of the string, and then we strip the '!' from that result, and then we capitalize the result:

In [47]:
a[8:].strip('!').upper()

'COOL'

Another handy function lets you capitalize each word:

In [48]:
a.title()

'This Is Cool!'

Note that we cannot assign a new letter to part of the string by its index location.  This is because in Python, strings are an **immutable** data type.  As we will see shortly, other data types like lists are **mutable**.

In [49]:
a[0] = 't'

TypeError: 'str' object does not support item assignment

There is a function that will let you replace string values, however:

In [50]:
print(a)
print(a.replace('!', '?'))

This is cool!
This is cool?


## Converting between string and numeric types

In [51]:
rent = '2500'
type(rent)

str

Let's say we have a string object that contains numeric values and we want to do mathematical operations on it.  What happens?

In [52]:
rent * 2

'25002500'

In [53]:
rent * 1.5

TypeError: can't multiply sequence by non-int of type 'float'

If we need to do mathematical operations, we really need to convert this string object to a numeric type -- either an `int` or a `float`.

In [54]:
rent_int = int(rent)
type(rent_int)

int

In [55]:
rent_int * 2

5000

In [56]:
rent_float = float(rent)
rent_float

2500.0

Recall that you can also convert an `int` to a `float` by a mathematical operation that involves a floating point component so that the result is coerced to type `float`:

In [57]:
rent_flt = rent_int * 1.5
rent_flt

3750.0

But notice that the `int` method won't convert a string that looks like a floating point number:

In [58]:
rent_i = int('2500.0')

ValueError: invalid literal for int() with base 10: '2500.0'

But you can do this if you first convert to float and then convert to int:

In [59]:
rent_i = int(float('2500.0'))
print(rent_i)
type(rent_i)

2500


int

Of course, you sometimes may need to convert data from numeric to string type.  It works the same way:

In [60]:
rent_str = str(rent_int)
rent_str

'2500'

## [Lists](https://docs.python.org/3/library/stdtypes.html#list)

You can think of strings as an ordered list of characters.  In Python, **lists** are another basic data type. Lists can contain any kind of object: strings, integers, floats, and others -- in any combination.  The syntax for lists is to include them as a sequence separated by commas, and enclosed in square brackets.  

### Creating Lists

We can create an empty list, and add elements to it:

In [61]:
mylist = []
mylist.append('this')

In [62]:
mylist

['this']

Notice that we can add lists, like we can add strings, to contatenate them:

In [63]:
# Besides using append as above, we can use + to add a list to a list, in this case we are adding a list with 1 item
mylist = mylist + ['that']

# We can also insert items in a specified location in a list
mylist.insert(1, 'and')

In [64]:
mylist

['this', 'and', 'that']

We can also convert a string that might be a sentence, or a line of data, to a list, so we can work with its elements more easily:

In [65]:
print('a = ', a)
b = str.split(a)
print('b = ', b)

a =  This is cool!
b =  ['This', 'is', 'cool!']


In [66]:
# And recalling that a is a string object, we can use the split function directly on a
a.split()

['This', 'is', 'cool!']

### Indexing Lists

Note that indexing works for lists like it does for strings.  And if you have a list of strings, you can index into both in a nested way.

In [67]:
# What is the content of the first item in the list?
mylist[0]

'this'

In [68]:
# What is the content of the last item in the list? We can use the index value -1 to get the last item
mylist[-1]

'that'

To get a range of values from a list, use a slice of the index values: [0:2] would get the first through the 2nd entry, since the range goes up to, but does not include, the value of the index after the colon.

In [69]:
mylist[0:2]

['this', 'and']

In [70]:
# How would we find the first character of the second word in our list?
mylist[1][0]

'a'

### Working with Lists

What functions are available for `list` objects?

In [71]:
#list.

In [72]:
# Find out the length of a list using len
len(mylist)

3

In [73]:
# Let's count the number of times we encounter a character in the list, or a word
a.count('o')

2

You can check whether a list contains an item, just as we did with strings.

In [74]:
'this' in mylist

True

In [75]:
# Delete the 3rd item in the list (remember it is indexed from 0). Let's make a copy of the list first
# since del is an inplace deletion
shortlist = mylist.copy()
del shortlist[2]
shortlist

['this', 'and']

Remember that strings are immutable and we were unable to directly substitute a value of a character based on its index position?  Well, **lists are mutable**, and it does work to replace a value directly by its index value:

In [76]:
b[2] = 'mutable!'
b

['This', 'is', 'mutable!']

and we can put the list of strings together again to make a string from a list, inserting a space between each element:

In [77]:
c = str.join(' ', b)
c

'This is mutable!'

We can reverse the order of the items in a list. Notice that this is an in place operation.  Try it twice.

In [78]:
b.reverse()
b

['mutable!', 'is', 'This']

We can use the `sort` function to order the list.  Let's try it with a list of numbers first.

In [79]:
nums = [1, 3, 4, 5, 8, 6]
nums.sort()
nums

[1, 3, 4, 5, 6, 8]

In [80]:
nums.reverse()
nums

[8, 6, 5, 4, 3, 1]

And now with a list of words.

In [81]:
words = ['A', 'big', 'apple', 'pie']
words.sort()
print(words)

['A', 'apple', 'big', 'pie']


Note that `-1` indexes the last item in a list

In [82]:
words[-1]

'pie'

and that the `:-1` below indexes into the string in an item in a list

In [83]:
words[-1][:-1]

'pi'

There is a `range` function that is often helpful in creating a list of integers.  It requires one argument (the length of the range) but can optionally accept arguments for the start, end, and step size of the range.

In [84]:
a = list(range(10))
print(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [85]:
b = list(range(1, 5))
print(b)

[1, 2, 3, 4]


In [86]:
c = list(range(10, 100, 5))
print(c)

[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]


### Practice: Creating and Sorting List ###
Write code that creates a list with even numbers from 0 to 100 (including 100) and print the result in reverse order. 


In [87]:
a = list(range(0, 101, 2))
a.reverse()
print(a)

[100, 98, 96, 94, 92, 90, 88, 86, 84, 82, 80, 78, 76, 74, 72, 70, 68, 66, 64, 62, 60, 58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38, 36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0]


### Practice: List Indexing##

Let's say we have a list called `thing` containing the integers from 1 to 7 and we have variables `low = 2` and `high = 5`:

For each operation below, first think about what you think the answer will be, then write it as code in the cell below and confirm that it does what you expected. For readability add one cell at a time and execute it, starting by creating a list called `thing` with the integer values 1...7 then answer each question below.

1- What does `thing[low:high]` do?

2- What does `thing[low:]` (without a value after the colon) do?

3- What does `thing[:high]` (without a value before the colon) do?

4- What does `thing[-1]` (just a colon) do?

5- What does `thing[:-1]` (just a colon) do?

6- What does `thing[:]` (just a colon) do?

7- How long is the list `thing[low:high]`?

In [88]:
#start by creating a list called 'thing' with the integers from 1 to 7, without typing them into a list
thing = list(range(1, 8))
low = 2
high = 5
print('thing:', thing)
print('thing[low:high]:', thing[low:high])
print('thing[low:]:', thing[low:])
print('thing[:high]:', thing[:high])
print('thing[-1]:', thing[-1])
print('thing[:-1]:', thing[:-1])
print('thing[:]:', thing[:])
print('len(thing[low:high]):', len(thing[low:high]))

thing: [1, 2, 3, 4, 5, 6, 7]
thing[low:high]: [3, 4, 5]
thing[low:]: [3, 4, 5, 6, 7]
thing[:high]: [1, 2, 3, 4, 5]
thing[-1]: 7
thing[:-1]: [1, 2, 3, 4, 5, 6]
thing[:]: [1, 2, 3, 4, 5, 6, 7]
len(thing[low:high]): 3


`thing[:]` is a shallow copy of thing, not a reference.

In [89]:
thing is thing[:]

False

### Practice: Inserting Elements in a List

Write code that add the name `Norah` to the following list, after the name `Michael`:

In [90]:
names = ['Akshara','Anna','Aqshems','Chester','Echo','James','Jessica','Matthew','Michael','Philip','Sarah'] 

In [91]:
names.insert(names.index('Michael') + 1, 'Norah')
print(names)

['Akshara', 'Anna', 'Aqshems', 'Chester', 'Echo', 'James', 'Jessica', 'Matthew', 'Michael', 'Norah', 'Philip', 'Sarah']


### Practice: List Manipulation###

We have two lists, `a` and `b`, `a = [10, 20, 30]` `b = [30, 60, 90]`. Write code that give us the following outputs as a list:

    [[10, 20, 30], [30, 60, 90]]
    [10, 20, 30, 30, 60, 90]
    [10, 20, 60, 90]

In [92]:
a = [10, 20, 30]
b = [30, 60, 90]

A = [a, b]
B = [a + b]
C = a[:2] + b[1:]

print(A,'\n', B, '\n',C)

[[10, 20, 30], [30, 60, 90]] 
 [[10, 20, 30, 30, 60, 90]] 
 [10, 20, 60, 90]


### Practice: String Manipulation

Turn the string below into `all good countrymen` using the minimum amount of code, using the methods covered so far. A couple of lines of code should do the trick.  Note: this requires `str` and `list` methods.

In [93]:
s = "Now is the time for all good men to come to the aid of their country!"

In [94]:
tokens = s.split(" ")
tokens[5]+ " " + tokens[6] + " " + tokens[-1][:-1] + tokens[7]

'all good countrymen'

### Practice: String Manipulation and Type Conversion

Using variable `a`, print how much Sarah earns monthly.

In [95]:
a = "Sarah earns $96500 in a year"

In [96]:
yearly_salary = int(a[a.index('$') + 1:].split()[0])
monthly_salary = round((yearly_salary) / 12, 2)
print("Sarah earns $", monthly_salary, "monthly.")

Sarah earns $ 8041.67 monthly.


## What You Learned


In this session, you learned how Python uses numeric data types like `int` and `float`, 
how it uses `str` to store text,
and how it uses `list` to handle multiple items. You saw
that the items in lists can be changed, and that you can join one
list to another list.  And you got some practice using these data types and their methods.