# Data Types
The basic data types in python include types that contain numbers, *text*, **sequences**, and boolean. Many of these data types have *methods* associated with them. Methods are functions associated with certain classes, like data types. Date types often use methods with the syntax
```python
varname.method(<input arguments>)
```
where `method` returns an altered `varname`, but does not always edit the original `varname` directly
## Numeric
Python has three data types for numbers.

In [1]:
x = 137
y = 1.618
z = 3+5j
print(type(x))
print(type(y))
print(type(z))

<class 'int'>
<class 'float'>
<class 'complex'>


Notice that Python automatically recognizes the data type for each type of number. There is no need to explicitly specify it.

### Arrays
Arrays are a datatype found in `numpy`; therefore, you must always import `numpy` **before** using them. Arrays in Python are:
1. fixed in size
1. all elements must be of the same type
1. multi-dimensional

The *array* command is stored in the package _numpy_. Arrays can always be used in arithmetic operations, and they store data more efficiently than lists, so when storing lots of data, use arrays.

In [3]:
from numpy import array
a = array([[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25],[26,27,28,29,30],[31,32,33,34,35]])
print(a)

[[11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]
 [26 27 28 29 30]
 [31 32 33 34 35]]


![python indexing](figures/pythonarray.png)

You can slice lists and arrays with a colon **:**

- a[m:n] = subset of *a* starting at index *m* up to but not including *n*
- a[m:n,p:q] = subset of a 2-D array *a* starting in row *m* and column *p* up to but not including row *n* and column *q* 
- a[m:n:step,p:q:step]  = subset of array *a* choosing an element at each step

Omitting numbers before colon means start at the beginning and after the colon means go to end

In [4]:
a

array([[11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35]])

In [7]:
a[2,3]

24

In [9]:
a[::2,::2]

array([[11, 13, 15],
       [21, 23, 25],
       [31, 33, 35]])

### Useful Functions
Some functions in *numpy* help creating and importing arrays. You may also turn lists into arrays.

In [10]:
import numpy as np
A=np.zeros([3,3],int)
B=np.ones([3,3],float)
print(A)
print(B)

[[0 0 0]
 [0 0 0]
 [0 0 0]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [11]:
list1 = [3,5,1,3,5,6]
print(list1)

[3, 5, 1, 3, 5, 6]


In [12]:
np.array(list1,float)

array([3., 5., 1., 3., 5., 6.])

Wait a minute! Are you saying I can only get 1D arrays from lists? Well, no. Let's use **lists of lists**

In [18]:
list2 = [[1,4,"crap"],[3,4,9],[-1,4,2]]
list2[1][2]

9

In [20]:
newarray = np.array(list2,str)
print(newarray)

[['1' '4' 'crap']
 ['3' '4' '9']
 ['-1' '4' '2']]


You can load data from files **fairly** easily with *loadtxt*. Documentation on *dtype* and the notation used below is available at https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.dtype.html

In [22]:
test=np.loadtxt("data/globaltemps.txt", dtype={'names': ('year','temp'),
...                      'formats': ('i4', 'f4')})
type(test)

numpy.ndarray

In [23]:
id = test['temp']>0.5
tp=test['year'][id]
print(tp)

[1998 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
 2014 2015 2016 2017 2018 2019 2020]


In [24]:
test['temp']

array([-0.15, -0.07, -0.1 , -0.16, -0.27, -0.32, -0.3 , -0.35, -0.16,
       -0.1 , -0.34, -0.22, -0.27, -0.31, -0.3 , -0.22, -0.1 , -0.1 ,
       -0.26, -0.17, -0.07, -0.15, -0.27, -0.36, -0.46, -0.25, -0.21,
       -0.37, -0.42, -0.47, -0.43, -0.43, -0.35, -0.33, -0.14, -0.13,
       -0.35, -0.45, -0.28, -0.27, -0.26, -0.18, -0.28, -0.26, -0.26,
       -0.21, -0.1 , -0.21, -0.19, -0.35, -0.15, -0.09, -0.16, -0.28,
       -0.12, -0.2 , -0.15, -0.03,  0.  , -0.02,  0.13,  0.19,  0.07,
        0.09,  0.2 ,  0.09, -0.07, -0.03, -0.11, -0.11, -0.17, -0.07,
        0.01,  0.08, -0.13, -0.14, -0.19,  0.05,  0.06,  0.03, -0.02,
        0.06,  0.03,  0.05, -0.2 , -0.11, -0.06, -0.02, -0.08,  0.05,
        0.03, -0.08,  0.01,  0.16, -0.07, -0.01, -0.1 ,  0.18,  0.07,
        0.16,  0.26,  0.32,  0.14,  0.31,  0.16,  0.12,  0.18,  0.32,
        0.39,  0.27,  0.45,  0.4 ,  0.22,  0.23,  0.31,  0.44,  0.33,
        0.46,  0.61,  0.38,  0.39,  0.54,  0.63,  0.62,  0.54,  0.68,
        0.64,  0.67,

In [25]:
print(test.tolist()) # convert array to list
print(test)

[(1880, -0.15000000596046448), (1881, -0.07000000029802322), (1882, -0.10000000149011612), (1883, -0.1599999964237213), (1884, -0.27000001072883606), (1885, -0.3199999928474426), (1886, -0.30000001192092896), (1887, -0.3499999940395355), (1888, -0.1599999964237213), (1889, -0.10000000149011612), (1890, -0.3400000035762787), (1891, -0.2199999988079071), (1892, -0.27000001072883606), (1893, -0.3100000023841858), (1894, -0.30000001192092896), (1895, -0.2199999988079071), (1896, -0.10000000149011612), (1897, -0.10000000149011612), (1898, -0.25999999046325684), (1899, -0.17000000178813934), (1900, -0.07000000029802322), (1901, -0.15000000596046448), (1902, -0.27000001072883606), (1903, -0.36000001430511475), (1904, -0.46000000834465027), (1905, -0.25), (1906, -0.20999999344348907), (1907, -0.3700000047683716), (1908, -0.41999998688697815), (1909, -0.4699999988079071), (1910, -0.4300000071525574), (1911, -0.4300000071525574), (1912, -0.3499999940395355), (1913, -0.33000001311302185), (1914, 

## Sequences
Sequences in Python include `lists`, `tuples` (`ranges`), and `strings`. Lists are a simple way of storing related data, much like arrays or tables in MATLAB. Tuples act much like lists, but are immutable, meaning once created they *cannot* be modified. Ranges refer to the built-in function `range` (and its many similar cousins: `arange`, `xrange`, etc). `range` is used to create a range of values. `strings` allow for text to be stored and manipulated.
### Lists

In [32]:
list1=['photon', 'W+ boson', 'W- boson', 'Z boson', 'gluon', 'graviton'];

In [33]:
list1[4]

'gluon'

In [34]:
notreal = list1.pop()
print(notreal)

graviton


In [35]:
list1

['photon', 'W+ boson', 'W- boson', 'Z boson', 'gluon']

In [36]:
x = list1.append(notreal)
print(x)

None


In [76]:
list1[0:4]
'photon' in list1

True

### Tuples

In [38]:
tuple1 = ('photon', 'W+ boson', 'W- boson', 'Z boson', 'gluon', 'graviton')

In [39]:
tuple1[3:5]

('Z boson', 'gluon')

In [40]:
tuple1[3]='gamma'

TypeError: 'tuple' object does not support item assignment

In [41]:
(em,weak1,weak2,weak3,strong,gravity)=tuple1
print(weak2)
print(strong)

W- boson
gluon


you often need to switch the values of two variables, tuples gives you an easy way:

In [42]:
a='switch'
b=3.1415
(a,b)=(b,a)
print(a)
print(b)

3.1415
switch


In [44]:
r1=range(6)
r2=range(1,19)
r3=range(1,44,4)
print(r1)
print(r2)
print(r3)
print(type(r3))

range(0, 6)
range(1, 19)
range(1, 44, 4)
<class 'range'>


In [49]:
for k in tuple1:
    print(k)

photon
W+ boson
W- boson
Z boson
gluon
graviton


### Strings
str is the only data type used to contain text in Python

In [46]:
text1 = "There is no difference between characters and strings. "
text2 = 'Single or double quotes give you the same thing'
print(type(text1))
print(type(text2))
print(text1[9])
print(text2[0:6])
print(text1+text2)

<class 'str'>
<class 'str'>
n
Single
There is no difference between characters and strings. Single or double quotes give you the same thing


`str` is quite the flexible data type in Python. Below are some examples of ways to use strings. Many more methods can be found at [string methods](https://www.programiz.com/python-programming/methods/string/join)

In [52]:
gettysburg = "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth."
substr= " we "
print(gettysburg.count(substr)) # count 'we's in the address
print(gettysburg.casefold().count(substr)) # counts we's regardless of capitalization
print(gettysburg.find(substr))
print(gettysburg[181:183])
gettysburg.replace(substr,"THE FAMILy")

8
10
180
we


'Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. NowTHE FAMILyare engaged in a great civil war, testing whether that nation, or any nation so conceived and dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper thatTHE FAMILyshould do this. But, in a larger sense,THE FAMILycan not dedicate --THE FAMILycan not consecrate --THE FAMILycan not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember whatTHE FAMILysay here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work w

## Dictionaries
The basic data container in Python is a list. A `list` can hold a variety of data types. Once a `list` exists, it can be appended, have elements deleted, and be transformed into `arrays` or `tuples`. Of course, there are more sophisicated ways of storing/accessing/processing data in Python. Let's look at two: `dictionaries` and `dataframes`. Content below is generously sampled from http://openbookproject.net/thinkcs/python/english3e/dictionaries.html.
Dictionaries are lists of values that are indexed by `keys`. In other languages, these dictionaries are often referred to associative arrays because it is a way to associate `keys` with values. Examples of using dictionaries:
* English to Spainish dictionary

In [56]:
entosp = {"one":"uno","two":"dos","three":"tres"} # `key:value` elements
entosp["one"]

'uno'

* Inventory of supplies

In [57]:
office = {"highlighters":200,"pencils":300,"pens":500}
office['pens']

500

You could construct a dictionary for a encryption code:

In [59]:
codex = { "a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6, "g": 7, "h": 8, "i": 9, "j": 10,"k": 11,"l": 12,"m": 13,"n": 14,
         "o": 15,"p": 16,"q": 17,"r": 18,"s": 19,"t": 20,"u": 21,"v": 22,"w": 23,"x": 24,"y": 25}
print(codex)
print(codex["g"])

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25}
7


You can add new keys after the initial definiton:

In [60]:
codex["z"]=26
print(codex)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


You can update the values in the dictionary

In [61]:
office["pens"]+=200
print(office)

{'highlighters': 200, 'pencils': 300, 'pens': 700}


You can find out how many `key:values` pairs a dictionary has

In [62]:
len(codex)

26

### Dictionary Methods
Dictionaries are a type of class in Python, and classes have functions associated with them called `methods`. Methods allow you to perform operations on and with the dictionary. 
You can always get the keys and the values from any dictionary, using the methods `keys` and `values`

In [63]:
print(office.values())
print(office.keys())
print(list(office.values()))

dict_values([200, 300, 700])
dict_keys(['highlighters', 'pencils', 'pens'])
[200, 300, 700]


You can run over all the keys to process the values or run through the values explicitly.

In [64]:
print("Let's count in Spanish")
for k in entosp.keys():   # The order of the k's is not defined
   print(entosp[k])


Let's count in Spanish
uno
dos
tres


In [65]:
print("Let's look at the code")
for k in codex:   # The order of the k's is defined
   print(k,"oops") # 

Let's look at the code
a oops
b oops
c oops
d oops
e oops
f oops
g oops
h oops
i oops
j oops
k oops
l oops
m oops
n oops
o oops
p oops
q oops
r oops
s oops
t oops
u oops
v oops
w oops
x oops
y oops
z oops


The method `items` can be used to iterate over the keys and values or to grab both the keys and the values.

In [66]:
list(office.items())

[('highlighters', 200), ('pencils', 300), ('pens', 700)]

In [67]:
for (k,v) in codex.items():
    print(k, "=", v)

a = 1
b = 2
c = 3
d = 4
e = 5
f = 6
g = 7
h = 8
i = 9
j = 10
k = 11
l = 12
m = 13
n = 14
o = 15
p = 16
q = 17
r = 18
s = 19
t = 20
u = 21
v = 22
w = 23
x = 24
y = 25
z = 26


## Booleans
Booleans are essential for `if`, `for`, and `while` loops. These variables take 1 of two values: True or False.

In [69]:
print((5 == 3) + 2)   # Is five equal 5 to the result of 3 + 2?
int(5 == 6)

2


0

In [70]:
j = "hel"
j + "lo" == "hello"

True

In [71]:
type(5==4)

bool

## Sets
Sets are much like `tuples`. A `set` is a collection which is unordered and unindexed. In Python, sets are written with curly brackets. Once sets are created, they may not be changed or accessed directly, but you may run through them in a `for` statement. A set is useful because Python has methods that allow you to perform set theory operations on this datatype.

In [72]:
thisset = {"apple", "banana", "cherry"}
print(thisset)

{'apple', 'banana', 'cherry'}


In [73]:
thisset = {"apple", "banana", "cherry"}
for x in thisset:
  print(x)

apple
banana
cherry


In [74]:
thisset = {"apple", "banana", "cherry"}
print("banana" in thisset)

True


In [77]:
thisset = {"apple", "banana", "cherry"}
thisset.add("orange")
#thisset.append("kiwi")
print(thisset)

{'orange', 'apple', 'banana', 'cherry'}


In [81]:
thisset2 = {"apple", "banana", "cherry"}
thisset2.update({"orange", "mango", "grapes"})
print(thisset2)

{'orange', 'apple', 'banana', 'grapes', 'cherry', 'mango'}


Let's try some of the set operations like intersection, difference, and union

In [82]:
thisset.intersection(thisset2)

{'apple', 'banana', 'cherry', 'orange'}

In [84]:
thisset.difference(thisset2)

set()

In [85]:
thisset.union(thisset2)

{'apple', 'banana', 'cherry', 'grapes', 'mango', 'orange'}