# Introduction to Python

## Data types

Python has four basic data types that we will use:

+ float (which is for numerical or _floating point_ data)
+ int (which is for integer data)
+ str (which is for strings or character data)
+ boolean (i.e. True/False, which is used for testing)

Python can do basic calculations 

In [1]:
1+1

2

In [2]:
4 * 5

20

In [3]:
6/4

1.5

> This is a good new feature for Python 3. In Python 2 the above would give you 1, since it did something called integer division. That can still be done in Python 3, but it's not the default behavior

In Python, we can assign quantities to __variables__, that is, named objects. Internally this serves as a container for the quantity, which has a particular address in your computer. 

In [25]:
a = 123
b = 123.0
c = '123'

In Excel, these three quantities might look the same, but in Python (or any other computer language) you have to be careful

In [26]:
type(a)

int

In [27]:
type(b)

float

In [28]:
type(c)

str

In [9]:
a == b

True

In [10]:
a == c

False

In [11]:
d = 123.000000000001
a == d

False

In [12]:
b == d

False

Once again, a and d might look the same in a spreadsheet, but if the actual recorded numbers are different, Python will make a distinction

## Lists, Tuples and Dictionaries

There are three kinds of bracketed entities in Python:

1. Lists (`[]`)
2. Tuples (`()`)
3. Dictionaries (`{}`)

Lists are baskets that can contain different kinds of things. They are ordered, so that there is a first element, and a second element, and a last element, in order. 

Tuples are basically like lists, except that they are _immutable_, i.e., once they are created, individual values can't be changed. They are also ordered.

Dictionaries are __unordered__ key-value pairs, which are very fast for looking up things. They work almost like hash tables. Dictionaries will be very useful to us as we progress towards the PyData stack. Elements need to be referred to by _key_, not by position.

In [6]:
test_list = ['apple', 3, True, 'Harvey', 48205]

In [8]:
test_list

['apple', 3, True, 'Harvey', 48205]

In [9]:
len(test_list)

5

In [10]:
test_list[0]

'apple'

In [11]:
test_list[:3]

['apple', 3, True]

In [12]:
test_list[2:]

[True, 'Harvey', 48205]

The important thing here is if you provide an index `a:b`, then `a` is included but `b` __is not__

In [13]:
test_list[-1]

48205

In [18]:
test_list

['apple', 3, True, 'Harvey', 48205]

In [14]:
test_list[:-1]

['apple', 3, True, 'Harvey']

In [15]:
test_list[-3:]

[True, 'Harvey', 48205]

In [17]:
test_list[-3:-1]

[True, 'Harvey']

You can also make a list of lists, or nested lists

In [23]:
test_nested_list = [[1,'a',2,'b'],[3,'c',4,'d']]
test_nested_list

[[1, 'a', 2, 'b'], [3, 'c', 4, 'd']]

This will come in useful when we talk about arrays and data frames.

You can also check if something is in the list, i.e. is a member.

In [41]:
'Harvey' in test_list

True

### Tuples

Tuples are like lists, except that once you create them, you can't change them.

In [17]:
test_tuple = ('apple', 3, True, 'Harvey', 48205)

In [20]:
test_tuple[:3]

('apple', 3, True)

In [21]:
test_list[0] = 'pear'
test_list

['pear', 3, True, 'Harvey', 48205]

In [22]:
test_tuple[0] = 'pear'
test_tuple

TypeError: 'tuple' object does not support item assignment

## Dictionaries

In [3]:
test_dict = {1: 'value', 'a': 3245}

In [37]:
test_dict[1]

'value'

In [38]:
test_dict['a']

3245

In [39]:
type(test_dict)

dict

In [41]:
test_dict['a'] = 4524
test_dict

{1: 'value', 'a': 4524}

In a dictionary, the keys can be strings, numbers or tuples, but the values can be any Python object. You can see the keys and values using extractor functions

In [4]:
test_dict.keys()

dict_keys([1, 'a'])

In [5]:
test_dict.values()

dict_values(['value', 3245])

### Loops and list comprehensions

Python has loops to iterate through a group of things. Usually these are lists, but you can loop through tuples and dictionaries too.

In [7]:
for i in range(len(test_list)):
    print(test_list[i])

apple
3
True
Harvey
48205


In [9]:
for u in test_list:
    print(u)

apple
3
True
Harvey
48205


The general structure for a `for` loop is:

```python
for (element) in (list):
    do some stuff
    do more stuff
```

In [14]:
test_list2 = [1,2,3,4,5,6,7,8,9,10]
mysum = 0
for u in test_list2:
    mysum +=  u
print(mysum)


55


### List comprehensions

In [15]:
squares = [u**2 for u in test_list2]

In [16]:
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [20]:
[type(u) for u in test_tuple]

[str, int, bool, str, int]

In [31]:
even_numbers = [u for u in squares if u % 2 == 0]
even_numbers

[4, 16, 36, 64, 100]

### String operations

Strings operations are actually quite commonly used for data cleaning and data munging. Specially when doing text analytics, or even when we are dealing with categorical data.

There is a slightly different "arithmetic" when working with strings

In [32]:
'a' + 'b'

'ab'

In [33]:
5 * 'a'

'aaaaa'

In [34]:
fname = 'Larry.txt'
newname = fname.replace('Larry','Henry')
newname

'Henry.txt'

> Notice that we attach a function after an object: `fname.replace()`. 

Here's one we use quite often when reading or verifying files:

In [17]:
test_string = 'A quick brown fox leaps over the lazy rabbit'
test_string.split(' ')

['A', 'quick', 'brown', 'fox', 'leaps', 'over', 'the', 'lazy', 'rabbit']

In [35]:
test_string2 = 'Jack, Ryder, 301-357-2436, Ashburn, Virginia,56'
test_string2.split(',')

['Jack', ' Ryder', ' 301-357-2436', ' Ashburn', ' Virginia', '56']

There is a problem here with spaces being in each item. Let's fix this with a list comprehension

In [37]:
out = test_string2.split(',')
out = [str.strip(u) for u in out]
out

['Jack', 'Ryder', '301-357-2436', 'Ashburn', 'Virginia', '56']

Are any of the entries a number?

In [38]:
[str.isnumeric(u) for u in out]

[False, False, False, False, False, True]

More sophisticated things can be done using regular expressions

In [39]:
import re
[re.match('[0-9]{3}-[0-9]{3}-[0-9]{4}', u) for u in out]

[None,
 None,
 <_sre.SRE_Match object; span=(0, 12), match='301-357-2436'>,
 None,
 None,
 None]

Strings can be sliced just like lists. 

In [40]:
fname

'Larry.txt'

In [34]:
fname[2:5]

'rry'

In [35]:
len(fname)

9