# Learning Python: The basics 

# Section 1. Primitive data types

Python is a computer language. Most computer langauges have some resemblance to human language but they are very fussy! Every punctuation must be correct, every space has to be in place, every word has to have the correct capitalisation. 

Language tends to have a notion of __nouns__ and __verbs__. Python has a similar notion of __objects__ and __functions/methods__. Functions are the ways that we do things in python. They are like the verbs. For example: 

~~~ python 
print("Hello world!") 
~~~

In this case:
- ```print``` is the function. 
- ```()``` contains what gets done. 
- ```"Hello world!"``` is data to which we do something, in this case, we __print__ the characters between the quotes. 

There are two basic types of data in python, __primitive__ data types and __object__ data types. Primitives are the basic buiding blocks of more complex data structures, much like how letters are the building blocks of words and digits the building blocks of numbers. For example, each letter is a primitive data type in python called a __character__. An ordered list of characters is called a __string__ object. 

Most objects you do not want to type out in full every time. For this reason we use labels as shortcuts. These labels are called __variables__. You assign a value to a variable and then you can use that variable to 'represent' the value. See how this works below:

In [None]:
new_string = 'Hello world!'

print( new_string )

print( type(new_string) )

## Important notes on characters 

### Note. Characters are encased in quotes, but there are different kinds of quotes.

Quote 1. the single tick. 

~~~ python
print( 'Hello single tick world!') 
~~~

Quote 2. The double tick. This is not two ticks, but the 'double-quote' character. Be careful with this character. Some programs like word like to replace the generic " with stylized quote characters that signal the beginning and end of a quote such as “these”. Python doesn't like these. 

~~~ python
print("Hello double tick world!")
~~~

Quote 3. The triple tick. This is indeed three ticks in a row. We have a special use for this. Python likes to evaluate everything line by line, but you can use multiple lines inside a triple tick. 

~~~ python 
print('''Hello triple 
tick world!''')
~~~

In [7]:
print('Hello single tick world!')

print() # this just prints an empty line to make output more readable.

print("Hello double tick world!")

print()

print('''Hello triple 
tick world!''')

Hello single tick world!

Hello double tick world!

Hello triple 
tick world!


### Note 2. Some types of characters can confuse python, we have to 'escape them'. 

So what happens if you want to use a quotation mark in your text.  If you just insert a quote then python thinks it's the end of the string, gets confused by the extra text and throws an error. Here is what an error looks like: 

In [10]:
print("Hello "Brave new World" ")

SyntaxError: invalid syntax (<ipython-input-10-4db1689b3d03>, line 1)

Decoding errors is a bit of an art that will develop over time. Here, we can see indeed, the syntax is invalid. But python is not great at explaining why. This is where experience comes in. Over time you will get better at deducing errors and cleaning them up. 

In this case, we want to __escape__ the quote character so that we can print it rather than evaluate it. We use the backslash character to escape. And yes, to escape the backslash we would use...another backslash. Observe the correctly formatted print statements below: 

In [14]:
print("Hello there \"coder\", I'm sure you'll do great! Just remember to use the \\\")

# Notice that I also had the single quote in there ("I'm").
# You can place single quotes unescaped in double quotes and vice versa. 

print('Hello there "coder", I\'m sure you\'ll do great! Just remember to use the \\')

SyntaxError: EOL while scanning string literal (<ipython-input-14-855431e12730>, line 1)

### Note 3. Not all characters are visible. 

Sometimes we want to add some sppaces to our code, maybe a tab or maybe a new line. Up until now we have just used ```print()``` to add an extra line. We can, however, do that right in the text. These sorts of characters are called 'whitespace' characters. There's a few to know about: 

- \n is the new line character
- \t prints a tab. This is nice when you're printing tables
- \r is the return carriage. You don't normally need this, but sometimes Windows adds \r\n to the end of a line rather than just \r. 

See newline and tab in action below: 

In [15]:
# A not-quite haiku
print("Hello characters")

print("\nDid you know that tab\tis sometimes used")

print("\t\t\tto move words around.")

Hello characters

Did you know that tab	is sometimes used
			to move words around.


Ta dah! Now we have made poetry. Bad poetry. But I hope it gets the point across. 

## The two basic types of numbers 

For numbers, there are two basic primitive data types (though there are more to discover later):
- __Integers__, which refer to whole numbers such as 1, 42 or 400. 
- __Floating point numbers__, which refer to real numbers that can be approximated by digits using a decimal point, such as 0.5, 12.345 and 0.333333333.

In [18]:
# An integer
x = 7

# A floating point number. Still a whole number, but the .0 makes it a float rather than an integer.
y = 4.0

print( type(x) )
print( type(y) )

z = x + y 

# See how z inherits the floating point number even though the value could be an integer? 
print(type(z), z)

<class 'int'>
<class 'float'>
<class 'float'> 11.0


## Basic number operations.

You will remember some basic number operations from arithmetic, such as addition and subtraction. Python implements these and a few others worth remembering. Let's have a look at several of these. You should pay attention to whether the result includes digits after the decimal point or not. We can do operations on both integers and floating points. When in doubt python uses the type of number that gives more precision. So 1 + 2.5 will not round up or down, it will return 2.5. 

Operations. 

- Addition: ```X + Y```
- Subtraction: ```X - Y```
- Multiplication: ```X * Y```
- Exponent (i.e., raising X to the power of Y): ```X ** Y``` 
- Floating point division: ```X / Y``` 
- Integer division: ```X // Y``` 
- Modulo (i.e., the 'remainder' from integer division): ```X % Y```

In [21]:
a = 10
print(a)
b = 3
print(b)
print( a // b, a % b )

10
3
3 1


In [22]:
x = 9
y = 4

print("x = ", x)
print("y = ", y)

print("x + y = ", x + y)
print("x - y = ", x - y)

print("x * y = ", x * y)
print("x ** y = ", x ** y)

print("x / y = ", x / y)
print("x // y = ", x // y)
print("x % y = ", x % y)

x =  9
y =  4
x + y =  13
x - y =  5
x * y =  36
x ** y =  6561
x / y =  2.25
x // y =  2
x % y =  1


### Note. declaring the right kind of number 
Notice how even though we had two integers, when we did the floating point division it returned numbers _after_ a decimal place? It's one of the ways python tries to resolve ambiguities in data. This is because python is a __weakly cast__ language. 

You are likely to encounter languages later on, such as java, that are __strongly cast__. In these languages you have to declar the type of a variable before you can use it. Python resolves a lot of that on the back end, but as a consequence it trusts the coder much more to pass around the right kind of variables. 

### Note. What about irrational numbers like $\pi$ and repeatable numbers like 1/3? 
This is why we say floating point numbers are approximations of real numbers. $\pi$ is a real number, but it is also infinitely non-repeating. The computer then cannot load the full number of $\pi$ (as it does not have infinite memory), but it loads in an approximation. 

When we calulate things in python, we are accepting a certain loss of precision. It does not do fractional math.  But it is still pretty clever. For the most part we won't be worrying to much about precision, but it can come up. 

See the example below:

In [24]:
x = 1/3
y = 1/3
z = 1/3

print(x)
print(y)
print(z)

# Shouldn't this add up to 0.99999999999?
print(x + y + z)

0.3333333333333333
0.3333333333333333
0.3333333333333333
1.0


In [26]:
# Python is very good at understanding the right level of precision. But it isn't perfect. 
# Fortunately, small errors rarely accumulate in this sort of work. See here:

# EXERCISE - Try deleting just the last 3 from each of these and see what happens
x = 0.3333333333333333
y = 0.333333333333333
z = 0.3333333333333333
print(x + y + z)

0.9999999999999996


# Section 2. From characters to strings

Strings, as we have now seen, are collections of characters. So we can now tell that:

```"This is a string"```

Formatting strings is one of the most common operations that is done in social data science. Whether it is text collected via communications, surveys, comments, or reviews, it will be a string and it will probably need formatting. This might involve changing everything to ALL CAPS, OR PERHAPS NOT BECAUSE YELLING IS NOT NICE. It might involve taking a collection of words and trimming the ends so that:

~~~ python 
"taking a bunch of words"
~~~

becomes

~~~ python
['tak','a','bunch','word']
~~~

which is the root of these words for text processing. 

It might involve detecting emojis in text or looking for regularly formatted strings such as mario@example.com without knowing the exact characters ahead of time. What about making sure you can process strings of Chinese characters, or even languages such as Arabic and Hebrew that read from left to right? 

It turns out that the humble string is actually really complex. 

## Important notes on strings
### Note 1. Strings have encodings. Use the wrong encoding and your characters will look funny. 

One of the earliest encodings in computing (in the West) was [ASCII](https://en.wikipedia.org/wiki/ASCII). This encoding had 128 characters including basic white space, punctuation, A-Z in upper and lower case, digits and a few system characters. This was later expanded to latin-1, which includes most characters derived in latin languages with accents and diacritics. But nowaways, many programs use Unicode. Where ASCII was stored in a single byte (this is a byte: 0000 0001, it's 8-bits though ASCII only needed 7 bits). By contrast, Unicode can have sepcific code points in a variety of bits from 8 bit to 32-bit. The most common is UTF-8, which can include every character set for every language and then some (including emoji). To learn more, see the [description from Wikipedia](https://en.wikipedia.org/wiki/UTF-8#Description). 

Python 3 is Unicode by default. Which is why we have no trouble doing something like: 
~~~ python 
print("Waves to class 👋!")
~~~ 

### Note 2. Strings are really a special kind of a list. 

We will learn more about lists next. But strings are just a special kind of list - one with only characters in the same encoding. This means we can do things with strings like we can with a list, like sort the strings, or ask for elements 3 through 10. But because strings are special, they have their own unique [methods](https://www.geeksforgeeks.org/difference-method-function-python/), like ```upper()``` which turns the lower-case letters to upper case.  



In [27]:
print("Waves to class 👋!")

Waves to class 👋!


## Common string methods
- ```upper()```
 - _change to upper case_
- ```lower()```
 - _change to lower case_
- ```title()``` or ```capitalize()```
 - _capitalize the first word (note the z my Commonwealth friends)
- ```find(input)```
 - _returns index of first instance of input_
- ```isalnum()```
 - _is this string alphanumeric?_
- ```isalpha()```
 - _is this string just letters?_
- ```join(second_string)```
 - _combine these things into a single string_
- ```replace(old_string,new_string)```
 - _find all instances of something and change to something else_
- ```strip()```
 - _remove whitespace characters from a string (useful when reading in from a file)_

## How to use a string method
Methods and functions are the 'verbs' of Python. A method is basically a function except you invoke a method on an object. For now, it's okay to  use the terms interchangably, but the difference will be clear (and important) when we start building our own functions later. 

Since a string 

So if we have a string object

~~~ python
"This is an object"
~~~

And we attach the 'upper' method like so: 

~~~ python
"This is an object".upper()
~~~

But more commonly we first assign a string to a variable then then invoke our method on the variable.

~~~ python
new_string = "This is a new string"
print ( new_string.upper() )
~~~

Try it below:

In [32]:
del test_string

print(test_string)

NameError: name 'test_string' is not defined

In [30]:
example_string = "Nope"

print("The original string:")
print()

print("\nTo upper case:")
print(example_string.upper())

print("\nTo lower case:")
print(example_string.lower())

print("\nIs the string just alpha numeric characters?")
print(example_string.isalnum())

print("\nWhat happens when we replace each space with the word 'like' between two spaces?")
print(example_string.replace(" ","_like_"))

The original string:
Nope

To upper case:
NOPE

To lower case:
nope

Is the string just alpha numeric characters?
True

What happens when we replace each space with the word 'like' between two spaces?
Nope


# Section 3. The ordered collection - The List

There are many kinds of collections. We will only be covering two of the most important ones today, but over the term you will be introduced to others such as The Set, The Series, The Tuple. But first it's the humble List. 

Lists are ordered collections of objects. They, like most British buildings, start at 0. The next element is 1 and so forth. 

Lists use square brackets as ends. So a list looks like: 

~~~ python
list_example = ["apples","bananas","cucumbers","durians"]
~~~

And to return an element, we can use the index, like so:
~~~ python 
print( list_example[0] ) 
> apples
~~~

We can also count backwards through the list using negative numbers, like so:

~~~ python 
print( list_example[-1] ) 
> durians 
~~~

Try below:

In [38]:
list_example = ["apples","bananas","cucumbers","durians"]

print( list_example[0] ) 

print( list_example[1] + ' ' + list_example[1] ) 

print( list_example[-2] ) 

apples
bananas	bananas
cucumbers


Lists have a __range__ that goes from 0 to n-1 where n is the number of elements. If you try to index an element that's out of the range, python will throw an error. To get the range, we can use a function called ```len``` which is short for __length__. 

Notice what we do below. We get the length, then use that as a variable. It will give us an error, but length minus one won't. 

In [39]:
print(list_example)

['apples', 'bananas', 'cucumbers', 'durians']


In [40]:
list_length = len(list_example)

# If the code gives an error, the comment this line immediately below and uncomment the lower line. 
print("The list has", list_length, "elements")

The list has 4 elements


In [42]:
# This is the line that I will not print
# print( list_example[list_length] )

print( list_example[list_length -1] )

durians


## Jupyter Tip: Comments

To quickly comment or uncomment a line you can press $ctrl + /$ (forward slash)

In [None]:
# comment / uncomment using ctrl + /  to make test_commenting worth 1 rather than 42.
# test_commenting = 42
# test_commenting = 1
print( test_commenting )

## List indexing and slicing

We do not simply index just one element of a list at a time. We can actually index entire chunks of elements at once using the ```:``` inside of the ```[]```. For example, with a four element list we can print elements 0 through two:

~~~ python 
list_example = ["apples","bananas","cucumbers","durians"]
print( list_example[0:2] ) 
~~~



In [44]:
list_example = ["apples","bananas","cucumbers","durians"]
list_2 = ["elderberry","fruity mcfruit"]

print(list_example + list_2)

list_3 = list_example + list_2

['apples', 'bananas', 'cucumbers', 'durians', 'elderberry', 'fruity mcfruit']


In [45]:
print( list_3[0:3] ) 
print( list_3[3:4] ) 
print( list_3 ) 

['apples', 'bananas', 'cucumbers']
['durians']
['apples', 'bananas', 'cucumbers', 'durians', 'elderberry', 'fruity mcfruit']


The first number is the list index for the element we start. The second number is the index that we get up to, but not including. So for example, ```list_example[0:2]``` and ```list_example[2:4]``` will not return overlapping lists. 

Yet, we can also slice lists __from the end__ rather than the front of the list by using negative numbers. -1 is the final element, and :-1 will give us everything up to the final element. 

In [47]:
list_example = ["apples","bananas","cucumbers","durians"]
#print( list_example[0:-1] ) 

print( list_3[1:-1] ) 

a_string = 'list_of_characters'
print( a_string[1:-1] ) 

['bananas', 'cucumbers', 'durians', 'elderberry']
ist_of_character


In [48]:
crazy_list = [ 1, 'a', 'htselurrthser', 1.0, [1,2] ]

print(crazy_list)

[1, 'a', 'htselurrthser', 1.0, [1, 2]]


## Adding data to a list (and adding two lists together)

Lists are __mutable__ meaning that they can be changed by the program. Other data structures we will encounter later are __immutable__ which means that we can only create or destroy them but not change them. To change a list with methdos, we can:

- Add things: 
    - One at a time: __Append__ an item to the end of the list with ```list.append(item)``` 
    - Adding any collection: __extend__ a list with the items from any __iterable__ collection with ```list1.extend(list2)```, more on iteration tomorrow. 
- Remove things: 
    - Remove everything with __clear__: Want to keep the name but empty the list? ```list1.clear()```
    - Removing one item by index: __pop__ a value from the list. By default it is the final element, but you can pass an index to pop an element anywhere in the list. ```list.pop()``` or ```list.pop(4)```. This is the same as __del__ before the list, like ```__del list1[-1]``` or ```del list[4]``` except that deleting doesn't return the things you deleted whereas pop does. 
    - Remove an item by its value: __remove__ the first instance of a value in the list with ```list.remove("item")```.
- Sort things
    - __sort__ the list in place with ```list.sort()```.
    
### Note. Some methods change things in place rather than return new things.
Most of the time when we call a method we expect it to return something. A method called upper() returns the string in upper case, for example. It might be confusing (it sure has been to me at times), but sometimes methods just alter the state of an object rather than returning a new object. The list methods mentioned above all change the list in place. So you would not say: 

~~~ py
list1 = list1.extend(list2) 
~~~

Because while ```extend()``` changes ```list1``` it does not _return_ ```list1```. You can test this below by printing what is returned from the list.   

In [52]:
# Take 1. Extending a list - it returns none. 
list1 = [1,2,3]
list2 = [4,5,6]

print( list1.extend(list2) )

3


In [61]:
#Take 2. Extending a list - print the old, freshly extended list. 
list1 = [1,2,3]
list2 = [4,5,6]

list1.extend( [list2] )
print(list1)

[1, 2, 3, [4, 5, 6]]


In [66]:
list1 = [1,2,3]
list4 = [[4,5,6]]

print( list1 + list4 )

[1, 2, 3, [4, 5, 6]]


Here are the other list methods in action. 

Notice that we create a new list before each method so that we can see exactly how the lists interact with the method. 

In [59]:
# Append
list1 = [1,2,3]
new_val = [1,2,3,4,5]

print("Original:\t", list1)

list1.append(new_val)

#print("Appended:\t",list1)
print(list1)

Original:	 [1, 2, 3]
[1, 2, 3, [1, 2, 3, 4, 5]]


In [57]:
#a_very_long_name_list1 = a_very_long_name_list1 + [ 71 ]

a_very_long_name_list1 = [ 1, 2, 3, 4 ]

a_very_long_name_list1 += [ 71 ]

print(a_very_long_name_list1)

[1, 2, 3, 4, 71]


In [None]:
# Extend
list1 = [1,2,3]
list2 = [4,5,6]

print("Original:\t", list1)

list1.extend(list2)

print("Extended:\t", list1)

In [67]:
# Clear
list1 = [1,2,3]

print("Original:\t",list1)

list1.clear()

print("Cleared:\t", list1)

Original:	 [1, 2, 3]
Cleared:	 []


In [68]:
# Pop
list1 = [1,2,3]

print("Original:\t\t", list1)

x = list1.pop(2)

print("Popped (index 2):\t", list1)
print("Popped value:\t\t", x)

Original:		 [1, 2, 3]
Popped (index 2):	 [1, 2]
Popped value:		 3


In [69]:
# Del
list1 = [0,54,31,5,77,-3]

print("Original:\t\t", list1)

del list1[-2:]

print("Deleted last two:\t", list1)

Original:		 [0, 54, 31, 5, 77, -3]
Deleted last two:	 [0, 54, 31, 5]


In [75]:
# Remove
list1 = [10,20,30,[20,20,21]]

print("Intact:\t\t", list1)

list1.remove(20)
list1.remove(20)

print("Removed '20':\t", list1)

Intact:		 [10, 20, 30, [20, 20, 21]]


ValueError: list.remove(x): x not in list

In [79]:
# Sort
list3 = [7,3,1,2,3,4]

print("Unsorted:\t", list3)

list4 = sorted(list3)

print("Sorted:\t\t", list4)

Unsorted:	 [7, 3, 1, 2, 3, 4]
Sorted:		 [1, 2, 3, 3, 4, 7]


# Section 4. The unordered collection - The Dictionary

A dictionary is a data structure with key-value pairs. Like a dictionary, where the word would be the key and the definition would be the value. 

__programming__ 

&nbsp;&nbsp;&nbsp;&nbsp;5 __b.__ _intr._ To write a computer program; to supply a computer or other device with a program.

In this case, programming would be the __key__ and "5. b. _intr._ To write..." would be the __value__. Dictionaries in python are written with curly braces, ```{}```. Inside, the key and the value are separated by a ```:``` and items are seprated by a comma like so: 

~~~ python 
example_dict = { key1:value1, key2:value2, key3:value3 }

example_newline_dict = { 
    key1:value1,
    key2:value2,
    key3:value3
}
~~~

To add a new key-value pair you assign it like so:

~~~ python 
new_dict = {} 

new_dict['lunch'] = "omelette"
new_dict['dinner'] = "stir-fry" 
~~~

Now you can check it by querying. You can query by: 
- keys:  ```new_dict.keys()```
- values:  ```new_dict.values()```
- items (key-value pairs):  ```new_dict.items()```

Below are some exercises with dictionaries. 


In [80]:
new_dict = {"salmon":"fish",
            "shitake":"mushroom"
           }

new_dict["apple"] = "fruit"
new_dict["potato"] = "vegetable"
new_dict["tomato"] = "who knows"

In [81]:
print(new_dict)

{'salmon': 'fish', 'shitake': 'mushroom', 'apple': 'fruit', 'potato': 'vegetable', 'tomato': 'who knows'}


In [87]:
L = list(new_dict.keys())

print( L + [ 1, 2, 3 ] )

['salmon', 'shitake', 'apple', 'potato', 'tomato', 1, 2, 3]


In [83]:
print(new_dict.values())

dict_values(['fish', 'mushroom', 'fruit', 'vegetable', 'who knows'])


## Important notes on dictionaries

### Note 1. Dictionaries are not ordered, even if it seems like it. 
Chances are the above example printed the items in the order that you entered them in the dictionary. But this is not always the case, especially for large dictionaries. This is because dictionaries use a technique called __hashing__, whereby the map the values directly on to values in memory. So when you pass it a key it knows exactly the area of memory to look for the value. This is as fast as it gets fo retrieval. It's different for say finding a value in a list. The list has to read through all the values to get to the one you are looking for. This ability to directly access a value regardless of it's location. It is the same principle as __Random Access Memory__.  The problem is, it can't preserve both the order and the property of random memory access. So we have lists which are ordered and  dictionaries which are fast. 

Some programs, like ```javascript``` have the notion of an __associative array__ which has both an order and can be accessed by random access. Python does not natively have such a feature, but as we will see later, we can use a __series__ to accomplish precisely this (ordered and random acccess).

### Note. Assigning a new value to a key replaces the old value. 

Python is, as we will discover again and again, very trusting of the coder. If you want to delete files or change variables that's all available. With dictionaries if you have a value already for a key and you assign a new value, the old one disappears, lost to the void, irretrievable. Let's banish some values: 

In [1]:
new_dict = {
    "Mario"   : ["itsamemario@email.com"],
    "Koopa"   : ["koopatroopa@email.com"]
}

print(new_dict)

# Changing an entry
new_dict["Bowser"] = [ "bowser@something.com", "evilbowser@email.com" ]

print()

print(new_dict['Bowser'])

{'Mario': ['itsamemario@email.com'], 'Koopa': ['koopatroopa@email.com']}

['bowser@something.com', 'evilbowser@email.com']


### Note. Dictionaries and lists can be nested inside each other. 

It turns out that a pretty substantial amount of data on the web is really just a series of nested lists and dictionaries. This is actually the basis of an extremely common file type called __'json'__ (javascript object notation). When Twitter, Facebook or Reddit sends data to the client it is typically formatted as json. So this means that if you understand how lists and dictionaries can be used together you are well on your way to discovering how to capture and wrangle data from the web. 

See the example for a dictionary nested in a list nested in a dictionary.

Here's the motivation. I have three coworkers and I have to order food for them every day. One order for lunch and one for dinner. We willl create a dictionary with two items, a lunch order and a dinner order. Each order will have a list of coworkers. Each coworker will have an order with custom toppings.


In [95]:
food_order = {
    'lunch':[ 
            { 'name':'Mario',
              'order':{
                        'Main':'rice and beans',
                        'Side':'salad',
                        'Drink':'orange juice'
                      }
            },
            { 'name':'Koopa',
              'order':{
                        'Main':'pancakes',
                        'Side':'honey',
                        'Drink':'coffee'
                      }
            }      
      ],
    'dinner':[
             { 'name':'Mario',
               'order':{
                        'Main':'burger',
                        'Side':'onion rings',
                        'Drink':'beer'
                      }
             },
             { 'name':'Bowser',
               'order':{
                        'Main':'pizza',
                        'Side': None,
                        'Drink':'cola'
                      }
             }      
            ]
} 

In [108]:
# print(food_order.keys())
x = food_order['lunch']

print(x[0]['order']['Main'])

rice and beans


In [None]:
print()
print(food_order['lunch'][1]['name'])
print()
print(food_order['lunch'][1]['order']['Main'])