#  Python for Sociologists: Lecture One

Welcome! This is the first of four lectures in the module. For introductory information outlining the class, please see the README.md file in the [Github repository](https://github.com/crahal/Teaching/tree/master/PythonForSociologists). In this very introductory lecture we will cover primative data types (characters and numbers) and introduce various types of object. Without getting ahead of ourselves, we can note at this early stage that in Python, everything is an object, and that it is an object-oriented language (although not a 'pure' one).

## Section 1. Primative Data Types

Before we get to objects, which are the abstract building blocks of data, let's first introduce some primitive data types:

* Characters
* Numbers

### 1.1 Characters

A character is a single glyph that is included in a character set. Some characters are visible, such as the letter A (or 'a' - note case sensitivity) and some are invisible, such as the space in between two words. The latter is called a whitespace character. There are three important whitespace characters:

1. Space,
2. Tab (which is usually about four spaces long - more in Lecture Two when we talk about indenting),
3. Newline character (which tells the computer to move to the next line)

Lets look at some examples, and introduce the print command:

In [1]:
print('hello friends! how are you today?')
print('hello friends! \nhow are you today?')
print('hello friends! \thow are you today?')

hello friends! how are you today?
hello friends! 
how are you today?
hello friends! 	how are you today?


### 1.2 Numbers 

Numbers come in several _types_, but two are important to mention:
* Integers - whole numbers, such as 1 or 42,
* Floating point numbers - these allow for decimal points such as 2.17 or 3.33

But when in doubt, python will make an integer into a decimal. 

In [2]:
12 / 5

2.4

Floats can be distinguished from integers because they have a fractional part. We can force a number to be either an integer or a float using int() and float():

In [3]:
int(12/5)

2

In [4]:
type(3)

int

In [5]:
type(3.0)

float

## Section 2. Strings

A character isn't very useful on its own. Multiple characters together form a string. Strings are enclosed using quotations. You can use a variety of quotations in order to close the string. Two things to remember:

1. Always close the string with the same quotes used to open it,
2. Always escape a quotation character if you use it inside the string.

```python
'This is a string.'

"This is also a string."

'''This is yet another string!
s
sdfasdfa
asdf
asdf
asdf'''

''This is not a string, but why?''
```

In Python 3.* all strings are printed inside parentheses like the following:
```python
print("This is a string!")
```

How does this differ from Python 2.*? 

In [6]:
# Now let's assign a string to a variable, then print the variable
somevar = "This is a string. It has been assigned to the variable, somevar"
print(somevar)

This is a string. It has been assigned to the variable, somevar


A variable is a name that is given to an object whose contents can change.
There are variable naming conventions:
* ALLCAPS means a variable that we like to keep constant, like a secret key.
* __underscoreunderscore means a variable that is hidden and shouldn't be referenced directly.
* variables should only start with ASCII characters.
* use a consistent style, such as camelCaseNames or underscore_variable_names.
* alllowercasenounderscorenames are hard to read.

If we want to print and assign a float or int, we have to force its type to a string first, as shown below:

In [7]:
randomnumber=2834
print('When printing a number, we need to convert it into a string like this: '+str(randomnumber))

When printing a number, we need to convert it into a string like this: 2834


### 2.1 📝 Character Sets 📝

#### 2.1.1. ASCII

Strings are drawn from character sets. Loosely, 'the alphabet' is a characeter set, but not a very useful one, because it's so limited. The basic Western character set is ASCII. It has 128 code points. The first 38 are control characters, like 'new line', and the remainder are the upper and lower case alphabet, ten digits and punctuation characters. ASCII is not really sufficient for most languages or most of our data intensive purposes.

#### 2.1.2. UTF-8

Unicode is meant to be a very large character set. It can code for over a million code points. As such, unicode includes most characters from most languages around the world, as well as the emergent emoji character set. Python 2 allows you to use unicode if you are mascochistic. Python 3 makes it pretty straightforward.

#### 2.1.3. Emoji

Emoji is an emerging unicode standard for pictograms. You cannot rely on every computer displaying emoji, or the Apple skin tone emoji. If you want to reference the emoji in python, you'll need to know the unicode codepoints. You can print emoji, but not really do much else with it. This is because some emoji are actually two characters. I mention emoji only to highlight that you don't need to strip your text right down to ASCII to work with it anymore, and this opens up new research questions. It's also great for pictographic langauges such as Chinese, which is now far easier to deal with in Python 3.

In [8]:
print('\U0001f334') # This is the emoji code point. 
print(b'\U0001f334') # This is what happens when you print it as a 'bytestring'
print('🌴') # You can print emoji directly
print('🌴' in 'Yeah, great job! 🌴')

🌴
b'\\U0001f334'
🌴
True


### 2.2 String manipulation

Strings are indexed starting from 0 and work sequentially forward. So in the string: "python is the best", there is an 'p' as the 0th element and an 's' as the 8th
```python
variable = "python is the best"
print(variable[0],variable[8])
```
This is because a string is really a list of characters (as in a series of characters that one would string together). Can you print out the 'b' from this variable?

Standard string methods:

* upper: change to upper case
* lower: change to lower case
* title (capitalize): capitalize the first word (note the z)
* find: return index of first instance of input
* isalnum: is this string alphanumeric?
* isalpha: is this string just letters?
* replace: find all instances of something and change to something else
* strip: remove whitespace characters from a string (useful when reading in from a file)

A method is 'attached' to an object. The period [.] is used to link the object to the method. So if we have a string object
"This is an object"
And we attach the 'upper' method like so:
"This is an object".upper()
We can change it to upper case. Try it below using ```somevar``` from above:

In [9]:
print(somevar)
print(somevar.upper())
print(somevar.lower())
print(somevar.title())
print(somevar.find('i'))
print(somevar.isalnum())
print(somevar.isalpha())
print(somevar.replace(' is ',' is not ').replace('a string', 'a banana')) #we can 'chain' methods together
print(somevar.strip(' '))

This is a string. It has been assigned to the variable, somevar
THIS IS A STRING. IT HAS BEEN ASSIGNED TO THE VARIABLE, SOMEVAR
this is a string. it has been assigned to the variable, somevar
This Is A String. It Has Been Assigned To The Variable, Somevar
2
False
False
This is not a banana. It has been assigned to the variable, somevar
This is a string. It has been assigned to the variable, somevar


We can also get help on specific methods using a syntax such as ```help(somevar.title)```. We can also get a list of all methods associated associated with an object using ```dir(object)```. To determine the *type* of object, we an utilize ```type(object)```, and To get detailed help on any object or method ```help(object)```. Lets try a couple out:

In [10]:
type(somevar)

str

In [11]:
dir(somevar)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [12]:
type(5)

int

In [13]:
type(5.0)

float

### 2.3 Special Characters

What if we need to print a quotation character inside a string that uses quotes? Introducing the escape character! The escape character is the backslash: in order to print a quotation rather than use it to end the string, you would type:
```python
"Escaping a \" in a string"
```
However, sometimes you can sidestep this by using a different quotation type within the string itself:
```python
"This will 'work'."
'This will also "work".'
'''This will work for both " and ' types.'''
```
The triple quote is used for block quotes, where you can just keep writing across lines. Lets try it out, and escape the escape character also (with a newline thrown in for good measure):

In [14]:
print('This will also "work".')
print("If you haven't inserted \\ characters\nThis will be \"totally\" broken")

This will also "work".
If you haven't inserted \ characters
This will be "totally" broken


### 2.4 Combining strings

Imagine you have two words that you wish to add together, such as 'Data' and 'Science'. There are several ways to do this.

#### 2.4.1. Concatenation

In python the + symbol means concatenate when it appears between two strings. This is the simplest way to combine two strings. Here are some ways to concatenate. Try them out, but dont forget the whitespace if necessary:
```python
var1 = "Cheese"
var2 = "burger"
print(var1 + var2)
print("Cheese" + "burger")
var3 = var1 + var2
print(var3)
```
Note that the + symbol is used for both addition *and* concatenation. So be careful, if you mix strings and numbers python will throw an error (try ```print(1 + '2')```....)

To make a number into a string, you can use the string function (```str()```)
```
num = 123
strNum = str(num)
```
#### 2.4.2. Insertion

Sometimes you want to insert something in the middle of a statement but don't want to merely concatenate. Maybe you have a collection of things and want to insert them in a lot of places. The bonus is that you can also print digits really nicely this way.

```python
print("Pi to two decimal points is %1.2f. Isn't that convenient?" % 3.1456)
```

#### 2.4.3. Joining

Sometimes you want to join strings together with a specific seperator:

```python
newStr = ";".join(["I want to","join this together"])
```

More commonly, you want to join a list of words on whitespace to make a sentence: ```' '.join(list)```.


#### 2.4.3. Splitting:

If you can join strings together, you can also split them! This is crucial for data cleaning. The default way to split the data is using the whitespace character, but we can also split on specific substrings:

```python
oldStr = "Let's split this into chunks"
newList = oldStr.split('this')
```

## Section 3. Collections

Virtually every programming language has a notion of a collection. A collection is a means for referring to one or more things at the same time, and Python has many collection types (note: a string can be thought of as a joined up list of characters). In general, collections are iterable, which means that you can ask for each item in the collection one-by-one. But beyond that they vary quite dramatically. Here are the major collection *types* that you will come across in Python:

### 3.1. Lists

A list is a sequential (the order is relevant), zero-indexed (first item is indexed at 0) and mutable (you can add or delete elements) collection signified by ```[... , ...]```. Lets make and play around with a list:

In [15]:
mylistoffavouritefruits = ["Strawberries","Blackberries","Blueberries"]
print('My first and third favourite fruits are ' + mylistoffavouritefruits[0] + ' and ' + mylistoffavouritefruits[2])

mylistoffavouritefruits.append("Gooseberries")
print(mylistoffavouritefruits)

My first and third favourite fruits are Strawberries and Blueberries
['Strawberries', 'Blackberries', 'Blueberries', 'Gooseberries']


### 3.2. Tuples

A tuple is a sequential, zero-indexed and *immutable* collection signified by ```(... , ...)```: a list you can't change. It's denoted by parentheses rather than square brackets. They are used in lots of places where you don't want a list to change size or you want your object operations to be faster than with a list:

```python
mytupleoffavouritefruits = ("Strawberries","Blackberries","Blueberries")
```

#### 3.1.2. Querying and Slicing Lists/Tuples 

You can index a list just like a string, and just like strings, you can ask for a range of values (a 'slice') using a semi-colon (although if you run out of range, you will get an error):

```python
mylistoffavouritefruits[0:2]
```

Note here that the return is a list. If we want a specific string, we can index the new list:

```python
mylistoffavouritefruits[0:2][0]
```

You can also index from the end of the list/tuple/string to walk backwards. This is done with negative numbers:

```python
mylistoffavouritefruits[-1]
```

### 3.3 Dictionaries

A dictionary is an unordered, key-indexed and mutable collection signified by ```{... : ... , ... : ... }```. Like in English, where a dictionary defines a word, a dictionary in Python uses a key to fetch a value:

```python
FruitScoresDict = {"Apples":"7/10","Bananas":"8/10", "Strawberries":"10/10"}
```

In [16]:
FruitScoresDict = {"Apples":"7/10","Bananas":"8/10", "Strawberries":"10/10"}
FruitScoresDict['Blueberries']="9/10" #add a new key:value pair 'on the fly'

print(FruitScoresDict.keys())
print(FruitScoresDict.values())
print(FruitScoresDict.items())
print(FruitScoresDict['Apples'])

dict_keys(['Apples', 'Bananas', 'Strawberries', 'Blueberries'])
dict_values(['7/10', '8/10', '10/10', '9/10'])
dict_items([('Apples', '7/10'), ('Bananas', '8/10'), ('Strawberries', '10/10'), ('Blueberries', '9/10')])
7/10


## Optional Homework:

What is a set? Is it ordered, mutable, etc? What benefits does it provide over a list, tuple or dictionary? What about if there are duplicates in the set? Sets will also feature at the start of (non-optional) Homework Two...

## Non-Optional Homework!

Be sure to submit the Week One Homework Questions via email by next 5pm on Tuesday! 