# NCRM April 2024 Intro to Python for data analysis
### Session 2: The Basics

![image.png](attachment:image.png)

## Lewys Brace
#### l.brace@exeter.ac.uk

## Learning Python

Python is designed to maximise human readability. This means that the Python syntax is designed to resemble human language as much as possible and that Python's error messages are designed to be much less cryptic than that of other programming languages.

This means that you might look at the code we use in class and feel like you understand it. However, you will not understand it until you start working with the code yourself. For this reason, it is crucial that you resist the urge to copy and paste all of your code without working out for yourself what it actually does.

## Basic data types and expressions

Every value in Python has a single data type.

The key data types to know for the time being are:
- Integers (i.e. 5, 20, 68)
- Floats (i.e. 5.2, 20.4, 68.9)
- Strings (i.e. 'Best class ever', 'Dogs are better than cats')


## Strings
Strings are sequences of characters and are the data type used for text.

Strings are wrapped in a single or double quotes.

While single and double quote strings work in the same way, it is crucial to be consistent with yourself in your useage of the two quote types. 

It is also important to note that you can't use quotes of one type inside quotes of another type.

Thus, when working with text, it is best to use double quotes, as you can then include apostrophes within your strings.

In [None]:
'The apple is red'
"That's a dog."
'That's a dog
'My friend said "that is fine"'

## Operators
Python uses the usual arithemtic operators:
- Addition: +
- Subtraction: -
- Multiplication: *
- Division: /
- Exponential: **

Others include floor divisaion, exponents, and modulus/remainder.

## Expressions
An expression is two or more values joined by an operator.

Using Python as a calculator allows us to look at both of these concepts:

In [None]:
2 + 2

In [None]:
10 - 5

In [None]:
4 * 2

In [None]:
8/2

In [None]:
2**6

Some of these operators can be used on strings, although they mean different things.

Using the addition operator on two strings, for example, performs string concatenation.

Below, we use the **print()** function. We'll cover functions in more detail later, but for the time being, you just need to know that **print()** will print whatever is contained within the parentheses to the screen.

Because Jupyter is interactive, we don't actually need to use **print()** within Jupyter Notebooks, but you will need to use this function if you're using code in a different IDE or producing a script in a notepad editor to run in the terminal, so we'll use it here to get you in the habit of doing so.

In [None]:
print("Dogs are better than cats" + "but both are better than fish")

Note that Pthon will **not** automatically insert a whitespace between two strings when you concatenate them, so you will have to do this manually:

In [None]:
print("Dogs are better than cats" + " " + "but both are better than fish")

You can also insert other characters during string concatenation if you want:

In [None]:
print("Dogs are better than cats" + ", " + "but both are better than fish")

Python can determine which way to use + based on context. However, it will through you an error message if you try to use it on two incompatible types:

In [None]:
42 + 'I want a burger'

## Error messages in Python
Due to python being designed for maximum readability, Python error messages are much easier to read and interpret than those of other programming languages.

However, you do still have to make sure that you read them properly.

The 'Traceback' aspect to Python error messages will help you do this.

In the error message above, it shows use exactly which line is causing the issuing.

The bottom line of the errror message then provides some detail as to what the errror is. Here, we see it is a 'Type Error'.

It is **crucial** that you read and learn to understand error messages as these will not only aid you in fixing any issues, but will also help you develop your skills as a programmer.

## The increment operator

Python has a common idiom that is not necessary, but which is used frequently used and useful:

x += 1 is the same as x = x + 1

This also works for other operators:

	x += y 		 adds y to the value of x 
	x *= y 		 multiplies x by the value y 
	x -= y 		 subtracts y from x 
	x /= y 		 divides x by y

## Exercise 1
Open up the Jupyter Notebook with exercises for this session and work through the first exercise.


## Variables and assignment
We can store data in variables by assignming them to variable names using the = operator.

Python is dynamically typed, meaning that the type of the variable is derived from the value it is assigned.

Variables in python can contain alphanumerical characters and some special characters. This means we can call variables anything we wnt, provided:

1. We only use numbers, letters, and _
2. We do not start the variable name with a number
3. We do not use any of the special words that are reserved for python itself (i.e. class, and, continue). Such keywords re reserved and cannot be used as variable names due to them serving an in-built Python function. Your IDE will let you know if you try to use one of these.

By convention, it is common to have variable names that start with lower case letters and have class names beginning with a capital letter.

It is also good practice to use descriptive names for your variables to make your code more readable and easier for you (and others) to understand your code later on. For example, the first line below is a good variable name for someone's last name, the second line is a bad example.

In [None]:
last_name = "Brace"
n = "Brace"
print(n)

You can think of a variable as a labelled container that stores specific information. Below, for example, the container has a 'label' called "int_variable" and stores an integer value of 5:

In [None]:
int_variable = 5

Once you have created a variable, you can then use it in expressions:

In [None]:
print(int_variable + int_variable)

In [None]:
city = "London"
country = "England"
print(city + ", " + country)

We can also combine variables into a single expression:

In [None]:
sentence =  city + " is in " + country
print(sentence)

You can always check the type of a variable using the **type()** function:

In [None]:
print(type(city))

The **len()** function is also a really useful way to get how manay elements are in a sequence.

In the example below, we use it to get how many letters are in the sentence string variable (remember, this includes whitespace as whitespace is counted as a character):

In [None]:
print(len(sentence))

## Comments
When coding, you might want to leave comments to your future self/collaborators as to what lines of code do. If you just write your comment in your python script, as we do in the last line of the code sell below, Python will give you an error message.

Equally, you might want to temporarily not run a line(s) of code.

To do both these things, we can use the **#** character to "comment out" code. In Python, any line that begins with **#** will not be run. For example:

In [None]:
#Lines of code to show how the comment works in python
print("This line runs")
#print("This line does not run")
print("But this third line runs")
This line will give you an error message and is designed to show you why you need to use a # to comment out code

## Exercise 2
Do the second exercise in the Jupyter notebook containing the exercises for this session.

## Classes, objects and methods

Python is an object-orientated language. This means it performs computations by using 'objects'.

In the example below, we have a class (dog) and two specific instances of that class; Indiana Bones and Droolius Caesar. All instances of a class share features, paws, a love of tennis balls, etc, but because Indiana and Droolius are specific instances of a class, we draw a distinction between them by assigning them a specific name.


![image.png](attachment:image.png)

In python, almost everything is an object. The sentence object above is a specific instance of the string class.

Objects are capable of a variety of actions that they share with other instances of the same class; chasing a ball in the case of our dog class.

There are several ways to learn about a particular class and its methods.Going back to our sentence object as an example, we can usually check the online documentation (or just google the object), use jupyter's ? function (i.e. sentence?), or Python's **dir()** function (i.e. **dir(sentence)**).

As computational social scientists/social data scientists, we'll spend a lot of our time working with text data, so exploring the relationship between objects and methods by using the string class is useful for us.

The Python string class has a numerous methods for doing things such as switching case, removing white space, etc.

Any time we use a method, we provide the name of the object followed by a . and then the name of the method. For example:


In [None]:
city = "london"
print(city.upper()) #Convert whole string to uppercase
print(city.lower()) #Convert whole string to lowercase
print(city.title()) #Capitalise the first letter of the string as it is a name

If we print the city variable again, we'll see that these methods do not actually alter the string itself:

In [None]:
print(city)

This is because methods do not change the variable itself, but instead, create a new instance of it.

If we want to replace an instance with a new instance, we need to overwrite it:

In [None]:
print(city)
city = city.title()
print(city)

Python has a load of other methods for strings (as well as other data types!)

For example, we can use the replace function to replace a part of a string with another string:

In [None]:
print(city)
city = city.replace("Lon", "What")
print(city)

It is not possible to cover all of Python's in-built methods for all data types in this course.

Indeed, it is recommended that you don't spend large amounts of time trying to learn them all. Instead, as with many aspects of coding, you will learn the specific methods that you need to know to do the kind of tasks that you do in Python.

That said, given the significant role of text data in computational social science, there are some other string methods that are worth looking at here.


## Joining and splitting strings

When working with text data, you are likely to need to manipulate strings in certain ways. The .split() method.

By default, the .split() method seperates a string on white space and returns each of the seperated elements in a data container called a 'list' (more on those later in the course):

In [None]:
my_string = "Dogs are better than cats"
my_split_string = my_string.split()
print(my_string)
print(my_split_string)

We can also provide a more specific way of splitting a string by providing an argument into the .split() method.

An argument is an extra piece of information that is provided to a method that dictates how the method is executed.

Below, we tell the .split() method to split the string on the word "better":

In [None]:
my_split_string_2 = my_string.split("better")
print(my_string)
print(my_split_string_2)

We can also join a list of strings back into a single string using the .join() method.

Top use this method, we first provide the separator that we want the method to place between the string elements in our list, and then provide the list we want to convert back into a single string.

In [None]:
print(my_split_string)
joined_1 = " ".join(my_split_string)
joined_2 = "-".join(my_split_string)
print(joined_1)
print(joined_2)

## Removing white space

Text data, such as that you're likely to work with a lot while doing computational social science, often have write space since it's natural language we're working with.

We'll look at how to deal with white space between words in a string later on the course.

For the time being, it is worth noting that string text in your data might sometimes have qhite space before or after the first word. We can rmove these with the following methods:

In [None]:
my_string = "    This string has extra white spaces either side    "
removed_left = my_string.lstrip() #Removes white space to the left of the first word in the string
print(removed_left)
removed_right = my_string.rstrip() #Removes white space to the right of the first word in the string
print(removed_right)
removed_both = my_string.strip() #Removes white space either side of the words in the string
print(removed_both)

A string can contain no characters, but still be a valid string:

In [None]:
my_string = ""
print(my_string)

Or be as long as your computar's memeory allows. 

This means that sometimes, you'll want to create and manipulate long strings; i.e. an entire text corpus of an online forum. In such cases, you can preserve the layout of a large text by useing the newline characters: \n

By putting the \n in every place within the string where there should be a line break, you can maintain the structure of your text.

In [None]:
multi_line_string = "This is my multi-line string.\n It shows how easy it is to work with Python.\n We're learning a lot in this class"
print(multi_line_string)

However, this can get very messy, very quickly. 

Luckily, Python has a built-in syntax for representing multiline string: ''' or """

In [None]:
multi_line_string = """This is my multi-line string.
It shows how easy it is to work with Python.
We're learning a lot in this class"""

print(multi_line_string)

## Exercise 3
Do the third exercise in the Jupyter notebook containing the exercises for this session.

## Conditionals
Conditionals essentially say:
    
> "If this condition(s) is met, do this"

There are three main conditional statements in Python; **if**, **else**, **elif**. In Python, conditionals consist of conditional statement (i.e. **if**, **else**, etc), an expression that can be evaluated as either being true or fales, and a colon (**:**) followed by an indented piece of code on the lines below that state what action is to be taken if the expression criteria are met.

in the code below,"**if**" is our conditional statement,whether or not the int_variable is greater than 5 is our expression. If it is greater than 5, the the string "condition is true" is printed to the screen. If it is not true, noting is done.

In [None]:
int_variable = 5
if int_variable > 4:
    print("Condition is true")
    
int_variable = 3
if int_variable > 4:
    print("Condition is true")

There is also the **else** conditional. This is basically states that if the first statement is not true, then Python should do something else instead.

The code below has a variable that states whether or not I have to go to work tomorrow. This is a boolean variable; i.e. it is either true or false. The **if** then says "If I have work tomorrow, then print that I cannot have beer tonight". The **else** function then says "If the **if** statement is not true, print that I can have beer tonight":

In [None]:
work_night = True
if work_night == True:
    print("No beer")
else:
    print("You may have beer")

By switching the value of the boolean variable to "false", we get a happier result:

In [None]:
work_night = False
if work_night == True:
    print("No beer")
else:
    print("You may have beer")

So far, we have looked at conditions with only two options; if the condition is true, do one thing, if not, do another.

However, we can also use **elif** to implement an else-if statement inorder to develop code that executes on different conditions. 

it should be noted, however, that using **elif** is equivalent to nesting an **if** statement within and **else** statement; meaning it will only run if the previous condition was determined to be false.

In [None]:
variable_1 = "Test string"
variable_2 = 5
if type(variable_1) == int:
    print("variable 1 is an integer type")
elif variable_2 > 4:
    print("The elif part of this code was activated")
else:
    print("Variable_2 is not greater than 4")

## Comparison operators and control flow

In the examples above, we saw a series of operators being used to test conditions; i.e.:

> if variable_2 > 4

The **>** is an example of a comparison operator.

As we have seen above, we use these to tell Python to execute code depending on one or more conditions; we refer to this as control flow.

Control flow statements include a condition that can be evaluated as either true or false (these are also known as Boolean statements), which are then followed bu a clause in the form of an indented block of code that executes depending on whether the the condition is evaluated to be true or false.

In other words, if the condition is evaluated to be true, the indented code below will run. If the condition returns fale, the indented code will not run.

In Python, the comparison operators are:

> Greater than: >

> Greater than or equal to: >=

> Lesser than: <

> Lesser than or equal to: <=

> Equal to: ==

> Does not equal: !=


In [None]:
int_var = 5
float_var = 3.2

if int_var > float_var:
    print("Int_var is larger than float_var")

if int_var == 5:
    print("Int_var is equal to 5")

Do note the difference between the 'greater than' and 'greater than or equal' variables, and the lesser versions:

In [None]:
int_var = 5
int_var2 = 5

if int_var > int_var2:
    print("Int_var is larger than int_var2")

if int_var >= int_var2:
    print("Int_var is greater than or equal to int_var2")

## Combining comparisons with connectives

Python has a number of connectives, such as:

> and

> or

> not

These connectives can be combined with comparison operators in a condition in order to exert more control over the code that is executed under specific conditons.

In the example below, Python will print the phrase "All conditions met" to the screen if the my_name_string variable is an integer type and if "Lewys" is contained within the string. If one of these conditions are not met, then the indented code will not work.

In [None]:
my_name_string = "My name is Lewys"
if type(my_name_string) == str and "Lewys" in my_name_string:
    print("All conditions met 1")

if type(my_name_string) == int and "Lewys" in my_name_string:
    print("All conditions met 2")

if type(my_name_string) == str and "Dave" in my_name_string:
    print("All conditions met 3")

Remember, when working with strings, Python looks for an exact match. That includes making sure that case and whitespace match up:

In [None]:
my_name_string = "My name is Lewys"
if type(my_name_string) == str and "lewys" in my_name_string:
    print("All conditions met 4")

if type(my_name_string) == str and "Lewys " in my_name_string:
    print("All conditions met 5")

if type(my_name_string) == str and " Lewys" in my_name_string:
    print("All conditions met 6")

The **or** connector allows us to run a piece of code if one of a number of conditions is true.

The code below is the same as the examples above, but notices how the **or** command will also allow the second example below to run it's code because one of the two conditions are met:

In [None]:
my_name_string = "My name is Lewys"
if type(my_name_string) == str or "Lewys" in my_name_string:
    print("All conditions met 7")

if type(my_name_string) == str or "Dave " in my_name_string:
    print("All conditions met 8")

In theory, you can combine together as many conditionals as you want using connectives.

However, if you use too many, your code will not only be hard to read, but also have a very complicated logic that would be hard to follow.

The Pythonic way to do this is to wrap each comparison in **()** in order to make the code easier to read:

In [None]:
my_name_string = "My name is Lewys"
if (len(my_name_string) > 5 and len(my_name_string) < 40) and (type(my_name_string) == str):
    print("All conditions met 9")

# End of session 2