# Introduction to Python 

This notebook provides a quick introduction to Python, as part of the University of Amsterdam course Computational Social Science Analysis.

It has been developed by Petter Törnberg. p.tornberg@uva.nl
Version 1.0. 2024-03-15 

### Notebook content overview 
1. Basics of Python Programming
- Data types and variables: Introducing numbers, strings, and boolean values.
- Basic operators: Arithmetic, assignment, comparison, and logical operators.
- Input and output: Using input() and print() functions for data exchange with the user.
2. Control Structures
- Conditional statements: if, elif, and else to make decisions in code.
- Looping constructs: for loops and while loops for iteration.
3. Data Structures
- Lists: Creation, indexing, slicing, and methods to manipulate lists.
- Tuples: Understanding immutable sequences.
- Dictionaries: Key-value pairs, accessing, and modifying data.
- Sets: Unordered collections of unique elements.
4. Functions
- Defining and calling functions: Parameters, return values.
- Scope and lifetime of variables.
- Lambda functions: Anonymous functions defined using the lambda keyword.
5. Pandas and dataframes
- Creating a dataframe
- Getting data from dataframe
- Adding data
6. Basic Plotting
- matplotlib 
- Plotting functions
7. Error and Exception Handling
- Basics of error types in Python.
- Using try, except, finally, and else blocks to handle exceptions.


## 1. Data types and Variables


In Python, a variable is like a box where you can store a piece of data. You can think of it as a name attached to a particular object. Python is dynamically typed, which means you don't have to declare the type of variable while assigning it. 

Let's explore some of the basic data types in Python:

##### Numbers
Python supports integers, floating-point numbers, and complex numbers. You do not need to declare the type of a variable. Python automatically detects it.

In [5]:
age = 30
age+age 

60

In [2]:
# Integer
age = 30
print(age)  # Output: 30
print("Hello world")

# Floating-point number
temperature = 98.6
print(temperature)  # Output: 98.6

# Complex number
complex_number = 3+5j
print(complex_number)  # Output: (3+5j)

30
Hello world
98.6
(3+5j)


In [3]:
print(age)

30


#### Strings
Strings in Python are used to store textual data. You can define them using single, double, or triple quotes.

In [9]:
# Single or double quotes for single-line strings
name = "John Doe"
greeting = 'Hello, Python!'
print(name)  # Output: John Doe
print(greeting)  # Output: Hello, Python!

# Triple quotes for multi-line strings
address = """123 Python Lane
Example City, EX 12345"""
print(address)

# Using the f"" syntax, you can easily combine text strings using curly brackets
print(f"Hello, {name+'banan'}!")



John Doe
Hello, Python!
123 Python Lane
Example City, EX 12345
Hello, John Doebanan!


#### Boolean Values
Boolean represents one of two values: True or False. It is useful when you are dealing with conditions.

In [10]:
is_student = True
print(is_student)  # Output: True

has_graduated = False
print(has_graduated)  # Output: False

True
False


### Basic Operators
Operators are special symbols in Python that carry out arithmetic or logical computation. The value that the operator operates on is called the operand.

#### Arithmetic Operators
Used to perform mathematical operations like addition, subtraction, multiplication, etc.

In [11]:
# Addition
print(5 + 3)  # Output: 8

# Subtraction
print(10 - 2)  # Output: 8

# Multiplication
print(4 * 2)  # Output: 8

# Division (always returns a float)
print(16 / 2)  # Output: 8.0

# Floor Division (discards the fractional part)
print(17 // 2)  # Output: 8

# Modulus (remainder of the division)
print(18 % 10)  # Output: 8

# Exponentiation (power of a number)
print(2 ** 3)  # Output: 8

8
8
8
8.0
8
8
8


##### Assignment Operators
Used to assign values to variables.

In [12]:
x = 10  # Simple assignment
x += 5  # Equivalent to x = x + 5
print(x)  # Output: 15

15


#### Comparison Operators
Used to compare values. It either returns True or False according to the condition.

In [13]:
a = 5
b = 3

print(a > b)  # True, because 5 is greater than 3
print(a < b)  # False
print(a == b)  # False, equals
print(a != b)  # True, not equals

True
False
False
True


#### Logical Operators
Used to combine conditional statements: and, or, not.

In [None]:
x = True
y = False

print(x and y)  # False, because both are not True
print(x or y)  # True, because at least one of the operands is True
print(not x)  # False, because x is True, and not True is False

### Exercises Part 1: Data Types and variables

#### Exercise 1.1: Variables and Data Types
Objective: Create variables to store personal information and display them using print statements.

- Create a variable named name and assign it your name as a string.
- Create a variable named age and assign it your age as an integer.
- Create a variable named height and assign it your height in meters as a float.
- Create a variable named is_student and assign it a boolean value representing whether you are a student (True or False).
- Use print() to display all these variables in a meaningful sentence. For example: "My name is John Doe, I am 20 years old, 1.75 meters tall, and it is True that I am a student."


In [14]:
# [Your solution here]
name = 'Petter Törnberg'
age = 36
height=177
is_student = False
print(f"My name is {name}, I am {age} years old, {height} cm tall, and it is {is_student} that I am a student.")


My name is Petter Törnberg, I am 36 years old, 177 cm tall, and it is False that I am a student.


#### Exercise 1.2: Basic Arithmetic Operations
Objective: Perform and display the results of basic arithmetic operations.

- Create two variables, a and b, assigning them any integer values.
- Calculate and print the sum of a and b.
- Calculate and print the difference when b is subtracted from a.
- Calculate and print the product of a and b.
- Calculate and print the division of a by b.
- Calculate and print the remainder of a divided by b.
- Calculate and print the result of a raised to the power of b.


In [None]:
# [Your solution here]

### Exercise 1.3: Using Basic Operators for a Simple Calculator
Objective: Create a simple calculator that can add, subtract, multiply, and divide two numbers entered by the user.

- Prompt the user to enter the first number and assign it to a variable num1. Ensure you convert the input to a float.
- Prompt the user to enter the second number and assign it to a variable num2. Ensure you convert the input to a float.
- Ask the user to choose an operation (+, -, *, /) and assign their choice to a variable named operation.
- Using if-elif-else statements, perform the operation based on the user's choice. For example, if the user chooses +, add num1 and num2.
- Print the result in a formatted message, such as "The result of 5.0 + 3.0 is 8.0".


In [26]:
for i in range(10):
    print(i)
    print("Done")
    if i == 5:
        print('Five!')
        print('Done')



0
Done
1
Done
2
Done
3
Done
4
Done
5
Done
Five!
Done
6
Done
7
Done
8
Done
9
Done


In [23]:
# [Your code here]
num1 = 5
num2 = 10
operation = '/'

if operation == '/':
    print(f'The result of {num1} / {num2} is {num1/num2}')
    
elif operation == '*':
    print(num1*num2) # etc string thing here as well :)
elif operation == '+':
    print(num1+num2)
elif operation == '-':
    print(num1-num2)

The result of 5 / 10 is 0.5


### Exercise 1.4: Working with Comparison and Logical Operators
Objective: Determine if a year entered by the user is a leap year.

- Take a variable that contains a year. Convert the input to an integer.
- A year is a leap year if it is divisible by 4 but not by 100, unless it is also divisible by 400. Use this information to check if the year is a leap year using comparison and logical operators.
- Print a message indicating whether the year is a leap year or not. For example, "The year 2000 is a leap year: True".


In [None]:
# [Your code here]

## 2. Control Structures
Control structures are fundamental to programming, allowing you to control the flow of execution based on conditions and perform repetitive tasks efficiently. Python provides several control structures, with if, elif, and else for conditional execution, and for and while loops for iteration.

##### Conditional Statements: if, elif, and else
Conditional statements let you execute different blocks of code based on certain conditions. These are the building blocks of decision-making in programming.

Syntax:

In [16]:
condition=False
another_condition = False 

if condition:
    print('code to execute if condition is True')
elif another_condition:
    print('code to execute if another_condition is True')
else:
    print('code to execute if all above conditions are False')

code to execute if all above conditions are False


Example

In [17]:
age = 18
if age >= 18:
    print("You are eligible to vote.")
elif age < 0:
    print("Age cannot be negative.")
else:
    print("You are not eligible to vote yet.")

You are eligible to vote.


#### Looping Constructs: for Loops and while Loops
Loops are used to execute a block of code repeatedly, as long as a condition is met. Python provides for loops for iterating over a sequence (like a list, tuple, dictionary, set, or string) and while loops to execute a set of statements as long as a condition is true.

for Loop Syntax:

In [None]:
for element in sequence:
    # code to execute for each element in sequence

for Loop Example:


In [18]:
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(f"I like {fruit}")

I like apple
I like banana
I like cherry


while Loop Syntax:


In [None]:
while True:
    # code to execute as long as condition is True

while Loop Example:


In [19]:
count = 0
while count < 5:
    print(f"Count is {count}")
    count += 1  # This is equivalent to count = count + 1

Count is 0
Count is 1
Count is 2
Count is 3
Count is 4


In [21]:
list(range(5,10))

[5, 6, 7, 8, 9]

#### Combining Loops and Conditional Statements
You can combine loops and conditional statements to perform more complex tasks.

Example:


In [22]:
# Print even numbers between 1 and 10
for number in range(1, 11):  # range(1, 11) generates numbers from 1 to 10
    if number % 2 == 0:
        print(f"{number} is even")
    else:
        print(f"{number} is odd")

1 is odd
2 is even
3 is odd
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is odd
10 is even


### Exercises Part 2: Control structures

#### Exercise 2.1: Grade Classifier
Objective: Write a program that takes a student's score and classifies the grade based on the score.

- Take a variable with a score between 0 and 100.
- Use conditional statements to classify the score into grades:
- A: 90-100
- B: 80-89
- C: 70-79
- D: 60-69
- F: below 60
- Print the grade.



In [None]:
# [Your code here]
score = 55
if score >= 90:
    print('A')
elif score >= 80:
    print('B')
elif score >= 70:
    print('C')
elif score > 60:
    print('D')
else:
    print('F :(')


#### Exercise 2.2: Sum of Natural Numbers
Objective: Use a loop to calculate the sum of all natural numbers up to a number n entered by the user.

- Take a variable containing a positive integer n.
- Use a for loop or a while loop to calculate the sum of all natural numbers up to and including n.
- Print the sum.



In [None]:
# [Your code here]

#### Exercise 2.3: Multiplication Table Printer
Objective: Write a program that prints the multiplication table of a number entered by the user.

- Take a variable containing a positive number n
- Use a for loop to iterate through numbers 1 to 10 and print the multiplication table for the entered number.

Example: 
- Number = 7
- 7 x 1 = 7
- 7 x 2 = 14
- ...
- 7 x 10 = 70


In [None]:
# [Your code here]


#### Exercise 2.4: Factorial Calculator
Objective: Write a program that calculates the factorial of a number entered by the user.

- Take a variable with a non-negative integer.
- Use a loop to calculate the factorial of the number (the factorial of any other number n is n * (n-1) * (n-2) * ... * 1. The factorial of 0 is 1. )
- Print the factorial of the number.




In [None]:
# [Your code here]

## 3. Data Structures
- Lists: Creation, indexing, slicing, and methods to manipulate lists.
- Tuples: Understanding immutable sequences.
- Dictionaries: Key-value pairs, accessing, and modifying data.
- Sets: Unordered collections of unique elements.

Data structures are fundamental concepts in programming, enabling efficient storage, access, and manipulation of data. Python provides several built-in data structures, such as lists, tuples, dictionaries, and sets, each with its own characteristics and use cases.

#### Lists
A list in Python is an ordered collection of items that can be of different types. Lists are mutable, meaning their elements can be changed.


In [27]:
my_list = [1, 2, 3, "Python", True]


#### Indexing
Access individual elements in a list using their index (starting from 0).



In [28]:
print(my_list[0])  # Output: 1
print(my_list[3])  # Output: Python



1
Python


#### Slicing
Slice a list to get a new list containing a subset of the original list's elements.


In [29]:
print(my_list[1:4])  # Output: [2, 3, "Python"]


[2, 3, 'Python']


#### List Methods
You can add and remove items from a list.


In [30]:
my_list.append("New item")  # Add an item to the end
my_list.remove(True)        # Remove the first occurrence of an item
print(my_list)              # Output: [1, 2, 3, 'Python', 'New item']



[2, 3, 'Python', True, 'New item']


#### Tuples
Tuples are similar to lists but are immutable, meaning once a tuple is created, its elements cannot be changed.


In [None]:
my_tuple = (1, "Hello", 3.14)



#### Accessing Elements
Use indexing to access tuple elements.



In [None]:
print(my_tuple[1])  # Output: Hello


#### Dictionaries
Dictionaries store key-value pairs and are mutable. If you ask for the key, it will give you the value. Keys must be unique and immutable. 



In [31]:
my_dict = {"name": "Alice", "age": 25, "is_student": False}



#### Accessing and Modifying Data in a dict


In [32]:
print(my_dict["name"])          # Output: Alice
my_dict["age"] = 26             # Update value
my_dict["major"] = "Computer Science"  # Add new key-value pair
print(my_dict)                  # Output: {'name': 'Alice', 'age': 26, 'is_student': False, 'major': 'Computer Science'}


Alice
{'name': 'Alice', 'age': 26, 'is_student': False, 'major': 'Computer Science'}


#### Removing Items from dict


In [33]:
del my_dict["is_student"]
print(my_dict)  # Output: {'name': 'Alice', 'age': 26, 'major': 'Computer Science'}


{'name': 'Alice', 'age': 26, 'major': 'Computer Science'}


#### Sets
Sets are unordered collections of unique elements. Sets allow you to do the mathematical operations that you learned in set theory:


In [34]:
my_set = {1, 2, 3, 4, 5, 5, 5}

In [36]:
my_set

{1, 2, 3, 4, 5}

##### Adding and Removing Elements


In [37]:
my_set.add(6)    # Add an element
my_set.remove(2) # Remove an element
print(my_set)    # Output: {1, 3, 4, 5, 6}


{1, 3, 4, 5, 6}


##### Set Operations
Sets support mathematical operations like union, intersection, and difference.



In [38]:
another_set = {4, 5, 6, 7, 8}
print(my_set.union(another_set))          # Output: {1, 3, 4, 5, 6, 7, 8}
print(my_set.intersection(another_set))   # Output: {4, 5, 6}
print(my_set.difference(another_set))     # Output: {1, 3}


{1, 3, 4, 5, 6, 7, 8}
{4, 5, 6}
{1, 3}


### Exercises Part 3: Data structures

#### Exercise 3.1: List Manipulation
Objective: Practice creating, accessing, and manipulating lists.

- Create a list named fruits containing "apple", "banana", "cherry", and "orange".
- Print the second item in the list.
- Change the value of the third item to "kiwi", and add "mango" to the end of the list.
- Remove "apple" from the list and print the final list.


In [39]:
# [Your solution here]
l = ["apple", "banana", "cherry", "orange"]
print(l[1])
l[2] = 'kiwi'
l.append('mango')
l.remove('apple')
l

banana


['banana', 'kiwi', 'orange', 'mango']

#### Exercise 3.2: Tuple Operations
Objective: Understand and work with tuples.

- Create a tuple named coordinates with values (4, 5, 6).
- Try changing the first value of coordinates to 10. What happens? Write a comment in your code explaining the outcome.
- Create a new tuple named updated_coordinates by adding a new value, 7, to the existing coordinates tuple. Print updated_coordinates.



In [47]:
# [Your solution here]
coordinates = (4,5,6)
# coordinates[0] = 10  #doesn't work
# updated_coordinates = coordinates.append(7) #doesn't work
updated_coordinates = coordinates + (7,) #creates a new tuple, and appends
print(updated_coordinates)


(4, 5, 6, 7)


#### Exercise 3.3: Dictionary Data Handling
Objective: Practice accessing and modifying dictionary data.

- Create a dictionary named student with keys "name", "age", "grade", and "subjects", where "subjects" is a list containing "Math", "Science", and "English".
- Print the age of the student.
- Add a new key-value pair "hobbies" with a list value containing at least two hobbies.
- Change the "grade" to the next higher grade (e.g., from 10 to 11) and add "Art" to the list of subjects.
- Remove the "age" key from the dictionary and print the final student dictionary.


In [None]:
# [Your solution here]

#### Exercise 3.4: Set Operations
Objective: Explore set operations for unique collections.

- Create two sets, set1 containing numbers 1 through 5 and set2 containing numbers 4 through 8.
- Print the union of set1 and set2.
- Print the intersection of set1 and set2.
- Print the difference between set1 and set2.
- Add the number 10 to set1 and remove the number 5. Print the updated set1.



In [55]:
# [Your solution here]
set1 = set(range(1,6))
set2 = set(range(4,9))
print(set1.union(set2))
print(set1.intersection(set2))
print(set1-set2)
set1.add(10)
set1.remove(5)
set1

{1, 2, 3, 4, 5, 6, 7, 8}
{4, 5}
{1, 2, 3}


{1, 2, 3, 4, 10}

#### Exercise 3.5: Comprehensive Data Structure Challenge
Objective: Use a combination of data structures to manage a more complex data set.

- Create a dictionary named library to represent a collection of books. Each key in the dictionary should be a book title, and its value should be another dictionary with keys "author", "year published", and "genre".
- Add at least three books to the library with appropriate information.
- Print the names of all books in the library of a specific genre (e.g., "Science Fiction").
- For one of the books, add a key "ratings" with a list of numerical ratings it has received. Then, calculate and print the average rating of that book.

In [None]:
# [Your solution here]

# 4. Functions
Functions are blocks of reusable code designed to perform a specific task. They improve code readability, make it easier to debug, and reduce the amount of code duplication. This section covers the basics of defining and calling functions, understanding the scope and lifetime of variables, and using lambda functions.

- Defining and calling functions: Parameters, return values.
- Scope and lifetime of variables.
- Lambda functions: Anonymous functions defined using the lambda keyword.

#### Defining and Calling Functions
##### Defining a Function
Use the def keyword followed by the function name and parentheses () to define a function. Parameters inside the parentheses are inputs to your function. A colon : follows the parentheses, and the indented block of code below the definition is the body of the function.



In [56]:
def greet(name):
    print(f"Hello, {name}!")


#### Calling a Function
Invoke the function by using its name followed by parentheses. If the function expects parameters, provide the values within the parentheses.



In [57]:
greet("Alice")  # Output: Hello, Alice!

Hello, Alice!


#### Parameters and Return Values
Functions can return values using the return statement. The function ends as soon as a return statement is executed.



In [58]:
def add_numbers(a, b):
    return a + b

result = add_numbers(5, 3)
print(result)  # Output: 8


8


#### Scope and Lifetime of Variables
- Local Scope: Variables declared within a function are local to that function and cannot be accessed outside of it. Their lifetime is as long as the function execution.
- Global Scope: Variables declared outside all functions have a global scope and can be accessed from any part of the code, including inside functions.



In [60]:
x = "global"

def func(x):
    y = "local"
    
    print(x)  # Can access global variable x
    print(y)  # Can access local variable y

func('horse')
# print(y)  # This will raise an error because y is not accessible here


horse
local


#### Lambda Functions
Lambda functions are anonymous functions defined using the lambda keyword. They are typically used for simple operations, especially when you need a function for a short period.



In [61]:
multiply = lambda a, b: a * b
print(multiply(5, 6))  # Output: 30


30


#### Lambda functions are often used with functions like filter(), map(), and reduce() for operations on lists or tuples.

Example with map()



In [62]:
numbers = [1, 2, 3, 4, 5]
squared = map(lambda x: x ** 2, numbers)
print(list(squared))  # Output: [1, 4, 9, 16, 25]


[1, 4, 9, 16, 25]


### Exercises Part 4: Functions

#### Exercise 4.1. Create a Function: 
Write a function named is_even that takes a number as a parameter and returns True if the number is even, False otherwise.



In [64]:
# [Your solution here]
def is_even(number):
    return (number % 2 == 0)
is_even(4)

True

#### Exercise 4.2. Function with Multiple Parameters
- Write a function named divide that takes two parameters and returns their division result. 
- If the second number is zero, return a string saying "Cannot divide by zero".



In [65]:
# [Your solution here]
def divide(a,b):
    if b == 0:
        return "Cannot divide by zero"
    return a/b
divide(5,2)


2.5

#### Exercise 4.3. Using Global and Local Variables: 
- Define a global variable, counter = 0. 
- Write a function that increments this counter by 1 every time it is called.



UnboundLocalError: cannot access local variable 'counter' where it is not associated with a value

#### Exercise 4.4. Experiment with Lambda: 
Use a lambda function with the filter() method to extract all odd numbers from a list.


In [None]:
# [Your solution here]

## 5. Pandas 
The most powerful and advanced package for data handling and analysis is called `pandas`, and is commonly imported as `pd`. 
Most of your work in this course will involve using `pandas dataframes` to load, store and process information.

[*This part of the course was adapted from the course by Mark Bakker*. See: https://mbakker7.github.io/exploratory_computing_with_python/ ]

### Loading data with Pandas
Data is often stored in CSV files (Comma Separated Values, although the values can be separated by other things than commas).


In [None]:

!pip install pandas


In [69]:
import pandas as pd

We will use only a few functions of the `pandas` package here. Full information on `pandas` can be found on the [pandas website](http://pandas.pydata.org/). 
Consider the following dataset, which is stored in the file `transport.csv`. It shows the percentage of transportation kilometers by car, bus or rail for four countries. The dataset has four columns. 

`country, car, bus, rail`  
`some more explanations, yada yada yada`  
`France, 86.1, 5.3, 8.6`  
`Germany, 85.2, 7.1, 7.7`  
`Netherlands, 86.4, 4.6, 9`  
`United Kingdom, 88.2, 6.5, 5.3` 

This data file can be loaded with the `read_csv` function of the `pandas` package. The `read_csv` function has many options. We will use three of them here. The rows that need to be skipped are defined with the `skiprows` keyword (in this case row 1 with the `yada yada` text). The `skipinitialspace` keyword is set to `True` so that the column name ' car' is loaded without the initial space that is in the data file. And the `index_col` keyword is set to indicate that the names in column 0 can be used as an index to select a row.

In [71]:
tran = pd.read_csv('transport.csv', skiprows=[1], 
                   skipinitialspace=True, index_col=0)

`pandas` loads data into a `DataFrame`. A `DataFrame` is like an array, but has many additional features for data analysis. For starters, once you have loaded the data, you can print it to the screen

In [None]:
tran.loc[(tran['car']=='')]

In [74]:
tran

Unnamed: 0_level_0,car,bus,rail
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
France,86.1,5.3,8.6
Germany,85.2,7.1,7.7
Netherlands,86.4,4.6,9.0
United Kingdom,88.2,6.5,5.3


When the `DataFrame` is large, you can still print it to the screen (`pandas` is smart enough not to show the entire DataFrame when it is very large), or you can simply print the first 5 lines of the `DataFrame` with the `.head()` function. 

A better option is the `display` function to display a nicely formatted `DataFrame` to the screen. 

In [None]:
display(tran)

### Basic `DataFrame` manipulation
The rows and columns of a `DataFrame` may have names, as for the `tran` `DataFrame` shown above. To find out which names are used for the columns, use the `keys` function, which is accessible with the dot syntax. You can loop through the names of the columns.

In [75]:
print('Names of columns:')
print(tran.keys())
for key in tran.keys():
    print(key)

Names of columns:
Index(['car', 'bus', 'rail'], dtype='object')
car
bus
rail


Each `DataFrame` may be indexed just like an array, by specifying the row and column number using the `.iloc` syntax (which stands for *index location*), where column 0 is the column labeled `car` (the column labeled as `country` was stored as an index when reading the csv file).

In [76]:
print(tran.iloc[0, 1])  # gives the bus data for France
print(tran.iloc[1, 0])  # gives the car data for Germany
print(tran.iloc[2, 2])  # gives the rail data for Netherlands
print(tran.iloc[3])     # all data for United Kindom
print(tran.iloc[:, 1])  # all data for bus

5.3
85.2
9.0
car     88.2
bus      6.5
rail     5.3
Name: United Kingdom, dtype: float64
country
France            5.3
Germany           7.1
Netherlands       4.6
United Kingdom    6.5
Name: bus, dtype: float64


Alternatively, and often more explicit, values in a `DataFrame` may be selected by specifying the indices by name, using the `.loc` syntax. This is a bit more typing but it is *much* more clearly what you are doing. The equivalent of the code cell above, but using indices by name is

In [77]:
print(tran.loc['France', 'bus'])
print(tran.loc['Germany', 'car'])
print(tran.loc['Netherlands', 'rail'])
print(tran.loc['United Kingdom'])
print(tran.loc[:, 'bus'])

5.3
85.2
9.0
car     88.2
bus      6.5
rail     5.3
Name: United Kingdom, dtype: float64
country
France            5.3
Germany           7.1
Netherlands       4.6
United Kingdom    6.5
Name: bus, dtype: float64


There are two alternative ways to access all the data in a column. First, you can simply specify the column name as an index, without having to use the `.loc` syntax. Second, the dot syntax may be used by typing `.column_name`, where `column_name` is the name of the column. Hence, the following three are equivalent

In [None]:
print(tran.loc[:, 'car'])  # all rows of 'car' column
print(tran['car'])        # 'car' column 
print(tran.car)

If you want to access the data in a row, only the `.loc` notation works

In [78]:
tran.loc['France']

car     86.1
bus      5.3
rail     8.6
Name: France, dtype: float64

### `numpy` functions for DataFrames
`DataFrame` objects can often be treated as arrays, especially when they contain data. Most `numpy` functions work on `DataFrame` objects, but they can also be accessed with the *dot* syntax, like `dataframe_name.function()`. Simply type 

`tran.` 

in a code cell and then hit the [tab] key to see all the functions that are available (there are many). In the code cell below, we compute the maximum value of transportation by car, the country corresponding to the maximum value of transportation by car (in `pandas` this is `idxmax` rather than the `argmax` used in `numpy`), and the mean value of all transportation by car. 

In [None]:
print('maximum car travel percentage:', tran.car.max())
print('country with maximum car travel percentage:', tran.car.idxmax())
print('mean car travel percentage:', tran.car.mean())

You can also find all values larger than a specified value, just like for arrays.

In [None]:
print('all rail travel above 8 percent:')
print(tran.rail[tran.rail > 8])

The code above identified France and Netherlands as the countries with more than 8% transport by rail, but the code returned a series with the country names and the value in the rail column. If you only want the names of the countries, you need to ask for the values of the index column

In [None]:
print(tran.index[tran.rail > 8].values)

### Adding a column to a `DataFrame`
A column may be added to a `DataFrame` by simply specifying the name and values of the new column using the syntax `DataFrame['newcolumn']=something`. For example, let's add a column named `public_transport`, which is the sum of the `bus` and `rail` columns, and then find the country with the largest percentage of public transport

In [None]:
tran['public_transport'] = tran.bus + tran.rail
print('Country with largest percentage public transport:', tran.public_transport.idxmax())

### Plotting DataFrames
You can plot the column or row of a DataFrame with `matplotlib` functions, as we have done in previous Notebooks, but `pandas` has also implemented its own, much more convenient, plotting functions (still based on `matplotlib` in the background, of course). The plotting capabilities of `pandas` use the *dot* syntax, like `dataframe.plot()`. All columns can be plotted simultaneously (note that the names appear on the axes and the legend is added automatically!).

In [None]:
tran.plot();  # plot all columns

You can also plot one column at a time. The style of the plot may be specified with the `kind` keyword (the default is `'line'`). Check out `tran.plot?` for more options. 

In [None]:
tran['bus'].plot(kind='bar')

### Sorting DataFrames
DataFrames may be sorted with the `.sort_values` function. The keyword `inplace=True` replaces the values in the DataFrame with the new sorted values (when `inplace=False` a new DataFrame is returned, which you can store in a separate variable so that you have two datasets, one sorted and one unsorted). The `sort_values` function has several keyword arguments, including `by` which is either the name of one column to sort by or a list of columns so that data is sorted by the first column in the list and when values are equal they are sorted by the next column in the list. Another keyword is `ascending`, which you can use to specify whether to sort in ascending order (`ascending=True`, which is the default), or descending order (`ascending=False`)

In [None]:
print('Data sorted by car use:')
display(tran.sort_values(by='car'))
print('Data sorted by bus use:')
display(tran.sort_values(by='bus'))

### Renaming columns
Sometimes (quite often, really), the names of columns in a dataset are not very convenient (long, including spaces, etc.). For the example of the transportation data, the columns have convenient names, but let's change them for demonstration purposes. You can rename columns inplace, and you can change as many columns as you want. The old and new names are specified with a Python dictionary. A dictionary is a very useful data type. It is specified between braces `{}`, and links a word in the dictionary to a value. The value can be anything. You can then use the word in the dictionary as the index, just like you would look up a word in an paper dictionary.

In [None]:
firstdictionary = {'goals': 20, 'city': 'Delft'}
print(firstdictionary['goals'])
print(firstdictionary['city'])

Much more on Python dictionaries can be found, for example, [here](https://www.w3schools.com/python/python_dictionaries.asp). Let's continue with renaming two of the columns of the `tran` `DataFrame`: 

In [None]:
tran.rename(columns={'bus': 'BUS', 
                     'rail': 'train'}, inplace=True)
display(tran)

The index column, with the countries, is now called `'country'`, but we can rename that too, for example to `'somewhere in Europe'`, with the following syntax

In [None]:
tran.index.names = ['somewhere in Europe']
display(tran)

### Setting values based on a condition
Values of a column may be changed based on a condition. For example, all values of the concentration above 0.2 may be set to 0.2 with the following syntax

In [None]:
data.loc[data.conc>0.2, 'conc'] = 0.2
display(data)

### Exercises Part 5: Pandas 

#### Exercise 5.1. Average annual rainfall by country
The file `annual_precip.csv` contains the average yearly rainfall and total land area for all the countries in the world (well, there are some missing values);  the data is available on the website of the <a href="http://data.worldbank.org/">world bank</a>. Open the data file to see what it looks like (just click on it in the Files tab on the Jupyter Dashboard). Load the data with the `read_csv` function of `pandas`, making sure that the names of the countries can be used to select a row, and perform the following tasks:

* Print the first 5 lines of the `DataFrame` to the screen with the `.head()` function.
* Print the average annual rainfall for Panama and make sure to include the units.
* Report the total land area of the Netherlands and make sure to include the units.
* Report all countries with an average annual rainfall less than 200 mm/year
* Report all countries with an average annual rainfall more than 2500 mm/year
* Report all countries with an average annual rainfall that is within 50 mm/year of the average annual rainfall in the Netherlands


In [None]:
# [Your solution here]

## 6. Basic plotting 
Plotting is not part of standard Python, but a nice package exists to create pretty graphics (and ugly ones, if you want). A package is a library of functions for a specific set of tasks. There are many Python packages and we will use several of them. The graphics package we use is called `matplotlib`. To be able to use the plotting functions in `matplotlib`, we have to import it. We will learn several different ways of importing packages. For now, we import the plotting part of `matplotlib` and call it `plt`. Before we import `matplotlib`, we tell the Jupyter Notebook to show any graphs inside this Notebook and not in a separate window using the `%matplotlib inline` command (more on these commands later). 

[*This part of the course was adapted from the course by Mark Bakker*. See: https://mbakker7.github.io/exploratory_computing_with_python/ ]

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

Packages only have to be imported once in a Python session. After the above import statement, any plotting function may be called from any code cell as `plt.function`. For example

In [None]:
plt.plot([1, 2, 4, 2])

Let's try to plot $y$ vs $x$ for $x$ going from $-4$ to $+4$ for the polynomial
$y=ax^2+bx+c$ with $a=1$, $b=1$, $c=-6$.
To do that, we need to evaluate $y$ at a bunch of points. A sequence of values of the same type is called an array (for example an array of integers or floats). Array functionality is available in the package `numpy`. Let's import `numpy` and call it `np`, so that any function in the `numpy` package may be called as `np.function`. 

In [None]:
import numpy as np

To create an array `x` consisting of, for example, 5 equally spaced points between `-4` and `4`, use the `linspace` command

In [None]:
x = np.linspace(-4, 4, 5)
print(x)

In the above cell, `x` is an array of 5 floats (`-4.` is a float, `-4` is an integer).
If you type `np.linspace` and then an opening parenthesis like:

`np.linspace(` 

and then hit [shift-tab] a little help box pops up to explain the input arguments of the function. When you click on the + sign, you can scroll through all the documentation of the `linspace` function. Click on the x sign to remove the help box. Let's plot $y$ using 100 $x$ values from 
$-4$ to $+4$.

In [None]:
a = 1
b = 1
c = -6
x = np.linspace(-4, 4, 100)
y = a * x ** 2 + b * x + c  # Compute y for all x values
plt.plot(x, y)

Note that  *one hundred* `y` values are computed in the simple line `y = a * x ** 2 + b * x + c`. Python treats arrays in the same fashion as it treats regular variables when you perform mathematical operations. The math is simply applied to every value in the array (and it runs much faster than when you would do every calculation separately). 

You may wonder what the statement like `[<matplotlib.lines.Line2D at 0x30990b0>]` is (the numbers above on your machine may look different). This is actually a handle to the line that is created with the last command in the code block (in this case `plt.plot(x, y)`). Remember: the result of the last line in a code cell is printed to the screen, unless it is stored in a variable. You can tell the Notebook not to print this to the screen by putting a semicolon after the last command in the code block (so type `plot(x, y);`). We will learn later on that it may also be useful to store this handle in a variable.

The `plot` function can take many arguments. Looking at the help box of the `plot` function, by typing `plt.plot(` and then shift-tab, gives you a lot of help. Typing `plt.plot?` gives a new scrollable subwindow at the bottom of the notebook, showing the documentation on `plot`. Click the x in the upper right hand corner to close the subwindow again.

In short, `plot` can be used with one argument as `plot(y)`, which plots `y` values along the vertical axis and enumerates the horizontal axis starting at 0. `plot(x, y)` plots `y` vs `x`, and `plot(x, y, formatstring)` plots `y` vs `x` using colors and markers defined in `formatstring`, which can be a lot of things. It can be used to define the color, for example `'b'` for blue, `'r'` for red, and `'g'` for green. Or it can be used to define the linetype `'-'` for line, `'--'` for dashed, `':'` for dots. Or you can define markers, for example `'o'` for circles and `'s'` for squares. You can even combine them: `'r--'` gives a red dashed line, while `'go'` gives green circular markers. 

If that isn't enough, `plot` takes a large number of keyword arguments. A keyword argument is an optional argument that may be added to a function. The syntax is `function(keyword1=value1, keyword2=value2)`, etc. For example, to plot a line with width 6 (the default is 1), type

In [None]:
plt.plot([1, 2, 3], [2, 4, 3], linewidth=6);

Keyword arguments should come after regular arguments. `plot(linewidth=6, [1, 2, 3], [2, 4, 3])` gives an error.

Names may be added along the axes with the `xlabel` and `ylabel` functions, e.g., `plt.xlabel('this is the x-axis')`. Note that both functions take a string as argument. A title can be added to the figure with the `plt.title` command. Multiple curves can be added to the same figure by giving multiple plotting commands in the same code cell. They are automatically added to the same figure.

### New figure and figure size

Whenever you give a plotting statement in a code cell, a figure with a default size is automatically created, and all subsequent plotting statements in the code cell are added to the same figure. If you want a different size of the figure, you can create a figure first with the desired figure size using the `plt.figure(figsize=(width, height))` syntax. Any subsequent plotting statement in the code cell is then added to the figure. You can even create a second figure (or third or fourth...).

In [None]:
plt.figure(figsize=(10, 3))
plt.plot([1, 2, 3], [2, 4, 3], linewidth=6)
plt.title('very wide figure')
plt.figure()  # new figure of default size
plt.plot([1, 2, 3], [1, 3, 1], 'r')
plt.title('second figure')

### Exercises Part 6: Basic Plotting

#### Exercise 6.1: Plot a function
Plot $y=(x+2)(x-1)(x-2)$ for $x$ going from $-3$ to $+3$ using a dashed red line. 

On the same figure, plot a blue circle for every point where $y$ equals zero. 


Label the axes as 'x-axis' and 'y-axis'. Add the title 'First nice Python figure of Your Name', where you enter your own name.

In [None]:
# [Your solution here]

#### Exercise 6.2: Simple scatter plot 
Provide a scatter plot of y as a function of x for the following data.

In [None]:
x = np.linspace(0, 10, 50)  # 50 linearly spaced numbers from 0 to 10
y = np.sin(x) + np.random.normal(0, 0.1, 50)  # y = sin(x) + some noise

# [Your solution here]

## Part 7. Error and Exception Handling
In Python, errors and exceptions are situations that disrupt the normal flow of a program's execution. Understanding how to handle these errors gracefully is crucial for building robust and user-friendly applications.

- Basics of error types in Python.
- Using try, except, finally, and else blocks to handle exceptions.


#### Basics of Error Types in Python
Errors in Python can be broadly classified into two categories:

- Syntax Errors: These occur when the parser detects incorrect syntax. They are often typos or incorrect use of Python's syntax rules.

Example:



In [None]:
print("Hello world"


This will result in a SyntaxError because the closing parenthesis is missing.

- Exceptions: These are errors detected during execution. Common exceptions include ValueError, TypeError, IndexError, etc.

Example:



In [None]:
numbers = [1, 2, 3]
print(numbers[3])


This will raise an IndexError because there is no item at index 3.

#### Using try, except, finally, and else Blocks
To handle exceptions and errors, Python provides the try, except, finally, and else blocks.

##### try and except Blocks
Use these to catch and handle exceptions. You can specify the type of exception to catch or use a generic except to catch all exceptions.

Example:



In [None]:
try:
    # Code that may raise an exception
    result = 10 / 0
except ZeroDivisionError:
    # Code that runs if the exception occurs
    print("You can't divide by zero!")



#### else block
This block runs if no exceptions were raised in the try block.

Example:



In [None]:
try:
    print("Trying to open the file...")
    file = open('file.txt', 'r')
except FileNotFoundError:
    print("File not found.")
else:
    print("File opened successfully.")
    file.close()


#### finally Block
Code within the finally block runs regardless of whether an exception was raised or not. It's typically used for clean-up actions.

Example:



In [None]:
try:
    file = open('file.txt', 'r')
except FileNotFoundError:
    print("File not found.")
finally:
    print("This will run regardless of previous blocks.")
    # It's a good place to close resources
    # file.close() # Uncomment after ensuring 'file' is defined


### Exercise Part 7: Errors and exceptions
- Write a function that takes two two numbers and then divides them. 
- Use try, except, else, and finally blocks to handle possible exceptions such as division by zero or invalid inputs. 
- Ensure that all input is properly validated and that any error messages are user-friendly.


In [None]:
def division(a,b): 
    # [Your code here3]
