# Session 2 - Introduction to Python

## Summary

- data types
- variables
- functions
- open and read files
- import packages


## Data types

Data types tell the computer what a piece of data is, and what it can do with it.

You do not do the same things with words, or with numbers...

In [1]:
# compare
"20"+"20"

'2020'

In [2]:
20+20

40

In [4]:
# find the type of something
type("20")

str

**String (str)**:
A sequence of characters (letters, punctuation, whitespace...)

**Integer (int) and Float (float)**:  
correspond to whole numbers (42) and decimal numbers (42.5)

**Boolean (bool)**: 
either `True` or `False`

## Variables
Variables are used to give names to a piece of data (number, characters, strings, etc.)


In [3]:
# Here is how you assign a value to a variable
# variable_name = variable_value
a = 3.14159
b = "hello world!" # you can also use single quotes: 'hello'
c = b



- Variables are used to store information to be accessed and manipulated in a computer program.
- You can think of it as a **label** pointing at something stored in memory.
- The value of a variable can be changed.
- You need to assign a value to a variable before you can use it.

In [7]:
# 1. assign a value to variable x
x = 2+3

# 2. you can reuse and modify your variable
y = x*2
x = "+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++"

# check the value of x and y
print(x)
print(y)

+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++
10


## Variables - Names

- The name of a variable can **contain** letters, numbers and underscore characters `_`
- The name of a variable **starts** with a letter or `_`
- It is case sensitive (`Test`and `test` are different variables)
- Use meaningful variable names (not `a` or `b`, but better `wordCount` or `word_count`)

## Errors: Don't Panic!

In [None]:
a = "The answer is "
b = "42"
a+b

<img src="../ancillary/python-error-1.png"  style="display:block;margin-right:auto;margin-left:auto;">

Python gives you the information that you need to correct the error:
1. The sort of error that was made: there is a problem with a data type.
2. The error message: something should be a string, but instead Python got an integer.
3. The line of code where the error happened: it highlighted with an arrow.

**Conclusion**: the variable `b` should be converted to a string, if we want to eliminate the error at line 3 `a+b` .

## Functions
This is how you give orders to the computer!

In [None]:
# assign your name to the variable
name = ""

# tell the computer to say hi
print("Hi "+name+"!")

## Functions - Examples

- general: `print()`, `help()`
- data types: `type()`, `int()`, `str()`
- counting the length: `len()`

What goes inside the parenthesis is called the `argument`. Arguments are information passed to the function so that it can do its job! Arguments can be optional or mandatory.

- `print()`: displays a variable
- `help()`: shows help about Python. If you give a variable name or a function name, you will receive more information about the variable data type, or how you can use the function (e.g. 'help(a)' or 'help(len)'.

- `type()`: gives you the type of a variable
- `int()`: converts a variable into an integer
- `str()`: converts a variable into a string

- `len()`: gives you the length of a variable (for a string, the number of characters it contains)

## Functions - Exercises

The lines starting with # are comments, they are ignored by Python. The comments give you the instruction of the exercise.

Complete each cell by adding code below the comment, and then run then cell (click on 'Run' or 'Exécuter').

In [13]:
# create a variable that is a string


In [14]:
# create a new variable that transforms the first one into a number


In [None]:
# find the length of both variables - what happens? Why?


In [None]:
# ask the computer for help about the second variable!


Functions usually send you back some information that can be stored in a variable, for later use:

In [23]:
a = "a short string"
b = len(a)
print(b)

14


## Methods
Methods are functions that apply only to a certain data type:
`str.upper()`, `str.find(sub[, start[, end]])`

These functions apply to the data type *string* (str). What you see in the parenthesis are the arguments: `str.upper` has no argument, but `str.find` needs at least one argument (*sub*) and has two optional arguments (*start* and *end*)

Here is a list of methods available for strings: <https://docs.python.org/3/library/stdtypes.html#string-methods>

## Examples - Working with Strings

In [17]:
message = "Le petit chien est sur la pente fatale!!!"

# replace characters
a = message.replace("petit chien", "grand chat")

# count occurrences of a substring
b = message.count("!")

# split a string at whitespace characters
c = message.split(" ")

# join strings together
d = "-".join(c)

# print a variable to see the result

## Read and Write files

One of the most frequent tasks that programmers do is reading data from files and writing some of the output to other files.

Files are located most often on your own computer. You access a file with its path:

In [23]:
# full path on my own computer
path_absolute = "D:/Documents/academia/collation_workshops/data/discworld.txt"

# path relative to where I am now, in this notebook
path_relative = "../data/Pratchett/discworld.txt"

This is how to read a file:
- `with` statement
- `open()` function, for which the first argument is a path to a file
- `as` assigns a label (variable name) to the file
- `read()` function

In [18]:
with open("../data/Pratchett/discworld.txt", 'r') as file:
    text = file.read()
    
# once we are out of the with statement, the file is closed again
# but we can still access the variable we created
text

"'I somehow feel I need to ask, Mister Stibbons...what chance is there of this just blowin' up and destroyin' the entire university?'\nPonder's heart sank. He mentally scanned the sentence, and took refuge in the truth. 'None, sir.'\n'Now try honesty, Mister Stibbons.'\n[...]\n'Well...in the unlikely event of it going seriously wrong, it... wouldn't just blow up the university, sir.'\n'What would it blow up, pray?'\n'Er... everything, sir.'\n'Everything there is, you mean?'\n'Within a radius of about fifty thousand miles out into space, sir, yes. According to Hex, it'd happen instantanously. We wouldn't even know about it.'\n'And the odds of this are...?'\n'About fifty to one, sir.'\nThe wizards relaxed.\n'That's pretty safe. I wouldn't bet on a horse at those odds,' said the Senior Wrangler."

In [None]:
# exercise:
# open an other .txt file from the data folder and check its contents

# bonus exercises: 
# try to split the text into lines with the function split()

# count the number of time the letter 'i' appears in the text

## Writing Files

Writing to a file is similar as reading: use the `open()` function with `w` instead of `r`.

If the path leads to an existing file, the content of the file will be replaced.

If the file you are writing to does not exist, a new file will be created.

In [19]:
with open('test.txt', 'w') as file:
    file.write("This is a test!")

In [None]:
# we check that it worked as planned, by reading the content of our new file
with open('test.txt', 'r') as file:
    text = file.read()
text


In [None]:
# with r+, you can both read a file and then add more text at the end
with open('test.txt', 'r+') as file:
    text = file.read()
    print(text) 
    file.write(" I am adding more words to this test file.")

In [None]:
# we check that it worked as planned
with open('test.txt', 'r') as file:
    text = file.read()
text

## Read Files

Now that you know the proper way to do things, we can use a shortcut for reading files!

It is also recommended to add an `encoding` argument to the `open()` function.

In [33]:
text = open('../data/Pratchett/discworld.txt', 'r', encoding='utf-8').read()
text

"'I somehow feel I need to ask, Mister Stibbons...what chance is there of this just blowin' up and destroyin' the entire university?'\nPonder's heart sank. He mentally scanned the sentence, and took refuge in the truth. 'None, sir.'\n'Now try honesty, Mister Stibbons.'\n[...]\n'Well...in the unlikely event of it going seriously wrong, it... wouldn't just blow up the university, sir.'\n'What would it blow up, pray?'\n'Er... everything, sir.'\n'Everything there is, you mean?'\n'Within a radius of about fifty thousand miles out into space, sir, yes. According to Hex, it'd happen instantanously. We wouldn't even know about it.'\n'And the odds of this are...?'\n'About fifty to one, sir.'\nThe wizards relaxed.\n'That's pretty safe. I wouldn't bet on a horse at those odds,' said the Senior Wrangler."

## Import Modules
Modules contains functions you can reuse in your code. You access the functions by importing the module with the keyword `import`. It is best to import modules at the beginning of your code.

For instance, the `os` [module](https://docs.python.org/3/library/os.path.html) can be helpful when working with files and their paths in different OS (Mac, Windows, Linux). The `re` [module](https://docs.python.org/3/library/re.html) is useful for string operations.

In [34]:
import re # module for working with regular expressions
from os import path # import a part of the os module 

# Example: find the file name from a path
filename = path.basename(path_absolute)

print(path_absolute)
print(filename)

D:/Documents/academia/collation_workshops/data/discworld.txt
discworld.txt


## More Data Types - List
The items or elements of lists are **ordered** in a defined sequence. A string is a sort of list.
The elements of a list can be accessed via a number that indicates their position inside the list (the **index**).

In [35]:
cities = ["Vienna", "London", "Paris", "Berlin", "Zurich"] # square brackets
world = "world"

# the index starts at 0
print(world[0])
print(cities[1])

w
London


In [36]:
# add item
cities.append("Lausanne")

# remove item
cities.remove("Zurich")

cities

['Vienna', 'London', 'Paris', 'Berlin', 'Lausanne']

In [37]:
# you can have a list of lists
lists = [[1, 2, 3],
        ["Rincewind", "Ridcully", "Hex"],
        [42, "don't panic!"]]

## More Data Types - Dictionary
A dictionary is an **unordered** collection of pairs of key/value.

The **key** is always a string.

The **value** can be anything!


In [38]:
# a dictionary is in curly brackets
book = {"title": "Good Omens",
        "author": ["Terry Pratchett", "Neil Gaiman"],
        "year": 1990
       }
# you can access a value thanks to the key
book["author"]

['Terry Pratchett', 'Neil Gaiman']

In [39]:
# add a key/value pair
book["publisher"] = "Gollancz"

# remove a key/value pair
book.pop("year")

book

{'title': 'Good Omens',
 'author': ['Terry Pratchett', 'Neil Gaiman'],
 'publisher': 'Gollancz'}

## Additional materials

- Python course: [data types and variables](https://www.python-course.eu/python3_variables.php)
- The Python Tutorial: [read and write files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)
- Real Python: [understanding error messages](https://realpython.com/python-traceback/#what-are-some-common-tracebacks-in-python)
- W3Schools Python tutorial: [objects and classes](https://www.w3schools.com/python/python_classes.asp)

## Additional materials - Objects and Classes

Python is an Object Oriented Programming language. Almost everything in Python is an object, with its properties and methods, e.g. strings, integers, lists, etc 

We have seen that **1 variable = 1 data type** (more or less, there are some more complex data types like lists and dictionaries).

But sometimes a single data type cannot describe something properly. **Classes** let you create your own objects and write methods for your objects.

For example, you can imagine a Book oject that has 3 or more properties: title, author, date...

Imagine that you have a lot of books. Now you can do different things:

- order them alphabetically by titles
- ask how many books were written by author X
- ask which book was published first


In [None]:
# example of a Book class definition

class Book:
    # when Book() is called, it creates a Book object (an "instance")
    def __init__(self, t, a, y):
        self.title = t # set title
        self.author = a # set author
        self.year = y # set year
    
# the Book() function needs three arguments, in the proper order: 
#1. title, 2. author, 3. date of publication
b = Book("Alice's adventures in Wonderland", "Lewis Carroll", 1865)

# check the author of the book you just created
b.author

**Why does this matter?**  
We are not going to create objects ourselves, but it can be useful to understand this concept because that is how the CollateX module is organized.

**Exercise:**  
Can you imagine what properties should a "Collation" class have? What other classes would we need?

**If you are very much bored...** Check CollateX source code for it's [core classes](https://github.com/interedition/collatex/blob/master/collatex-pythonport/collatex/core_classes.py) and see how the Collation object has been implemented. You can also check the [core functions](https://github.com/interedition/collatex/blob/master/collatex-pythonport/collatex/core_functions.py).