Author : Nicolas Rousset  
Github : https://github.com/Aenori  
Mail : nrousset@gmail.com  
License : MIT  

# Python basic

<a id='variable'></a>
## Variable

Variable are one of the basis of any programming language, and they are meant to store variable values.  
You don't specify the type of a variable in python and variable can even change type ! *That is the kind of thing you should not do*, there is no sensible reason to do that ...

In [None]:
# int
a = 4
# float
b = 4.0
# string
c = "Hello world"

You can ask the type of variable, by simply using the function type()

In [None]:
type(a)

NB : In this case, the type is printed because you are in a jupyter notebook, so an interactive environment, and it is the last result of the cell. Otherwise you should use print to show a variable result.

<a id='string_variable'></a>
### String variable

You can use three different delimiters for python string, <code>"</code>, <code>'</code> or <code>"""</code>.  
The three of them create the same kind of string. The only difference is that if you use a delimiter, you can use the other safely.

Also the last one <code>"""</code> is multi-line. It is often used for documentation purpose.

In [None]:
a = "A string with ' inside"
b = 'A string with " inside'
c = """A string
that spawn over multiple lines
"""

<a id='f_string'></a>
### F-String

Let's start by a nice feature of python 3 that will be used through this notebook, f-string.  
F string are a new feature in recent version of python that allow to combine string and variables. It is  

<code>f"show variable {var}"</code>

So you just put an **f** before the string symbol to be able to reference any variable inside with the bracket symbol.

In [None]:
a = 25 + 12
print(f"The result of 25 + 12 is {a}")

<a id='syntax'></a>
## Syntax and indentation

First starting with something very specific to python : indentation. Indentation has a syntaxing meaning in python, it defines the bloc code. The basic idea is that code should be well indented, and that a good way to do so is to make sure badly indented code doesn't work ...

For example the two functions are differents :

In [None]:
def f1(i):
    if i % 2 == 0:
        print(f"{i} is even")
    else:
        print(f"{i} is odd")
        print(i)
        
def f2(i):
    if i % 2 == 0:
        print(f"{i} is even")
    else:
        print(f"{i} is odd")
    print(i)

The first function only print i if the number is odd, while the second one always print it

In [None]:
f1(10)
f1(11)
print("=" * 6)
f2(12)
f2(13)

The syntaxing symbol <code>:</code> is strongly linked to indentation, you will find after each instruction that require a block code :  

    if a == 3:
        pass
    for a in l:
        pass
    def my_function(i):
        pass

### One of the strange keyword

<code>pass</code> is a strange keyword. It means "do nothing". What is the use of this keyword ? It is simply that python expect an indented bloc after all instruction that requires it, so if you don't want to provide one yet, you have to use the <code>pass</code> keyword.

<a id='functions'></a>
## Defining functions

Function in python are defined using the keyword <code>def</code> which is followed by the name of the function and the argument :

In [None]:
def custom_sum(a, b):
    return a + b

You don't specify any type, and so the same function can be used with different type :

In [None]:
custom_sum(1, 3)

In [None]:
custom_sum("1", "3")

<code>return</code> is the keyword to define the result of the function. The function is stopped, exited and its result returned.

Be careful that once again, python won't warn you if you write code after the return, even if it won't called anyway 

In [None]:
def custom_sum(a, b):
    return a + b
    print(f"The result is {a + b}") # This line will never be executed, 
        # and would produce an error on some compiled language
    
custom_sum(1, 2)

## DataStructure

Datastructure are a main tool in programming, as they are used to store and process data. 

!Disclaimer : data analysis in python use some specific data structure, pandas dataframe, numpy array and other.

The four main data structure in python are :

In [None]:
# List
a_list = [1, 2, 3]

# Tuple, which is a not mutable list
a_tuple = (1, 2, 3)

# Set
a_set = {1, 2, 3} #
# or 
a_set_2 = set([1, 2, 3])

# Dict
a_dict = {1: 2, 3: 4}

List are structure that are ordered (from the first element to the last) but they don't have any kind of search structure attached, so finding an element by its value is slow.

Set and dict have search structure.

### Examples

In [None]:
a_list = ['a', 'b', 'c']
print(a_list[1]) # fast
print('b' in a_list) # slow

a_set = {'a', 'b', 'c'}
try:  
    # This will provoke an error
    a_set[1]
except:  
    print("Sorry, you can't access set element with index")
print('b' in a_set)  

In [None]:
a_dict = {'a' : 2, 'b' : 3, 'c' : 5}
print(a_dict['c'])

Don't hesitate if you have question here, there is a lot to tell about data structure, but since, once again, you won't use them to store your data, i don't want to emphasize to much.

<a id="slice"></a>
### List and tuple slices

One specific python syntax with list and tuple are the ability to extract a slice of a list. The syntax is close to range, it works with lower bound (included) to upper bound (excluded)

In [None]:
a_list = ['a', 'b', 'c', 'd', 'e']
print(a_list[2:4]) # print ['c', 'd']
print(a_list[:4]) # print ['a', 'b', 'c', 'd']

### list.append

One specific feature of list is that you can modify them. The main method for that is the method append, that add an element to the list

In [None]:
# compare the two case
list_1 = [1, 2, 3]
list_2 = list_1 + [4]
print(list_1) # show [1, 2, 3]
print(list_2) # show [1, 2, 3, 4]

# compare the two case
list_3 = [1, 2, 3]
list_4 = list_3.append(4)
print(list_3) # show [1, 2, 3, 4] => list_3 has been modified
print(list_4) # show None => append return nothing

<a id='packing'></a>
## Packing_and_unpacking

One nice feature of python (compared to Java and C++) is the facility to pack and unpack variables, for example if you want a function to return 2 values.

In [None]:
def f():
    return 1, 2

# We can get the result in one variable, that would be a tuple
x = f()
print("*" * 6)
print(type(x))
print(x)

# Or we can directly unpack values
x, y = f()
print("*" * 6)
print(type(x))
print(x)
print(type(x))
print(x)

<a id='loop'></a>
## Loop and control flow

Three main instruction for control flow and loop are :  
- <code>if</code>  
- <code>for</code>  
- <code>while</code>  

Among the three, while is not used a lot. Also it is worth mentionning that <code>for</code> corresponds to <code>foreach</code> in other language. 

In [None]:
a_list = ['a', 'b', 'c', 'd']
for elt in a_list:
    print(elt)

For this reason, <code>for</code> is often used in conjunction with <code>range</code>. range takes up to 3 arguments, and return numbers, depending of the number of argument.

In [None]:
print("One argument")
for it in range(3): # one argument, => upper bounds (excluded)
    print(it)
print("Two argument")
for it in range(2, 7): # two argument, => lower bound (included), upper bounds (excluded)
    print(it)
print("Three arguments")
# Three arguments is not used a lot, except for reverse looping
for it in range(2, 7, 2): # three argument, => lower bound (included), upper bounds (excluded) and step
    print(it)

### Conditionnal structure If / elif / else

You can write condition using the classical if / elif / else

In [None]:
x = 10

if x > 100:
    print("x is bigger than 100")
elif x < 0:
    print("x is smaller than 0")
else:
    print("x is between 0 and 100")

You can have as many elif as you want

<a id='class'></a>
## Class in python

### Class syntax in python

Class are an object that combines data (attributes) and function (methods).  
In python, the two specificity to notes are the use of the <code>self</code> keyword inside the code to refers to the object itself, and the convention for system methods : 

    __xxx__ 
    
As mentionned before, class are not used a lot in python, especially in data analysis, so if you don't get this part, it doesn't matter. 

In [None]:
class Animal:
    # Defining the constructor, this method is called when you create an object  
    def __init__(self, name):
        self.name = name
        
    def __str__(self):
        return f"I am an animal : {self.name}"

    def speak(self):
        print("I don't know how to")
    
# A class can inherit another class
class Dog(Animal):
    def __init__(self, name, breed):
        # Calling the constructor
        super().__init__(name)
        self.breed = breed
        
    def speak(self):
        print("Wouaf wouaf")

max = Dog("Max", "Labrador")
# Will call Animal.__str__
print(max)
# Will class Dog.speak()
max.speak()

### Limit to class : class vs panda dataframe

As mentionned, python allow OOP (object oriented programming) which basically means defining class.  
Nevertheless, it is only the shadow of what it is in java or C++.  
The reason is that the real power of OOP is to define the properties and capacity of an object, and making sure at compilation time that everything is right in the code. OOP strong point is really a specification checked by compilation.

As you don't have this check in python, OOP is still powerful for some generic behaviour, but overall, it is not a proeminent paradigm in python.

Also regarding data analysis, OOP is pure python, so badly fit for the python data analysis librairies.

Let's compare on an example on the titanic dataset :

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/Aenori/20221024_public/main/dataset/titanic_train.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))

Each row of the data set describe a passenger with several attributes :

In [None]:
df.columns

The matching pure python code for a class would be :

In [None]:
class Passenger:
    def __init__(self, passenger_id, survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked):
        self.passenger_id = passenger_id
        self.survived = survived
        self.pclass = pclass
        self.name = name
        self.sex = sex
        self.age = age
        self.sibsp = sibsp
        self.parch = parch
        self.ticket = ticket
        self.fare = fare
        self.cabin = cabin
        self.embarked = embarked
        
    def __str__(self):
        return f"Passenger {self.name}, age : {self.age}"

In [None]:
passenger_as_python_class = []

for passenger in df.itertuples():
    passenger_as_python_class.append(
        Passenger(passenger.Index, passenger.Survived, passenger.Pclass, passenger.Name, 
                  passenger.Sex, passenger.Age, passenger.SibSp, passenger.Parch, passenger.Ticket, 
                  passenger.Fare, passenger.Cabin, passenger.Embarked)
    )

In [None]:
len(passenger_as_python_class)

In [None]:
print(passenger_as_python_class[0])