# Introduction to Python

In this notebook  will discuss the basis of **programing with python**.

If you want a more in-depth tutorial, check out the [offical python tutorial](https://docs.python.org/3/tutorial/).

There is excellent, **extensive** documentation in [python documentation](https://docs.python.org/3/).

For specific questions use [Google](https://www.google.com/) or, more specifically [stackoverflow](https://stackoverflow.com/).


## Preliminaries

### E4500 Directory Organization

We will always assume a relative organization like this

* **root**
    * **lectures** (lecture notes and jupyter notebooks)
        * lecture1
        * lecture2
        * ...
    * **E4500_DS** ( python modules reused during the course)
        * module1.py
        * module2.py
        * ...
    * **raw** (raw unmodified data we will use during course)
        * dataset1 (directory for dataset1)
        * dataset2
        * ...
    * **data** (pre-processed data we have computed or modified ourselves )
        * project1
        * project2
        * ...
    * **projects** (stand along python scripts)
        * project1/script1.py
        * project2/script2.py
        
If you save notebooks, data, etc this way, all examples from this course should work out of the box

In [1]:
%pwd

'/home/manel/E4500DS/lectures/01_Introduction'

My **root** directory is called `E4500DS` and now we are working on `lectures/01_Introduction`

In [2]:
%ls

[0m[01;34mbuild[0m/   Introduction_DataScience.tex  ReadingFiles.ipynb  [01;34mtables[0m/
[01;34mimages[0m/  IntroductionPython.ipynb      read_options.py


Other class materials for this class are saved there.

## Python expressions

An expression computes a value

The following expression is just the value `5`

In [3]:
5

5

And here we compute $1+1$

In [4]:
1+1

2

We can have more complicated expressions:

In [5]:
5*(1-3)+ 24/3

-2.0

We use **boolean expressions** to test for conditions

In [6]:
0 < 5*(1-3)+ 24/3

False

Which is the same as 

In [7]:
0 < -2

False

The rules of [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) are well documented on [python's manual](https://docs.python.org/3/).


But, as an easy to remember rule; when in doubt about the order of evaluation **use parenthesis**:

In [8]:
0 < ( 5*(1-3)+ 24/3 )

False

## Variables

We can have **names** to values. A named value is called a **variable**

In [9]:
x=2

variables have a **value**

In [10]:
x

2

and a **type**

In [11]:
type(x)

int

There are many different types, we will review the most common ones later.

We can use values in **expressions**:

In [12]:
x+2

4

You can re-assign a new value to a variable and the old one is *forgotten*

In [13]:
x=5.0
x

5.0

In [14]:
type(x)

float

In python only values (but not variables) have types.

We can assing the results of expressions to new variables

In [15]:
y=x+2
y

7.0

In [16]:
2*y+x

19.0

In [17]:
y>x

True

## Python scalar types

There are many kinds of types in python. Scalar types contain just one value.

The most commonly used  ones are:

### Integers

In [18]:
2

2

In [19]:
type(2)

int

#### Arithmetic Operators

Integer support addition, substraction and multiplication

In [20]:
2+4,2-4,2*4

(6, -2, 8)

There are **two division operators for integer**:
1. Floor division: returns an integer result
2. True division: returns a fractional (float) result

In [21]:
7//3 # Floor division

2

In [22]:
7/3 # true division

2.3333333333333335

The reminder of division is the **mod** operator

In [23]:
7 % 3

1

Exponentiation is also supported with the `**` operator

In [24]:
2**4

16

#### Mutating arithmetic operators

A common pattern is to increase (decrease, scale) a number by another, this is supported by the operators `+=`,`-=`, `*=`,`/=`,`//=`,`**=`.

The compute an operation (+,-,*,/,**) between two numbers and save the result on the left hand side.

In [25]:
x=8

In [26]:
x+=1 # x = x +1
x

9

In [27]:
x*=2 # x = x * 2
x

18

In [28]:
x//=3 # x = x // 3
x

6

In [29]:
x**=2 # x = x ** 2
x

36

#### Integer Range

Integers in python can carry a sign

In [30]:
-100

-100

And can by arbitrarily large (or small)

In [31]:
((7**10)**10)*10

32344765096247579913446477691002168108572031989046254009338953313916914596369280600010

We are only limited my how many digits will fit into memory

### Booleans

The **Boolean** type only  has two values

In [33]:
True

True

In [34]:
False

False

In [35]:
type(True)

bool

We can have boolean variables

In [36]:
p=True
q=False
p,q

(True, False)

#### Boolean Operators

Boolean variables support the usual logical operators
1. **and**: true if both values are true
2. **or**: true is *either* is true
3. **not**: logical negation


In [37]:
p and q

False

In [38]:
p or q

True

In [39]:
not(q)

True

#### Relational Operators

The relational operators compare numbers and return a boolean result 
1. `==` equal: true if both values are the same
2. `!=` not equal, true if both values are different
3. `<` less than: true if left hand side is strictly smaller than right hand side
4. `<=` less or equal: true if left hand side is equal or smaller than right hand side
5. `>` larger than: true if left hand side is strictly larger than right hand side
6. `>` larger or equal: true if left hand side is equal or larger than right hand side. 

In [45]:
x=2
y=4

In [46]:
x==y,x!=y,x<y,x<=y,x>y,x>=y

(False, True, True, True, False, False)

In [47]:
x<x,x<=x

(False, True)

### Floats

Floating point numbers (floats) represent *approximately* real numbers.

They have limited precision and range (they can not be arbitrarily large of small)

In [48]:
3/11

0.2727272727272727

You can use scientific notation (like in calculators)

In [49]:
2.1e2

210.0

In [50]:
-3.4e-3

-0.0034

Precision is limited to (approximately 15 digits)

In [51]:
1+1e-15

1.000000000000001

In [52]:
1+1e-16

1.0

The largest number is approx $10^{308}$

In [53]:
1.0e308

1e+308

In [54]:
1.0e309

inf

#### Arithmetic Operators

Like integers, floating point numbers support arithmetic operations `+,-.*,/,**`, and their mutating counterparts `(+=,-=,*=,/=,**=)`



In [55]:
x=3.4
y=0.1
x+y,x-y,x*y,x/y,x**y

(3.5, 3.3, 0.34, 34.0, 1.1301807132434794)

In [56]:
x*=2
x

6.8

#### Relational operators

Relational operators are also supported

In [57]:
x==y,x!=y,x<y,x<=y,x>y,x>=y

(False, True, False, False, True, True)

#### Precision and Rounding Error

Because calculations with floating points are only approximate. 

We can see rounding error:

In [58]:
x=1.0
y=1e-15

In [59]:
x+y

1.000000000000001

In [60]:
x+y/10

1.0

In [61]:
10*(x+y/10)-10*x

0.0

Mathematically

$$
    y = 2(x+y/2)-2x 
$$

Due to rounding error, this may not be the case

In [64]:
2*(x+y/2)-2*x == y

False

In [65]:
2*(x+y/2)-2*x,y

(8.881784197001252e-16, 1e-15)

Comparing floating point numbers for equality is usually a mistake.

### Integer/Floating Point conversions

You can convert a float to and integer using:

In [66]:
x=3.9
y=-3

In [67]:
int(x)

3

In [68]:
float(y)

-3.0

In arithmetic expressions integers are promoted to floats if needed

In [69]:
x+y

0.8999999999999999

### Strings

String is a type that allows us to store text (collections of characters)

In [74]:
x="Manuel"
y='Data Science'
z="100"

#### Relational Operators

Strings are ordered (lexicographical order, like in a dictionary) and support all the comparison operators

In [71]:
x==y,x!=y,x<y,x<=y,x>y,x>=y

(False, True, False, False, True, True)

A string representation of some digits is **different** from the number they represent

In [72]:
z==100

False

In [73]:
z

'100'

#### String Operations

Strings support a rich set of operations, see the [reference manual](https://docs.python.org/3/library/stdtypes.html#string-methods).

Here are some of the most useful ones

In [75]:
len(x)

6

You can check if a sub string is contained in a larger one

In [76]:
"nu" in "Manuel"

True

Matches are case sensitive

In [77]:
"Nu" in "Manuel"

False

You can **format** strings using **interpolation**:

In [79]:
f"My name is {x} and I teach {y}."

'My name is Manuel and I teach Data Science.'

## Collections

Collections contain multiple values in a single object.
There are three basic collections in python:
1. tuple
3. list
4. dictionary

### Tuple
A tuple contains a set of values (of any type) and it is **inmutable**. Once it is created you can not add or substract new values

In [80]:
t=(1,"hello",5.0)
t

(1, 'hello', 5.0)

In [81]:
type(t)

tuple

The total number of elements on the tuple is returned by the `len` function:

In [82]:
len(t)

3

We can access tuple elements by **index**.

Note the indexing starts at zero (the first element is at position zero).

In [83]:
t[0]

1

In [84]:
t[2]

5.0

### List

A list, like a tuple contains multiple elements of any time. Unlike a tuple, it is **mutable**. You can add or substract elements from a tuple

#### Basics

In [94]:
l=[1,"hello",5.0,7.0,72,"last"]
l

[1, 'hello', 5.0, 7.0, 72, 'last']

In [95]:
type(l)

list

The total number of elements on the tuple is again returned by the `len` function:

In [96]:
len(l)

6

We can also access  list elements by **index**.


In [97]:
l[1]

'hello'

Note that indexes start at **zero**:

In [98]:
l[0]

1

The last valid index is `len(l)-1`

In [99]:
l[len(l)-1]

'last'

We can also test if an elements is contained in a list

In [119]:
72 in l

True

#### List Mutation

You can add elements to a list:

In [100]:
l.append("New Element")
l

[1, 'hello', 5.0, 7.0, 72, 'last', 'New Element']

And remove elements by position

In [101]:
del l[0]
l

['hello', 5.0, 7.0, 72, 'last', 'New Element']

Or by value

In [102]:
l.remove(5.0)
l

['hello', 7.0, 72, 'last', 'New Element']

#### Fancy indexing

If we want only the first three elements we can index into them:

In [105]:
l[:3] # all elements up to, but not including index 3

['hello', 7.0, 72]

If we and elements at indexes  1,2, and 3 we can do that too

In [107]:
l[1:4] #  from index 1 (inclusive) to index 4 (exclusive) i.e indexes 1,2,3

[7.0, 72, 'last']

Or elements from index 2 (third element)  to end

In [108]:
l[2:]

[72, 'last', 'New Element']

Negative indexes count from the end. To get the last element:

In [109]:
l[-1]

'New Element'

### Dictionary
A dictionary is a collection in which, instead of accessing elements by index, you access them by key:

In [110]:
d={"A":24,"B":-5,"C":101}
d

{'A': 24, 'B': -5, 'C': 101}

In [111]:
type(d)

dict

In [112]:
len(d)

3

In [113]:
d["A"]

24

In [114]:
d["C"]

101

We can get a list of all keys

In [115]:
d.keys()

dict_keys(['A', 'B', 'C'])

or values

In [116]:
d.values()

dict_values([24, -5, 101])

or a list of tuples with key and value

In [117]:
d.items()

dict_items([('A', 24), ('B', -5), ('C', 101)])

You can also check if a key is in the dictionary

In [118]:
"B" in d

True

#### Mutation

dictionaries are **mutable**

In [120]:
d

{'A': 24, 'B': -5, 'C': 101}

In [122]:
d[4]="hello"
d

{'A': 24, 'B': -5, 'C': 101, 4: 'hello'}

And we can delete elements

In [123]:
del d["C"]
d

{'A': 24, 'B': -5, 4: 'hello'}

## Control Flow

By default, computers execute instructions sequentially, line by line:

In [124]:
x=2
print("x =",x)
y=x+2
print("y =",y)
z=x*y
print("z =",z)

x = 2
y = 4
z = 8


### If Statement

We can use an `if` statement to  perform a calculation based on a boolean condition (see boolearns above)

In [125]:
if (x>y):
    z=1
    print("x larger")
else:
    z=-1
    print("y larger")
z

y larger


-1

In [126]:
x=x+100
if (x>y):
    z=1
    print("x larger")
else:
    z=-1
    print("y larger")
z

x larger


1

### For loop
We can use a `for loop` to repeat a calculation over the members of a collection:

In [127]:
for item in t:
    print("Got item",item)

Got item 1
Got item hello
Got item 5.0


In [128]:
for item in l:
    print("Got item",item)

Got item hello
Got item 7.0
Got item 72
Got item last
Got item New Element


In [129]:
for key in d:
    print("Got key",key,"with value",d[key])

Got key A with value 24
Got key B with value -5
Got key 4 with value hello


Strings are also colections of characters, so you can use them in a for loop

In [130]:
for c in name:
    print(c)

M
a
n
u
e
l


If you want to return something a number of times, the range function works nicely with a for loop

In [131]:
N=10
for i1 in range(N):
    print(i1)

0
1
2
3
4
5
6
7
8
9


Remember that python is **zero-indexed**: `range(N)` returns the numbers 0 to $N-1$.

### While loop

A `while` loop repeats a block of code until a condition is false

In [133]:
f=1
c=0
while f<100:
    print(c,f)
    f*=2
    c+=1
    

0 1
1 2
2 4
3 8
4 16
5 32
6 64


## Function

Many times, a particular calculation needs to be repeated may times with different inputs.

In those cases is better to define a `function`

In [134]:
def scalar_product(x,y):
    """ Compute the vector product of two three dimensional vectors"""
    return x[0]*y[0]+x[1]*y[1]+x[2]*y[2]
    

In [135]:
a=(1.0,2.0,3.0)
b=[-1.0,-1.0,-2.0]

In [136]:
scalar_product(a,b)

-9.0

In [138]:
c=[1.0,1.0,1.0]
scalar_product(a,c),scalar_product(b,c)

(6.0, -4.0)

The  string we added to the function will be in the function help message. 

In [139]:
help(scalar_product)

Help on function scalar_product in module __main__:

scalar_product(x, y)
    Compute the vector product of two three dimensional vectors



Help also works on built-in functions:

In [141]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



## Objects

Objects allow  to keep a  group of variables and functions together.

Many times, when we alter (**mutate**) the object we need to follow  certain rules to keep all the variables **consistent** with each other.

In python you create a new type called a `Class`, variables of that type are called *objects*. and functions that operate on objects are called **methods**.

Imagine a  credit card **Costumer**  described by:
1. `credit_card_number`
1. `name`
2. `address`
3. `credit_balance`
4. `credit_limit`
5. `cash_back_amount`
6. `cash_back_rate`

We can create a python **Class** to 
- group together the variables that define a costumer
- define methods that act on the costumer as a whole.

We will use all sorts of objects, but we will not be creating many ourselves.

In [144]:
class Costumer:
    def __init__(self, credit_car_number, name, address,credit_limit=10_000,cash_back_rate=0.02):
        """Create a new costumer starting with a zero balance """
        # here we initialize the variables for the costumer
        self.credit_card_number=credit_car_number
        self.name=name
        self.address=address
        self.credit_balance=0.0 # start with no balance
        self.cash_back=0.0 # no cash back earned yet
        self.credit_limit=credit_limit
        self.cash_back_rate=cash_back_rate
    def __repr__(self):
        """friendly string representation of costumer"""
        return f"Costumer(id={self.credit_card_number}, name='{self.name}', address='{self.address}', balance={self.credit_balance}, cash_back={self.cash_back}, limit={self.credit_limit}, cash_back_rate={self.cash_back_rate})"
    def borrow(self,amount):
        """ Client borrow money. 
        
            Returns true if ammount does not get him/her over credit limit, zero otherwise.
            Cashback ammount is updated with rebate for new borrowed funds.
            
            """
        new_balance=self.credit_balance+amount
        if new_balance<self.credit_limit:
            self.credit_balance=new_balance
            cash_back=amount*self.cash_back_rate
            self.cash_back+=cash_back
            return True
        else:
            return False
    def pay(self,amount):
        """ costumer pays part or all of his balance.
        
            If ammount is larger than balance, we return excess funds.
        """
        used_amount=min(self.credit_balance,amount) # only use the smallest of balance and amount 
        self.credit_balance-=used_amount
        return amount-used_amount # return unused funds
            

In [145]:
help(Costumer)

Help on class Costumer in module __main__:

class Costumer(builtins.object)
 |  Costumer(credit_car_number, name, address, credit_limit=10000, cash_back_rate=0.02)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, credit_car_number, name, address, credit_limit=10000, cash_back_rate=0.02)
 |      Create a new costumer starting with a zero balance
 |  
 |  __repr__(self)
 |      friendly string representation of costumer
 |  
 |  borrow(self, amount)
 |      Client borrow money. 
 |      
 |      Returns true if ammount does not get him/her over credit limit, zero otherwise.
 |      Cashback ammount is updated with rebate for new borrowed funds.
 |  
 |  pay(self, amount)
 |      costumer pays part or all of his balance.
 |      
 |      If ammount is larger than balance, we return excess funds.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |

We now can define costumers

In [147]:
costumer1=Costumer(20457,"Mary Smith","New York City",20_000,0.015)
costumer1

Costumer(id=20457, name='Mary Smith', address='New York City', balance=0.0, cash_back=0.0, limit=20000, cash_back_rate=0.015)

In [None]:
type(costumer1)

We can also list its methods:

In [148]:
dir(costumer1)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'address',
 'borrow',
 'cash_back',
 'cash_back_rate',
 'credit_balance',
 'credit_card_number',
 'credit_limit',
 'name',
 'pay']

In [149]:
costumer2=Costumer(51952,"John Doe","Hoboken") # we have defaults for 
costumer2

Costumer(id=51952, name='John Doe', address='Hoboken', balance=0.0, cash_back=0.0, limit=10000, cash_back_rate=0.02)

We can store them in colections:

In [150]:
costumers=[costumer1,costumer2]
costumers # a list of costumers

[Costumer(id=20457, name='Mary Smith', address='New York City', balance=0.0, cash_back=0.0, limit=20000, cash_back_rate=0.015),
 Costumer(id=51952, name='John Doe', address='Hoboken', balance=0.0, cash_back=0.0, limit=10000, cash_back_rate=0.02)]

We can now borrow money

In [151]:
costumer1.borrow(1_000)

True

In [152]:
costumer1

Costumer(id=20457, name='Mary Smith', address='New York City', balance=1000.0, cash_back=15.0, limit=20000, cash_back_rate=0.015)

But we do not allow the costumer to go over his limit

In [153]:
costumer1.borrow(19_500)

False

In [154]:
costumer1

Costumer(id=20457, name='Mary Smith', address='New York City', balance=1000.0, cash_back=15.0, limit=20000, cash_back_rate=0.015)

And we can pay our balance

In [155]:
costumer1.pay(3_000) # too much money

2000.0