## Files

Technically speaking files are categorized as one of the core python data types (like strings, lists,...) while actually they are external objects saved on your machine which are imported to Python using python built-in functions like **open**. Python, either using built-in function or external libraries is able to read from and write to a wide range of files. Here we'll going to see three commonly used files: *.txt , .csv* and *.json* files.

### Text files

In [73]:
# creating a new file and writing to it
file = open('first_text_file.txt', 'w')
file.write('Navid Nobani')
file.close()

In [74]:
# openning an existing file and writing to it
file = open('first_text_file.txt', 'a')
file.write('\nanother line!')
file.close()

In [77]:
# openning an existing file and reading from it
file = open('first_text_file.txt', 'r')
line = file.readline()
print(line)
file.close()

Hello World!



As you have noticed, in order to read from/write to a file, we do these steps:
- open/create a file
- read from/write to it
- close the file

Just like opening and actions you do on the file, closing it is absolutely important since having an open instance uses your RAM and slows down your script. Actually there is a better way to manage opening and closing files which is using content managers. Given our limited time we can't go in details here but fortunately using them is easy. Look at the following example:

In [79]:
with open('first_text_file.txt', 'w') as file:
    file.write('Some funny text :D')

This is equal to the following code:

In [80]:
# creating a new file and writing to it
file = open('second_text_file.txt', 'w')
file.write('Some funny text :D')
file.close()

### CSV Files

There are two main ways to open a csv file: 
- using [csv module](https://docs.python.org/3/library/csv.html) ( which comes with python)
- or using [pandas library](https://pandas.pydata.org/) (which you should install it first)

*[This site](https://realpython.com/python-csv/) has a rich explanation about csv files in Python.*

first let's take a look at csv library:

In [None]:
import csv # NEW STUFF !

with open('./files/data/camdenhousesales15.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader: # NEW STUFF !
        print(row)

For working with csv files (specially the big ones!) I personally prefer pandas library over csv module. Later in the course we'll briefly learn about pandas but let's have a sneak peek now!

In [None]:
import pandas as pd

df = pd.read_csv('./files/data/camdenhousesales15.csv')
df

The table you see in the previous cell is a "DataFrame" one of the most important non-core data types you'll use in data science field. We'll learn more about dataframes later in the course.

# Functions

Python functions are a bundle of code that perform a specific functionality if this functionality is going to be reused later in another part of the code. 

Suppose we want to perform some simple statistics on a list of numbers:

In [55]:
import numpy as np
np.random.seed(42)

data = np.random.rand(1,100_000).tolist()[0]
std = np.std(data)
mean = np.mean(data)
median = np.median(data)
min_val = min(data)
max_val = max(data)
for val in [std, mean, median, min_val, max_val]:
    print(val)

0.2883400052995804
0.49948825007353703
0.5006297820093312
5.536675737993768e-06
0.999992042302966


Ok, we did it! but what if we now want to repeat these processes for another list of numbers. We have two options:
    
- Updating the data variable with a new list
- Re-write all the steps for another list

Both these solutions are bad! or at least not Pythonic. The correct way to deal with such a task is to pack them as a function.

## Built-in functions

Until this point, we have many functions without knowing the exact definition of a function! We've seen:

- print
- len
- max
- min

These are just few example of hundreds of functions that are part of core python language (it means to use them you don't need to import any library/package). A function always has the following structure:

<img src="Images/function.png" width="500"> 

Parameters are inputs that functions may or may not need in order to perform their desired tasks. For example lats take a look at **print** function which take a string as input and prints it.

In [25]:
print('My shiny text!')
print() # prints an empty line
print('X', 'Y', 'Z', sep='+++', end='\n')
print('X', 'Y', 'Z', '+++', '°°°')

My shiny text!

X+++Y+++Z
X Y Z +++ °°°


you can see that print accepts zero , one or (technically) infinite number of parameters. Let's see another built-in function : **map**

**map** returns a special iterable with type "map":

In [42]:
my_list = ['-1.567', '0.2', '-65']
list(map(float, my_list))

[-1.567, 0.2, -65.0]

map takes either no parameter and returns an empty list or takes two or more parameters.(let's say the function we want to apply and the iterable we want to apply function to). It means it won't work with just one parameter(function or iterable) and we will get a "TypeError" message.

<img src="Images/student.svg"   width="30" align="left">               

**YOUR TURN :** using **map** and **abs** functions, create a list with the absolute value of the list's items.

In [76]:
new_list = [-1, 5.5, -9, -123, 65]
list(map(abs, new_list))

[1, 5.5, 9, 123, 65]

## The Other Built-in functions:

### isinstance

In [17]:
cities = ['CO', 'MI', 'BN', 22100, 'RO', 20147, 'LE']

In [21]:
for city in cities:
    if type(city) != str:
        print(f'Wrong format --> {city}')

Wrong format --> 22100
Wrong format --> 20147


In [28]:
for city in cities:
    if not isinstance(city, str):
        print(f'Wrong format --> {city}')

Wrong format --> 22100
Wrong format --> 20147


<img src="Images/wizard.svg"   width="30" align="left" >               

**YOUR TURN:**
    
What is happening in the following code?

In [30]:
print(*[f'Wrong format --> {city}' for city in cities if not isinstance(city, str)], sep='\n')

Wrong format --> 22100
Wrong format --> 20147


### filter

In [36]:
cities = ['CO', 'MI', 'BN', 22100, 'RO', 20147, 'LE']

def filter_cities(city):
    if type(city) != str:
        return True

filtered_items = filter(filter_cities, cities)

for item in filtered_items:
    print(f'Wrong format --> {item}')

Wrong format --> 22100
Wrong format --> 20147


<img src="Images/wizard.svg"   width="30" align="left" >               

**YOUR TURN:**
    
What is happening in the following code?

In [37]:
print(*[f'Wrong format --> {city}' for city in list(filter(lambda x: not isinstance(x, str), cities))], sep='\n')

Wrong format --> 22100
Wrong format --> 20147


### all

In [44]:
cities = ['CO', 'MI', 'BN', 22100, 'RO', 20147, 'LE']

all([isinstance(city, str) for city in cities])

False

### any

In [45]:
any([isinstance(city, str) for city in cities])

True

<img src="Images/wizard.svg"   width="30" align="left" >               

**YOUR TURN:**
    
Use the df below and filter it for the rows that all of their items are bigger than one:

In [4]:
import pandas as pd
df = pd.DataFrame({'C_1':'a 3ab ac a5d ce af ah a a7e ak a3l'.split(),
                   'C_2':'aa ytcb c a3d ce af agh at3i aj ank fa'.split()})

In [5]:
df

Unnamed: 0,C_1,C_2
0,a,aa
1,3ab,ytcb
2,ac,c
3,a5d,a3d
4,ce,ce
5,af,af
6,ah,agh
7,a,at3i
8,a7e,aj
9,ak,ank


In [7]:
res = []
for i in range(len(df)):
    if all([len(x) > 1 for x in df.iloc[i, :]]):
        res.append(pd.DataFrame(df.iloc[i, :].values).T)
pd.concat(res)#.reset_index(drop=True)

Unnamed: 0,0,1
0,3ab,ytcb
0,a5d,a3d
0,ce,ce
0,af,af
0,ah,agh
0,a7e,aj
0,ak,ank
0,a3l,fa


In [93]:
res = []
for i in range(len(df)):
    if all([len(x) > 1 for x in df.iloc[i, :]]): # try to re-write the code using any
        res.append(i)
df[df.index.isin(res)]

Unnamed: 0,C_1,C_2
1,3ab,ytcb
3,a5d,a3d
4,ce,ce
5,af,af
6,ah,agh
8,a7e,aj
9,ak,ank
10,a3l,fa


### Finding index of a list/string item/character

It often happens that we want to find the index of an item (or character) in a list (or a string). The easiest way to do so is using *index* method:

In [6]:
# finding the index of 'foo'
small_list = ['pipa', 'papa', 'foo', 'bar', 'ham']
small_list.index('foo')

2

The same thing exists for strings:

In [9]:
# getting index of 'want'
example_string = 'this is a test text I want to use here'
example_string.index('want')

22

## Range function

Range is a simple and handy function which usually is used within a for loop(but not always). range function has three inputs:
    - start
    - end
    - step

In [94]:
for i in [0, 1, 2, 3]:
    print(i)

0
1
2
3


In [100]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [101]:
my_list = ['ABC', 'BVG', 'UHY', 'UQH', 'LLL']
for i in range(len(my_list)):
    print(my_list[i])

ABC
BVG
UHY
UQH
LLL


<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN**: using as for loop print items inside each key of the given dictionary and print numbers that are less than 10. (some numbers by mistake were recorded as a string, take care of them too!)

**Hint**: *You may want to use **.isnumeric( )** string method*

In [149]:
target = {'A': [1.0, 5 ,7, 'foo'], 
          'B': [4, [12], '3', 83]}

In [152]:
for key in target.keys():
    for item in target[key]:
        check = 0
        if type(item) in [int, float] :
            if item < 10 :
                check = 1
        elif type(item) == str:
            if item.isnumeric():
                if int(item) < 10 :
                    check = 1
        if check == 1:
            print(item)    

1.0
5
7
4
3


In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## Custom functions

Although there are numerous built-in and third-party modules and packages ready to be used by you, there are some tasks that are too specific which you can't find any existing module/function to do that task for you. In these cases ( which happen a lot) we need to write our own functions.
Consider the initial example we saw at the beginning of the "functions", calculating some simple statistics.

In [85]:
def simple_stat(data, log=True):
    """Calculating simple statistics of a given list.
    
    For the the given list of numbers, calculates standard deviation, 
    Mean, Median, Minimum and Maximum statistics.

    Args:
        data : A list of numerical data.
        log: If True, prints a simple report.
        
    Returns:
        Return a list of the given numbers
    """
    std = np.std(data)
    mean = np.mean(data)
    median = np.median(data)
    min_val = min(data)
    max_val = max(data)
    
    vals = [std, mean, median, min_val, max_val]
    names = ['Standard deviation', 'Mean', 'Median', 'Min', 'Max']
    
    if log:
        for name, val in zip(names, vals):
            print(f'{name} : {val}')
            #print(f'{name} : {round(val, 2)}')
    return vals

In [79]:
def simple_stat(data):
    
    std = np.std(data)
    mean = np.mean(data)
    median = np.median(data)
    min_val = min(data)
    max_val = max(data)
    
    vals = [std, mean, median, min_val, max_val]
    names = ['Standard deviation', 'Mean', 'Median', 'Min', 'Max']
    
    for name, val in zip(names, vals):
        print(f'{name} : {val}')
    return vals

In [81]:
import numpy as np

In [86]:
np.random.seed(42)

data = np.random.rand(1,100_000).tolist()[0]
results = simple_stat(data)

Standard deviation : 0.29
Mean : 0.5
Median : 0.5
Min : 0.0
Max : 1.0


Following figure shows the structure of a custom function:

<img src="Images/function_detail.png" width="900"> 

<img src="Images/student.svg"   width="30" align="left">               

**YOUR TURN :** 

- Using **round** function, modify the *simple_stat* in a way that returns the 2-digit rounded values.
- Add *range* statistics to results

In [84]:
round(134.456435453, 4)

134.4564

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## Lambda functions

Lambda functions are smaller and simpler version of normal function which come handy when we use **functional programming** features of Python. Providing a technical definition of functional programming is way beyond scope of our mini-course but in a nutshell, it means using functions as arguments of other functions. 
Let's first see the example of map function we saw:

In [58]:
my_list = ['-1.567', '0.2', '-65']
list(map(float, my_list))

[-1.567, 0.2, -65.0]

Now suppose that on top of changing the data types from str to float, we want to add 100 to each of them. remember how map function . works? It applies the first argument you give to it (in this case, float) to the sequence you pass as the second argument (in our example: my_list). So it's clear that we need to somehow change float to *something* that also adds 100 to each item. This *something* either should be *type* or a *function*. Ok, based on whjat we have learned until this point, create a custom function which does what we want:

In [59]:
def float_plus_100(number):
    """Converts the given number to float and adds 100 to it
    
    args:
        number : int or float
        
    returns:
        float
    """
    return float(number) + 100

Ok, our function is ready. Let's test it:

In [60]:
float_plus_100('3.14')

103.14

It works! Now we can add it to our map function:

In [61]:
list(map(float_plus_100, my_list))

[98.433, 100.2, 35.0]

We did it! The only problem is in order to perform such a simple task we had to write a function, which most probably we're not going to use it again! That's why lambda functions exist! Let's re-write or code using lambda:

In [62]:
list(map(lambda x : float(x)+100, my_list))

[98.433, 100.2, 35.0]

As you can see, we did the same thing without defining any function. In the following figure you can see the comparison between a lambda function and its corespondent function:

<img src="Images/lambda.png" width="800"> 

Such a comparison may indicate that lambda functions are the definite choice but in reality, we use functions way more than we use lambdas since dealing with real data, often we should perform tasks that are too complex to be written as a lambda function.

# Control Flow

Until now, we've works with single items. for example we sliced a string, or write a file. But what if we want to do something (perform an action) on a sequence of objects? The first type of loops we explore is for loops.

## For statement

Suppose that we have a list of our colleagues year of births and we want to calculate their ages, print them and put them in a new list called ages:

In [88]:
years = [1987, 1990, 1980, 1973]
ages = [] # creating an empty list

We know how to get a list's items using indexes. So what if we take each item and subtract it from 2019?

In [89]:
age_1 = 2019 - years[0]
print(age_1)
ages.append(age_1)

age_2 = 2019 - years[1]
print(age_2)
ages.append(age_2)

age_3 = 2019 - years[2]
print(age_3)
ages.append(age_3)

age_4 = 2019 - years[3]
print(age_4)
ages.append(age_4)

32
29
39
46


In [90]:
print(ages)

[32, 29, 39, 46]


Ok, we did what we've we asked for! but I think you agree that it wasn't quite efficient as we've written more than 12 lines of code to perform three simple operations on 4 items! imagine you want to calculate the age of all Bicocca employees! Fortunately we for repetitive tasks like this( when we are dealing with sequence objects like lists, tuples, dictionaries or strings) we can use loops. In this case we're going to use a for loop:

In [98]:
ages = []
for year in years:
    age = 2019 - year
    print(age)
    ages.append(age)

32
29
39
46


This is the general structure of a for loop:

<img src="Images/for_loop.png" width="500"> 

*Notice: for loops can have an "else" clause but we're not going to talk about them here*

<img src="Images/baby.svg"   width="30" align="left">               

**YOUR TURN**:
In the previous example, Why can't we define the empty list inside the loop instead of outside it?

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## While statement

While loops has a similar objective as for loops but while for loops, in theory iterate through all the items in the provided sequence, a while loop goes forward until meeting a certain condition.
Suppose we want to print numbers between 0 and 10 and we want to do it using **while**.
Let's see the info we have from the question:

- we should print numbers
- numbers should be between 0 and 10

In [10]:
num = 0 # initial number is set to 0
while num <= 10: # while the number is less than or equal 10, continue
    print(num, end='-')
    num = num + 1 # adding one to the current value of the num = num+=1

0-1-2-3-4-5-6-7-8-9-10-

Just like any other thing, we can achieve the same result using different methods and functions. For example we can get the numbers we want by using a for loop:

In [12]:
for num in range(11): # what is range?
    print(num, end='\t')

0	1	2	3	4	5	6	7	8	9	10	

*Note: Just like for loops, while loops also can have a else clause*

Here is the general structure of a while loop:

<img src="Images/while_loop.png" width="500"> 

<img src="Images/student.svg"   width="30" align="left">               

**YOUR TURN**:
What if we use a condition which is always true?

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## If statement

Often, you need to execute some statements only if some condition holds, or choose statements to execute depending on several mutually exclusive conditions. The Python compound statement if, which uses if, elif, and else clauses, lets you conditionally execute blocks of statements. Here’s the syntax for the if statement:

It's common that you want to execute a part of code if a certain condition holds (it's True). To do so in Python we use If statement. Let's go back to the simple example of printing numbers we've seen for while and for statements but this time suppose that you want to print the number only if it's an odd number:

In [3]:
num = 0 # initial number is set to 0
while num <= 20: # while the number is less than or equal 20, continue
    if num%2 != 0: # if the reminder of them with 2
        print(num)
    num = num + 1 # adding one to the current value of the num = num+=1

1
3
5
7
9
11
13
15
17
19


Ok, now let's use also *eles* and *elif* clauses. Imagine that you have a set of ages and you want to print one if the following labels based on the age:
- kid
- Teen
- adult
- old

In [91]:
ages = [56, 2, 14, 8, 33, 23, 19, 80, 6, 9, 27]

for age in ages:
    if age <= 12:
        print(age, 'Kid', sep=' - ')
    elif age <= 19 and age >= 13:
        print(age, 'Teen', sep=' - ')
    elif age <= 60 and age >= 20:
        print(f'{age} - Adult') # Did you noticed anything new here?
    else:
        print(age, 'Old', sep=' - ')

56 - Adult
2 - Kid
14 - Teen
8 - Kid
33 - Adult
23 - Adult
19 - Teen
80 - Old
6 - Kid
9 - Kid
27 - Adult


## Break, Continue and pass clauses

Let's have a quick review of statements we have learned until now:
- **For statement** : repeats a task for a given sequence
- **While statement** : repeats a task while a given condition is True
- **If statement** : Evaluates a condition and acts upon the result

Looking at these statements it's clear that we can combine them to get a certain task done. For instance: For each item in this list check if condition 1 is True do action 1, otherwise do action 2:

In [106]:
salaries = [1000000, 5000, 11000]

for salary in salaries:
    if salary < 120_000:
        print(f'{salary:,} : Low')
    elif salary > 999_000:
        print(f'{salary:,} : High')
    else:
        print(f'{salary:,} : Normal')

1,000,000 : High
5,000 : Low
11,000 : Low


In the above example we used **for** and **if** together to examine each item and print a label based on the item's value. Well, It's not alway the case to go through every item of a sequence. Sometime we need to break the loop or continue to the next item without spending time with the current item. We can achive these types of action by using these clauses:

- continue
- break
- pass

Suppose you want to find the first number that is less than a specific number in any given list of numbers. You can create a function to do this using a for loop:

In [107]:
def find_number(data, limit):
    """Finds the first number in a given list which is less than the defined threshold
    
    Args:
        data : A list containing numerical values
        limit : Threshold defined by user
    
    Returns:
        The first number in the given list smaller than threshold
    """
    all_good = []
    for i in data:
        if i < limit:
            answer = i
            all_good.append(i)
    return all_good

Ok, let's test our function:

In [108]:
nums = [973, -1428, 315, -788, 1477, -428, -1187, 592, -108, 370]
limit = -500

print(find_number(nums, limit))

[-1428, -788, -1187]


Well, it kinda works...the problem is that it return the last number in the list that is less than -500 instead of the first one! (-1428)
Of course it's not because Python couldn't find the first number (-1428) but the problem is since we simply just used a for loop, after finding -1428 Python continued to examine other items in the list...something that we shouldn't have done. There are cases like this that we want to stop(break) the loop as soon as a certain condition is satisfied. We can write our function like this:


In [22]:
def find_number(data, limit):
    """Finds the first number in a given list which is less than the defined threshold
    
    Args:
        data : A list containing numerical values
        limit : Threshold defined by user
    
    Returns:
        The first number in the given list smaller than threshold
    """
    for i in data:
        if i < limit:
            answer = i
            break
    return answer

print(find_number(nums, limit))

-1428


And as you can see this time we get the right answer. In this simple case we could have achieve this also without using break and just by putting  return clause in a different place:

In [23]:
def find_number(data, limit):
    """Finds the first number in a given list which is less than the defined threshold
    
    Args:
        data : A list containing numerical values
        limit : Threshold defined by user
    
    Returns:
        The first number in the given list smaller than threshold
    """
    for i in data:
        if i < limit:
            answer = i
            return answer

print(find_number(nums, limit))

-1428


The reason that the function above give us the answer even without using break is that when Python reaches a **return** in a function, it exits immediately before continuing with the rest of code.
Let's do another example: Suppose you received 4 orders for a can of beer from 4 of your customers. In order to know if you can proceed with the order or not, you need to check 3 things: client having at least 18 years old, having at least 2 Euros in his bank card and be in a 3 kilometer radius from your shop. As you can see in the following cell, we can use continue clause after age check. In this way if the client has less than 18 years old, it's useless to check for the other conditions:

In [27]:
clients = {'2341': {'age': 15, 'name': 'Ale', 'balance': 658, 'distance': 11},
           '5682': {'age': 61, 'name': 'Ettore', 'balance': 890, 'distance': 1},
           '1873': {'age': 19, 'name': 'Tacca', 'balance': 121, 'distance': 5},
           '9950': {'age': 31, 'name': 'Navid', 'balance': 0, 'distance': 12}}

codes = list(clients.keys())

for code in codes:
    if clients[code]['age'] < 18:
        continue
    if clients[code]['balance'] >= 2 and clients[code]['distance'] < 3:
        print(f"{clients[code]['name']} can have a beer!")

Ettore can have a beer!


Now imagine that the shop manager tells you that he is thinking to do something for the clients that have all the conditions (age and credit) but are living in more distant neighbourhoods. The problem is that it's Fridays evening, you want to go home as soon as possible but he can't make his mind about how exactly he's going to handle the delivery situation. So you know you need to modify your function to reflect the new situation but you still don't know what would be the solution. So you can use a pass clause as a placeholder in your function:

In [29]:
clients = {'2341': {'age': 15, 'name': 'Ale', 'balance': 658, 'distance': 11},
           '5682': {'age': 61, 'name': 'Ettore', 'balance': 890, 'distance': 1},
           '1873': {'age': 19, 'name': 'Tacca', 'balance': 121, 'distance': 5},
           '9950': {'age': 31, 'name': 'Navid', 'balance': 0, 'distance': 12}}

codes = list(clients.keys())

for code in codes:
    if clients[code]['age'] < 18:
        continue
    if clients[code]['balance'] >= 2:
        if  clients[code]['distance'] < 3:
            print(f"{clients[code]['name']} can have a beer!")
        else:
            pass

Ettore can have a beer!


<img src="Images/baby.svg"   width="30" align="left" >               

**YOUR TURN:**
    
Why the codes below are not the same?

In [None]:
vals = '43 65 342 78 89 4 32 54 687 23 675 87 43 675 -546 54 432 786 76 123 4 324'.split()

In [None]:
for val in vals:
    if int(val) < 0:
        break
    print(val, end=' ')

In [None]:
for val in vals:
    if int(val) > 0:
        print(val, end=' ')

<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN**: Use **break** to write a function that calculates the sum of the first n items of the given number list.

In [69]:
def sum_n(data, n):
    num = 1
    summ = 0
    for i in data:
        if num > n:
            break
        summ += i
        num += 1
    return summ

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## Comparisons operators

Comparison operators are not technically considered as a part of "Control flow" but since their usage with mentioned statements, I put them here. As the name suggest these operators **compare** two object and return either *True* or *False*. Let's see some examples:

In [4]:
a = 15.6
b = 79
print(a == b)
print(a != b)
print(a < b)
print(a > b)
print(a <= b)
print(a >= b)

False
True
True
False
True
False


# Importing libraries

reading csv files, we used :

In [8]:
import csv
import pandas as pd

The reason we did this is because like lots of other functionalities, reading csv files doesn't exist as a built-in function in Python language. Instead there are various libraries that has modules that are capable of performing those certain functions. To use these modules you need to import them in your code. This is exactly we've done in the csv section. 

The following figure shows the simplified structure of Python's modules:

<img src="Images/module.png" width="800">

There are different ways to import a package or a specific module of it:

Importing a library as a whole. For example the following code imports **re** package (regex) but not any module inside it specifically. It means for example if I want to use one of its functions, let's say **sub** function, I need to use **re.** before function name:

In [103]:
import re
re.sub('\W', '', 'Hey..,Wait!')

'HeyWait'

It's possible to import modules as an alias. We usually do this when either dealing with not son short library names or there is a convention to do so:

In [108]:
import statsmodels as sm
import numpy as np

Another possibility is to just import the module we need , instead of importing the whole library:

In [105]:
from pandas import Timestamp

In some rare cases we may want to import every function/module/class from a library:

In [None]:
from pprint import *

## Installing libraries

There are 4 main ways to install a package in python:

- **conda**

conda is the package manager which is a part of Anaconda distribution (not the official Python). Respect to the native Python package managers, it manages the dependencies between different libraries is a better way. It's shortcoming is that it usually doesn't include lesser known/recent packages.

Installing a package using conda in terminal:  

        conda install package_name

Installing a package using conda in Jupyter notebook:    

        !conda install package_name


- **pip**

pip is the newest (introduced in 2008) official package manager.

Installing a package using pip in terminal:     

        pip install package_name

Installing a package using pip in Jupyter notebook:     

        !pip install package_name

- **easy_install**

easy_install is the father of pip (born in 2004) and somehow is inferior respect to the pip. (see their difference [here](https://packaging.python.org/discussions/pip-vs-easy-install/))

Installing a package using easy_install in terminal:   

        easy_install package_name

Installing a package using pip in Jupyter notebook:     

        !easy_install package_name

- **installation from source code**

I don't recommend using it but if for some reasons you want to use this method, you can find a step by step guide [here](https://kb.iu.edu/d/acey).

# Understanding errors

Regardless of your experience with Python, either you're seeing Python for the first time or you have years of coding experience, you should deal with errors, almost every time you write something! Some errors occur because due to simple reasons like writing *primt(...)* instead of *print(...)* and some are more complicated both to resolve  and to find their origin. There are way to many error types in Python for us to cover in this course but you'll find a well-written summary of important errors [here](https://www.tutorialsteacher.com/python/error-types-in-python) .
Instead of going to each and every error and explain it, I try to show you the general process of dealing with them.
Take a look at the following code:

In [112]:
my_dict = {'A':1, 'B': 2}
my_list = ['C', 'D']

for item in my_list:
    my_dict.append(my_dict)

AttributeError: 'dict' object has no attribute 'append'

**Error Translation into natural language**

>I started from the line 1 in the main **module**, I went down until line 4 without any problem. In the line 5 I saw an error of type **AttributeError**. I see this type of error because in the line 5 you tried to use **append** method on a **dictionary** which is not correct.

Now let's take a look at another example which is a bit complex respect to the previous one:

In [66]:
def add_them(a, b):
    """Add to input numbers
    args:
        a : first number
        b : second number
        
    returns:
        a number, sum of the inputs
    """
    return a + b

print(add_them('1', 2))

TypeError: can only concatenate str (not "int") to str

**Error Translation into natural language**

>I started from the line 1, I went down until line 11 without any problem. In the line 12 I saw something that wasn't correct. I tried to find the reason. I found out that the error comes from a function called **add_them**. I put your inputs in this function and went forward. when I reach to line 10 I saw an error of type **TypeError**.I see this type of error because using your input data in the line 10 you tried to concatenate a **str** to a **int** which is impossible.

**GENERAL TIPS**:

It's possible that ...

- Resolving an error, you get another error! always pay attention to the details of error.
- Even by using all the info coming from the error message we won't be able to resolve the problem. In this case you can try to search for the error message online. For sure someone else before you has arrived to the same problem and asked it online! The best source for getting help is [stackoverflow](https://stackoverflow.com) site. 

<img src="Images/student.svg"   width="30" align="left">               

**YOUR TURN**:

Run the following line of codes and try to locate and resolve the problem:

In [120]:
a, b, c, d = list(range(87, 110, 10))

ValueError: not enough values to unpack (expected 4, got 3)

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN**:

Run the following lines of codes and try to locate and resolve the problem:

In [None]:
city_1 = {'country': 'Italy', 'city': 'como', 'population': 84_876}
city_2 = {'country': 'Germany', 'city': 'berlin', 'population': 3_575_000}

data = []
data['IC'] = city_1
data['GB'] = city_2

ids = data.keys()
countries = []
cities = []
populations = []
for idd in ids:
    temp = data[idd]
    countries.add(temp['country'])
    cities.append(temp['city'])

print(f'There are {len(countries)} countries:')
for num, country in enumerate(countries, start=1):
    print(num, country.title(), sep=' - ')

print(f'\nand {len(cities)} cities:')
for city in enumerate(cities, start=1):
    print(num, city.title(), sep=' - ')

print()
for id in ids:
    temp = data[idd]
    print(f"{temp['city'].title()} is in {temp['country'].title()} and has a population of {temp['population']:,}.")

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

## Error Handling

In section 8 and 12 we've seen some examples of Syntax and Exception errors and we also saw how can we *Translate* error messages into *human language*. In this part we will have deeper look on errors and how can we __handle__ them.

First, let's have a recap of different types of errors:

__Syntax Error__

In [168]:
my_string = 'It's 6 AM and I'm already working!'
print(my_string)

SyntaxError: invalid syntax (<ipython-input-168-ad95df0c582c>, line 1)

__Exception Error__

In [169]:
some_list = [x[0] for x in [55, 'foo', 'bar', 42]]
print(some_list)

TypeError: 'int' object is not subscriptable

<img src="Images/student.svg"   width="30" align="left">               

**YOUR TURN:**
    
Re-write the above code blocks in a correct way --> the result should be ['5', 'f', 'b', '4']

In [170]:
my_string = "It's 6 AM and I'm already working!"
print(my_string)
some_list = [str(x)[0] for x in [55, 'foo', 'bar', 42]]
print(some_list)

It's 6 AM and I'm already working!
['5', 'f', 'b', '4']


### Raising your own Exceptions

We saw how Python uses __Exceptions__ to show you are doing something wrong. In the same way you can __raise__ your own exceptions to warn users and stop the code.

Let's make a function which gets a string as ___"name surname"__ and prints out __"Your name is name and your surname is surname"__

In [171]:
def simple_fun(text):
    """
    gets a string of name and surname separeted by space and prints them in a template
    """
    
    # splitting the string by space
    name, surname = text.split(' ',1)
    print(f'Your name is {name} and your surname is {surname}')

In [172]:
simple_fun('Mimmo Paladino')

Your name is Mimmo and your surname is Paladino


But what happens if someone gives just her name or surname instead?

In [173]:
simple_fun('Oliver')

ValueError: not enough values to unpack (expected 2, got 1)

The error we get is a __ValueError__ and is indicating that we passed just one input while Python was expecting 2.
While it may seems easy to understand, for a someone who doesn't know python (or simply is not familiar with your code) may not be so simple to understand what causes the error. We can solve this problem by raising a custom Exception in our function.

In [174]:
def simple_fun(text):
    """gets a string of name and surname separeted by space and prints them in a template"""
    
    if ' ' not in text :
        raise Exception("It seems you just entered your name or surname. I need something like this : 'Rickey Gervais'")
    
    # splitting the string by space
    name, surname = text.split(' ', 1)
    print(f'Your name is {name} and your surname is {surname}')

Let's check to see if it works

In [175]:
simple_fun('Oliver')

Exception: It seems you just entered your name or surname. I need something like this : 'Rickey Gervais'

__General Form__

<img src="Images/exception.png" width="800"> 

### Assertion

To achieve what we did with Exception raising, we can use another method which is __Assertion__. It's simply a *control for a condition*:

In [178]:
def simple_fun(text):
    """gets a string of name and surname separeted by space and prints them in a template"""
    
    assert(' ' in text), "It seems you just entered your name or surname. I need something like this : 'Rickey Gervais'"
    
    # splitting the string by space
    name, surname = text.split(' ', 1)
    print(f'Your name is {name} and your surname is {surname}')

In [179]:
simple_fun('Oliver')

AssertionError: It seems you just entered your name or surname. I need something like this : 'Rickey Gervais'

<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN:**
    
Use Exception raising to write a function that gets a number in centimeter and convert it to inch unit. This function should be able to warn user if the input is not a number.

- Hint 1 : a number in this case can be either integer of float
- Hint 2 : 1 cm is 0.393701 inch

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

In [278]:
cm2inch('cane')

Exception: Please enter an integer or float not a <class 'str'>

<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN:**
    
Use Assertion instead of raising an Exception

In [None]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

In [185]:
cm2inch('cane')

AssertionError: Please enter an integer or float not a <class 'str'>

__General form__

<img src="Images/assertion.png" width="800"> 

### Try/Except

While Exception raising and assertion are helpful ways to assure the code is behaving in the desired way, they don't actually _handle_ errors as the code crashes no matter the exception/assertion.

That's where try and except blocks come in the scene!

Let's see an example:

In [186]:
from pprint import pprint

info = {'id_1': {'name': 'Fabio M', 'debt': 67_000, 'balance': 10_000},
       'id_2': {'name': 'Mario M', 'debt': 0, 'balance': 350_000},
       'id_3': {'name': 'Ale V', 'debt': 100, 'balance': 60_000},
       'id_4': {'name': 'Mauro', 'debt': 130, 'balance': 100_000}}

ids = info.keys()
for iid in ids:
    info[iid]['new_val'] = round(info[iid]['balance'] / info[iid]['debt'], 1)

pprint(info)

ZeroDivisionError: division by zero

We're getting ZeroDivisionError and it's crashing our code. We can use try/except to catch the exception:

In [187]:
info = {'id_1': {'name': 'Fabio M', 'debt': 67_000, 'balance': 10_000},
       'id_2': {'name': 'Mario M', 'debt': 0, 'balance': 350_000},
       'id_3': {'name': 'Ale V', 'debt': 100, 'balance': 60_000},
       'id_4': {'name': 'Mauro ', 'debt': 130, 'balance': 100_000}}

ids = info.keys()
for iid in ids:
    try:
        info[iid]['new_val'] = round(info[iid]['balance'] / info[iid]['debt'], 1)
        
    except ZeroDivisionError:
        info[iid]['new_val'] = 0
        
pprint(info)

{'id_1': {'balance': 10000, 'debt': 67000, 'name': 'Fabio M', 'new_val': 0.1},
 'id_2': {'balance': 350000, 'debt': 0, 'name': 'Mario M', 'new_val': 0},
 'id_3': {'balance': 60000, 'debt': 100, 'name': 'Ale V', 'new_val': 600.0},
 'id_4': {'balance': 100000, 'debt': 130, 'name': 'Mauro ', 'new_val': 769.2}}


As you can see, unlike last time, our code finished its job , we just handled an error!

We can use more than one exception type:

In [188]:
info = {'id_1': {'name': 'Fabio M', 'debt': 67_000, 'balance': 10_000},
       'id_2': {'name': 'Mario M', 'debt': 0, 'balance': 350_000},
       'id_3': {'name': 'Ale V', 'debt': 100, 'balance': 60_000},
       'id_4': {'name': 'Mauro ', 'debt': 130, 'balance': 100_000}}

ids = info.keys()
for iid in ids:
    try:
        info[iid]['new_val'] = round(info[iid]['balance'] / info[iid]['debt'], 1)
        
    except ZeroDivisionError:
        info[iid]['new_val'] = 0
    
    name, surname = info[iid]['name'].split()
    info[iid]['name'] = name
    info[iid]['surname'] = surname
        
pprint(info)

ValueError: not enough values to unpack (expected 2, got 1)

In [189]:
info = {'id_1': {'name': 'Fabio M', 'debt': 67_000, 'balance': 10_000},
       'id_2': {'name': 'Mario M', 'debt': 0, 'balance': 350_000},
       'id_3': {'name': 'Ale V', 'debt': 100, 'balance': 60_000},
       'id_4': {'name': 'Mauro ', 'debt': 130, 'balance': 100_000}}

ids = info.keys()
for iid in ids:
    try:
        info[iid]['new_val'] = round(info[iid]['balance'] / info[iid]['debt'], 1)
        
    except ZeroDivisionError:
        info[iid]['new_val'] = 0
    
    try:
        name, surname = info[iid]['name'].split()
        info[iid]['name'] = name
        info[iid]['surname'] = surname
    except ValueError:
        info[iid]['surname'] = ''
        
pprint(info)

{'id_1': {'balance': 10000,
          'debt': 67000,
          'name': 'Fabio',
          'new_val': 0.1,
          'surname': 'M'},
 'id_2': {'balance': 350000,
          'debt': 0,
          'name': 'Mario',
          'new_val': 0,
          'surname': 'M'},
 'id_3': {'balance': 60000,
          'debt': 100,
          'name': 'Ale',
          'new_val': 600.0,
          'surname': 'V'},
 'id_4': {'balance': 100000,
          'debt': 130,
          'name': 'Mauro ',
          'new_val': 769.2,
          'surname': ''}}


__General form__

<img src="Images/try_except.png" width="600"> 

You can use except without specifying any particular exception.__It's a bad idea! Don't use such a thing unless you are desperate!__

### Else/Finally

There are other to statements which can be used within try/except block which are *else* and *finally*. Let's see an example:

In [190]:
data_1 = [('89768745', 2), ('87443665', -1), ('45789345', '1')]
data_2 = [('89768745', 2), ('87443665', -1), ('45789345', 1)]

for item in data_1:
    print(f'answer  : {item[0][item[1]]}')

answer  : 7
answer  : 5


TypeError: string indices must be integers

In [286]:
data_1 = [('89768745', 2), ('87443665', -1), ('45789345', '1')]
data_2 = [('89768745', 2), ('87443665', -1), ('45789345', 1)]

try:
    for item in data_1:
        print(f'answer  : {item[0][item[1]]}')
        
except TypeError:
    print(f'wrong index --> {item[0]}')
    

else:
    print('All pairs are valid')

finally:
    print('\nAll pairs are controlled')
    
print('-' * 50)

try:
    for item in data_2:
        print(f'answer  : {item[0][item[1]]}')
        
except TypeError:
    print(f'wrong index --> {item[0]}')
    
else:
    print('All pairs are valid')

finally:

    print('\nAll pairs are controlled')

answer  : 7
answer  : 5
wrong index --> 45789345

All pairs are controlled
--------------------------------------------------
answer  : 7
answer  : 5
answer  : 5
All pairs are valid

All pairs are controlled


__General form__

<img src="Images/tryexceptelsefinally.png" width="500"> 

In this case, 22 is the *start* of the *want*

## .py files
- Creatung a new notebook
- Converting a notebook to a .py file
- running a .py file
- Reading from a py file
- argparse

In [7]:
import my_functions as mf

In [2]:
mf.text_cleaner("   I'm just a test text...")

"I'm just a test text"

In [6]:
mf.text_splitter("   I'm just a test text...")

AttributeError: module 'my_functions' has no attribute 'text_splitter'

<img src="Images/wizard.svg"   width="30" align="left">               

**YOUR TURN:**
    
It seems we've forgot to add the **text_splitter** function!
Add this function which : Takes a string, first run in through text_clener and then split the result on *space* and returns the result.

In [10]:
# if we add something to the my_functions, we can't simplt re-import it by using import my_functions as mf
# instead we need to use imp module to do so (The other way is to restart the kernel)
import imp
imp.reload(mf)

mf.text_splitter("   I'm just a test text...")

["I'm", 'just', 'a', 'test', 'text']

## Navigation

In [46]:
import os
os.listdir()

['etc_files',
 'Images',
 'PyStyle-master',
 '__pycache__',
 'my_functions.py',
 'README.md',
 'Results',
 'Py_intro.ipynb',
 'MedSys.py',
 '.gitignore',
 'Py_intro_temp.ipynb',
 'desc_csv.py',
 '.ipynb_checkpoints',
 'data_preparation.py',
 '.git',
 'example_dir',
 '.vscode',
 'Data',
 'TBD.txt']

In [5]:
ls

[34mData[m[m/                README.md            data_preparation.py
[34mImages[m[m/              [34mResults[m[m/             desc_csv.py
MedSys.py            TBD.txt              [34metc_files[m[m/
Py_intro.ipynb       [34m__pycache__[m[m/         my_functions.py


In [47]:
os.mkdir('example_directory/')

In [49]:
ls

[34mData[m[m/                README.md            [34metc_files[m[m/
[34mImages[m[m/              [34mResults[m[m/             [34mexample_dir[m[m/
MedSys.py            TBD.txt              [34mexample_directory[m[m/
[34mPyStyle-master[m[m/      [34m__pycache__[m[m/         my_functions.py
Py_intro.ipynb       data_preparation.py
Py_intro_temp.ipynb  desc_csv.py


In [50]:
mkdir example_directory

mkdir: example_directory: File exists


### Filename pattern matching

In [51]:
for f_name in os.listdir():
    if f_name.endswith('.py'):
        print(f_name)

my_functions.py
MedSys.py
desc_csv.py
data_preparation.py


In [1]:
import glob

In [62]:
for file in glob.glob('*.py'):
    print(file)

my_functions.py
MedSys.py
desc_csv.py
data_preparation.py


In [5]:
ls

[34mData[m[m/                Py_intro_temp.ipynb  desc_csv.py
Exercise.ipynb       README.md            [34metc_files[m[m/
[34mImages[m[m/              [34mResults[m[m/             [34mexample_dir[m[m/
MedSys.py            TBD.txt              [34mexample_directory[m[m/
[34mPyStyle-master[m[m/      [34m__pycache__[m[m/         [34mlecture_notes[m[m/
Py_intro.ipynb       data_preparation.py  my_functions.py


In [6]:
for file in glob.iglob('Data/*', recursive=True):
    print(file)

Data/urg_1.csv
Data/review.txt
Data/urg_2.csv
Data/adult.csv
Data/Daily_Demand_Forecasting_Orders.csv
Data/stock_px.csv
Data/emp_data
Data/AirQualityUCI.csv
Data/type_1.csv
Data/AirQualityUCI.zip
Data/type_2.csv
Data/bank_2.csv
Data/adult_names.txt
Data/bank_1.csv
Data/AirQualityUCI.xlsx
Data/totals.csv
Data/description.txt
Data/data_types.names


In [24]:
from pathlib import Path

p = Path('.')
for name in p.glob('*.p*'):
    print(name)

my_functions.py
MedSys.py
desc_csv.py
data_preparation.py


### Deleting

In [68]:
ls

[34mData[m[m/                README.md            [34metc_files[m[m/
[34mImages[m[m/              [34mResults[m[m/             [34mexample_dir[m[m/
MedSys.py            TBD.txt              [34mexample_directory[m[m/
[34mPyStyle-master[m[m/      [34m__pycache__[m[m/         my_functions.py
Py_intro.ipynb       data_preparation.py  pippo.xml
Py_intro_temp.ipynb  desc_csv.py


In [69]:
data_file = 'pippo.xml'
if os.path.isfile(data_file):
    os.remove(data_file)

In [70]:
rm topolino.py

In [72]:
rm -rf foo/

https://realpython.com/working-with-files-in-python/