# Notebook 1: Python Overview

## Motivations

Spark provides multiple *Application Programming Interfaces* (API), i.e. the interface allowing the user to interact with the application. The main APIs are [Scala](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package) and [Java](http://spark.apache.org/docs/latest/api/java/index.html) APIs, as Spark is implemented in Scala and runs on the Java Virtual Machine (JVM).
Since the 0.7.0 version, a [Python API](http://spark.apache.org/docs/latest/api/python/index.html) is available, also known as PySpark. An [R API](http://spark.apache.org/docs/latest/api/R/index.html) has been released with 1.5.0 version. During this course, you will be using Spark 2.4.4.

Throughout this course we will use the Python API for the following reasons:
- R API is still too young and limited to be relied on. Besides, R can quickly become a living hell when using immature libraries.
- Many of you are wanabee datascientists, and Python is a must-know language in data industry.
- Scala and Java APIs would have been quite hard to learn given the length of the course and your actual programming skills.
- Python is easy to learn, and even easier if you are already familiar with R.

The goal of this session is to teach (or remind) you the syntax of basic operations, control structures and declarations in Python that will be useful as datascientist. Keep in mind that we do not have a lot of time, and that you should be able to create functions and classes and to manipulate them at the end of the lab. If you don't get that, the rest of the course will be hard to follow. Don't hesitate to ask for explanations and/or more exercises if you don't feel  confident enough at the end of the lab.

**Note:** Python comes in two flavours, Python 2 and Python 3. Python 2 is now officially deprecated so there is no longer any support. However, a lot of companies have still a lot of code in Python 2 (Tensorflow only supports python 3 since 2019 and Google is well known for his extensive use of Python 2). In this course, we will use Python 3 rather than Python 2. Note that if you know Python 2, learning Python 3 will be lightning-fast.
For those who are interested, you can quickly learn Python 3 syntax [over here](https://learnxinyminutes.com/docs/python3/) or find some [cheatsheets](http://ptgmedia.pearsoncmg.com/imprint_downloads/informit/promotions/python/python2python3.pdf) highlighting the differences between the two versions. Spark Python API is compatible with Python 3 since Spark 1.4.0.

When you look at stackoverflow, please be aware that they might share snippets of code in another python version and that you might need to translate them. So no stupid copy-paste.

*This introduction relies on [Learn Python in Y minutes](https://learnxinyminutes.com/docs/python/)*

## Introduction

Python is a high level, general-purpose interpreted language. Python is meant to be very concise and readable, it is thus a very pleasant language to work with. 

## 1. Primitive Datatypes and Operators
Read section 1 of [Learn python in Y Minutes](https://learnxinyminutes.com/docs/python/) (if you already know Python, you can skip this step). Then, replace `???` in the following cells with your code to answer the questions. To get started, please run the following cell.

Compute 4 + 8

In [96]:
4+8

12

Compute 4 * 8

In [97]:
4*8

32

Compute 4 / 8 (using the regular division operation, not integer division)

In [98]:
4/8

0.5

Compute $4^8$

In [99]:
4^8

12

Check if the variable `foo` is None:

In [100]:
foo = None
print(foo)

None


## 2. Variables and Collections
Same as before, read the corresponding section, and answer the questions below.

Now you're asked to `print` your results instead of just output them.

Please always remember that there is a difference between the output of your code (which is the result of the last executed line) and the printed output.

In [152]:
txt = "This is a text"
print(txt)
new_txt = "This is another text" # assignment doesn't return anything

This is a text


In [None]:
# Try uncommenting the following lines to check your understanding
del txt # return the value of txt
print(txt)
del new_txt # return the value of new_txt
print(new_txt)

In [104]:
# Declare a variable containing a float of your choice and print it
# From now on, when you will be asked to print something, please use the print statement.
x = 0.77927; type(x)

float

In [106]:
# Create a list containing strings and store it in a variable
l = ['Hello', 'Sun'] 
# Append a new string to this list
l.append('Winter')
# Append an integer to this list and print it
l.append(5)
print(l)

['Hello', 'Sun', 'Winter', 5]


Note that the modifications on list objects are performed inplace, i.e.

    li = [1, 2, 3]
    li.append(4)
    li  # => [1, 2, 3, 4]

In [107]:
# Mixing types inside a list object can be a bad idea depending on the situation.
# Remove the integer you just inserted in the list and print it
l.remove(5)
print(l)

['Hello', 'Sun', 'Winter']


In [108]:
# Print the second element of the list
print(l[1])

Sun


In [109]:
x = 42.567
# Print x with only two decimal (rounding)
print(round(x, 2))
# Print x with only two decimal (trunking)
import math

def truncate(number, decimals=0):
    """
    Returns a value truncated to a specific number of decimal places.
    """
    if not isinstance(decimals, int):
        raise TypeError("decimal places must be an integer.")
    elif decimals < 0:
        raise ValueError("decimal places has to be 0 or more.")
    elif decimals == 0:
        return math.trunc(number)

    factor = 10.0 ** decimals
    return math.trunc(number * factor) / factor
    
truncate(x,2)

42.57


42.56

You can access list elements in reverse order, e.g.

    li[-1]  # returns the last element of the list
    li[-2]  # returns the second last element of the list
    
and so on...

In [110]:
# Extend your list with new_list and print it
new_list = ["We", "are", "the", "knights", "who", "say", "Ni", "!"]
li = l + new_list
print(li)

['Hello', 'Sun', 'Winter', 'We', 'are', 'the', 'knights', 'who', 'say', 'Ni', '!']


In [111]:
# Replace "Ni" by "Ekke Ekke Ekke Ekke Ptang Zoo Boing" in the list and print it
li[9] = 'Ekke Ekke Ekke Ekke Ptang Zoo Boing'
print(li)

['Hello', 'Sun', 'Winter', 'We', 'are', 'the', 'knights', 'who', 'say', 'Ekke Ekke Ekke Ekke Ptang Zoo Boing', '!']


In [112]:
# Compute the length of the list and print it
len(li)

11

In [113]:
# What is the difference between lists and tuples?
"""
A list is mutable or immutable and a tuple is an immutable sequence containing different types (string, float)
"""

'\nA list is mutable or immutable and a tuple is an immutable sequence containing different types (string, float)\n'

In [114]:
# Create a dictionary containing the following mapping:
# "one" : 1
# "two" : 2
# etc. until you reach "five" : 5
baz = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}; type(baz)

dict

In [115]:
# Check if the key "four" is contained in the dict
baz.keys()

dict_keys(['one', 'two', 'three', 'four', 'five'])

In [116]:
# If four is contained in the dict, print the associated value
baz.values()

dict_values([1, 2, 3, 4, 5])

In [117]:
gibberish = list("fqfgsrhrfeqluihjgrshioprqoqeionfvnorfiqeo")
# Find all the unique letters contained in gibberish. Your answer should fit in one line of code
unique_letters = list(set('fqfgsrhrfeqluihjgrshioprqoqeionfvnorfiqeo'))
print(unique_letters)

['g', 'j', 'o', 'e', 'u', 'l', 'r', 'i', 'f', 'v', 's', 'p', 'h', 'q', 'n']


You should now be able to answer the following problem using dictionaries, lists and sets. Imagine you owe money to your friends because your forgot your credit card last time you went out for drinks. You want to remember how much you owe to each of them in order to refund them later. Which data structure would be useful to store this information? Use this data structure and fill it in with some debt data in the cell below:

In [118]:
debts = {'Laura': 13, 'Paul': 22, 'Sophie': 17}

Another party night with more people, yet you forgot your credit card again... You meet new friends who buy you drinks. Create another data structure as above with different data, i.e. include friends that were not here during the first party and new friends.

In [119]:
debts_2 = {'Lili': 14, 'Pierre': 21}

Count the number of new friends you made that second night. Print the name of the friends who bought you drinks during the second party, but not during the first.

In [120]:
new_friends = debts_2.keys() # should fit in one line
nb_new_friends = len(debts_2) # should fit in one line
nb_new_friends

2

In [121]:
print(new_friends)

dict_keys(['Lili', 'Pierre'])


## 3. Control flow
Same as before, read the corresponding section, and answer the questions below.
You can skip the paragraph on exceptions for now.

In [122]:
# Code the following:
# if you have made more than 5 friends that second night, 
# print "Yay! I'm super popular!", else, print "Duh..."
debts_2 = {'Lili': 14, 'Pierre': 21}
if len(debts_2) > 5:
    print('Yay! I m super popular!')
print('Duh...')

Duh...


In [123]:
# Now, thank each new friend iteratively, i.e.
# print "Thanks <name of the friend>!" using loops and string formatting (cf. section 1)
debts_2 = {'Lili': 14, 'Pierre': 21}
for (k, v) in debts_2.items():
    print('Thanks {}!'.format(k, v))

Thanks Lili!
Thanks Pierre!


In [124]:
# Sum all the number from 0 to 15 (included) using what we've seen so far (i.e. without the function sum() )
result = 0
n = 15
for i in range(1, n + 1):
    result += i

print(result)

120


In [125]:
# Note: you can break a loop with the break statement
for i in range(136):
    print 
    if i >= 2:
        break

In [126]:
# enumerate function can be very useful when dealing with iterators:
for i, value in enumerate(["a", "b", "c"]):
    print(value, i)

a 0
b 1
c 2


## 4. Functions
Things are becoming more interesting. Read section 4. It's ok if you don't get the args/kwargs part. Be sure to understand basic function declaration and anonymous function declaration. Higher order functions, maps, and filters will be covered during the next lab.

Write a Python function that checks whether a passed string is palindrome or not. Note: a palindrome is a word, phrase, or sequence that reads the same backward and forward, e.g. "madam" or "nurses run". Hint: strings are lists of characters e.g.

    a = "abcdef"
    a[2] => c
    
If needed, here are [some tips about string manipulation](http://www.pythonforbeginners.com/basics/string-manipulation-in-python).

In [127]:
def isPalindrome(word):
    # Run loop from 0 to len/2
    for i in range(0, int(len(word)/2)):
        if word[i] != word[len(word)-i-1]:
            return False
    return True
 
print(isPalindrome('aza'))        # Simple palindrome
print(isPalindrome('nurses run')) # Palindrome containing a space
print(isPalindrome('palindrome')) # Not a palindrome

True
False
False


Write a Python function to check whether a string is pangram or not. Note: pangrams are words or sentences containing every letter of the alphabet at least once. For example: "The quick brown fox jumps over the lazy dog".

[Hint](https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset)

In [128]:
import string

# In this function, "alphabet" argument has a default value: string.ascii_lowercase
# string.ascii_lowercase contains all the letters in lowercase.
def ispangram(string_input, alphabet=string.ascii_lowercase):   
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    for char in alphabet: 
        if char not in string_input.lower(): 
            return False
  
    return True 
   
print(ispangram('The quick brown fox jumps over the lazy dog'))
print(ispangram('The quick red fox jumps over the lazy dog'))

True
False


### Python lambda expressions

When evaluated, lambda expressions return an anonymous function, i.e. a function that is not bound to any variable (hence the "anonymous"). However, it is possible to assign the function to a variable. Lambda expressions are particularly useful when you need to pass a simple function into another function. To create lambda functions, we use the following syntax

    lambda argument1, argument2, argument3, etc. : body_of_the_function

For example, a function which takes a number and returns its square would be

    lambda x: x**2
    
A function that takes two numbers and returns their sum:

    lambda x, y: x + y
    
`lambda` generates a function and returns it, while `def` generates a function and assigns it to a name.  The function returned by `lambda` also automatically returns the value of its expression statement, which reduces the amount of code that needs to be written.

Here are some additional references that explain lambdas: [Lambda Functions](http://www.secnetix.de/olli/Python/lambda_functions.hawk), [Lambda Tutorial](https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/), and [Python Functions](http://www.bogotobogo.com/python/python_functions_lambda.php).

Here is an example:

In [129]:
# Function declaration using def
def add_s(x):
    return x + 's'

print (type(add_s))
print (add_s)
print (add_s('dog'))

<class 'function'>
<function add_s at 0x7f9234453c20>
dogs


In [130]:
# Same function declared as a lambda
add_s_lambda = lambda x: x + 's'
print (type(add_s_lambda))
print (add_s_lambda)  # Note that the function shows its name as <lambda>
print (add_s_lambda('dog'))

<class 'function'>
<function <lambda> at 0x7f9234455200>
dogs


In [131]:
# Code a function using a lambda expression which takes
# a number and returns this number multiplied by two.
multiply_by_two = lambda x: x*2
print (multiply_by_two(5))

print(multiply_by_two(10) == 20)

10
True


Observe the behavior of the following code:

In [132]:
def add(x, y):
    """Add two values"""

    print("Result = ", x + y);
    pass;  

In [133]:
def add(x, y):
    """Add two values"""
    print("Result = ", x + y);
    pass; 

def sub(x, y):
    """Substract y from x"""
    print("Result = ", y - x);
    pass; 

functions = [add, sub]
print(functions[0](1, 2))
print(functions[1](3, 4))

Result =  3
None
Result =  1
None


Code the same functionality, using lambda expressions:

In [134]:
lambda_functions = [lambda x,y: x + y , lambda x,y: y - x]

print(lambda_functions[0](1, 2) == 3)
print(lambda_functions[1](3, 4) == -1)

True
False


Lambda expressions can be used to generate functions that take in zero or more parameters. The syntax for `lambda` allows for multiple ways to define the same function.  For example, we might want to create a function that takes in a single parameter, where the parameter is a tuple consisting of two values, and the function adds the two values.  The syntax could be either 

    lambda x: x[0] + x[1]
    
or 
    
    lambda (x, y): x + y

If we called either function on the tuple `(1, 2)` it would return `3`.

In [135]:
# Example:
add_two_1 = lambda x, y: (x[0] + y[0], x[1] + y[1])
add_two_2 = lambda x0, x1, y0, y1: (x0 + y0, x1 + y1)
print('add_two_1((1,2), (3,4)) = {0}'.format(add_two_1((1,2), (3,4))))
print('add_two_2(1,2,3,4) = {0}'.format(add_two_2(1,2,3,4)))

add_two_1((1,2), (3,4)) = (4, 6)
add_two_2(1,2,3,4) = (4, 6)


In [136]:
# Use both syntaxes to create a function that takes in a tuple of three values and reverses their order
# E.g. (1, 2, 3) => (3, 2, 1)
reverse1 = lambda x: x[::-1]
reverse2 = lambda x0, x1, x2: (x0, x1, x2)[1]

print(reverse1((1, 2, 3)) == (3, 2, 1))
print(reverse2(1, 2, 3) == (3, 2, 1))

True
False


Lambda expressions allow you to reduce the size of your code, but they are limited to simple logic. The following Python keywords refer to statements that cannot be used in a lambda expression: `assert`, `pass`, `del`, `print`, `return`, `yield`, `raise`, `break`, `continue`, `import`, `global`, and `exec`.  Assignment statements (`=`) and augmented assignment statements (e.g. `+=`) cannot be used either. If more complex logic is necessary, use `def` in place of `lambda`.

## 5. Classes
Classes allow you to create objects. Object Oriented Programming (OOP) can be a very powerful paradigm. If done well, OOP  allows you to improve the modularity and reusability of your code, but that's the subject of an entire other course. 
Here is a *very* short introduction to it.

By convention, class names are written in camel case, e.g. `MyBeautifulClass`, while variable and function names are written in snake case, e.g. `my_variable`, `my_very_complex_function`

Classes contain methods (i.e. functions owned by the class) and attributes (i.e. variables owned by the class). 
When you define a class, first thing to do is to define a specific method, the constructor. In Python, the constructor is called `__init__`. This method is used to create the instances of an object. Example:

    class MyClass:
    
        def __init__(self, first_attribute, second_attribute):
            self.first_attribute = first_attribute
            self.second_attribute = second_attribute
            
This class has two attributes, and one (hidden) method, the constructor. To create an instance of this class, one simply does:

    instance_example = MyClass(1, "foo")
    
Then, the attributes can easily be accessed to:

    instance_example.first_attribute  # => 1
    instance_example.first_attribute  # => "foo"

In [137]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
            
instance_example = MyClass(1, "foo") 
print(instance_example.first_attribute)
instance_example.__init__(3,4)  # In real life, it is rare to reinit an object.
print(instance_example.first_attribute)

1
3


`self` denotes the object itself. When you declare a method, you have to pass `self` as the first argument of the method:

class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
   
    def method_baz(self):
        print "Hello! I'm a method! I have two attributes, initialized with values %s, %s"%(self.first_attribute, self.second_attribute)
        
indeed, when we call
    
    instance_example = MyClass(1, "foo") 
    instance_example.method_baz()
    
the `self` object is implicitely passed to `method_baz`as an argument. Think of the method call as the following function call

    method_baz(instance_example)

In [138]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def class_method(self):
        print("Hello! I'm a method! My class has two attributes, of value {0}, {1}".format(self.first_attribute, self.second_attribute))
            
instance_example = MyClass(1, "foo") 
# Call to a class method
instance_example.class_method()

Hello! I'm a method! My class has two attributes, of value 1, foo


Now, the tricky part. You can declare **static** methods, i.e. methods that don't need to access the data contained in `self` to work properly. Such methods do not require the `self` argument as they do not use any instance data. They are implemented in the following way:

In [139]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def class_method(self):
        print("Hello! I'm a method! My class has two attributes, of value {0}, {1}".format(self.first_attribute, self.second_attribute))
         
    @staticmethod
    def static_method():
        print("I'm a static method!")
            
instance_example = MyClass(1, "foo") 
# Call to a class method
instance_example.class_method()
# Call to a static method
instance_example.static_method()

Hello! I'm a method! My class has two attributes, of value 1, foo
I'm a static method!


In [140]:
# Call to a static method without class instanciation
MyClass.static_method()

I'm a static method!


In [141]:
# Call to a class method without class instanciation: raises an error
# MyClass.class_method()
# => TypeError: unbound method class_method() must be called with MyClass instance as first argument (got nothing instead)

You can set attributes without passing them to the constructor:

In [142]:
# Run this example
class MyClass:
    
    default_attribute = 42
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def method_baz(self):
        print("Hello! I'm a method! I have two attributes, initialized with values %s, %s"%(self.first_attribute, self.second_attribute))
        
    @staticmethod
    def static_method():
        print("I'm a static method!")
            
instance_example = MyClass(1, "foo") 
print(instance_example.default_attribute)

42


In [143]:
# Write a Python class named Rectangle which is 
# constructed by a length and width 
# and has two class methods
# - "rectangle_area", which computes the area of a rectangle.
# - "rectangle_perimeter", which computes the perimeter of a rectangle.
#
# The Rectangle class should have an attribute n_edges equal to 4
# which should not be initialized by the __init__ constructor.
#
# Declare a static method "talk" that returns "Do you like rectangles?" when called

class Rectangle: 

    n_edges = 4
    
    def __init__(self, length, width):
        self.length = length
        self.width  = width

    def rectangle_area(self):
        return self.length*self.width
    
    def rectangle_perimeter(self):
        return 2*(self.length+self.width)
    
    @staticmethod
    def talk():
        print("Do you like rectangles?")


new_rectangle = Rectangle(12, 10)
print(new_rectangle.rectangle_area() == 120)# rectangle_area method
print(new_rectangle.rectangle_perimeter() == 44) # rectangle_area method
print(Rectangle.n_edges == 4) # constant attibute
print(Rectangle.talk() == "Do you like rectangles?") # Rectangle talk static method

True
True
True
Do you like rectangles?
False


In machine learning, when you're manipulating large images, you often need to break the diwn into smaller, more manageable images.
- Modify the previous rectangle class to include a list of patches.
- Write a Python class named Patch which is a square of size 1x1. Each patch should be identified by his coordinates.
- Use the [\_\_iter\_\_](https://www.datacamp.com/community/tutorials/python-iterator-tutorial) function to create a get_next_patch() in the Rectangle class.

In [144]:
class Rectangle :

    n_edges=4
    def __init__(self, length, width):
        self.length = length
        self.width  = width



class Patch :

    def __init__(self,rectangle):
        self.length = 1
        self.abs_max=rectangle.width
        self.ord_max=rectangle.length

    def __iter__(self):
        self.absi=1
        self.ordi=1
        return self
    
    def __next__(self):
        
        if self.absi+1<=self.abs_max:
            self.absi+=1
        elif self.absi+1>self.abs_max:
            self.absi=1
            if self.ordi+1<=self.ord_max:
                self.ordi+=1
            elif self.ordi+1>self.ord_max:
                raise StopIteration
        return(self.absi-1,self.ordi-1)

patchi=Patch(new_rectangle)
decompo=iter(patchi)
i=0
try:
    while True:
        print(next(decompo))
        i+=1
except:
    print("end",i)

#    def __str__() = 
#        ???

#rectangle = ???

#for patch in rectangle :
#    print(patch)
# should print all the patches in your rectangle

(1, 0)
(2, 0)
(3, 0)
(4, 0)
(5, 0)
(6, 0)
(7, 0)
(8, 0)
(9, 0)
(0, 1)
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(5, 1)
(6, 1)
(7, 1)
(8, 1)
(9, 1)
(0, 2)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(5, 2)
(6, 2)
(7, 2)
(8, 2)
(9, 2)
(0, 3)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(5, 3)
(6, 3)
(7, 3)
(8, 3)
(9, 3)
(0, 4)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
(5, 4)
(6, 4)
(7, 4)
(8, 4)
(9, 4)
(0, 5)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
(5, 5)
(6, 5)
(7, 5)
(8, 5)
(9, 5)
(0, 6)
(1, 6)
(2, 6)
(3, 6)
(4, 6)
(5, 6)
(6, 6)
(7, 6)
(8, 6)
(9, 6)
(0, 7)
(1, 7)
(2, 7)
(3, 7)
(4, 7)
(5, 7)
(6, 7)
(7, 7)
(8, 7)
(9, 7)
(0, 8)
(1, 8)
(2, 8)
(3, 8)
(4, 8)
(5, 8)
(6, 8)
(7, 8)
(8, 8)
(9, 8)
(0, 9)
(1, 9)
(2, 9)
(3, 9)
(4, 9)
(5, 9)
(6, 9)
(7, 9)
(8, 9)
(9, 9)
(0, 10)
(1, 10)
(2, 10)
(3, 10)
(4, 10)
(5, 10)
(6, 10)
(7, 10)
(8, 10)
(9, 10)
(0, 11)
(1, 11)
(2, 11)
(3, 11)
(4, 11)
(5, 11)
(6, 11)
(7, 11)
(8, 11)
(9, 11)
end 119


In [None]:
Use the same approach to create two classes :
- Sentence which is a list of words
- Word which encapsulate a string with a specific order within the sentence
Implement both [\_\_str\_\_](https://docs.python.org/3/reference/datamodel.html?highlight=__str__#object.__str__) and [\_\_repr\_\_](https://docs.python.org/3/reference/datamodel.html?highlight=__str__#object.__repr__) for both classes. `__str__` should be readable when `__repr__` should offer you information to helps you debug.
In Natural Language Processing you'll use a lot of n-grams (which is the set of *n* sequential words). So you should implement a get_next_bigrams() method in the Sentence class. For instance give the sentence "My name is Brian", it should return an element of the list of bigrams ```["My name", "name is", "is Brian"]```

In [147]:
class Sentence :
    def __init__(self, txt):
        self.txt = txt
        self.word_l = self.text.split()

class Word :

    def __init__(self, args):
        self.x = args[0]
        self.y = args[1]
        self.z = args[2]

    def __repr__(self):
        return "Vector3([{0},{1},{2}])".format(self.x, self.y, self.z)
        
    def __str__(self):
        return "x: {0}, y: {1}, z: {2}".format(self.x, self.y, self.z)

In [None]:
# 6. Modules
Finally one really important part of python is the ability to create modules.
Read https://docs.python.org/fr/3/tutorial/modules.html and create a module named geo with the classes Rectangle & Patch.
Import the ```geo``` module and demonstrate the different calls.

Repeat the operation with an ```NLP``` module to encapsulate the classes Sentence and Word.
In Natural Language Processing you'll use a lot of n-grams (which is the set of *n* sequential words). So you should implement a ```get_bigrams(sentence)``` method who will return a list of bigrams.

In [154]:
pip install geopy



In [None]:
import ???

rectangle = ???
patch = ???

???

In [149]:
import nltk
nltk.download('punkt')

word_data = "My name is Brian"
nltk_tokens = nltk.word_tokenize(word_data)  	

print(list(nltk.bigrams(nltk_tokens)))


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[('My', 'name'), ('name', 'is'), ('is', 'Brian')]


In [150]:
def n_grams(s, n=2, i=0):
     while len(s[i:i+n]) == n:
         yield s[i:i+n]
         i += 1

txt = 'My name is Brian'

ugr = n_grams(txt.split(), n=1)
list(ugr)

[['My'], ['name'], ['is'], ['Brian']]

In [151]:
import nltk
from nltk.util import ngrams

def get_next_bigrams(sentence, n):
    n_grams = ngrams(nltk.word_tokenize(sentence), n)
    return [' '.join(grams) for grams in n_grams]

sentence = 'My name is Brian'

print(get_next_bigrams(sentence, 2))

['My name', 'name is', 'is Brian']


# 7. Coding style

You'll often ask yourself about which letters should be capital or if you should use _ or other characters.

To help you decide, python comes with PEP-8 which is a set of recommandation.

It is highly advised to read PEP-8 and to apply the rules whenever it's applicable.

You'll get point for style in your code (I mean, you will lose points if the code is not readable).

Fortunately, [the rules of PEP-8 are not perfect and should not always be followed](https://www.youtube.com/watch?v=wf-BqAjZb8M).

In [None]:
# You should create a github account before the next session :
student_1_github_login =
student_2_github_login = 

Congratulations, you've reched the end of this notebook. =)