# Data Analytics 2022: lecture 1


<!--NAVIGATION-->
[Contents](Index.ipynb) | [Lecture 2](Lecture-2.ipynb)>

## Notebook Basics


From the [Jupyter Notebook documentation](http://jupyter-notebook.readthedocs.io/en/latest/notebook.html):

> The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. 

The Jupyter notebook combines two components:
> **A web application**: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output.

> **Notebook documents** : a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects. The extension of a notebook file is `.ipynb`.

### Notebooks

>A **notebook** consists of a sequence of **cells**. A cell is a multiline text input field, and its contents can be *executed* by using `Shift-Enter`, or by clicking either the “Play” button the toolbar, or *Cell /Run* in the menu bar. 

<div class="alert alert-block alert-info">
    The execution behavior of a cell is determined by the cell’s type.
</div>

> There are **four** types of cells: **code cells, markdown cells, raw cells** and **heading cells**. Every cell starts off being a code cell, but its type can be changed by using a drop-down on the toolbar (which will be “Code”, initially), or via keyboard shortcuts.

We mainly use **code cells** and **markdown cell**.

### Code cells

>A code cell allows you to edit and write new code, with full syntax highlighting and tab completion. By default, the language associated to a code cell is Python
>When a code cell is executed, code that it contains is sent to the kernel associated with the notebook. The results that are returned from this computation are then displayed in the notebook as the cell’s output. The output is not limited to text, with many other possible forms of output are also possible, including matplotlib figures and HTML tables (as used, for example, in the pandas data analysis package). This is known as IPython’s rich display capability.


Below you can find our first code cell:

In [1]:
#
# A comment in Python starts with # 
# This is our first example of python code
#

print('Hello Data Analytics class')

Hello Data Analytics class


In [2]:
a = 10
print(a)
a += 2
print(a)



10
12


**Observe that:**
1. The code contained in the cell is executed as in a classical Python interpreter
2. The output is written in the notebook below the cell 

Variables defined in a cell are __available__ to cell below if you executed the cell. 

In [3]:
print(a)
b = a + 3
print(b)

12
15


<div class="alert alert-block alert-info">
    The cell evaluation process is <b>strictly sequential</b>. If you make a change in a cell above, to have the correct content in the current cell you have to execute all cells above (in the Cell Menu there is an entry "Run All Above")
</div>

### Markdown cells

You can document the computational process in a literate way, alternating descriptive text with code, using rich text. In IPython this is accomplished by marking up text with the Markdown language. The corresponding cells are called Markdown cells. The Markdown language provides a simple way to perform this text markup, that is, to specify which parts of the text should be emphasized (italics), bold, form lists, etc.

>When a Markdown cell is executed, the Markdown code is converted into the corresponding formatted rich text. Markdown allows arbitrary HTML code for formatting.

$$ \sum_{a=1}^n x_i $$

Within Markdown cells, you can also include mathematics in a straightforward way, using standard LaTeX notation: \\$...\\$ for inline mathematics and \\$\\$...\\$\\$ for displayed mathematics. When the Markdown cell is executed, the LaTeX portions are automatically rendered in the HTML output as equations with high quality typography. This is made possible by MathJax, which supports a large subset of LaTeX functionality

>Standard mathematics environments defined by LaTeX and AMS-LaTeX (the amsmath package) also work.

Markdown language [cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)


1. First ordered list item
2. Another item
    * Unordered sub-list. 
1. Actual numbers don't matter, just that it's a number
    1. Ordered sub-list
    4. And another item.


### Interaction trough the browser

The web application through a browser  entirely supports notebooks development. 
You have to know a few things:

### Cell modes

A cell has two different keyboard input modes. **Edit mode** allows you to type code/text into a cell and is indicated by a **green** cell border. Command mode binds the keyboard to notebook level actions and is indicated by a grey cell border with a **blue** left margin.

You can switch between the two modes simply by pressing  **`Enter`** to activate
the edit mode and **`Esc`** to activate the command mode.



### Shortcuts

Cell commands are available through the tool-bar menu, 
but there are also simple shortcuts the you must know:

- **`Ctrl+Enter`** runs the current cell.  
- **`Shift+Enter`** runs the current cell and jumps to the next cell.   
- **`Alt+Enter`** runs the cell and adds a new one below it.

If a cell is in command mode one can use keyboard shortcuts.

**`h`** shows the help menu with all these commands. 


<div class="alert alert-block alert-info">
Markdown can also load images in the cell. The following help list is a picture file (.png). 
</div>

![alt](figures/Screenhelp.png "Title")


Most useful shortcuts:

* **`m`** switches a cell to **markdown**.  
* **`y`** switches a cell to **code**.  
* **`a`** adds a cell before the current cell.  
* **`b`** adds a cell after the current cell.  
* **`dd`** deletes a cell.  




### Kernels

When you [run code](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Running%20Code.html), the code is actually executed in a kernel. Sometime a kernel can crash or can get stuck in an endless loop. In all these cases, kernel needs to be interrupted or even restarted. All these operations can be performed by the "Kernel" menu.

### File Output

Remember that a notebook file contains both, the input to a computation and the outputs. If you run a notebook, all the outputs generated by the code cells are also stored in the notebook file. 

## Python introduction


There are plenty of tutorial on Python programming. The following are excellent examples:

- [Google's Python Class](https://developers.google.com/edu/python)
- [Learn Python Programming](https://www.programiz.com/python-programming)
- [A Whirlwind Tour of the Python Language](https://github.com/jakevdp/WhirlwindTourOfPython)

In what follows, we discuss the language key features that you should know and practice. 


Python is a **dynamic, interpreted language**. There are no type declarations of variables, parameters, functions, or methods in source code. 
If you need a variable ``a`` and you want to assign value 6 to ``a`` in Python you do:

# Python variable assignment
``a = 6``

In [4]:
a = 6

In [5]:
a

6

```
The same assigment in C looks like:

/* C variable definition and assignment */
int a;
a = 6;
```
The difference is crucial. In C, via variable definition, you reserve a memory bucket, then you put the value 6 into it. 
In Python you're defining a **pointer** (named ``a``) that points to a container holding the value 6. So, you don't need to declare the variable type and a variable can point different information types (and eventually change type during code execution).

By executing the following cell, you can check such a behavior that explains the meaning of _dynamically-typed_ language.

In [6]:
help("keywords")


Here is a list of the Python keywords.  Enter any keyword to get more help.

False               break               for                 not
None                class               from                or
True                continue            global              pass
__peg_parser__      def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield



In [7]:
valore = 6
print (valore)
valore = 'Hello there'
print (valore)
valore = [0,3,5]
print (valore)

6
Hello there
[0, 3, 5]


**There is a major consequence of this approach:**

If two variables point the same **mutable** object then changing one 
changes the other as well.

Consider the following example:

In [8]:
a = [1,2,3]
b = a

Variables ``a`` and ``b`` **point** the same object: the list [1,2,3].
Now, if you modify either variable the other will also change.
That is:

In [9]:
a.append(4) 
print(a)

[1, 2, 3, 4]


In [10]:
print(b)

[1, 2, 3, 4]


In [11]:
b.append(5)

print (a)
print (b)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


However, if you assign to a variable another object the other remains unchanged

In [12]:
b = 'Hello there'

print (a)
print (b)

[1, 2, 3, 4, 5]
Hello there


Now, we have to point out a fundamental difference in Python objects: number and strings are **immutable** objects while lists are **mutable**.

Basically, you cannot change the value of a immutable object but you can change what values the variables point to. 

Example:

In [13]:
a = 100
b = a 

print (a)
print (b)

100
100


In [14]:
# Add 10 to a and assign it to a

a += 10 # a = a + 10

print(a)
print(b)

110
100


When the instruction ``a = a + 10`` is executed the variable ``a`` is changed so that it points to a (new) integer with value 110. Variable ``b`` then is not affected.

## Python Strings

Strings are important **immutable** Python objects and new
strings can be constructed in a way close to integer computation 
above.

In detail:

```Python
s = 'Hello'
t = 'Data' 
u = 'Analytics'
print (s + t + u)
```
will print out the string 'HelloDataAnalytics' 

Note the ``+`` operator. It behaves differently depending on its arguments. If you use ``+`` between two integer variables it behaves like the mathematical operator sum. If you use ``+`` between strings it concatenates the strings. In both cases the ``+`` operator does not convert automatically numbers (or other types) to string type. 
If you try:

In [15]:
s = 'Hi'
t = ' '
u = 'class'

print (s + t + u)

Hi class


In [17]:
s = 'There are '
a = "52"
p = ' people in this class'

print (s + a + p)

There are 52 people in this class


you get a ``Type Error``. To concatenate ``string`` and ``integer`` you have to convert the integer object into a string. This is performed by the
**function** ``str()``: 

In [18]:
s = 'There are '
a = 52
p = ' people in this class'

print (s + str(a) + p)



There are 52 people in this class


In [19]:
a = 10
type(a)

int

In [20]:
a = 52.0
type(a)

float

If you want to check a type of a Python object you can use the function ``type``. 

In [21]:
a = 3
type(a)

int

In [22]:
a = 'Hi'
type(a)

str

In [23]:
a = 1e-9
type(a)

float

Broadly speaking, in Python ``a`` is an **object** and its type is not linked to variable name but to the object itself. 

Python is an **Object-oriented** language.

From [Wikipedia](https://en.wikipedia.org/wiki/Object-oriented_programming):

> Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which may contain data, in the form of fields, often known as **attributes**; and code, in the form of procedures, often known as **methods**.

## String methods

As for strings, you can find all string methods in the official
[Python documentation](https://docs.python.org/3.6/library/stdtypes.html#string-methods).

Common and useful methods are described in the following.

* ``s.lower(), s.upper()``. Return a **copy** of the string with all the cased characters converted to lowercase (uppercase).

In [24]:
s = 'Hello'



In [25]:
s.upper()

'HELLO'

In [26]:
slower = s.lower() # slower is a new string
supper = s.upper() # supper is a new string

print(slower, supper)

slower.upper()

hello HELLO


'HELLO'

In [27]:
s

'Hello'

* ``s.strip()``. Return a **copy** of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or ``None``, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

In [28]:
s = '  Hello Data Analytics class '

st = s.strip()
st

'Hello Data Analytics class'

In [29]:
st = s.strip('Helas')
st

'  Hello Data Analytics class '

In [30]:
st = s.strip(' Helas')
st

'o Data Analytics c'

* ``s.isalpha()/s.isdigit()/s.isspace()`` Tests if all the string chars are in the various character classes.

In [31]:
s = 'abc'
s.isalpha()

True

In [32]:
s = '123'
s.isdigit()

True

In [33]:
s = '12a'
s.isalpha()

False

In [34]:
s = '12a'
s.isdigit()

False

In [35]:
s = ' '
s.isspace()

True

* ``str.startswith(prefix[, start[, end]])``

Return ``True`` if string starts with the ``prefix``, otherwise return ``False``. ``prefix`` can also be a tuple of prefixes to look for. With optional ``start``, test string beginning at that position. With optional ``end``, stop comparing string at that position.



In [36]:
s = 'hello there'

s.startswith(('hel','Hel'))

True

* ``str.endswith(suffix[, start[, end]])``

Return ``True`` if string starts with the ``suffix``, otherwise return ``False``. ``suffix`` can also be a tuple of prefixes to look for. With optional ``start``, test string beginning at that position. With optional ``end``, stop comparing string at that position.


In [37]:
s = 'Hello there'

s.endswith('here')

True

* ``s.find(sub[,start[,end]])``

Return the lowest index in the string where substring ``sub`` is found within the slice s[start:end]. Optional arguments ``start`` and ``end`` are interpreted as in [slice](#slicing) notation. Return -1 if sub is not found.


String position: 0:1:2:...:n-1

In [38]:
s = 'Hello there there'
s.find('there')

6

In [39]:
s = 'Hello there there'
s.find('there',8)

12

In [40]:
s.find('hel')

-1

<div class="alert alert-block alert-info">
    <b>Note</b>: The <code>find()</code> method should be used only if you need to know the position of the substring. To check if <code>'sub'</code> is a substring or not, use the <code>in</code> <b>operator</b> as follows:
</div>

In [41]:
'Hel' in 'Hello' 

True

* ``str.replace(old, new[, count])``

Return a **copy** of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.


In [42]:
s = 'Hello there'
s.replace('Hello', 'Ciao')

'Ciao there'

In [43]:
print(s)

Hello there


* ``s.split(sep=None,maxsplit=-1)``

Return a list of the words in the string, using ``sep`` as the delimiter string. If ``maxsplit`` is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If ``maxsplit`` is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

If ``sep`` is given, **consecutive delimiters are not grouped together** and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). 

The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). 

Splitting an empty string with a specified separator returns [''].



In [44]:
'A B C'.split()

['A', 'B', 'C']

In [45]:
stringtest = 'a,b,c'

In [46]:
test = stringtest.split()

In [47]:
type(stringtest)

str

In [48]:
'a,b,c'.split(',')

['a', 'b', 'c']

In [49]:
'a,b,c'.split(',',1)

['a', 'b,c']

In [50]:
'a--b--c'.split('-') 

['a', '', 'b', '', 'c']

If ``sep`` is not specified or is ``None``, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

In [51]:
' a   b   c  '.split()

['a', 'b', 'c']


* str.join(iterable)

Return a string which is the concatenation of the strings in the iterable ``iterable``. A ``TypeError`` will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.


In [52]:
'**'.join(['What','a', 'string'])

'What**a**string'

In [53]:
':'.join(['Fabrizio','Rossi','ID','95020'])

'Fabrizio:Rossi:ID:95020'

## Slicing
<a id='slicing'></a>

The slice syntax is a nice Python feature. 

Suppose to have the string ``s = 'Alberto'``. 

The slice ``s[start:end]`` is the elements **beginning** at ``start`` and **extending up to but not including** ``end``.

In [54]:
s = 'Alberto'

 | s = | 'A | l | b | e | r | t | o' 
--- | --- | --- | --- | --- | --- | --- | --- 
**index**| 0 | 1 | 2 | 3 | 4 | 5 | 6
**index**| -7 | -6 | -5 | -4 | -3 | -2 | -1



In [55]:
s[0]

'A'

In [56]:
s[-4]

'e'

In [57]:
s[1:5]

'lber'

In [58]:
s[-1:-6]

''

In [59]:
s[5:3]

''

In [60]:
s[:5]

'Alber'

In [61]:
s[5:]

'to'

In [62]:
s[:-3]

'Albe'

In [63]:
s[-3:]

'rto'

In [64]:
s[:-3] + s[-3:]

'Alberto'

In [65]:
s[:4] + s[4:]

'Alberto'

In [66]:
s[-56:]

'Alberto'

In [67]:
s[0:100]

'Alberto'

In [68]:
print(s[1:4])
print(s[:])
print(s[-3:])
print(s[:-3])
print(s[:4]+s[4:])

lbe
Alberto
rto
Albe
Alberto
