# Table of Contents
- [**What is Python**](#What-is-Python?)
- [**Python environments**](#Python-environments)
- [**Setup Guide**](#Setup-Guide:-Installation-of-Python)

# What is Python?

Python is a modern, general-purpose, object-oriented, high-level programming language.

General characteristics of Python:

* *clean and simple language*: 
    * easy to read code
    * easy to learn, natural, syntax
    * maintainability scales well with size of projects
* *expressive language*: fewer lines of code, fewer bugs, easier to maintain

Technical details:

* *dynamically typed*: no need to define the type of variables, function arguments or return types
* *automatic memory management*: no need to explicitly allocate and deallocate memory for variables and data arrays
* *interpreted*: source code is not compiled to binary machine code but pre-compiled in an intermediate representation (named bytecode); the Python interpreter reads and executes the bytecode
* *object-oriented*: effective in tackling complexity for large programs



![tiobe](images/tiobe.png)

More on TIOBE index [here](https://it.wikipedia.org/wiki/TIOBE_Programming_Community_Index) and on the [official website](https://www.tiobe.com/tiobe-index/)

**Why Python**
- Ease of programming, minimizing the time required to develop, debug and maintain the code
- Overall strength for general-purpose SW engineering and encouraging many good programming practices:
    - “Blocks by Indentation” forces proper code structuring & readability
    - Modular and object-oriented programming, good system for packaging and re-use of code
    - Documentation tightly integrated with the code
- A large standard library, and a large collection of add-on packages, constantly being improved
- Widely adopted for data mining, as a confluence of multiple disciplines:
    - Machine Learning (`scikit-learn`)
    - Database Technology (`pandas`)
    - High Performance Computing
    - Statistics (`scipy`, `numpy`)
    - Pattern Recognition
    - Visualization (`matplotlib`, `seaborn`)
    - Application
    - Algorithm



**Why *not* Python**
- Slower than code written in compiled language(e.g., C++)
- Unsuitable for applications with low latency and low resource utilization requirements
- Unsuitable for highly concurrent, multi-threaded applications
- Somewhat decentralized, with different environments, packages and documentation spread out at different places


# Python environments

There are many different *environments* through which the Python *interpreter* can be used. Each environment has different advantages and is suitable for different workflows. One strength of Python is that it is versatile and can be used in complementary ways, but it can be confusing for beginners so we will start with a brief survey of python environments.



## Python interpreter

The standard way to use the Python programming language is to use the Python interpreter to run Python code. The **Python interpreter is a program that reads and executes the python code in files passed to it as arguments**. At the command prompt, the command `python` is used to invoke the Python interpreter.
For example, to run a file `my-program.py` that contains Python code from the command prompt, use:

```shell
$ python my-program.py
```

We can also start the interpreter by simply typing `python` at the command line, and interactively type Python code into the interpreter (press `Ctrl+D` or type `exit()` to exit):

```python
while True:
  code = input(">>> ")  # prompt the user for some code
  exec(code)        # execute it
```

This model is often called a REPL, or Read-Eval-Print-Loop: 

The `>>>` symbols represent the *prompt* where you will type code expressions. The interpreter awaits our instructions step by step. This is often how we want to work when quick testing **small portions of code**, or when doing small calculations. 


While some Python programmers execute all of their Python code in this way, those doing data analysis or scientific computing make use of IPython (an enhanced Python interpreter) or Jupyter notebooks (web-based code notebooks originally created within the IPython project).



## IPython
IPython (*Interactive Python*) is a **command shell for interactive computing in Python**.

IPython offers a fully compatible replacement for the standard Python interpreter, with convenient shell features, special commands, command history mechanism and output results caching.

We can start IPython by simply typing `ipython` at the command (press `Ctrl+D` or type `exit()` to exit).

The original IPython interface runs an interactive shell built with Python. The name shell indicates that it is the outermost layer around the kernel (i.e. the engine capable of executing code) and allows user to access it through a user interface. 

The default IPython prompt adopts the numbered `In [1]:` style compared with the standard `>>>` prompt. Just as with the standard interpreter, you can execute arbitrary Python statements by typing them in and pressing Return (or Enter).

## Jupyter Notebook

The "Project **Jupyter**" denotes an organization created with the aim of **supporting interactive data science and scientific computing** via the development of open-source software.

The **Notebook** term may refer both to the notebook document and the web-based environment used to create it. Jupyter Notebook represents another interface to the IPython kernel and can connect to it to allow interactive programming in Python. It uses an internal library for converting the document in HTML and allows visualization and editing in the browser. Although using a web browser as graphical interface, Jupyter notebooks are usually run locally, from the same computer that runs the browser. 

![notebook image](https://jupyter.readthedocs.io/en/latest/_images/notebook_components.png)

The document you are reading, with extension `.ipynb`, is a notebook and consists in an ordered list of cells which can contatin code, text, mathematics, visualization of output and plots.

Jupyter notebooks are particularly useful as scientific lab books when you are doing lots of data analysis using computational tools. This is because, with Jupyter notebooks, you can:
* **Record the code you write in a notebook as you manipulate your data**. This is useful to remember what you've done, repeat it if necessary, etc.
* **Graphs and other figures are rendered directly in the notebook**.
* You can **update the notebook (or parts thereof) with new data by re-running cells**. You could also copy the cell and re-run the copy only if you want to retain a record of the previous attempt.

 There are two types of cell:
- **Markdown cells**: contain explanatory text.
- **Code cells**:  contain executable code.


#### Text cells 

Text cells (like this) use *markdown syntax*:  it consists in plain text formatting syntax that enables the creation of rich text that can be converted to HTML!

You can include well-formatted text, formulas, and images too.

A comprehensive guide on markdown language is available [here](https://colab.research.google.com/notebooks/markdown_guide.ipynb).

Few examples follows:

Markdown | Preview
--- | ---
`**bold text**` | **bold text**
`*italicized text*` or `_italicized text_` | *italicized text*
`` `Monospace` `` | `Monospace`
`~~strikethrough~~` | ~~strikethrough~~
`[A link](https://www.google.com)` | [A link](https://www.google.com)
`![An image](https://www.google.com/images/rss.png)` | ![An image](https://www.google.com/images/rss.png)


#### Code cells
A code cell contains executable code and displays its output just below.
The subsequent cells are examples of code cell: execute them clicking the play button or using Ctrl+Enter.

In [1]:
# this is an executable code cell. 
a = 2
b = 4
a*b

8

In *code cells* you can use question mark for accessing the documentation:

In [2]:
a?

[0;31mType:[0m        int
[0;31mString form:[0m 2
[0;31mDocstring:[0m  
int([x]) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is a number, return x.__int__().  For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base.  The literal can be preceded by '+' or '-' and be surrounded
by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4

In *code cells* you can also use system aliases. Use exclamation mark for terminal operation: 

In [3]:
!python --version

Python 3.11.9


# Setup Guide: Installation of Python


## Versions of Python

from [python wiki](https://wiki.python.org/moin/Python2orPython3):

*Python 2.x is legacy, Python 3.x is the present and future of the language*

>*Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid-2010, with a statement of extended support for this end-of-life release. The 2.x branch will see no new major releases after that.*

>*As of January 2020, Python 2 has reached End Of Life (EOL) status, meaning it will receive no further updates or bugfixes, including for security issues. Many frameworks and other add on projects are following a similar policy.*


>*As such, we can only recommend learning and teaching Python 3.*

To see which version of Python you have, run:
```shell
$ python --version
Python 3.11.9
```

Several versions of Python can be installed in parallel.

### IMPORTANT: In this course we will use Python 3

There are several differences between Python 2.7.x and Python 3.x; the most relevant ones are reported [here](https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html).

New users are recommended to download and install **Anaconda**.
It is a package manager, an environment manager, a Python distribution, and a collection of 1,000+ open source packages. Among the available packages:
- **jupyter**;
- **numpy**: fundamental package for scientific computing with Python;
- **matplotlib**: a Python plotting library;
- **pandas**: a Python library for data pre-processing and data analysis;
- **scikit-learn**: a Machine Learning library in Python;
- **NLTK** (Natural Language Toolkit): platform for building Python programs to work with human language data.

Download Anaconda from the [official website](https://www.anaconda.com/products/individual) and follow the [instructions](https://docs.anaconda.com/anaconda/install/) for your OS.

**conda** is the package and environment manager provided with Anaconda. From the [conda website](https://conda.io/en/latest/):
> Conda is an open source **package management system** and **environment management system** that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.



## Managing packages

In programming, a module is a piece of software that has a specific functionality. In Java, the term *package* is often used as a synonym of module. In Python, a *package* is a collection of modules.

-  `conda` provides many commands for installing packages from the Anaconda repository and cloud.
-  `pip` is the Python Packaging Authority’s recommended tool for installing packages from the Python Package Index, PyPI.

For example, you can install a given package by typing the following commands in the terminal: 
> ```conda install [package name]```

or

> ```pip install [package name]``` 


## Managing Environments

Python has its own unique way of downloading, storing, and locating packages (or modules). There are a few different locations where these packages can be installed on your system. 

Third party packages, or **site packages**, installed using `pip` or `conda` are typically placed in one of the directories pointed to by `site.getsitepackages`:

In [4]:
import site

print(site.getsitepackages())

['/Users/alessandrorenda/opt/anaconda3/envs/DMML24/lib/python3.11/site-packages']


By default, every project on your system will use these same directories to store and retrieve site packages. 
Site packages are stored according to just their name, there is no differentiation between versions.

Consider the following scenario
- you have two projects: *ProjectA* and *ProjectB*.
- Both projects have a dependency on the same package, *ProjectC*. 
- *ProjectA* needs *ProjectC* `v1.0.0`.
- *ProjectB* needs *ProjectC* `v2.0.0`.

This is a real problem for Python since it can’t differentiate between versions in the `site-packages` directory. So both `v1.0.0` and `v2.0.0` would reside in the same directory with the same name.

An **environment manager** allows to create lightweight "virtual environments" with their own site directories, optionally isolated from system site directories. This means that each project can have its own dependencies, regardless of what dependencies every other project has.

In our example, we would just need to create a separate virtual environment for both *ProjectA* and *ProjectB*.
Each environment, in turn, would be able to depend on whatever version of *ProjectC* they choose, independent of the other.

Practically, **virtual environments are just directories containing a few scripts**.

Availble tools:
-  `conda` provides many commands for managing environments
-  `venv` is the Python tool for managing virtual environments


## Differences between `conda`, `pip`, `virtualenv`

Look at [the documentation](https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands) for the full-size table, and at [this blog post](https://www.anaconda.com/blog/understanding-conda-and-pip).
![jupyter-home.png](images/conda-pip-venv.PNG)




## Creating a conda environment

1. Create the environment named "DMML" using the following command (Unix shell or Anaconda prompt)

```bash
$ conda create --name DMML python=3.11
```

2. Get the list of available environments

```bash
$ conda env list
```
3. Activate your newly created environment

```bash
$ conda activate DMML
```

4. Install `jupyter` (or any other needed package)

```bash
(DMML)$ pip install jupyter 
```
 
5. If you need to go back to the "system context":

```bash
(DMML)$ conda deactivate
```



##  Jupyter Notebook Usage

After installing jupyter using `conda` or `pip`, to start a Jupyter notebook server, navigate to a suitable working directory and type the following command (Unix shell or Anaconda prompt)
```bash
(DMML)$ jupyter notebook
```

This starts a Jupyter notebook server and automatically opens it in the browser. You should get something like this:

![jupyter-home.png](images/jupyter-home.png)

Here's a quick list of the main functionalities in Jupyter notebooks (have a look at `Help > Keyboard Shortcuts`)
* Start a new notebook clicking `New` in the top-right corner
* Type in some Python code in a **cell** and press `Shift` + `Enter` to execute.
* Change the cell type from Code to Markdown using the drop-down box (top-middle) to write explanatory text. Markdown guidance is available [here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet%20%22Markdown%20cheat-sheet%20on%20GitHub). 
* Add new cells using the `+` button (top-left).
* Save the notebook using the disk button (top-left).
* Run cells in various ways (run all, run selection, run all above, run all below, etc.) using the options in the `Cell` menu.
* Interrupt the kernel, restart it, clear all output, etc. using the options in the `Kernel` menu.
* Download as Python script, HTML, LaTeX, etc. using the options under `File > Download as`.

An open notebook has exactly one interactive session connected to a kernel which will execute code sent by the user and communicate back results. 

This kernel remains active if the web browser window is closed, and reopening the same notebook from the dashboard will reconnect the web application to the same kernel.

### Jupyter Notebook pitfall: the hidden state

Note that:
- All code cells share *state*: a variable can be defined in a cell, referenced in another, deleted in a third
- All code cells are mutable: they can be modified, rearranged, deleted (regardless of whether we executed them or not).

As a consequence, there is a natural decoupling between the code that has been executed by the kernel and the aspect of a notebook.

Consider the following cells:

In [5]:
my_var = 2 
my_var

2

In [6]:
my_var+=1 # what if i don't execute this cell?  what if i execute it more than once??

In [7]:
my_var

3

How to address the hidden state challenge?
- be aware of it;
- look at the cell **execution number** (upper-left of each cell): useful to discover exceution out of order or deleted cells;
- work *linearly* (from top to bottom);
- use frequently the "**Restart & Run All**" utility ("Kernel" tab);
- avoid deleting work until you are sure it can be safely deleted.


**To sum up**:

You may prefer Jupyter Notebook:
- for scripts and prototypes
- for interactive devoloping
- for visualization purpose
- for documenting, explaining, and analyzing your work (e.g., homework assignment/ practical exam)

You may want to use an integrated development environment (IDE), e.g. Spyder (comes with Anaconda installation), PyCharm, Visual Studio, JupyterLab to name but a few, for the following reasons:
- more complex projects (many files)
- debug utilities
- powerful code editing tools
- furthermore, many IDEs support working with .ipynb Notebooks natively



# <font color='blue'><ins>TASK</ins></font>
- Please, fill the survey [https://forms.office.com/e/NjPS9QLrYw
](https://forms.office.com/e/H2Vy1jfpfb)
- Setup:
    - download and install Anaconda
    - create a dedicated `conda environment` for the course
    - activate the environment and install a bunch of useful packages by typing:
        - `pip install pandas openpyxl seaborn folium matplotlib imbalanced-learn scikit-learn plotly mlxtend`

- Perform the following steps:
    - Create a directory for the course exercises.
    - Inside, create a new jupyter notebook. 
    - Type `import this` in a cell code and execute it. 
    - Enjoy ***The Zen of Python***.