# Data Science and Machine Learning with Python

## What is Data Science?
>From Wikipedia: Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data of various forms both structured and unstructured.  


## What is machine Learning?

> The term machine learning was coined in 1959 by Arthur Samuel, in which he programmed a computer to play the game of checkers. The program remembered every position it had already seen, along with the terminal value of the reward function as it improves its capabilities on the game.<br>
> Also, Tom M. Mitchell, an American computer scientist, a more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms. <br>

>Another generic definitions: 
>in which a machine can learn by its own without being explicitly programmed
>Machine learning is the process of extracting knowledge from data automatically, usually with the goal of making predictions on new, unseen data. A classical example is a spam filter, for which the user keeps labeling incoming mails as either spam or not spam. A machine learning algorithm then "learns" a predictive model from data that distinguishes spam from normal emails, a model which can predict for new emails whether they are spam or not.   
>Central to machine learning is the concept of **automating decision making** from data **without the user specifying explicit rules** how this decision should be made.
>For the case of emails, the user doesn't provide a list of words or characteristics that make an email spam. Instead, the user provides examples of spam and non-spam emails that are labeled as such.
>The second central concept is **generalization**. The goal of a machine learning model is to predict on new, previously unseen data. In a real-world application, we are not interested in marking an already labeled email as spam or not. Instead, we want to make the user's life easier by automatically classifying new incoming mail.

*other Examples: image recognition; NLP; 

## The Tools or ``libraries``

>First, you need to install the Python libraries and packages for data science and machine learning; as minimum, they are:
- **NumPy** -  for numerical computing and linear algebra operations
- **Pandas** - for data cleaning and exploratory data analysis
- **Matplotlib** and **Seaborn** - for data visualization
- **Scikit-learn** - machine learning algorithms
- iPython or **jupyter** notebook - the preferred IDE for data science in Python



    
### How to Install Python libraries for data science and machine learning


>Install these libraries and its dependencies via 
[``Anaconda distribution``](https://www.anaconda.com/download/#macos)

>Anaconda is a free and open source distribution for Python specific for scientific computing to simplify package management and deployment. Package versions are managed by the package management system **conda**.

>License: BSD

>Developers: Anaconda, Inc.

<img src="images/anaconda_dist.png" width="70%" style="margin: 20px auto;">

## the ``IPython`` or ``jupyter`` notebook 
<img src="images/jupyter.png" width="50%" style="margin: 20px auto;">

> -is the preferred IDE (integrated development environment) for data science in the **Python**.
> -is an open-source web application that allows creating and sharing documents that contains live code and narrative text. 
>-Uses include: data cleaning and transformation, statistical modeling, data visualization, machine learning, etc. 
> library and its dependencies are included in the Anaconda distribution  




### to start,  ``Launch`` jupyter notebook 
- from the Anaconda Navigator, "Launch" jupyter notebook
- or from Terminal
```python
    type "jupyter notebook"
    ```

### Notebook operations

- there are two Modes: **Edit** and **Command**
- in **Command** mode (cell is blue) -> press ``Enter`` to **Edit** mode
- in **Edit** mode (cell is green) -> press ``esc`` to return to **Command** mode
- to Execute a Cell: press ``Shift + Enter``
- Down: ``j/Down Arrow``
- Up: ``k/Up Arrow``


In [1]:
a = [1,2,3]
print(a)


[1, 2, 3]


### Python version

In [2]:
import sys
print(sys.version)

3.7.1 (default, Dec 14 2018, 13:28:58) 
[Clang 4.0.1 (tags/RELEASE_401/final)]


### The magic ``%`` functions

The magic function system provides a series of functions which allow you to
control the behavior of the notebook itself, plus a lot of system-type
features. There are two kinds of magics, line-oriented and cell-oriented.

Line magics are prefixed with the % character and work much like OS
command-line calls: they get as an argument the rest of the line, where
arguments are passed without parentheses or quotes. 

In [6]:
# %lsmagic

In [8]:
import numpy as np


In [9]:
%timeit np.arange(1000)

882 ns ± 7.53 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [10]:
%timeit np.linalg.eigvals(np.random.rand(100,100))

3.13 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### while the kernel is in execution mode...you can wait, or ``interrupt`` to stop it advertently

In [11]:
import time, sys

In [12]:
from IPython.display import display, clear_output
for i in range(50):
    time.sleep(0.25)
    clear_output(wait=True)
    print(i)
    sys.stdout.flush()


16


KeyboardInterrupt: 

In [13]:
import time, sys
for i in range(10):
    print(i)
    time.sleep(0.5)

0
1
2
3
4
5
6
7
8
9


### handling large outputs

In [None]:
for i in range(50):
    print(i)


# The Markdown cell

# Header
## Sub-header
### $This$ is a **Markdown** cell with some stylised $\LaTeX$ example

When $a \ne 0$, quadratic equation 
$$ ax^2 + bx + c = 0 $$
has two roots and they are
$$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$ 

This is a *list* operations in Python

```python
 a = [2,3,4]   
 b = [0,4,5]
 c = a + b
 print(c)

In [8]: [2, 3, 4, 0, 4, 5]
```

In [14]:
a = [2,3,4]
b = [0,4,5]
c = a + b
print(c)

[2, 3, 4, 0, 4, 5]


# next ...``numPy``