# Python in Data Science

***

# Part 1 - Introduction
## Python (1 of 2)

***

# https://github.com/MichalKorzycki/PythonDataScienceEN


***

# Python

Python is:
- a dynamic strongly-typed general purpose programming language
- powers Youtube, Dropbox, Netflix, Instagram ...
- can work as a stand-alone program called a "script"
- or a notebook (what you see)
- is a serious programming language

![title](img/rossum.jpg)

![title](img/gosling.jpg)

![title](img/stroustrup.jpg)

![title](img/wall.jpg)

(sources: wikipedia)

In [None]:
print ('Hello World')

In [None]:
import this

In [None]:
import sys
print (sys.version)

![title](img/python_growth.png)

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/

![tiobe](img/tiobe.png)

Source: https://www.tiobe.com/tiobe-index/

***

---
# Jupyter Notebooks

... **Jupyter** is *simple* but _**powerful**_

$$\sum_{i=1}^\infty \frac{1}{2^i} = 1$$

This is: $P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}$ the _**Bayes Equation**_

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,100, 110).reshape(-1, 11))
df.columns=[chr(i) for i in range(ord('A'),ord('K')+1)]

df["max"] = df.apply(max,axis=1)
df

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.style.use("dark_background")

x = np.linspace(0, 10, 100)
fig = plt.figure(figsize=(20,4))
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '.');

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.style.use("dark_background")

x = np.linspace(0, 10, 100)
fig = plt.figure(figsize=(20,4))
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '.');

# https://www.anaconda.com/download/

# https://colab.research.google.com/
---

# Syntax

## Variables and their types

In [None]:
s = 'Hello world'

In [None]:
s

## Variable types
### Primitive
- integers (`int`)
- floating point (`float`)
- strings of characters (`str`)
- Booleans (`True` and `False`)

### Non-primitive
- lists - (`list`)
- tuples - (`tuple`)
- dictionaries - (`dict`)
- ...


In [None]:
7+2

In [None]:
7/2

### Integer division

In [None]:
7//2

### The Modulo operator (the remainder of dividing the left hand operand by right hand operand)

In [None]:
7%2

## Python is a dynamically typed language


In [None]:
s = 3
s

In [None]:
s = "Hello "

In [None]:
s = s + "world"
s

In [None]:
n = 3

In [None]:
s = "Hello " + n + " worlds"
s

##  Python is a strongly typed language

In [None]:
s = "Hello " + str(n) + " worlds. "
s

In [None]:
(s+"|") * 4

In [None]:
"10"

In [None]:
int("10")

In [None]:
int("10", base=2)

In [None]:
int("20", base=6)

## String formatting

In [None]:
'%d is a number' % 3

In [None]:
'PI is more or less equal to %.2f'  % 3.141526

In [None]:
'%d is a number and %d is also a number' % (3,7)

In [None]:
str(3) + ' is a number and ' + str(7) + ' is also a number'

In [None]:
n = 3
m = 11117
f'{n} is a number and {m} is also a number'

## Boolean

In [None]:
n == 3

In [None]:
n != 3

In [None]:
not n == 3

In [None]:
n == 3 or n != 3

In [None]:
n == 3 and n != 3

In [None]:
if n == 3:
    print( "Three" )

In [None]:
if n == 4:
    print ("Four")
else:
    print("Not four")

In [None]:
if n == 4:
    print ("Four")
elif n == 3:
    print ("Three")
    print ("3")    
else:
    print("Neither three or four")

In [None]:
n = 4
if n == 4:
    print ("Four")
elif n == 3:
    print ("Three")
    print ("3")    
else:
    print("Neither three or four")

## Lists

In [None]:
a = [3,5,6,7]
a

In [None]:
a[1]

In [None]:
a[0]

In [None]:
a[0:2]

In [None]:
a[-1]

In [None]:
a[1:-1]

In [None]:
a[-4]

In [None]:
a[:-1]

In [None]:
a[1:-1]+a[0:-1]

In [None]:
len(a)

In [None]:
for i in a:
    print (f'{i}.')
    print("!")

In [None]:
for i in range(len(a)):
    print(i)

In [None]:
for i in range(len(a)):
    print(f'{i}: {a[i]}')

In [None]:
a.append(8)
a

In [None]:
a + a

In [None]:
a * 3

In [None]:
s

In [None]:
s[0]

In [None]:
" - ".join( ["This", "is", "Alice"] )

In [None]:
"".join( ["This", "is", "Alice"] )

In [None]:
s2 = '.|.'

In [None]:
s2.join(["This", "is", "Alice"] )

## Tuples

In [None]:
t = (1, 2, 3, 4)
t

## Functions

In [None]:
def add_2(x):
    ret = x + 2
    return ret

In [None]:
add_2(5)

In [None]:
def is_odd(x):
    print ("*" * x)
    return (x % 2) == 1

In [None]:
is_odd(25)

In [None]:
is_odd(8)

In [None]:
def spell(n):
    units = { 0: "zero", 1: "one", 2: "two", 3: "three", 4: "four", 5: "five", 6: "six", 7: "seven", 8: "eight", 9: "nine"}
    return units[n]

In [None]:
spell(6)

In [None]:
def add_2_spell(n):
    result = add_2(n)
    return spell(result)

In [None]:
add_2_spell(4)

## Dictionaries

In [None]:
m = { 'a': 1, 'b': 2 }

In [None]:
m.keys()

In [None]:
m.values()

In [None]:
m['a']

In [None]:
m['c']

In [None]:
m.get('c', 0)

In [None]:
m = dict( [("a", 1), ("b", 2)] )

In [None]:
m

In [None]:
l = [ "a", "a", "a" ]
l

In [None]:
list(zip( range(len(l)),l ))

In [None]:
li = list(range(len(l)))
li

In [None]:
l

In [None]:
list(zip(li, l))

In [None]:
m = dict(zip( range(len(l)), l))
m

In [None]:
for k in m.keys():
    print (k, m[k])

In [None]:
for k in m:
    print( k, m[k])

In [None]:
{ (1,2): "a", (1,2,3): "b"}

In [None]:
{ [1,2]: "a", [1,2,3]: "b"}

In [None]:
list(range(12))

In [None]:
l = ["a"] * 7
l

In [None]:
li = list(range(7))
li

In [None]:
list(zip(li,l))

In [None]:
len(l)

---
## Exercise 1 - Easy

Create a function that counts the occurences on a list

Eg.
count( [1, 2, 3, 4, 1, 2, 1] )

{ 1: 3, 2: 2, 3: 1, 4: 1 }

## Exercise 2 - Intermediate


Create a function that prints with "*" a christmas tree of a given height

Eg.
xmas_tree(7)

       *
      ***
     *****
    *******
   *********
  ***********
 *************
       *
      ***

xmas_tree(4)
        
    *
   ***
  *****
 *******
    *
   ***



## Exercise 3 - Advanced


Create a function that spells number ranging from 1 to 1 Billion

spell(1984)
'one thousand eighty four'

Tips: 
- split the task into smaller portions using functions
- use integer division (`//`) and modulo (`%`)
- use "if-else"


# Recommended reading (free online versions)

## Introductory
- [Python 101](http://python101.pythonlibrary.org/) – The first 10 chapters are enough to start coding!
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) – strongly recommender
- [Python Docs](https://docs.python.org/3/) – Python Standard Library Docs (python comes with bateries)

## Intermediary
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
- [Dive into Python](https://diveintopython3.problemsolving.io/) 
- [Think Bayes](https://greenteapress.com/wp/think-bayes/)
- [Think Stats](https://greenteapress.com/wp/think-stats-2e/) 
- [Natural Language Processing in Python](https://www.nltk.org/book/) 

## Advanced
- [The Elements of Statistical Learning](https://web.stanford.edu/~hastie/Papers/ESLII.pdf) 
- [Foundations of Statistical NLP](https://nlp.stanford.edu/fsnlp/) 
- [Introduction to Information Retrieval](https://nlp.stanford.edu/IR-book/)  

## Misc
- [Peter Norvig - The Unreasonable Effectiveness of Data](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf)
- [Andrew Ng – Machine Learning Yearning](https://www.deeplearning.ai/machine-learning-yearning/)
- [Tidy Data](https://vita.had.co.nz/papers/tidy-data.pdf) 
