# Eighty20 Python Training 

## Basics

### What is it?
Python is an interpreted, object-orientated programming language that works really well for scripting and application development

In [None]:
print("Hello World")

### 2 vs 3
- There are currently 2 major versions of python - 2.x and 3.x
    - Python 3.0 was released in 2008 to try to fix some of the issues with Python 2 (mostly Unicode support) 
    - But that broke some backwards compatibility
- Basically Python 3 is better so use it if you can
- However some older packages do not support Python 3 so you might have to use Python 2 for certain tasks
- Good modern Python 2 code should be compatible with Python 3
- Everything we currently use at Eighty20 is in Python 3

### How to use it?
- Like this! In a Jupyter Notebook (more on that later)
- Linux
    - It should already be installed
    - Type `python3` (or `python` for python 2) at a terminal to enter a python shell
    - Install packages with `pip3 install`
- Windows and Mac
    - Download and install [Anaconda](https://www.continuum.io/)
    - Install packages with `conda install`
- The [Tutorial](https://docs.python.org/3/tutorial/index.html)
- The [Python Standard Library](https://docs.python.org/3/library/index.html)

### Let's write some code

### It's a fancy Calculator

In [167]:
print(3 + 2)

5


In [168]:
#Jupyter always prints the answer
3 + 2

5

In [169]:
#Division to return a float 
10 / 3

3.3333333333333335

In [170]:
#Division to return an int
10 // 3

3

In [171]:
#Return the remainder
10 % 3

1

In [172]:
#Multiplication
10 * 3

30

In [173]:
#Exponent 10^3
10 ** 3

1000

In [174]:
x = 3

In [175]:
x

3

In [176]:
x + 5

8

In [177]:
x

3

In [178]:
x = x + 5
x

8

In [179]:
x = 3
x += 5
x

8

In [180]:
#Python follows operator precendence
x = 10 + 3 * 2
x

16

### Input and Output

In [181]:
print(x)

16


In [182]:
print("The number is", x)

The number is 16


In [183]:
print("{} is the number".format(x))

16 is the number


In [185]:
print(f'{x} is the number')

16 is the number


In [186]:
print("One line")
print("Another line")

One line
Another line


In [187]:
#Normally defaults to new line character
print("Both on", end=" ", )
print("the same line")

Both on the same line


In [None]:
import time
num_lines = 5
for i in range(num_lines):
    time.sleep(1)
    print(f"\rPrinting line {i+1} of {num_lines}", end="")

In [188]:
#Input alway returns a string answer
print("Would you like to play a game?")
ans = input()
ans

Would you like to play a game?


 yes


'yes'

In [189]:
print("What year were you born?")
birth_year = input()
age = 2021 - birth_year
print(age)

What year were you born?


 1991


TypeError: unsupported operand type(s) for -: 'int' and 'str'

In [190]:
print("What year were you born?")
birth_year = input()
age = 2021 - int(birth_year)
print(age)

What year were you born?


 1991


30


### Conditionals

#### Comparison operators

In [None]:
x = 3
x > 2

In [None]:
x < 2

In [None]:
x == 2

#### Logical Operators

In [None]:
# and 
x = 8
x > 1 and x < 5

In [None]:
# or 
x > 1 or x < 5

In [None]:
# not
not x > 10

#### If Statements

In [None]:
temperature = 35
if temperature > 30:
    print("It's hot") #Will be printed if true 
print(f"The temperature is {temperature} today") #Will always be printed - introduces string formatting

In [None]:
temperature = 10
if temperature > 30:
    print("It's hot")
elif temperature > 20:
    print("It's a nice day")
else:
    print("It's alright")
print(f"The temperature is {temperature} today")

In [None]:
#Indetation is really important in Python 
x = 50
if x > 2:
    if x > 100:
        print("That's a really big number")
    else:
        print("That's not so big")
elif x > 0 and x <= 2:
    print("At least it's still positive")
else:
    print("You're on your own now")


### Loops

In [None]:
print(1)
print(2)
print(3)
print(4)
print(5)

In [None]:
## While loops - repeat a block of code multiple times
i = 1
while i <= 5:
    print(i)
    i += 1

In [None]:
#Something cool but useless
i = 1
while i <= 5:
    print(i * '*')
    i += 1

In [None]:
range(5) #All integers from 0 to 4 

In [None]:
for i in range(5):
    print(i)

In [None]:
for i in range(1,6):#range(<include>, <exclusive>)
    print(i)

### Lists

In [None]:
a = [0, 1, 2, 3, "hello"]
for element in a:
    print(element, "of type", type(element)) 

In [None]:
a[0]

In [None]:
a[1:4]

In [None]:
a[:3]

In [None]:
a[1:]

In [None]:
#The last element in the list
a[-1]

In [None]:
#Second last element 
a[-2]

In [None]:
list((0,1,2))

In [None]:
#Replace values in a list 
a[-1] = 4
a

#### List methods

In [None]:
my_list = [1, 2, 3, 4]

In [None]:
my_list.append(6)
print(my_list)

In [None]:
my_list.insert(-1, 5)
print(my_list)

In [None]:
my_list.remove(3)
print(my_list)

In [None]:
len(my_list)

In [None]:
my_new_list = [3, 51, 2, 8, 6]

In [None]:
#Sorts the list in ascending order
my_new_list.sort()
print(my_new_list)

In [None]:
#Sorts in desceding order
my_new_list.sort(reverse=True)
print(my_new_list)

### List Comprehensions

In [None]:
for b in range(5):
    b = b + 2
    print(b)

In [None]:
b_list = []
for b in range(5):
    b = b + 2
    b_list.append(b)
print(b_list)

In [None]:
[b + 2 for b in range(5)]

### Tuples

In [None]:
#Tuples are immutable - they cannot be changed once created 
#Only use these when you don't want anyone to change your 'list'
#A list is defined with [] and a tuple is defined with ()

In [None]:
numbers = (1,2,3,3)

In [None]:
numbers[0] = 0

In [None]:
#Returns the number of instances 
numbers.count(3)

In [None]:
#Returns the index of the first instance
numbers.index(2)

### Dictionaries

In [None]:
#Allow us to work with key-value pairs
dictionary = {'key': 'value'}

In [None]:
fast_food = {
    "Riona": "KFC",
    "Courtney" : "Butlers",
    "Matthew" : "Nandos"
}
fast_food

In [None]:
fast_food["Riona"]

In [None]:
fast_food["Sebastian"]

In [None]:
"Sebastian" in fast_food

In [None]:
fast_food.get("Sebastian", "Not found")

In [None]:
fast_food.get("Riona", "Not found")

In [None]:
#Can add addiitonal values
fast_food['Tarryn'] = 'McDonalds'
print(fast_food)

In [None]:
fast_food.items()

In [None]:
#Sorted vs sort() - sort() does not work for dictionaries
for person in sorted(fast_food):
    print(person, "likes", fast_food[person])

In [None]:
for person, restaurant in fast_food.items():
    print(f'{person} likes {restaurant})

In [None]:
#Remove using del or pop 
del fast_food['Riona']
print(fast_food)

In [None]:
me = fast_food.pop('Tarryn')
print(fast_food)
print(me)

In [None]:
#update() changes a value
fast_food.update({'Courtney': 'Pizza Hut'})
print(fast_food)

In [None]:
the_world = {
    "Africa": {
        "South Africa": [
            "Cape Town",
            "Johannesburg",
            "Durban"
        ],
        "Zimbabwe": [
            "Harare",
            "Bulowayo"
        ]
    },
    "Europe": {
        "Germany": [
            "Munich",
            "Berlin",
            "Stuttgart"
        ]
    },
    "Antarctica": []
}

In [None]:
the_world["Africa"]

### Strings

In [None]:
big_bang = "   The whole universe was in a hot dense state\r\n"
print(big_bang)
big_bang

In [None]:
#Strip removes white space
big_bang = big_bang.strip()
big_bang

In [None]:
#In tells us whether or not the string exists in the variable
"universe" in big_bang

In [None]:
#find() gives us the character index, beginning from 10
big_bang.find("universe")

In [None]:
#len() gives us the number of characters in the variable 
len(big_bang)

In [None]:
#Make it upper case with upper()
big_bang.upper()

In [None]:
#Or lower case with lower()
big_bang.lower()

In [None]:
big_bang.replace("universe", "office")

In [None]:
[word.strip() for word in big_bang.split("in")]

In [None]:
thest = "This is a Test"

In [None]:
[word for word in thest.split() if word[0].isupper()]

In [None]:
big_bang.startswith("The")

In [None]:
"23".isdecimal()

In [None]:
int("23")

In [None]:
pedestrian_crossing = "Fußgängerübergänge"

In [None]:
pedestrian_crossing.encode("ASCII", "ignore")

In [None]:
"The 'end\n"

### Files

In [None]:
with open("test_file.txt", "w") as file:
    file.write("Hello World\n")
    file.write("This is me")

In [None]:
with open("test_file.txt", "r") as file:
    for line in file:
        print(line.strip())
    

#### JSON files

In [None]:
import json
with open("places.json") as file:
    places = json.load(file)

In [None]:
places

In [None]:
places['South America'] = {'Brazil': [], 'Chile': []}

In [None]:
with open('places_updated.json', 'w') as outfile:
    json.dump(places, outfile)

### Functions

In [None]:
def add_three(x):
    return x + 3

In [None]:
add_three(5)

In [None]:
def add(x, y=3):
    return x + y

In [None]:
add(5, 4)

In [None]:
add(y=2, x=5)

### Classes

In [None]:
class TestClass:
    pass

In [None]:
test = TestClass()

In [None]:
test.a = 2
test.a

In [None]:
test.b = lambda x: 3 + x
test.b(4)

In [None]:
class Animal():
    def __init__(self, name=None):
        self.name = name
        if name is None:
            self.name = "Nobody"
        
    def speak(self):
        print("Hello! My name is", self.name)

In [None]:
marlan = Animal("Marlan")
marlan.speak()

In [None]:
class Duck(Animal):
    def speak(self):
        print("Quack! My name is", self.name)

In [None]:
daffy = Duck()
daffy.speak()

In [None]:
daffy.name

### Modules and Packages

In [None]:
import os
os.getcwd()

In [None]:
import numpy as np
np.cos(np.pi)

In [None]:
from os import listdir
listdir()

### Virtual Environments

- There are an awful lot of python versions, libraries, packages and modules out there
- Sometimes they conflict with each other
- A virtual environment lets you set up a custom garden filled only with exactly the versions you need
- [virtualenvwrapper](https://pypi.python.org/pypi/virtualenvwrapper) makes it a bit easier to manage
- Anaconda also provides virtual environments

```bash
which python
which python3
python3
```

```python
import openpyxl
quit()
```

```bash
mkvirtualenv --python=/usr/bin/python3 training
workon training
pip install openpyxl
which python
python
```

```python
import openpyxl
quit()
```

```bash
deactivate
```

## Jupyter Notebook

- Jupyter Notebooks give you an interactive environment like a python shell with some nice IDE like features
- In a browser!!!
- Most importantly you can save the notebooks and intersperse commentary in markdown with the actual code making it easier to do reproducible research
- You can set up a Jupyter server on your machine
- Or use the Eighty20 one and make it easy to share your notebooks
- `jupyter:8000`

- There are also some cool things called magics

In [None]:
%%time
time.sleep(5)

## Pandas

In [None]:
import pandas as pd
import numpy as np

### Series

In [None]:
s = pd.Series([1,3,5,np.nan,6,8])
s

### DataFrames

In [None]:
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print(df)

### Reading from Files

In [None]:
people = pd.read_csv("people.csv")
people

In [None]:
fastfood = pd.read_csv("fastfood.csv")
fastfood

In [None]:
people_excel = pd.read_excel("training_data.xlsx", sheet_name="People")

### Joining tables

In [None]:
#
people.merge(fastfood, how="left",
             left_on="FastFood", 
             right_on="Restaurant").drop("Restaurant", axis=1)

### Indexing

In [None]:
people["Email"] = people["Name"].str.lower() \
    + "." + people["Surname"].str.lower() \
    + "@eighty20.co.za"
    
people

In [None]:
people.set_index("Email")

In [None]:
people.set_index(["FastFood"]).loc["Nandos"]

In [None]:
people.where(people.FastFood == "Nandos").dropna()

In [None]:
people.loc[people.FastFood == "Nandos"].dropna()

In [None]:
indexed_people = people.set_index(["Gender", "FastFood"])

In [None]:
indexed_people.loc["M", "Nandos"]

In [None]:
# we get this error because the datafame is not sorted
indexed_people

In [None]:
indexed_people.sort_index(level=[0, 1], inplace=True) 

In [None]:
indexed_people

In [None]:
indexed_people.loc["M", "Nandos"]

### Selection

In [None]:
people["Name"]

In [None]:
people.Surname

In [None]:
people.loc[2:5, ["Name", "Surname"]]

In [None]:
people[people.FastFood == "Nandos"]

In [None]:
indexed_people.loc["M"]

### Assigning

In [None]:
people[people["HighScore"] > 80]['HighScore'] = 0

When we run ```people[people["HighScore"] > 80]``` pandas creates an entire new copy of the data. So when we try and assign the ```HighScore``` column, we are modifying this new copy and not the original. Thus, the name SettingWithCopy make sense: pandas is warning you that you are setting (making an assignment) on a copy of a DataFrame. Use loc to properly make the selection and assignment.


https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-part-4-c4216f84d388

In [None]:
people.loc[people["HighScore"]> 80, 'HighScore'] = 0
people

### Aggregation

In [None]:
fastfood_groups = people.groupby("FastFood")
fastfood_groups

In [None]:
fastfood_groups["HighScore"].sum()

In [None]:
people.groupby(["FastFood", "Gender"])["HighScore"].sum()

In [None]:
pd.crosstab(people.Gender, people.FastFood, margins=True)

### Interaction with Databases

### An example

## SQLAlchemy

### Setting up a connection

In [None]:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session
from contextlib import contextmanager
engine = create_engine("sqlite:///training.db")
Base = declarative_base()
Session = sessionmaker(bind=engine)
session = Session()

### Object Relational Mapping (ORM)

In [None]:
from sqlalchemy import Column, Integer, String, Float

class People(Base):
    __tablename__ = "people"
    
    id = Column(Integer, primary_key=True)
    name = Column(String)
    surname = Column(String)
    fastfood = Column(String)
    gender = Column(String)
    highscore = Column(Float)
    
class FastFood(Base):
    __tablename__ = "fastfood"
    
    id = Column(Integer, primary_key=True)
    restaurant = Column(String)
    foodtype = Column(String)


In [None]:
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

In [None]:
cara = People(name="Cara", surname="Pienaar")
laurence = People(name="Laurence", surname="Sonnenberg", fastfood="KFC")
session.add(cara)
session.add(laurence)

### Querying

In [None]:
people = session.query(People).all()
for person in people:
    print(person.name, person.fastfood)

In [None]:
result = session.execute("SELECT * FROM people")
result.fetchall()

In [None]:
people = session.query(People).filter_by(fastfood="KFC").first()
people.name

In [None]:
people = session.query(People).filter_by(fastfood="KFC").all()
for person in people:
    print(person.name)

In [None]:
# commit and close the session to save
session.commit()
session.close()

##### Working within a scoped_session

When using a scoped_session in a with block, you don't need to worry about commits, rollbacks and closing the session - it's all done for you.

In [None]:
Session = scoped_session(sessionmaker(bind=engine))

@contextmanager
def session_scope():
    """Provide a transactional scope around a series of operations."""
    session = Session()
    try:
        yield session
        session.commit()
    except Exception as error:
        print(error)
        session.rollback()
        raise
    finally:
        Session.remove()

In [None]:
with session_scope() as session:
    cara = People(name="Kara", surname="Pienaar")
    session.add(cara)

In [None]:
with session_scope() as session:
    people = session.query(People).all()
    for person in people:
        print(person.name)

## Putting it all together

## Useful Packages

- csv
- os
- savReaderWriter
- openpyxl

## Other Cool Stuff

### Numpy
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

### Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

### Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

### Geopandas
GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. 

### scikit-learn
Machine Learning in Python.

### Tornado
Tornado is a Python web framework and asynchronous networking library.

### Flask
Flask is a micro web framework written in Python. 

### xlwings
xlwings is a Python library that makes it easy to call Python from Excel and vice versa.

### Scrapy
Scrapy is a web-crawling framework written in Python

## JupyterLab

JupyterLab is an interactive development environment for working with notebooks, code and data. JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

Some cool things you can do in JupyterLab:
* Drag-and-drop to reorder notebook cells and copy them between notebooks.
* Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
* Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
* Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.


## JupyterHub
### Why use JupyterHub?
You can install Jupyter locally and work on your own laptop. There are two main problems with this:
1. Your work is not automatically backed up - you will have to remember to commit to a repo or copy your notebooks to the T-drive.
2. Your laptop might not be powerful enough for the operations you need to perform.

These problems can both be solved by working on Eighty20's JupyterHub instead.

JupyterHub is accessed at https://192.168.1.93.

### Creating a symlink
In order to more easily access files on the t-drive (e.g. data files) you can create a symlink to a folder in your home directory.

In a terminal, execute:

    ln -s path_to_directory_to_link_to path_to_local

For example, to create a link to the whole of the t-drive, ```cd``` into your home directory:

    cd ~/

And then create the symlink:

    ln -s /t-drive/ t-drive

### Creating your environment

You can now create your virtual environment and pip install any necessary libraries.

```
python3 -m virtualenv venv --python=python3
```
Activate the venv with
```
source venv/bin/activate
```
You will also need to install ipython kernel to use the environment within jupyter notebooks:
```
install the ipython kernel
pip install ipykernel
ipython kernel install --user --name=projectname
```