# Introduction to Python, the Intuitive Way
#### 29 September, 2018
#### Author: Jeanne Elizabeth Daniel

Please run me in a Colab environment!

In [1]:
print("Hello World!")

Hello World!


### What is Python?
Python is an open-source, modern, robust, high level programming language. Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.

#### What is open-source? 
Open-Source code is original source code made freely available and which may be redistributed and modified.

#### Why Modern?
Released in 1991, making it one of the younger languages, (and a millenial)

#### Why so robust? 
In Computer Science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Python features a dynamic type system (interpreting code line by line) and automatic memory management, making it ideal for quick prototyping as well as constructing large, complicated systems. 

#### What does high-level language even mean? 
A high-level language (HLL) is a programming language that enables a programmer to write programs that are more or less independent of a particular type of computer. Such languages are considered high-level because they are closer to human languages and further from machine languages.

More general programinning lingo can be found at https://hackernoon.com/i-finally-understand-static-vs-dynamic-typing-and-you-will-too-ad0c2bd0acc7

It is very easy to pick up Python even if you are completely new to programming. (I'll prove it to you!)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

### Essential Libraries 
*See Notebooks 0.1, 0.2, 0.3 for optional tutorials on these three libraries, which you can work through whenever you have time.*

Numpy, Pandas and Matplotlib.pyplot are arguably the most useful tools available to any scientist, and just with these three libraries you can do incredible things. We import the libraries we'll be making use of at the top of our script, always. We don't import unnecessary libraries as they all take up memory. 

#### Numpy
NumPy is the fundamental package for scientific computing with Python. More information found at http://www.numpy.org
Numpy provides
  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation (And Statistical Tools)
  
#### Pandas
Pandas is the most powerful and flexible open source data analysis / manipulation tool available in any language. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. More found at https://pandas.pydata.org

Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating point data
  - Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
  - Automatic and explicit data alignment: objects can  be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in
    computations
  - Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
  - Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
  - Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
  - Intuitive merging and joining data sets
  - Flexible reshaping and pivoting of data sets
  - Hierarchical labeling of axes (possible to have multiple labels per tick)
  - Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format
  - Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

#### Matplotlib.pyplot
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code. More information and tutorials can be found at https://matplotlib.org

In [8]:
print("Let's get down to business")
if True:
    print("Notebook agrees")
else:
    print("Please free me")

Let's get down to business
Notebook agrees


## Back to Basics
We will be covering the following topics -- datatypes, arrays, string functions, essential datastructures in python, boolean statements, and coding conventions.

There are many more components to programming, but for getting started, you'll only need to have a basic understanding of these. 

### Datatypes
There are a few primitive datatypes essential for programming: 
    - booleans (True, False)
    - integers (-1, 5, 30, -455, etc)
    - floats   (0.555555, 3.14, 6.89, etc)
    - chars    ('a', '5', '#', '!', etc)
    
These primitive types are so simple but so powerful. The whole world wide web, every database, every programming function makes use of and relies on these primitive datatypes. 

#### What are Booleans?
A boolean is a datatype that can only be one of two values, True or False. This can be used to make decisions in functions, and especially form the basis for If, While, and For functions. 

With booleans come Boolean Algebra (don't panic, this is the quickest Algebra you will ever master). 
There are two boolean algebra operations:
    - AND
    - OR
In more complicated languages, AND is written as &&, and OR is written as ||.

In [25]:
print("And => both must") 
print("True and True    =", True and True)
print("True and False   =", True and False)
print("False and True   =", False and True)
print("False and False  =", False and False)

And => both must
True and True    = True
True and False   = False
False and True   = False
False and False  = False


In [27]:
print("Or => either can") 
print("True or True     =", True or True)
print("True or False    =", True or False)
print("False or True    =", False or True)
print("False or False   =", False or False)

Or => either can
True or True     = True
True or False    = True
False or True    = True
False or False   = False


Congratulations! You just mastered boolean algebra operations!

#### What are integers?
Integers are just plain round numbers, and can be considered countable infinite (why?) We use integers to index in lists and arrays. Computers do powerful and quick math. And they love numbers. Try doing 133*57 in your head?

In [20]:
133*57

7581

Don't worry, computers still can't think for themselves though. 

So what are the operations we can do with integers? Well, all the usual mathy stuff:
    - add
    - subtract
    - multiply
    - divide
    - modular
    - yeah, that's about it.
    
To this day, it still blows my mind that the whole world's computer systems runs using these simple operations. They are like the mitochondrial DNA of all things computers.

NOTE: dividing integers by one another will produce floats

In [24]:
print("3+1=", 3+1)
print("3-1=", 3-1)
print("3*1=", 3*1)
print("3/1=", 3/1, "(see here the resulting float)")
print("3%1=", 3%1)

3+1= 4
3-1= 2
3*1= 3
3/1= 3.0 (see here the resulting float)
3%1= 0


#### What are Floats?
Floats include everything that happens between integers. For example, between the integers 0 and 1 there is nothing. Nada. 

Between the floats 0.0 and 1.0 there is a continous and infinite amount of floating point numbers.

0.1, 0.2, 0.3, 0.4... But also 

0.11, 0.12, 0.13, 0.14... And even further:

0.111, 0.112, 0.113, 0.114...


Because of this property, floats are uncountable infinite. 

In [34]:
print("1/3           =", 1/3)
print("pi            =", np.pi)
print("random number =", np.random.random())

1/3           = 0.3333333333333333
pi            = 3.141592653589793
random number = 0.6569397419844438


Note: just because the computer prints out that amount of numbers, does not mean the digits following the . necessarily end there. For example the digits of PI are infinite. 

#### What are chars?
Chars are short for characters, which is a data type that holds one character (letter, number, etc.) of data.. Chars are captured between quotation marks. Some operands, like + and *, can be used on chars.

For example, 'a', '4', '#', etc, are all chars. 

Fun fact: strings are char arrays.

In [41]:
'a' + 'b'

'ab'

In [42]:
'a' - 'b'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

In [43]:
'a'*5

'aaaaa'

### Arrays
An one-dimensional array is an Nx1 dimensional grid-like data store. Don't overthink it. Think about it like this: 
1. We have a bookshelf. 
2. All the books are indexed, starting from 0, and ending at N, where N is an arbitrary large integer. 
3. Each book on the shelf is a single, discrete entity, representing a value. It can be any value, 20000, or 2, or 200. 
4. If we want to access the value at index i, we need only look through all the indexes (on the books) until we find i, and then we will know the value that exists at bookshelf[i].

This is the essence of arrays. It is a very simple, but powerful data structure, that has an index i = 0...N and values stored at each index i of that array. 

So how do we code arrays? With Numpy of course!

#### One-dimensional arrays

In [46]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

#### Two-dimensional arrays (matrices)

In [47]:
np.array([[1, 2], [3, 4]])

array([[1, 2],
       [3, 4]])

A lot more content on Numpy and arrays can be found in Notebook 01

### String Functions
Before we can define string functions, we should probably define strings. So, what are strings?

##### Strings are sequences/arrays of chars.

Pay close attention here, because we will be using this information to build our chatbot later!

#### Char

In [50]:
'a'

'a'

#### String: a sequence of chars

In [51]:
'a' + 'b' + 'c' + 'd'

'abc'

Strings enable us to build powerful tools, because they can store so much information! 

Let's take a look at some of the most commonly used built-in string functions:

    - str.find()
    - str.count()
    - str.lstrip()
    - str.rstrip()
    - str.join()
    - str.split()
    - str.replace()
    - str.upper()
    - str.lower()
    - str.capitalize()
    
Boolean functions are functions that return True or False, and they usually start with "is":
    - str.isalnum()
    - str.isalpha()
    - str.islower()
    - str.isnumeric()
    - str.isdigit()
    - str.isspace()
    - str.istitle()
    - str.isupper()

    
There are some other tricks we can do, like determining the length of a string, using len(). We replace the *str* with our chosen string, for example: 

In [3]:
print("'hello' is all lower case: ", "hello".islower())

'hello' is all lower case:  True


In [5]:
print("'12345' are all numeric: ", "12345".isnumeric())

'12345' are all numeric:  True


The difference between str.isalpha() and str.isalnum() is that the first only looks for alphabet letters, where the second one will return true if there are either alphabet or numerical values.

In [6]:
"12alphabet".isalpha()

False

In [8]:
"12alphabet".isnumeric()

False

In [7]:
"12alphabet".isalnum()

True

Let's take a look at some of the built-in functions. Now, might we learn if a string contains a certain character?

In [10]:
big_string = "the quick brown fox jumped over the hedge"
sub_string = "fox"

Say we want to know if the big string *"the quick brown fox jumped over the hedge"* contained the substring, *"fox"*. Because strings are just sequences of chars, each char also has an index, starting at 0. 

The function, str.find() will return the starting index of the substring, if it is found in the big string, otherwise it will return -1. Observe:

In [11]:
big_string.find(sub_string)

16

Looks like the substring "fox" is at the 16th character in our big string!

In [12]:
big_string.find("random")

-1

Obviously the substring "random" does not occur in our big string.

#### Quick Look at Functions
If we are only interested in knowing whether or not the substring is contained in the big string or not, we can write a boolean function for that:

In [20]:
def contains(big_string, sub_string):
    if big_string.find(sub_string) > -1:
        return True
    else:
        return False

Okay whoah. So what happened here? We just wrote our first functions! 

Functions in Python have the following skeleton where we:
    - indicate the start of a function with the word **def**
    - the name of the function(what you will use to call it), in our case **contains**
    - followed by all the parameters you need to pass to it, enclosed in brackets, in our case the **(big_string, substring)**
    - this first line is finished off with a compulsory **:**, where your code begins
    - functions can return any datatype you like -- indicated by the word **return**. 
    - if you don't specify a return value, it will return a **NoneType**

In [24]:
def none_type_demo():
    print("")

In [25]:
print(none_type_demo())


None


#### Back to strings!
See? All good!
Now we will test our contains function on some other big strings and substring, feel free to play around with them! Note how the current **contains** function is case-senstive. 


In [21]:
contains(big_string, sub_string)

True

In [22]:
contains(big_string, sub_string.upper())

False

<div class="alert alert-success" data-title="Case Insensitive Contains Function">
  <h1><i class="fa fa-tasks" aria-hidden="true"></i> Exercise: Case Insensitive Contains Function</h1>
</div>

Machines don't really care (or know) about upper or lowercase. For them "a" and "A" are as far apart as "a" and "#". How do we make the machine care?

Try to change the **contains** function to be CASE-INSENSITIVE, i.e. it should match regardless of the word being uppercase or lowercase. 

**Hint**: take a look at the boolean functions for strings, as well as the line just above.

In [None]:
def case_insensitve_contains(big_string, sub_string):
    # your code goes here

#### Join, Split, Replace String Functions