## Jupyter & Python
### Based on the [DSA 2017](http://www.datascienceafrica.org/dsa2017/) lecture by [Ernest Mwebaze](http://air.ug/~emwebaze/)


Jupyter Notebook allows you to write, document and run your code in one place. For data science work it presents a very convenient tool for quickly prototyping your scripts and algorithms.

Also it includes so many plug-ins for example you can create presentation slides on the fly, you can export the notebook to a python script file, to HTML, to pdf, to LaTeX, etc. Its a great tool.

In this lesson we will cover:
* Jupyter Notebook
* Basic Python: Basic data types and data structures (Lists, Dictionaries, Tuples, Sets), Functions, Classes
* Numpy: Arrays, Array indexing, Array functions


For a brief introduction to Markdown see [this](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

For a list of Python Tutorials see [this](https://wiki.python.org/moin/BeginnersGuide/Programmers)

As you proceed in your coding journey, you should pay great attention to coding style. The [Google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md) a good place to start.


## Jupyter Notebook

Jupyter Notebook is cell-based. A cell can contain Markdown like this one. 

In python the hello world program is a single line. Executing the cell below will print the sentence "Hello world!"

In [None]:
print("Hello World!")

## Python

Ensure you are using version 3 of Python

In [None]:
!python --version
!ls

### Basic data types

In [None]:
# The usual data types prevail in Python - numbers, strings, boolean
print(23 + 45 - 5)
print(4 * 5)
print(2 ** 3) # Exponential
print(1 // 2)

In [None]:
string_one = "mimi ni mkenya"
string_two = 'na mimi ni mwafrika'
print(string_one)
print(string_one.upper())
print(string_one.lower())
print(string_one.capitalize())
print(string_one + string_two)
print(string_one + ' ' + string_two)

In [None]:
type(string_one)

In [None]:
# Boolean data type
print(23 > 4)
print(1 == 1)
print(2 != 2.0)
print(3 == 5 and 4 < 1)
print(not True)
import numpy as np

bool_array = np.array([True, False])
bool_array.astype(int)

### Exercise 1: Data types

Qn1. Calculate $45^3$

In [None]:
#Answer:
print(45 ** 3)

### Basic data structures

Python includes several built-in data structures: lists, dictionaries, sets, and tuples.

**Lists**

A list is an ordered collection of objects. These may be of different types, integers, strings, other lists etc

In [None]:
list_of_stuff = ['Alex', 12, 'Peter', [6, 85], 50000]
names = ['Oduor', 'Kinaro', 'Oduor', 'Jane', 'Doe']

In [None]:
print(names.count('Oduor'))
type(list_of_stuff[3])

In [None]:
list_of_stuff

Slicing in lists. Slicing syntax : *names[start:stop:stride]*

In [None]:
print(names[0]) # Python is zero-indexed
print(names[::-1]) # invert order of list
print(names[-1])# get last item of list
print(names[-2])# get second last item of list
names.append('Maina')
print(names)
"Maina" in names

You can see all applicable methods for an object by using the *dir* command

In [None]:
dir(names)

In [None]:
names.index('Oduor')

**Dictionaries**

A dictionary is a key-value mapping data structure or container. 

In [None]:
student_ages = {'Alex': 24, 'Jane':19, 'Doe':27}

In [None]:
student_ages

In [None]:
student_ages.keys()

In [None]:
student_ages['Doe']

In [None]:
student_ages.items()

**Tuples**

Tuples are immutable data structures or containers. They cannot be ammended once created

In [None]:
names = ('John', 'Doe', 'Jane')

In [None]:
names[2] = 'x'

**Sets**

Sets specify unique items

In [None]:
a = ['a', 'b', 'b', 'a', 'd', 'c', 'c']
set(a)

## Control flow

for loop

In [None]:
names

In [None]:
for name in names:
    print(name)
    
# ugly alternative

for i in range(len(names)):
    print(names[i])

In [None]:
for index in range(len(names)):
    print(index, names[index])

In [None]:
for index, name in enumerate(names): # Inbuilt enumerate function to do the same thing
    print(index, name)

if/elif/else loops

In [None]:
age = 30

if age > 20:
    print('Adult identified')
elif age < 15:
    print('Child in sight')
else:
    print('youth identified')

More loops

In [None]:
student_names = ['Peter', 'Alex', 'Jane', 'Doe']
for name in student_names:
    if name.startswith('A'):
        print(name)

While loop

In [None]:
feedback = "This is going extremely well - easy stuff"
index = 0
while index < len(feedback):
    print(feedback[index])
    index += 1
    

List comprehensions

In [None]:
names_lowercase = [name.lower() for name in student_names if name.startswith('A')]

In [None]:
[i**2 for i in range(10)]

In [None]:
names_lowercase

## Functions

Function definition - based on the key word def and instead of brackets uses spacing to separate out the parts of the function (scope). Key things about functions:

* the **def** keyword;
* is followed by the function’s name, then
* the arguments of the function are given between parentheses followed by a colon.
* the function body;
* and return object for optionally returning values.

In [None]:
def hello_world():
    print("Hello, World!")

In [None]:
# Call function
hello_world()

In [None]:
# Function to do stuff - add any two numbers
# Use key word "return" to *return* output from function
def add_numbers(a, b):
    answer = a + b
    return answer

In [None]:
print(add_numbers(2, 8))

# Try with string inputs
print(add_numbers('abc', 'def'))

In [None]:
#Q2 write a function that returns the nth number in the Fibonacci sequence 

### Modules and packages

A module is sequence of instructions written in one file. It could also contain multiple functions. If you have some code that you want to re-use of that you are using in several other places in your code then its best to write it out as a module.

Python comes with inbuilt modules that you can use in your own code without reimplementing the functionality.

A collection of many modules together forms a package. One strength of Python is that it has a huge number of packages that can let you do all sorts of things.

In [None]:
import sys
import os
import numpy as np

In [None]:
sys.version

In [None]:
os.path

In [None]:
np.mean(np.array([2, 3, 4, 5]))

In [None]:
?np.mean #find out what a function does

Some key modules particularly for data science include Sklearn for machine learning, Pandas for data analytics, matplotlib for plotting and numpy for numerical functions

## Numpy

Numpy is an extension package for Python that enables scientific computation. It allows array oriented computing which makes doing computations very efficient. Its a must use package for data analytics with Python

In [None]:
import numpy as np
?np.sum

In [None]:
alist = [1, 2, 3, 4] #1D array
test_array = np.array(alist)
type(test_array)

Multidimensional arrays

In [None]:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
x = np.array([1,1,1])
print(np.dot(A,x))
print(np.sum(A,0))


Some np array methods

In [None]:
print(A.shape)
print(A.ndim)

Creating arrays

In [None]:
c = np.arange(10)
d = np.linspace(0,10,15) # start, end, num points
e = np.ones(4)
f = np.zeros((4,5))

In [None]:
type(c)

In [None]:
print(len(d))
print(e)
print(f)

In [None]:
np.random.rand(5) 

### Indexing and slicing

In [None]:
a = np.arange(10)
print(a)

In [None]:
# Slicing is [start, end, step]
print(a[0])
print(a[-1])
print(a[2:])
print(a[:8])
print(a[3:7])
print(a[::3])

In [None]:
children_data = np.random.randint(low=1, high=10, size=(4,2))
children_sex = np.array(['M','F','F','M'])

In [None]:
children_data

In [None]:
np.array(children_sex) == 'M'

In [None]:
#Get male children
children_data[children_sex=='M',:]

If the columns of the data represent current age and weight at birth, we can do some summarisations

In [None]:
# Sum of ages and weights
np.sum(children_data, axis=0)

In [None]:
# Get averages
np.sum(children_data, axis=0)/children_data.shape[0]

In [None]:
# Alternatively
np.mean(children_data, axis=0)

More information on Numpy arrays can be [found here](http://www.scipy-lectures.org/intro/numpy/index.html).

## Plotting

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
x=np.linspace(0,1,100)
plt.plot(x,np.sin(2*np.pi*x))

Q3 Plot $f(x)=cos(2\pi x)$ on the same plot above using a different colour and include a legend, title, x-axis label and y- axis label

In [None]:
?plt.plot
