## Jupyter & Python
### Based on the [DSA 2017](http://www.datascienceafrica.org/dsa2017/) lecture by [Ernest Mwebaze](http://air.ug/~emwebaze/)


Jupyter Notebook allows you to write, document and run your code in one place. For data science work it presents a very convenient tool for quickly prototyping your scripts and algorithms.

Also it includes so many plug-ins for example you can create presentation slides on the fly, you can export the notebook to a python script file, to HTML, to pdf, to LaTeX, etc. Its a great tool.

In this lesson we will cover:
* Jupyter Notebook
* Basic Python: Basic data types and data structures (Lists, Dictionaries, Tuples, Sets), Functions, Classes
* Numpy: Arrays, Array indexing, Array functions


For a brief introduction to Markdown see [this](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

For a list of Python Tutorials see [this](https://wiki.python.org/moin/BeginnersGuide/Programmers)

As you proceed in your coding journey, you should pay great attention to coding style. The [Google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md) a good place to start.


## Jupyter Notebook

Jupyter Notebook is cell-based. A cell can contain Markdown like this one. 

In python the hello world program is a single line. Executing the cell below will print the sentence "Hello world!"

In [None]:
print("Hello World!")

## Python

Ensure you are using version 3 of Python

In [None]:
!python --version

### Basic data types

In [1]:
# The usual data types prevail in Python - numbers, strings, boolean
print(23 + 45 - 5)
print(4 * 5)
print(2 ** 3) # Exponential
print(1/2)

63
20
8
0.5


In [2]:
string_one = "Mimi ni mkenya"
string_two = 'na mimi ni mwafrika'
print(string_one)
print(string_one.upper())
print(string_one.lower())
print(string_one.capitalize())
print(string_one + string_two)
print(string_one + ' ' + string_two)

Mimi ni mkenya
MIMI NI MKENYA
mimi ni mkenya
Mimi ni mkenya
Mimi ni mkenyana mimi ni mwafrika
Mimi ni mkenya na mimi ni mwafrika


In [3]:
type(string_one)

str

In [5]:
# Boolean data type
print(23 > 4)
print(1 == 1)
print(2 != 2.0)
print(3 == 5 and 4 < 1)
print(not True)
import numpy as np

bool_array = np.array([True, False])
bool_array.astype(int)

True
True
False
False
False


array([1, 0])

### Exercise 1: Data types

Qn1. Calculate $45^3$

In [6]:
#Answer:
print(45 ** 3)

91125


### Basic data structures

Python includes several built-in data structures: lists, dictionaries, sets, and tuples.

**Lists**

A list is an ordered collection of objects. These may be of different types, integers, strings, other lists etc

In [7]:
list_of_stuff = ['Alex', 12, 'Peter', [6, 85], 50000]
names = ['Alex', 'Peter', 'Jane', 'Doe']

In [8]:
print(names.count('Maina'))
type(list_of_stuff[3])

0


list

In [9]:
list_of_stuff

['Alex', 12, 'Peter', [6, 85], 50000]

Slicing in lists. Slicing syntax : *names[start:stop:stride]*

In [10]:
print(names[0]) # Python is zero-indexed
print(names[::-1]) # invert order of list
print(names[-1])# get last item of list
print(names[-2])# get second last item of list
names.append('Maina')
names
"Maina" in names

Alex
['Doe', 'Jane', 'Peter', 'Alex']
Doe
Jane


True

You can see all applicable methods for an object by using the *dir* command

In [11]:
dir(names)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

**Dictionaries**

A dictionary is a key-value mapping data structure or container. 

In [None]:
student_ages = {'Alex': 24, 'Jane':19, 'Doe':27}

In [None]:
student_ages

In [None]:
student_ages.keys()

In [None]:
student_ages['Doe']

In [None]:
student_ages.items()

**Tuples**

Tuples are immutable data structures or containers. They cannot be ammended once created

In [None]:
names = ('John', 'Doe', 'Jane')

In [None]:
names[2]

**Sets**

Sets specify unique items

In [None]:
a = ['a', 'b', 'b', 'a', 'd', 'c', 'c']
set(a)

## Control flow

for loop

In [12]:
names

['Alex', 'Peter', 'Jane', 'Doe', 'Maina']

In [14]:
for name in names:
    print(name)

Alex
Peter
Jane
Doe
Maina


In [15]:
for index in range(len(names)):
    print(index, names[index])

0 Alex
1 Peter
2 Jane
3 Doe
4 Maina


In [16]:
for index, name in enumerate(names): # Inbuilt enumerate function to do the same thing
    print(index, name)

0 Alex
1 Peter
2 Jane
3 Doe
4 Maina


if/elif/else loops

In [17]:
age = 30

if age > 20:
    print('Adult identified')
elif age < 15:
    print('Child in sight')
else:
    print('youth identified')

Adult identified


More loops

In [18]:
student_names = ['Peter', 'Alex', 'Jane', 'Doe']
for name in student_names:
    if name.startswith('A'):
        print(name)

Alex


While loop

In [19]:
feedback = "This is going extremely well - easy stuff"
index = 0
while index < len(feedback):
    print(feedback[index])
    index += 1
    

T
h
i
s
 
i
s
 
g
o
i
n
g
 
e
x
t
r
e
m
e
l
y
 
w
e
l
l
 
-
 
e
a
s
y
 
s
t
u
f
f


List comprehensions

In [20]:
names_lowercase = [name.lower() for name in student_names]

In [21]:
[i**2 for i in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [22]:
names_lowercase

['peter', 'alex', 'jane', 'doe']

## Functions

Function definition - based on the key word def and instead of brackets uses spacing to separate out the parts of the function (scope). Key things about functions:

* the **def** keyword;
* is followed by the function’s name, then
* the arguments of the function are given between parentheses followed by a colon.
* the function body;
* and return object for optionally returning values.

In [23]:
def hello_world():
    print("Hello, World!")

In [24]:
# Call function
hello_world()

Hello, World!


In [25]:
# Function to do stuff - add any two numbers
# Use key word "return" to *return* output from function
def add_numbers(a, b):
    answer = a + b
    return answer

In [29]:
print(add_numbers(2, 8))

# Try with string inputs
# print(add_numbers('abc', 'def'))

10
abcdef


In [27]:
#Q2 write a function that returns the nth number in the Fibonacci sequence 

### Modules and packages

A module is sequence of instructions written in one file. It could also contain multiple functions. If you have some code that you want to re-use of that you are using in several other places in your code then its best to write it out as a module.

Python comes with inbuilt modules that you can use in your own code without reimplementing the functionality.

A collection of many modules together forms a package. One strength of Python is that it has a huge number of packages that can let you do all sorts of things.

In [30]:
import sys
import os
import numpy as np

In [31]:
sys.version

'3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]'

In [32]:
os.path

<module 'posixpath' from '/home/ciira/anaconda3/lib/python3.6/posixpath.py'>

In [33]:
np.mean(np.array([2,3,4,5]))

3.5

In [34]:
?np.mean #find out what a function does

Some key modules particularly for data science include Sklearn for machine learning, Pandas for data analytics, matplotlib for plotting and numpy for numerical functions

## Numpy

Numpy is an extension package for Python that enables scientific computation. It allows array oriented computing which makes doing computations very efficient. Its a must use package for data analytics with Python

In [35]:
import numpy as np
?np.sum

In [36]:
alist = [1,2,3,4] #1D array
test_array = np.array(alist)
type(test_array)

numpy.ndarray

Multidimensional arrays

In [39]:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
x = np.array([1,1,1])
print(np.dot(A,x))
print(np.sum(A,1))


[ 6 15 24]
[ 6 15 24]


Some np array methods

In [40]:
print(A.shape)
print(A.ndim)

(3, 3)
2


Creating arrays

In [41]:
c = np.arange(10)
d = np.linspace(0,10,15) # start, end, num points
e = np.ones(4)
f = np.zeros((4,5))

In [42]:
type(c)

numpy.ndarray

In [43]:
print(d)
print(e)
print(f)

[  0.           0.71428571   1.42857143   2.14285714   2.85714286
   3.57142857   4.28571429   5.           5.71428571   6.42857143
   7.14285714   7.85714286   8.57142857   9.28571429  10.        ]
[ 1.  1.  1.  1.]
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]


In [44]:
np.random.rand(5) 

array([ 0.52257511,  0.97401439,  0.59523521,  0.7670781 ,  0.48843262])

### Indexing and slicing

In [45]:
a = np.arange(10)
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [46]:
# Slicing is [start, end, step]
print(a[0])
print(a[-1])
print(a[2:])
print(a[:8])
print(a[3:7])
print(a[::3])

0
9
[2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7]
[3 4 5 6]
[0 3 6 9]


In [47]:
children_data = np.random.randint(low=1, high=10, size=(4,2))
children_sex = np.array(['M','F','F','M'])

In [None]:
children_data

In [None]:
np.array(children_sex) == 'M'

In [None]:
#Get male children
children_data[children_sex=='M',:]

If the columns of the data represent current age and weight at birth, we can do some summarisations

In [None]:
# Sum of ages and weights
np.sum(children_data, axis=0)

In [None]:
# Get averages
np.sum(children_data, axis=0)/children_data.shape[0]

In [None]:
# Alternatively
np.mean(children_data, axis=0)

More information on Numpy arrays can be [found here](http://www.scipy-lectures.org/intro/numpy/index.html).

## Plotting

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
x=np.linspace(0,1,100)
plt.plot(x,np.sin(2*np.pi*x))

Q3 Plot $f(x)=cos(2\pi x)$ on the same plot above using a different colour and include a legend, title, x-axis label and y- axis label

In [None]:
?plt.plot
