 ## Data Analysis Using [Python](https://www.python.org)

![Gudio Van Rossum](gudio_van_rossum.jpg?raw=true)

# History of Python

* #### *Dec1989* - As a successor of ABC to provide exception handling
* #### *Feb1991* - First public release 0.9.0 had classes with inheritance, exception handling, functions and core data types
* #### *Jan1994* - Version 1.0 with functional programming
* #### *Oct2000* - Version 2.0 brings garbage collectors, Version 2.2 improves types to be fully object oriented
* #### *Dec2008* - Version 3.0 reduce feature duplication by removing old ways of doing things 

# Why Python ?
* #### Easy to learn
* #### Has efficient high level data structures
* #### Elegent Syntax and Dynamic typing
* #### Many third party libraries/modules
* #### Active community support


# Application
* #### Web and Internet Development
* #### Scientific and Numeric
* #### Software automation and testing

## Installation

* #### We will be using Anaconda distributed by [CONTINUUM](https://www.continuum.io/downloads) for this training
* #### You can download and install python and jupyter - here are the [instructions](https://github.com/sdonapar/python_training/blob/master/python_Installation_instructions.md)
* #### Anaconda is a completely free [Python](https://www.python.org) distribution (including for commercial use and redistribution). It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis
* #### Check the python version installed on our training desktop/laptop
* #### Setup the environment variables if not set already

## Python Interpreter

* The Python interpreter is usually installed as /usr/local/bin/python or /usr/bin/python
* On Windows machines, the Python installation is usually placed in C:\Python27
* If you are using Anaconda - it will be usually present in C:\anaconda2\bin

### REPL - read–eval–print loop

```python
>>> print("Let us start learning python")
Let us start learning python 
>>>
```

### Zen of Python

```python
>>> import this
>>>
```



## Introduction to [Jupyter](http://jupyter.org) notebooks

* #### Starting jupyter notebook
* #### How to get python help
* #### Walk thru basic operations
* #### Line and Cell [Magic commands](https://damontallen.github.io/IPython-quick-ref-sheets)

In [None]:
# My first program
print("I am learning python")

In [None]:
%lsmagic

In [None]:
%run run.py

In [None]:
%timeit [a for a in range(0,100000)]

### How to get help ?

In [None]:
%quickref

In [None]:
help(sum)

In [None]:
sum?

In [None]:
#Python is dynamically typed language
#Dynamically typed programming languages do type checking at run-time as opposed to Compile-time. 
iam_integer = 100
iam_float = 3.14
iam_str = "Hellow"
iam_bool = True
iam_complex = 3+4j

* Everything in Python is an object
* Everything in Python has a type
* **type** and **object** are special objects in python


In [None]:
print(type(iam_integer))
print(type(iam_float))
print(type(iam_str))
print(type(iam_bool))
print(type(iam_complex))

### Intriduction to objects and namespace

<img src="namespace.jpg" alt="Python Namespace" height="542" width="542" align="left">

In [1]:
a = 2
a = a +1
b = 2

<img src="namespace_example.png" alt="namespace" height="550" width="550" align="left">

* Every objects has an identity which is going to be unique
* **variable a** in the namespace points to object 2
* **variable a** in the namespace points moves to object 3 
* new name b is created in the namespace and points to object 2

In [2]:
print(id(a))
print(id(b))
print(id(2))

8106312
8106336
8106336


In [3]:
# There are many other names in this namespace which are brought in by Jupyter
print(dir())

['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__name__', '_dh', '_i', '_i1', '_i2', '_i3', '_ih', '_ii', '_iii', '_oh', '_sh', 'a', 'b', 'exit', 'get_ipython', 'quit']


In [4]:
# %load utilities.py
#!/usr/bin/env python

def my_dir(mylist):
    import re
    pattern = re.compile('_[0-9a-z]+')
    return [x for x in mylist if not pattern.match(x) and 
            x not in ('In','Out','_','__','___','exit','quit','get_ipython')] 

In [5]:
import utilities
print(utilities.my_dir(dir()))

['__builtin__', '__builtins__', '__doc__', '__name__', '__package__', 'a', 'b', 'my_dir', 'utilities']


## [Python Library Reference](https://docs.python.org/2/library/index.html)
#### [Built-in Functions](https://docs.python.org/2/library/functions.html) - Loaded when python is started
#### [Standard Library](https://docs.python.org/2/library/) - These are installed along with standard python installation, ex: sys, os, etc
#### [External modules](https://pypi.python.org/pypi) - Can be downloaded from the Python Package Index, ex: numpy, pandas, etc

## Numbers

In [None]:
2 + 2

In [None]:
5 * 3

In [None]:
100/21

In [None]:
100/21.0

In [None]:
import math
radius = 10 # 10 centimeters
area = math.pi * radius**2
print(area)

## Strings

* Strings in python are immutable

In [None]:
str_a = 'This string uses single quotes'

str_b = "This string uses double quotes"

str_c = """This is a multi line string
This is second line
This is third line
"""

str_d = "This doesn't contain escape characters"
str_e = 'There are some "SPECIAL" words in this sentence'
str_f = 'It is fine to use escape character\'s some times'

# There are some special character \t, \n, etc

str_g = "Everything in Python is an object\nEvery object in Python has type\nPython is dynamically typed language"

In [None]:
print(str_g)

### String methods

* Strings can be indexed
* startswith, endswith
* strip, split, replace, partition
* index,count, find
* upper, lower
* join, format
* string slicing


In [6]:
my_string = "Assets under administration : $5.2 trillion, including managed assets : $2.1 trillion"

In [7]:
len(my_string) # returns length of string

85

In [8]:
my_string.startswith("Assets") # Returns True or False

True

In [9]:
my_string.endswith("Fidelity") # Returns True or False

False

In [10]:
"   This is a test String ".strip() # removes the leading and trailing spaces

'This is a test String'

In [11]:
"This line has return line characters at the end\n\n\n".strip("\n")

'This line has return line characters at the end'

In [12]:
print(my_string.split()) # default delimiter is space

['Assets', 'under', 'administration', ':', '$5.2', 'trillion,', 'including', 'managed', 'assets', ':', '$2.1', 'trillion']


In [13]:
print(my_string.split(":")) # passing a delimiter
#What is the type of output of split ?
# What if there is no delimiter present in my_string ? would split operation fail ?

['Assets under administration ', ' $5.2 trillion, including managed assets ', ' $2.1 trillion']


In [14]:
my_string.find("$") # returns the index first occurance of character $

30

In [15]:
my_string.count("trillion") # returns the number of occurances of word/character

2

In [16]:
my_string.count("Fidelity") # returns 0 if the substring is not found

0

In [17]:
my_string.upper() # conversts to uppercase

'ASSETS UNDER ADMINISTRATION : $5.2 TRILLION, INCLUDING MANAGED ASSETS : $2.1 TRILLION'

In [18]:
my_string.index("trillion") # returns the starting index position of the sting

35

In [19]:
# string slicing
print(my_string[0:15])  # returns the character starting from zero till 15 ( excluding 15)
print(my_string[10:25]) # returns the character starting from 10 till 25 ( excluding 25)
print(my_string[25:])   # starting with 25 till the end of the string
print(my_string[:25])   # starting from the begining till 25 ( excluding 25)
print(my_string[:])     # complete string

Assets under ad
er administrati
on : $5.2 trillion, including managed assets : $2.1 trillion
Assets under administrati
Assets under administration : $5.2 trillion, including managed assets : $2.1 trillion


In [20]:
# String concatenation

my_statement = "This" + " " + "is" + " a " + "test statemet"
my_statement

'This is a test statemet'

In [None]:
print("*"*3 + " Title " + "*"*3)

In [None]:
# what happens if string "Title" is divided by 3 ?
# what happens if string integer 5 is added to string "Title" ?

## Exercises

Explore string "Monty Python"

<img src="fig_list_index.png" alt="namespace" height="550" width="550" align="left">


* Find the lenght of string "String in Python is an array of characters"
* How many occurance of "people" word are there in below sentence

Fidelity's goal is to make financial expertise broadly accessible and effective in helping people live the lives they want. With assets under administration of \$5.2 trillion, including managed assets of \$2.1 trillion as of April 30, 2015, we focus on meeting the unique needs of a diverse set of customers: helping more than 24 million people invest their own life savings, nearly 20,000 businesses manage employee benefit programs, as well as providing nearly 10,000 advisory firms with technology solutions to invest their own clients' money.

* Extract substring "assets under administration of \$5.2 trillion"  from above sentence using indicies
* Remove "." from the above sentence and split the sentence using "," as the delimiter



## Lists

* List is the  most versatile compound data type, which can be written as a list of comma-separated values (items) between square brackets
* Lists in python are mutable
* Items of list can be any python object
* [List methods](https://docs.python.org/2/tutorial/datastructures.html#more-on-lists): append, extend, insert, remove, pop, index, count, sort, reverse
* in statement to check the presence of an element

In [21]:
# list can have different types of objects
my_list = ['Python','java',25,32,43.55,'C++']

In [22]:
len(my_list) # returns lenght of the list

6

In [23]:
my_list[1] # returns second element of list

'java'

In [24]:
my_list[1] = 'Java' # list are mutable

In [25]:
my_list

['Python', 'Java', 25, 32, 43.55, 'C++']

In [28]:
new_list = my_list[0:3] # list slice

In [29]:
new_list

['Python', 'Java', 25]

In [30]:
my_list.append("DotNet") # appends string at the 

In [31]:
my_list

['Python', 'Java', 25, 32, 43.55, 'C++', 'DotNet']

In [32]:
my_list.extend(['R','SPSS','MATLAB']) # extending a list using another list

In [33]:
my_list

['Python', 'Java', 25, 32, 43.55, 'C++', 'DotNet', 'R', 'SPSS', 'MATLAB']

In [34]:
# list can contain duplicate items
my_list.append("Python")

In [35]:
print(my_list)

['Python', 'Java', 25, 32, 43.55, 'C++', 'DotNet', 'R', 'SPSS', 'MATLAB', 'Python']


In [36]:
my_list.count("Python")

2

In [37]:
# this modifies the original list, sort is in place
my_list.sort()

In [38]:
print(my_list)

[25, 32, 43.55, 'C++', 'DotNet', 'Java', 'MATLAB', 'Python', 'Python', 'R', 'SPSS']


In [39]:
# Please do not run this multiple times, pop removes an element each time
last_element = my_list.pop()
last_element

'SPSS'

In [40]:
my_list.index("Java")

5

In [None]:
'Python' in my_list # checking if object is present inside the list

In [None]:
# this modifies the original list, in place reverese
my_list.reverse()

In [43]:
print(my_list)

[25, 'SPSS', 32, 43.55, 'C++', 'DotNet', 'Java', 'MATLAB', 'Python', 'Python', 'R']


In [44]:
# inserting at 1st position ( Please note the index starts at 0)
my_list.insert(1,'SPSS')

In [45]:
print(my_list)

[25, 'SPSS', 'SPSS', 32, 43.55, 'C++', 'DotNet', 'Java', 'MATLAB', 'Python', 'Python', 'R']


In [46]:
# List of Lists
list_of_lists = [['Python','C++','Java'],[2.7,4.2,8.0],['Object',2.5],'Main']
list_of_lists

[['Python', 'C++', 'Java'], [2.7, 4.2, 8.0], ['Object', 2.5], 'Main']

In [None]:
list_of_lists[0]

In [47]:
# Accessing list
for item in my_list: # iterates from the first element to last element
    print "Programming Language : ", item

Programming Language :  25
Programming Language :  SPSS
Programming Language :  SPSS
Programming Language :  32
Programming Language :  43.55
Programming Language :  C++
Programming Language :  DotNet
Programming Language :  Java
Programming Language :  MATLAB
Programming Language :  Python
Programming Language :  Python
Programming Language :  R


## Tuples

* Tuples are very similar to Lists except that they are not mutable
* A tuple consists of a number of values separated by commas enclosed in round brackets
* Tuples can contain mutable objects like lists

In [48]:
my_tuple = 'Equity', # observe the comma at the end

In [49]:
my_tuple

('Equity',)

In [50]:
another_tuple = ('Equity Fund','1 Year',13.5)

In [51]:
another_tuple[1]

'1 Year'

In [52]:
sorted(another_tuple) # Pelase see the type of the output

[13.5, '1 Year', 'Equity Fund']

In [53]:
len(another_tuple)

3

In [54]:
tuple_list = ([1,2,3],['a','b','c'],'Another String')

In [55]:
tuple_list[0].append(4)

In [56]:
tuple_list

([1, 2, 3, 4], ['a', 'b', 'c'], 'Another String')

## Sets

* A set is an unordered collection with no duplicate elements
* Basic uses include membership testing and eliminating duplicate entries
* Support mathematical operations like union, intersection, difference, and symmetric difference.

In [57]:
instrument_types = ['Equity','Fixed Income','Equity','Money Market']
instrument_types_set = set(instrument_types)

In [58]:
instrument_types_set

{'Equity', 'Fixed Income', 'Money Market'}

In [59]:
another_set = {'Fixed Deposits','Equity'}

In [60]:
instrument_types_set.union(another_set)

{'Equity', 'Fixed Deposits', 'Fixed Income', 'Money Market'}

In [61]:
instrument_types_set.intersection(another_set)

{'Equity'}

In [62]:
instrument_types_set - another_set

{'Fixed Income', 'Money Market'}

In [63]:
instrument_types_set ^ another_set #items in instrument_types_set or another_set but not both

{'Fixed Deposits', 'Fixed Income', 'Money Market'}

## Dictionaries

* Associative arrays or Hash tables
* An unordered set of key: value pairs
* Key should by any immutable type

In [64]:
my_dict = {'name':'Python','version':2.7,'objects':['List','Tuple','Set']}

In [65]:
my_dict_list = dict([('x',20),('y',40)])

In [66]:
my_dict_list

{'x': 20, 'y': 40}

In [67]:
person = dict(name='Mark',age=25,language='English')

In [68]:
person

{'age': 25, 'language': 'English', 'name': 'Mark'}

In [69]:
my_dict['name']

'Python'

## Exercises

* Explore [**range**](https://docs.python.org/2/tutorial/controlflow.html#the-range-function) function
* Explore [**del**](https://docs.python.org/2/tutorial/datastructures.html#the-del-statement) statement
* Explore [**format**](https://docs.python.org/2/tutorial/inputoutput.html#fancier-output-formatting) function

In [None]:
# Exercise - 1
# There are 5 items in a buffet  - Roti , mushroom curry, Salads, Palak Panner, Veg Palao
# Create a list in the orde these items are arranged for buffet
# Iterate over the list and print the item sequence numer and name (hint : explore enumarte function)

In [None]:
# Exercise - 2
a = [1,2,4,3,5]
# re-arrage the oder of elments  to 1,2,3,4,5 using del bultin function and insert list method