In [None]:
%matplotlib notebook

import pandas as pd
import numpy as np
import matplotlib

from matplotlib import pyplot as plt
import seaborn as sns

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                  columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
df.plot(); plt.legend(loc='best')

# Python libraries and Data Structures

- Lists They are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.
Here is a quick example to define a list and then access it:

In [1]:
squares_list = [1,2,4,9,16,25]

In [2]:
squares_list

[1, 2, 4, 9, 16, 25]

Individual Elements can be accessed by using array indexing just as we do in C, C++ etc

In [3]:
squares_list[0]

1

In [4]:
squares_list[2]

4

A range of script can be accessed by using [:]

In [5]:
squares_list[:] # prints all

[1, 2, 4, 9, 16, 25]

In [6]:
squares_list[:3]# excludes the last index

[1, 2, 4]

A negative indexing is also possible..
It access the list from the end..

In [7]:
squares_list[-1]

25

In [8]:
squares_list[::-1]# reverse a list

[25, 16, 9, 4, 2, 1]

- Strings They can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Strings enclosed in tripe quotes ( ”’ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you can not change part of strings.

In [9]:
greet = ' Hello'
print(greet[0])
print(greet)
print(greet+' World')

 
 Hello
 Hello World


Raw strings can also be used as a string as is by adding r.

In [10]:
stmt = r'\n is a newline character'
stmt

'\\n is a newline character'

In [11]:
# Python Strings are immutable and hence cannot be changed as such
greet[1:] = 'h'

TypeError: 'str' object does not support item assignment

- Tuples A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.Since Tuples are immutable and can not change, they are faster in processing as compared to lists. Hence, if your list is unlikely to change, you should use tuples, instead of lists.

In [12]:
tuple_example = 0, 1, 4, 9, 16, 25

In [13]:
tuple_example # in parenthesis

(0, 1, 4, 9, 16, 25)

In [14]:
tuple_example[2]

4

In [15]:
tuple_example[:]

(0, 1, 4, 9, 16, 25)

In [16]:
tuple_example[2] = 6 # immutable so error

TypeError: 'tuple' object does not support item assignment

- Dictionary It is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. 

In [17]:
extensions = {'a':65, 'b':66, 'c':67}

In [18]:
extensions

{'a': 65, 'b': 66, 'c': 67}

In [19]:
extensions['a']

65

In [20]:
extensions['d'] # error key doesn't exists

KeyError: 'd'

In [21]:
extensions.keys()

dict_keys(['a', 'b', 'c'])

In [22]:
extensions.values()

dict_values([65, 66, 67])

- Python Iteration and Conditional Constructs
Like most languages, Python also has a FOR-loop which is the most widely used method for iteration. It has a simple syntax:

In [23]:
'''
for i in [Python Iterable]:
  expression(i)

Here “Python Iterable” can be a list, tuple or other advanced data structures which we will explore in later sections. 
Let’s take a look at a simple example, determining the factorial of a number.
'''
fact=1
N = 5
for i in range(1,N+1): # uptill not equals to last element
    fact *= i
fact

120

In [24]:
'''
Coming to conditional statements, these are used to execute code fragments based on a condition. 
The most commonly used construct is if-else, with following syntax:

if [condition]:
  __execution if true__
else:
  __execution if false__
For instance, if we want to print whether the number N is even or odd:
'''
if N%2 == 0:
    print('Even')
else:
    print('Odd')

Odd


### Python Libraries
Lets take one step ahead in our journey to learn Python by getting acquainted with some useful libraries. The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:

In [26]:
#1st
import math as m 
# 2nd from math import *
'''
In the first manner, we have defined an alias m to library math. 
We can now use various functions from math library (e.g. factorial) by referencing it using the alias m.factorial().

In the second manner, you have imported the entire name space in math i.e. 
you can directly use factorial() without referring to math.
'''

'\nIn the first manner, we have defined an alias m to library math. \nWe can now use various functions from math library (e.g. factorial) by referencing it using the alias m.factorial().\n\nIn the second manner, you have imported the entire name space in math i.e. \nyou can directly use factorial() without referring to math.\n'

### Following are a list of libraries, you will need for any scientific computations and data analysis:

- NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms,  advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
- SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
- Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
- Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.
- Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
- Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
- Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.
- Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.
- Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.
- Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.
- SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.
- Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.

Additional libraries, you might need:

- os for Operating system and file operations
- networkx and igraph for graph based data manipulations
- regular expressions for finding patterns in text data
- BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.

Now that we are familiar with Python fundamentals and additional libraries, lets take a deep dive into problem solving through Python. Yes I mean making a predictive model! In the process, we use some powerful libraries and also come across the next level of data structures. We will take you through the 3 key phases:

- Data Exploration – finding out more about the data we have
- Data Munging – cleaning the data and playing with it to make it better suit statistical modeling
- Predictive Modeling – running the actual algorithms and having fun 🙂

In [None]:
# List Comprehensions
'''
List comprehension is powerful and must know concept in Python. 
Yet, this remains one of the most challenging topic for beginners. 
I intend help each one of you who is facing this trouble in python. 
Mastering this concept would help you in two ways:
1. You would start writing shorter and effective codes
2. Hence, your code will execute faster
Do you know List Comprehensions are 35% faster than FOR loop and 45% faster than map function?
'''

Lets look at some of basic examples:

- { x^2: x is a natural number less than 10 }
- { x: x is a whole number less than 20, x is even }
- { x: x is an alphabet in word ‘MATHEMATICS’, x is a vowel }

Now let’s look at the corresponding Python codes implementing LC (in the same order):

- [x**2 for x in range(0,10)]

- [x for x in range(1,20) if x%2==0 ]

- [x for x in 'MATHEMATICS' if x in ['A','E','I','O','U']]

In [None]:
'''
In a general sense, a FOR loop works as:
for (set of values to iterate):
  if (conditional filtering): 
    output_expression()
'''
'''
The same gets implemented in a simple List Comprehension construct in a single line as:
 [ output_expression() for(set of values to iterate) if(conditional filtering) ]
'''

Consider another example: { x: x is a natural number less than or equal to 100, x is a perfect square }

This can be solved using a for-loop as:

In [27]:
for i in range(1,101):     #the iterator
   if int(i**0.5)==i**0.5: #conditional filtering
     print (i,end =" ")    #output-expression in a sinle line seperated by " " 

1 4 9 16 25 36 49 64 81 100 

Now, to create the List Comprehension code, we need to just plug in the different parts:

In [28]:
[i for i in range(1,101) if int(i**0.5)==i**0.5]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

I hope it is making more sense now. Once you get a hang of it, List Comprehension is a simple but powerful technique and can help you accomplish a variety of tasks with ease. Things to keep in mind:
- List Comprehension will always return a result, whether you use the result or nor.
- The iteration and conditional expressions can be nested with multiple instances.
- Even the overall List Comprehension can be nested inside another List Comprehension.
- Multiple variables can be iterated and manipulated at same time.

Example 1: Flatten a Matrix

Aim: Take a matrix as input and return a list with each row placed on after the other.

In [None]:
#Python codes with FOR-loop and LC implementations:

def eg1_for(matrix):
    flat = []
    for row in matrix:
        for x in row:
            flat.append(x)
    return flat

def eg1_lc(matrix):
    return [x for row in matrix for x in row ]

#Let’s define a matrix and test the results:

matrix = [ range(0,5), range(5,10), range(10,15) ]
print ("Original Matrix: " + str(matrix))
print ("FOR-loop result: " + str(eg1_for(matrix)))
print ("List Comprehension result      : " + str(eg1_lc(matrix)))

Example 2: Removing vowels from a sentence

Aim: Take a string as input and return a string with vowels removed.

In [None]:
#Python codes with FOR-loop and LC implementations:

def eg2_for(sentence):
    vowels = 'aeiou'
    filtered_list = []
    for l in sentence:
        if l not in vowels:
            filtered_list.append(l)
    return ''.join(filtered_list)

def eg2_lc(sentence):
    vowels = 'aeiou'
    return ''.join([ l for l in sentence if l not in vowels])

#Let’s define a matrix and test the results:

sentence = 'My name is AI Saturdays!'
print ("FOR-loop result: " + eg2_for(sentence))
print ("List Comprehension result      : " + eg2_lc(sentence))

xample 3: Dictionary Comprehension

Aim: Take two list of same length as input and return a dictionary with one as keys and other as values.

In [None]:
#Python codes with FOR-loop and LC implementations:

def eg3_for(keys, values):
    dic = {}
    for i in range(len(keys)):
        dic[keys[i]] = values[i]
    return dic

def eg3_lc(keys, values):
    return { keys[i] : values[i] for i in range(len(keys)) }

#Let’s define a matrix and test the results:

country = ['India', 'Pakistan', 'Nepal', 'Bhutan', 'China', 'Bangladesh']
capital = ['New Delhi', 'Islamabad','Kathmandu', 'Thimphu', 'Beijing', 'Dhaka']
print ("FOR-loop result: " + str(eg3_for(country, capital)))
print ("List Comprehension result      : " + str(eg3_lc(country, capital)))

In [None]:
#I believe things are getting pretty much self-explanatory by now. 