# Python Ecosystem and Crash Course in Python and SciPy

from the book 
"Machine Learning Mastery With Python: 

Understand Your Data, Create Accurate Models and Work Projects End-To-End"
by Jason Brownlee

In [None]:
import sys
sys.version


## Python Ecosystem for Machine Learning
### SciPy is an ecosystem of Python libraries for mathematics, science and engineering. 

It is an
add-on to Python that you will need for machine learning. The SciPy ecosystem is comprised of
the following core modules relevant to machine learning:
-  NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays.
-  Matplotlib: Allows you to create 2D charts and plots from data.
-  Pandas: Tools and data structures to organize and analyze your data.

To be effective at machine learning in Python you must install and become familiar with
SciPy. 
Specifically:
-  You will prepare your data as NumPy arrays for modeling in machine learning algorithms.
-  You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots
and charts of your data.
-  You will use Pandas to load explore and better understand your data.


In [None]:
# scipy
import scipy
print('scipy: {}'.format(scipy.__version__))
# numpy
import numpy
print('numpy: {}'.format(numpy.__version__))
# matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# pandas
import pandas
print('pandas: {}'.format(pandas.__version__))
# scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))

## Crash Course in Python and SciPy

When getting started in Python you need to know a few key details about the language syntax
to be able to read and understand Python code. This includes:
-  Assignment.
-  Flow Control.
-  Data Structures.
-  Functions.


### Assignment

In [None]:
# Strings
data = 'hello world'
print(data[0])
print(len(data))
print(data)

In [None]:
# Numbers
value = 123.1
print(value)
value = 10
print(value)

In [None]:
# Boolean
a = True
b = False
print(a, b)

In [None]:
# Multiple Assignment
a, b, c = 1, 2, 3
print(a, b, c)

### Flow Control
If-Then-Else Conditional

In [None]:
value = 99
if value == 99:
    print('That is fast')
elif value > 200:
    print('That is too fast')
else:
    print('That is safe')

For-Loop

In [None]:
# For-Loop
for i in range(10):
    print(i)

While-Loop

In [None]:
# While-Loop
i = 0
while i < 10:
    print(i)
    i += 1

### Data Structures

#### Tuple
Tuples are read-only collections of items.

#### List
Lists use the square bracket notation and can be index using array notation. Notice that we are using some simple printf-like functionality to combine strings and variables when printing. Running the example prints:

In [None]:
mylist = [1, 2, 3]
print("Zeroth Value: %d") % mylist[0]

In [None]:
mylist.append(4)
print("List Length: %d") % len(mylist)
for value in mylist:
    print(value)

#### Dictionary
Dictionaries are mappings of names to values, like key-value pairs. Note the use of the curly bracket and colon notations when defining the dictionary

In [None]:
mydict = {'a': 1, 'b': 2, 'c': 3}
print("A value: %d") % mydict['a']
mydict['a'] = 11
print("A value: %d") % mydict['a']
print("Keys: %s") % mydict.keys()
print("Values: %s") % mydict.values()
for key in mydict.keys():
    print(mydict[key])


#### Functions
The biggest gotcha with Python is the whitespace. Ensure that you have an empty new line after indented code. The example below defines a new function to calculate the sum of two values and calls the function with two arguments.


In [None]:
# Sum function
def mysum(x, y):
    return x + y

# Test sum function
mysum(1, 3)


## NumPy Crash Course

NumPy provides the foundation data structures and operations for SciPy. These are arrays (ndarrays) that are efficient to define and manipulate.


In [None]:
# define an array
import numpy
mylist = [1, 2, 3]
myarray = numpy.array(mylist)
print(myarray)
print(myarray.shape)


In [None]:
# access values
import numpy
mylist = [[1, 2, 3], [3, 4, 5]]
myarray = numpy.array(mylist)
print(myarray)
print(myarray.shape)
print("First row: %s") % myarray[0]
print("Last row: %s") % myarray[-1]
print("Specific row and col: %s") % myarray[0, 2]
print("Whole col: %s") % myarray[:, 2]


In [None]:
# arithmetic
import numpy
myarray1 = numpy.array([2, 2, 2])
myarray2 = numpy.array([3, 3, 3])
print("Addition: %s") % (myarray1 + myarray2)
print("Multiplication: %s") % (myarray1 * myarray2)


## Matplotlib Crash Course
Matplotlib can be used for creating plots and charts. The library is generally used as follows:
-  Call a plotting function with some data (e.g. .plot()).
-  Call many functions to setup the properties of the plot (e.g. labels and colors).
-  Make the plot visible (e.g. .show()).

In [None]:
# basic line plot
import matplotlib.pyplot as plt
import numpy
myarray = numpy.array([1, 2, 3])
plt.plot(myarray)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()


In [None]:
# basic scatter plot
import matplotlib.pyplot as plt
import numpy
x = numpy.array([1, 2, 3])
y = numpy.array([2, 4, 6])
plt.scatter(x,y)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()

## Pandas Crash Course
Pandas provides data structures and functionality to quickly manipulate and analyze data. The key to understanding Pandas for machine learning is understanding the Series and DataFrame data structures.

#### Series
A series is a one dimensional array of data where the rows are labeled using a time axis

In [None]:
import numpy
import pandas
myarray = numpy.array([1, 2, 3])
rownames = ['a', 'b', 'c']
myseries = pandas.Series(myarray, index=rownames)
print(myseries)

print(myseries[0])
print(myseries['a'])


#### DataFrame
A data frame is a multi-dimensional array where the rows and the columns can be labeled.

In [None]:
# dataframe
import numpy
import pandas
myarray = numpy.array([[1, 2, 3], [4, 5, 6]])
rownames = ['a', 'b']
colnames = ['one', 'two', 'three']
mydataframe = pandas.DataFrame(myarray, index=rownames, columns=colnames)
print(mydataframe)


In [None]:
# Data can be index using column names.
print("method 1:")
print("one column:\n%s") % mydataframe['one']

print("method 2:")
print("one column:\n%s") % mydataframe.one