# 1.2 Machine Learning in Python

- Unlike statistics, where models are used to *understand* data, predictive modeling is laser focused on developing models that make the *most accurate predictions* at the expense of explaining why predictions are made.
- Predictive modelling is primarily focused on tabular data, eg: tables of numbers like in a spreadsheet.
- The wider field of machine learning involves data of any format.

Predictive modelling machine learning project steps:
1. Define the problem
2. Analyze data
3. Prepare data
4. Evaluate algorithms
5. Improve results
6. Present results

- Python is a general purpose language, easy to learn and focuses on readability. 
- Suited for interactive development
- Excellent library support
- Again, a general purpose language - unlike R or MatLab.
- It means you can develop and research your models or whatever, then roll into production easily. 

**SciPy**  
- Libraries for maths, science and engineering.
- An add-on for ML and contains
  - NumPy : Arrays
  - Matplotlib : 2D plots
  - Pandas : Data structures, load, explore and better understand your data

**SciKit Learn**
- Plug in for SciPy
- Library focused for ML problems
- Usable under BSD License 
- With this you can learn, develop and put into operations your ML work

In [None]:
import matplotlib.pyplot as plt
import numpy
myarray = numpy.array([1, 2, 3])
plt.plot(myarray)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()


In [None]:
# basic scatter plot
import matplotlib.pyplot as plt
import numpy
x = numpy.array([1, 2, 3])
y = numpy.array([2, 4, 6])
plt.scatter(x,y)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()

In [5]:
# series
import numpy
import pandas
myarray = numpy.array([1, 2, 3])
rownames = ['a', 'b', 'c']
pandas.Series(myarray, index=rownames)

a    1
b    2
c    3
dtype: int32

In [6]:
# dataframe
import numpy
import pandas
myarray = numpy.array([[1, 2, 3], [4, 5, 6]])
rownames = ['a', 'b']
colnames = ['one', 'two', 'three']
pandas.DataFrame(myarray, index=rownames, columns=colnames)

Unnamed: 0,one,two,three
a,1,2,3
b,4,5,6
