# Python Machine Learning Overview

This notebook will contain an overview of all the different modules, functions and concepts for machine learning in Python.

# Pandas overview

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Basics: we can work with a Series or a DataFrame.

Series: One-dimensional ndarray with axis labels.

DataFrame: 2-D size-mutable, potentially heterogeneous tabular data structure with labeled axes. Can be thought of as a dict-like container for Series objects. When loading a DataFrame, it can contain strings (like True or False) and numbers (like 5) that contains all the features for your learning algorithm. You'll have to convert your 'True' or 'False' arrays into 0 or 1 which will make that column or row usable.

---

Let's create a series by passing a list of values and letting pandas create default integer index (the labels):

In [3]:
s = pd.Series([1,3,5,np.nan,6,8])

s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

We get a simple array that has numbers on the left indicating the index/label.

---

We can create a DataFrame by passing a dict of objects that can be converted in a series-like way.

In [4]:
df2 = pd.DataFrame({ 'A' : 1.,
   ....:             'B' : pd.Timestamp('20130102'),
   ....:             'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
   ....:             'D' : np.array([3] * 4,dtype='int32'),
   ....:             'E' : pd.Categorical(["test","train","test","train"]),
   ....:             'F' : 'foo' })

df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


Each column of the previous DataFrame is a different type. And if we compare A and C, we se that there are different ways of writing the same column.

In [6]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

It is important to look at your dataset before you start applying some machine learning algorithms. A quick way to view a portion of your DataFrame is to use the head() or tail() function which will show the 5 first or last rows. Putting a number n inside the parenthesis will give n rows.

# Scipy Overview

# Scikit-Learn Overview