see: https://www.quora.com/How-does-python-pandas-go-along-with-scikit-learn-library

In [1]:
!conda list

# packages in environment at /home/user/anaconda3/envs/sklearn-pandas:
#
libgfortran               3.0                           0    defaults
mkl                       11.3.1                        0    defaults
numpy                     1.11.0                   py27_0    defaults
openssl                   1.0.2h                        0    defaults
pandas                    0.18.0              np111py27_0    defaults
pip                       8.1.1                    py27_1    defaults
python                    2.7.11                        0    defaults
python-dateutil           2.5.2                    py27_0    defaults
pytz                      2016.3                   py27_0    defaults
readline                  6.2                           2    <unknown>
scikit-learn              0.17.1              np111py27_0    defaults
scipy                     0.17.0              np111py27_2    defaults
setuptools                20.7.0                   py27_0    defaults
six             

In [2]:
import pandas as pd
import numpy as np

In [3]:
df1 = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8],[9,8,10,11],[16,45,67,88]]))

In [4]:
df1

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,5,6,7,8
2,9,8,10,11
3,16,45,67,88


In [5]:
df1.index= ["A1","A2","A3","A4"]
df1

Unnamed: 0,0,1,2,3
A1,1,2,3,4
A2,5,6,7,8
A3,9,8,10,11
A4,16,45,67,88


In [6]:
df1.columns= ["X1","X2","X3","X4"]
df1

Unnamed: 0,X1,X2,X3,X4
A1,1,2,3,4
A2,5,6,7,8
A3,9,8,10,11
A4,16,45,67,88


In [7]:
df1.values

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9,  8, 10, 11],
       [16, 45, 67, 88]])

In [8]:
arr1 = np.array(df1)
arr1

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9,  8, 10, 11],
       [16, 45, 67, 88]])

In [9]:
arr1.dtype

dtype('int64')

In [10]:
list(df1)

['X1', 'X2', 'X3', 'X4']

In [11]:
df1.columns

Index(['X1', 'X2', 'X3', 'X4'], dtype='object')

In [12]:
list(df1.columns)

['X1', 'X2', 'X3', 'X4']

How to use sklearn and pandas together:
- read data into a Pandas DataFrame
- Use native Pandas features to process text features to numerical ones:
  * Use the extremely convenient "dummies" feature of the Pandas library to convert categorical features to binary ones. (One-Hot encoding). 
  * Scikit has it's own One-Hot Encoding routine but it only works with integers (Features with categories like 1,2,3 rather than 'a','b','c'). Pandas can digest anything thrown at it.
- Finally, explicitly cast the DataFrame into a numpy array which can be used  by the scikit-learn API. 
- Note that at this point you lose your feature labels (Headers), so it would be difficult to keep track of the features if you use the "feature-importance" routine in scikit-learn. 
- save the headers before casting the data-frame into a numpy array.
> list(DataFrame1)     # prints out the headers into a nice list

In [13]:
dfmixed0 = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,8,10,11]])
dfmixed0

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,5,6,7,8
2,9,8,10,11


In [14]:
dfmixed0.values[:,:-1]

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9,  8, 10]])

In [15]:
dfmixed1 = pd.DataFrame([[1,2,3,np.nan],[5,6,7,8],[9,8,10,11]])
dfmixed1

Unnamed: 0,0,1,2,3
0,1,2,3,
1,5,6,7,8.0
2,9,8,10,11.0


In [16]:
dfmixed1.values[:,:-1]

array([[  1.,   2.,   3.],
       [  5.,   6.,   7.],
       [  9.,   8.,  10.]])

In [17]:
dfmixed1.values[:,:-1].astype(np.float32)

array([[  1.,   2.,   3.],
       [  5.,   6.,   7.],
       [  9.,   8.,  10.]], dtype=float32)

In [18]:
float

float

In [19]:
dfmixed2 = pd.DataFrame([[1,2,3,'NaN'],[5,6,7,8],[9,8,10,11],['a','b','c','d']])
dfmixed2

Unnamed: 0,0,1,2,3
0,1,2,3,
1,5,6,7,8
2,9,8,10,11
3,a,b,c,d


In [20]:
dfmixed2.values[:,:-1]

array([[1, 2, 3],
       [5, 6, 7],
       [9, 8, 10],
       ['a', 'b', 'c']], dtype=object)

In [21]:
# dfmixed2.values[:,:-1].astype(float)