## DataFrame

* While the ***Series*** class is the building block of data structures in pandas, the DataFrame is the work-horse.
* DataFrames collect multiple series in the same way that a spreadsheet collects multiple columns of data.
* In a simple sense, a DataFrame is like a 2-dimensional NumPy array 
    * When all data is numeric and of the same type (e.g. float64), it is virtually indistinguishable. 
* However, a DataFrame is composed of Series and each Series has its own data type. Therefore not all DataFrames are representable as homogeneous NumPy arrays.
* A number of methods are available to initialize a DataFrame. 
* The simplest is from a homogeneous NumPy array.

### Method 1

In [2]:
import numpy as np
import pandas as pd

# NumPy Array
a = np.array([[1.0,2],[3,4]])

df = pd.DataFrame(a)

df

Unnamed: 0,0,1
0,1.0,2.0
1,3.0,4.0


* Like a Series, a DataFrame contains the input data as well as row labels. 
* However, since a DataFrame is a collection of columns, it also contains column labels (located along the top edge). 
* When none are provided, the numeric sequence 0, 1, . . . is used.
* Column names are entered using a keyword argument or later by assigning to columns.


In [41]:
df = pd.DataFrame(np.array([[1,2],[3,4]]),columns=["a","b"])
df

Unnamed: 0,a,b
0,1,2
1,3,4


In [42]:
df = pd.DataFrame(np.array([[1,2],[3,4]]))
df.columns = ["dogs","cats"]
df

Unnamed: 0,dogs,cats
0,1,2
1,3,4


In [43]:
df = pd.DataFrame(np.array([[1,2],[3,4]]), columns=["Dogs","Cats"], index=["Anne","Barry"])
df

Unnamed: 0,Dogs,Cats
Anne,1,2
Barry,3,4


DataFrames can also be created from NumPy arrays with structured data.



In [24]:
import datetime
t = np.dtype([("datetime", "O8"), ("value", "f4")])
t




dtype([('datetime', 'O'), ('value', '<f4')])

In [44]:
datetime.datetime(2017,1,1)

datetime.datetime(2017, 1, 1, 0, 0)

In [46]:
x = np.zeros(2,dtype=t)
x

array([(0,  0.), (0,  0.)], 
      dtype=[('datetime', 'O'), ('value', '<f4')])

In [51]:
x.shape

(2,)

In [27]:

x[0][0] = datetime.datetime(2013,1,1)
x

array([(datetime.datetime(2013, 1, 1, 0, 0),  0.)], 
      dtype=[('datetime', 'O'), ('value', '<f4')])

In [28]:

x[0][1] = 99.99
x


array([(datetime.datetime(2013, 1, 1, 0, 0),  99.98999786)], 
      dtype=[('datetime', 'O'), ('value', '<f4')])

In [36]:
df = DataFrame(x)
df


Unnamed: 0,datetime,value
0,2013-01-01,99.989998


In [37]:
s1 = pd.Series(np.arange(0.0,5))
s2 = pd.Series(np.arange(1.0,3))


In [38]:
pd.DataFrame({"one": s1, "two": s2})

Unnamed: 0,one,two
0,0.0,1.0
1,1.0,2.0
2,2.0,
3,3.0,
4,4.0,


In [40]:

s3 = pd.Series(np.arange(0.0,3))
DataFrame({"one": s1, "two": s2, "three": s3})

Unnamed: 0,one,three,two
0,0.0,0.0,1.0
1,1.0,1.0,2.0
2,2.0,2.0,
3,3.0,,
4,4.0,,
