# DataFrame
*This notebook covers basic DataFrame manipulation*

A dataframe is a 2-dimensional labelled data structure with columns of potentially different types.

You can think of it like a spreadsheet or SQL table
 - or as a dict of Series objects
 - i'll say it again:
  - **you can think of a DataFrame as a dict of Series objects**

This is the most commonly used pandas object

Like Series, DataFrame accepts many different kinds of input:
 - Dict of 1D ndarrays, lists, dicts, or Series
 - 2-D numpy.ndarray
 - Structured or record ndarray
 - A ```Series```
 - Another DataFrame
 
Along with the data, you can optionally pass these arguments:
 - **index** (row labels)
 - **columns** (column labels)
 
If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame.
 - Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index.

If axis labels are not passed, they will be constructed from the input data based on common sense rules.

#### From dict of Series or Dicts:

The result **index** will be the **union** of the indexes of the various Series.
 - If no columns are passed, the columns will be the sorted list of dict keys.

In [6]:
import pandas as pd

In [8]:
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print(df)

   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0


In [9]:
pd.DataFrame(d, index=['d', 'b', 'a'])

Unnamed: 0,one,two
d,,4.0
b,2.0,2.0
a,1.0,1.0


In [10]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

Unnamed: 0,two,three
d,4.0,
b,2.0,
a,1.0,


Note that when you have made a dataframe from a dict of series objects, the three arguments are:
 - ```index=[row1, row2, row3]```
 - ```columns=['column1', 'column2']```

Here it seems you are slicing the dict of series objects and putting the chosen rows and columns into an array (a special array called a DataFrame)

It seems the advantage here is that you are locked to indexes, so that when you perform operations, you are doing them on locked grids.

Note: when a particular set of columns is passed along with a dict of data, the passed columns override the keys in the dict:

In [11]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [12]:
df.columns

Index(['one', 'two'], dtype='object')

#### From dict of ndarrays / lists:

The ndarrays must all be the same length.

If an index is passed, it must clearly also be the same length as the arrays
 - If no index is passed, the result will be ```range(n)```, where ```n``` is the array length.

In [13]:
d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}

pd.DataFrame(d)

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [14]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0


#### From structured or record array

(will check this out later when I learn numpy)

#### From a list of dicts:
Note that this is not the way to fly:

In [16]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [17]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [19]:
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


To be continued...

I am about 1/3 of the way down this page [pandas dataframes](https://pandas.pydata.org/pandas-docs/stable/dsintro.html)