# DataFrame
A *DataFrame* represents a tabular, spreadsheet like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).
- It has a both row and a column index.
- It can be thought of as a dict of series.
- It stores the data internally in a two-dimensionally format.
- DataFrame have it's index automatically assigned.


## 1. Creating a DataFrame
One of the most common ways of creating a DataFrame is by using a python-*dictionary* of equal length lists or NumPy arrays.
`DataFrame(DICTIONARY_NAME)` is the most used method of creating a DataFrame.

In [1]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame

In [2]:
# dict of data 
data = {'state' : ['ohio', 'ohio', 'ohio', 'nevada', 'nevada'],
       'year' : [2000, 2001, 2002, 2000, 2001],
       'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

In [3]:
# create dataframe
frame = DataFrame(data)

In [4]:
# get frame
frame

Unnamed: 0,state,year,pop
0,ohio,2000,1.5
1,ohio,2001,1.7
2,ohio,2002,3.6
3,nevada,2000,2.4
4,nevada,2001,2.9


## 2. Specify a sequence of columns 
`column=` attribute can be used to specify the sequence of columns while creating a DataFrame.

In [5]:
frame = DataFrame(data, columns=['year', 'state', 'pop']) 
frame

Unnamed: 0,year,state,pop
0,2000,ohio,1.5
1,2001,ohio,1.7
2,2002,ohio,3.6
3,2000,nevada,2.4
4,2001,nevada,2.9


## 3. Specify Index
`index=` attribute can be used to specify the index of the DataFrame.

In [6]:
# specify user-defined index in dataframe
frame2 = DataFrame(data, columns=['state', 'pop', 'year'], index=['one', 'two', 'three', 'four', 'five'])
frame2

Unnamed: 0,state,pop,year
one,ohio,1.5,2000
two,ohio,1.7,2001
three,ohio,3.6,2002
four,nevada,2.4,2000
five,nevada,2.9,2001


### NOTE: 
***If we pass a column that isn't available in data, it will appear with NaN value in the result.***

In [7]:
frame2 = DataFrame(data, columns=['state', 'pop', 'year', 'debt'], index=['one', 'two', 'three', 'four', 'five'])
frame2

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,
two,ohio,1.7,2001,
three,ohio,3.6,2002,
four,nevada,2.4,2000,
five,nevada,2.9,2001,


## 4. Retrieve columns
*Columns* and *Index* in DataFrame can be retrieved using `fname_name.columns` and `fname_name.index` respectively. 

In [8]:
frame2.columns

Index(['state', 'pop', 'year', 'debt'], dtype='object')

In [9]:
frame2.index

Index(['one', 'two', 'three', 'four', 'five'], dtype='object')

## 5. Retrieve Columns
A column can be retrieved as a series from DataFrame using:
- Dict-like notations
- attribute-like notations

### a. Dict-like Notations

In [10]:
frame2['state']

one        ohio
two        ohio
three      ohio
four     nevada
five     nevada
Name: state, dtype: object

### b. Attribute-like Notations 

In [11]:
frame2.year

one      2000
two      2001
three    2002
four     2000
five     2001
Name: year, dtype: int64

## 6. Retrieve Rows
Rows can be retrieved by postion or name using a `loc[]` and `iloc[]` methods.

In [12]:
frame2.loc['one'] # access rows using labels

state    ohio
pop       1.5
year     2000
debt      NaN
Name: one, dtype: object

In [13]:
frame2.iloc[0:3] # access rows using integer indexing

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,
two,ohio,1.7,2001,
three,ohio,3.6,2002,


## 7. Modify Columns
Columns can be modified by assignment.

### a. assign scalar values:

In [14]:
# assign scalar values to the debt column
frame2['debt'] = 16.5
frame2

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,16.5
two,ohio,1.7,2001,16.5
three,ohio,3.6,2002,16.5
four,nevada,2.4,2000,16.5
five,nevada,2.9,2001,16.5


### b. assign array of values to a column:

In [15]:
frame2['debt'] = np.arange(5)
frame2

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,0
two,ohio,1.7,2001,1
three,ohio,3.6,2002,2
four,nevada,2.4,2000,3
five,nevada,2.9,2001,4


## 8. Assign array, list and series to a column

### 1. assign lists or arrays to a column:
When assigning lists or arrays to be columns, the value's length must match the length of *DataFrame*.

In [16]:
frame2['debt'] = [100, 200, 300, 400, 500]
frame2

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,100
two,ohio,1.7,2001,200
three,ohio,3.6,2002,300
four,nevada,2.4,2000,400
five,nevada,2.9,2001,500


### 2. Assigning a series
Assigning series will set the values of a series exactly to the *dataframe's* index, inserting missing values in any holes.

In [18]:
values = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = values
frame2

Unnamed: 0,state,pop,year,debt
one,ohio,1.5,2000,
two,ohio,1.7,2001,-1.2
three,ohio,3.6,2002,
four,nevada,2.4,2000,-1.5
five,nevada,2.9,2001,-1.7


## 9. Assign values to a column that doesn't exist

In [19]:
frame2['eastern'] = frame2['state'] == 'ohio'
frame2

Unnamed: 0,state,pop,year,debt,eastern
one,ohio,1.5,2000,,True
two,ohio,1.7,2001,-1.2,True
three,ohio,3.6,2002,,True
four,nevada,2.4,2000,-1.5,False
five,nevada,2.9,2001,-1.7,False


## 10. Delete Columns 
Columns can be deleted using **del** keyword.

In [20]:
del frame2['eastern']