### Pandas Objects
At basic level the rows and columns in a dataframe are identified with **labels** rather than simple integer indices. 

In [1]:
# Imports
import numpy as np
import pandas as pd

### Pandas Series Object
It is a one-dimensional array of indexed data. It can be created from a list using `pd.Series()` method. 

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

Series gets both the values and indices together. 

In [3]:
# Accessing using .values and .index attributes
print(data.values)
print(data.index)

[0.25 0.5  0.75 1.  ]
RangeIndex(start=0, stop=4, step=1)


In [4]:
# Accessing the index using python square brackets
print(data[1])
print(data[1:3])

0.5
1    0.50
2    0.75
dtype: float64


### Series as a Numpy array
The difference between a `series` and a `numpy` array is the presence of index in the series object which is **explicitly defined**. So we can use strings as indices for a series object.

In [5]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], 
                 index = ['a','b','c','d'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [6]:
# Accessing items via string indices
print(data['a'])
print(data['c':])

0.25
c    0.75
d    1.00
dtype: float64


In [7]:
# Using non-sequential indices and mixed datatypes as indices
data = pd.Series([0, 1, 2, 3],  
                 index = [2, 4, 'a', 5])
data

2    0
4    1
a    2
5    3
dtype: int64

### Series - Special Dictionary
The `Series` object can me considered to be a special dictionary with the typed keys to typed values. Consider the case of conversion of a dictionary to a series.

In [8]:
# Dictionary to series - Keys map to indices and values map to values
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [9]:
# Accessing the items via dictionary keys or indices
population['California']

38332521

In [10]:
# Slicing with keys - using position or index value
print(population[1:])
print(population['Texas':]) 

Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135
dtype: int64
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135
dtype: int64


### Creating Series object

In [11]:
# Using a list
print(pd.Series([1, 2, 3]))

# Using a scalar, which repeats to the specified index
print(pd.Series(5, index=[1, 2, 3]))
      
# Using a dictionary, in which index defaults to the sorted dict keys
print(pd.Series({2: 'a',1: 'b',3: 'c'}))

# Overriding the index in dict to Series
print(pd.Series({2: 'a',1: 'b',3: 'c'}, index=[3, 2, 1]))

0    1
1    2
2    3
dtype: int64
1    5
2    5
3    5
dtype: int64
2    a
1    b
3    c
dtype: object
3    c
2    a
1    b
dtype: object


### Dataframe Object
It can be thought of as a generalization of a Numpy array, or a special python dictionary. 

- **As a generalized Numpy array**
`Series` - One-D array with flexible indices
`Dataframe` - Two-D array with flexible row indices and column names. They can be thought of as a sequence of aligned `Series` objects. 

In [12]:
# Recall the population Series
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)

# Create a Area Series
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)

# Create the dataframe using the 2 series
states = pd.DataFrame({'population': population, 'area': area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [13]:
# Index attribute of dataframe
print(states.index)
# Columns attribute of dataframe
print(states.columns)

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')


- **DataFrame as a special dictionary**
Here the dataframe maps a column name to a `Series` of column data.

In [14]:
# Print the contents of area column
print(states['area'])

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64


`data[0]` returns the first elements in the 2-D Numpy array. `data['col0']` returns the first **column** in the Dataframe.

### Constructing a Dataframe object

In [15]:
# Construct a single column/ series dataframe
print(pd.DataFrame(population, columns=['population']))

            population
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135


In [16]:
# From a list of dictionaries
data = [{'a': i, 'b': 2 * i}
        for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


In [17]:
# Pandas fills NaN when the keys are missing
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


In [18]:
# From a dictionary of series objects
pd.DataFrame({'population': population,
              'area': area})

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [19]:
# From a 2-D Array 
pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.56181,0.014435
b,0.355357,0.829394
c,0.628289,0.329939


### Pandas Index Object
They can be thought of either as an **imutable array** or **ordered set**.

In [20]:
# Index as immutable array
ind = pd.Index([2, 3, 4, 5, 6])
print(ind)
ind[1] = 0

Int64Index([2, 3, 4, 5, 6], dtype='int64')


TypeError: Index does not support mutable operations

In [21]:
# Index as an ordered set
# We can apply set operations - intersections, difference, other combinations too!
ind1 = pd.Index([1, 3, 4, 5, 10])
ind2 = pd.Index([2, 3, 4, 5, 11])

# intersection
print(ind1 & ind2)

# union
print(ind1 | ind2)

# symmetric difference
print(ind1 ^ ind2)

Int64Index([3, 4, 5], dtype='int64')
Int64Index([1, 2, 3, 4, 5, 10, 11], dtype='int64')
Int64Index([1, 2, 10, 11], dtype='int64')


---