#### Panda Objects

Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices. 

In [1]:
import pandas as pd

### Panda Series

The Series wraps both a sequence of values and a sequence of indices, which we can access with the values and index attributes. The values are simply a  NumPy array:

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [3]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [4]:
type(data.values)

numpy.ndarray

The index is an array-like object of type pd.Index.

In [5]:
data.index

RangeIndex(start=0, stop=4, step=1)

As with a NumPy array, data can be accessed by the associated index with the familiar Python square-bracket notation:

In [6]:
data[1]

0.5

In [7]:
data[1:3]

1    0.50
2    0.75
dtype: float64

The Pandas Series however, is more general and flexible than the one-dimensional NumPy array.

The Series object looks interchangeable with a one-dimensional NumPy array.

The essential difference is the presence of the index. 

The Numpy Array has an implicitly defined integer index used to access the values.

The Pandas Series has an explicitly defined index associated with the values.

This explicit index definition gives the Series object additional capabilities. 

For example, the index need not be an integer, but can consist of values of any type. 

For example, we can use strings as an index:

In [23]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [9]:
data['c']

0.75

Series objects can be modified with a dictionary-like syntax. Just as you can extend a dictionary by assigning to a new key, you can extend a Series by assigning to a new index value:

In [10]:
data['e'] = 1.25
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

Index values need not be sequential.

In [11]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [12]:
data[3]

0.75

Similar to a dictionary, a Series maps typed keys to a set of typed values.

This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas Series makes it more efficient than Python dictionaries for certain operations.

A Series object can be directly constructed from a Python dictionary.

The index is drawn from the keys.

In [13]:
s1=pd.Series({2:'a', 1:'b', 3:'c'})
s1

2    a
1    b
3    c
dtype: object

In [14]:
s1.values

array(['a', 'b', 'c'], dtype=object)

In [15]:
s1.index

Int64Index([2, 1, 3], dtype='int64')

In [16]:
s1[2]


'a'

In [17]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [18]:
population.values

array([38332521, 26448193, 19651127, 19552860, 12882135], dtype=int64)

In [19]:
type(population.values)

numpy.ndarray

In [20]:
population['Texas']

26448193

In [21]:
population['California':'New York']

California    38332521
Texas         26448193
New York      19651127
dtype: int64

A Series builds on this dictionary-like interface and provides array-style item selection via the same basic mechanisms as NumPy arrays – that is, slices, masking, and fancy indexing.

In [22]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])

In [24]:
# slicing by implicit integer index
data[0:2]

a    0.25
b    0.50
dtype: float64

In [25]:
# masking
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

In [26]:
population[(population > 20000000)]

California    38332521
Texas         26448193
dtype: int64

### The Pandas DataFrame Object

The next fundamental structure in Pandas is the DataFrame. 

Like the Series object, the DataFrame can be thought of as a generalization of a NumPy array, or as a specialization of a Python dictionary. 

A DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible column names. 

Just as you might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you can think of a DataFrame as a sequence of aligned Series objects. 

Like the Series object, the DataFrame has an index attribute that gives access to the index labels:

The DataFrame has a columns attribute, which is an Index object holding the column labels:

Similarly, we can also think of a DataFrame as a specialization of a dictionary. 

Where a dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data. 

In [27]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

Now that we have this along with the population Series from before, we can use a dictionary to construct a single two-dimensional object containing this information:

In [28]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [29]:
states.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

In [30]:
states.columns

Index(['population', 'area'], dtype='object')

In [32]:
type(states)

pandas.core.frame.DataFrame

In [37]:
states.head(2)

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662


In [33]:
states.shape

(5, 2)

Like the Series object, the DataFrame has an index attribute that gives access to the index labels:

In [None]:
states.index

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

In [None]:
states.columns

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series:

In [39]:
df3=pd.DataFrame(population, columns=['population'])

In [40]:
type(df3)

pandas.core.frame.DataFrame

### Pandas index object

The index acts like an immutable array or an ordered set

In [None]:
ind=pd.Index([2,3,5,7,11])
ind

In [None]:
type(ind)

Standard indexing can
be used to retrieve values or slices on an index

In [None]:
ind[3]

indexes have attributes, size, shape, ndim, and dtype - similar to numPy arrays.
In order to facilitate operations such as joins, indexes act similar to Pythons built in set.
This supports intersection, union, and difference