<!--NAVIGATION-->
< [Data Manipulation with Pandas](03.00-Introduction-to-Pandas.ipynb) | [Contents](Index.ipynb) | [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb) >

# Introducing Pandas Objects

In [24]:
import numpy as np
import pandas as pd

## The Pandas Series Object

A Pandas ``Series`` is a one-dimensional array of indexed data.
It can be created from a list or array as follows:

In [25]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [26]:
data.values

array([ 0.25,  0.5 ,  0.75,  1.  ])

In [27]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [28]:
data[1]

0.5

In [29]:
data[1:3]

1    0.50
2    0.75
dtype: float64

### ``Series`` as generalized NumPy array: explicit index

In [30]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [31]:
data['b']

0.5

non-contiguous or non-sequential indices:

In [32]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [33]:
data[5]

0.5

### Series as specialized dictionary

In [34]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

By default, the index is drawn from the **sorted** keys.

In [35]:
population['California']

38332521

``Series`` also supports array-style operations such as slicing:

In [36]:
population['California':'Illinois']

California    38332521
Florida       19552860
Illinois      12882135
dtype: int64

### Constructing Series objects

```python
>>> pd.Series(data, index=index)
```
Default index is arange(len(data))

In [37]:
print(pd.index())
pd.Series([2, 4, 6])

AttributeError: module 'pandas' has no attribute 'index'

``data`` can be a scalar, which is repeated to fill the specified index:

In [None]:
pd.Series(5, index=[100, 200, 300])

``data`` can be a dictionary, in which ``index`` defaults to the sorted dictionary keys:

In [None]:
pd.Series({2:'a', 1:'b', 3:'c'})

In each case, the index can be explicitly set if a different result is preferred:

In [None]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

Notice that in this case, the ``Series`` is populated only with the explicitly identified keys.

### DataFrame as a generalized NumPy array
A sequence of aligned ``Series`` objects: they share the same index.

In [None]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

In [None]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

In [None]:
states.index

Additionally, the ``DataFrame`` has a ``columns`` attribute, which is an ``Index`` object holding the column labels:

In [None]:
states.columns

### DataFrame as specialized dictionary

A ``DataFrame`` maps a column name to a ``Series`` of column data.

In [None]:
states['area']

#### Constructing DataFrame object from single Series object

In [None]:
pd.DataFrame(population, columns=['population'])

#### From a list of dicts

In [None]:
data = [{'a': i, 'b': 2 * i}
        for i in range(3)]
pd.DataFrame(data)

Aligned on index, missing values filled in as ``NaN`` (i.e., "not a number")

In [None]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

#### From a dictionary of Series objects

As we saw before, a ``DataFrame`` can be constructed from a dictionary of ``Series`` objects as well:

In [None]:
pd.DataFrame({'population': population,
              'area': area})

#### From a two-dimensional NumPy array

Create a ``DataFrame`` with any specified column and index names (default: an integer index)

In [None]:
pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

#### From a NumPy structured array

We covered structured arrays in [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb).
A Pandas ``DataFrame`` operates much like a structured array, and can be created directly from one:

In [None]:
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A

In [None]:
pd.DataFrame(A)

## The Pandas Index Object

An *immutable array* and an *ordered multi-set* 

In [None]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

### Index as immutable array


In [None]:
ind[1]

In [None]:
ind[::2]

``Index`` objects also have many of the attributes familiar from NumPy arrays:

In [None]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

However, ``Index`` objects are immutable: they cannot be modified and thus safely shared

In [38]:
ind[1] = 0

NameError: name 'ind' is not defined

### Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
The ``Index`` object follows many of the conventions used by Python's built-in ``set`` data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [42]:
indA = pd.Index([1, 3, 7, 5, 7, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [45]:
indB & indA # intersection

Int64Index([3, 5, 7, 7, 7], dtype='int64')

In [49]:
indB | indA  # union

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [48]:
indA ^ indB  # symmetric difference

Int64Index([1, 2, 9, 11], dtype='int64')

These operations may also be accessed via object methods, for example ``indA.intersection(indB)``.

<!--NAVIGATION-->
< [Data Manipulation with Pandas](03.00-Introduction-to-Pandas.ipynb) | [Contents](Index.ipynb) | [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb) >