# Introducing Pandas Objects

At very basi level, Pandas objcts can be thought of as enchanded versions of NumPy structured arryas in which the rows and columns are indentified with labels rather tha simple integer indices.

Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will requiere an underestanding of what these structures are.

We will introduce these three fundamental Pandas data structures: the `Series`, `DataFrame`, and `Index`.

In [1]:
import numpy as np
import pandas as pd

## The Pandas Series Object

A Pandas `Series` is one-dimensional array of index data. It can be created from a list or array as follows:

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

As we see in the output, the `Series` wraps both a sequence of values and a sequence of indices, which we can access with the `values` adn `index` attributes. The `values` are simply a familiar NumPy array:

In [3]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

The `index` is an array-like object of type `pd.Index`, which we'll discuss in more detail momentarily.

In [4]:
data.index

RangeIndex(start=0, stop=4, step=1)

Like with a NumPy array, data can be accessed by the associated, index via the familiar Python square-bracket notation:

In [5]:
data[1]

0.5

In [6]:
data[1:]

1    0.50
2    0.75
3    1.00
dtype: float64

As we will see, though, the Pandas `Series` is much more general and flexible than the one-dimensional NumPy array that it emulates.

## Series as specialized dictionary

In this way, you can think of a Pandas `Series` a bit like a specialization of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a `Series` is a structure which maps typed keys toa set of typed values.

This typing is important: just as the type-specific compiled code behidn a NumPy array makes it more efficient than a Python a list for certain operations, the type information of a Pandas `Series` makes it much more efficient thant Python dicitonaries for certain operations.

The `Series` as dictionary analogy can be made even more clear by constructing `Series` object directly from a Python dictionary:

In [7]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 18561127,
                   'Florida': 19552860,
                   'Illinois': 12882135,}

population = pd.Series(population_dict)                   
population

California    38332521
Texas         26448193
New York      18561127
Florida       19552860
Illinois      12882135
dtype: int64

Unlike a dictionary, though, the `Series` also supports array-style operations such as slicing:

In [8]:
population['California': 'Florida']

California    38332521
Texas         26448193
New York      18561127
Florida       19552860
dtype: int64

### Constructinc Series objects

We've already seen a few ways of constructing a Pandas `Series` from scratch; all of them are some version of the following:

In [9]:
pd.Series(data, index=index)

NameError: name 'index' is not defined

where `index` is an optional argument, and `data` can be one of many entities.
For example, `data` can be a list or NumPy array, in which case `index` defaults to an integer sequence:

In [27]:
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

`data` can be a dictionary, in which `index` defaults to the sorted dictioanry keys:

In [28]:
pd.Series({2: 'a', 1:'b', 3:'c'})

2    a
1    b
3    c
dtype: object

Notice that in this case, the `Series` is populated only with the explicity indentified keys

## The Pandas DataFrame Object

The next fundamental structure in Pandas is the `DataFrame`. Like the `Series` object discussed in the previous section, the `DataFrame` can be thought of either as generalization of a NumPy array, or as a specialization of a Python dictionary. We'll now take a look at each of these perspectives.

### DataFrame as a generalized NumPy array

If a `Series` is an analog of a one-dimensional array with flexible indices, a `DataFrame` is an analog of a two-dimensional array with both flexible row indices and flexible column names. Just as you might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you can think of a `DataFrame` as a sequecne of aligned `Series` objects. Here, by "aligned" we mean that they share the same index.

To demonstrate this, let's first construct a new `Series` listing the area of each of the five states discussed in hte previous section:

In [10]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141397, 'Florida': 170321, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141397
Florida       170321
Illinois      149995
dtype: int64

Now that we have this along with the `population` Series form before, we can use a dictionary to construct a single two-dimensional object containing this information:

In [11]:
states = pd.DataFrame({'population': population, 'area': area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,18561127,141397
Florida,19552860,170321
Illinois,12882135,149995


Like the `Series` object, the `DataFrame` has an `index` attribute that gives access to the index labels:

In [12]:
states.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

Additionally, the `DataFrame` has a `columns` attribute, which is an `Index` object holding the column labels:

In [18]:
states.columns

Index(['population', 'area'], dtype='object')

### DataFrame as specialized dictionary

Similarly, we can also think of a Datafram as a specialization of a dictionary. Where a dictionary maps a key to a value, a `DataFrame` maps a column name to a `Series` of column data. For example, asking for the `area` attribute returns the `Series` object containig the areas we saw earlier:

In [None]:
states['area']

Notice the potential point of confusion here: in a two-dimensional NumPy array, `data[0]` will return the first *row*. For a `DataFrame`, `data['col0']` will return the first *column*. Because of this, it is probably better to think about `DataFrame`s as generalized dictionaries rather than generalized arrays, though both ways of looking at the situation can be useful.