# Introducing Pandas Objects

At a basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than interger indices.
The three fundamental Pandas data structures are:
  * Series
  * DataFrame
  * Index

In [3]:
#standard numpy/pandas imports
import numpy as np
import pandas as pd

## The Pandas Series Object
A pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows:

In [5]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

The output shows that the Series wraps both a sequence of values and a sequence of indices. Both of these can be accessed with the values and index attributes.

In [7]:
data.values

array([ 0.25,  0.5 ,  0.75,  1.  ])

In [8]:
data.index

RangeIndex(start=0, stop=4, step=1)

Like a NumPy array, data can be accessed by associated index.

In [9]:
data[1]

0.5

In [10]:
data[1:3]

1    0.50
2    0.75
dtype: float64

### Series as a generalized NumPy array
The essential differences between a NumPy array and a Series are that the Series has an explicitly defined index associated with values while the NumPy array is implicit.

This give the Series additional capablities. For example, we can set the index to what ever we want, in the following case we can use strings as the index.

In [12]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [13]:
data['b']

0.5

### Series as a specialized dictionary
Series can also be looked at like a specialization of Python's built in dictionary. A dictionary maps arbitrary keys to a set of arbitrary values. A Pandas Series however used *typed* keys and values making it noticably more efficient than a dict. 

A series can be constructed directly from a python dict.

In [26]:
#made up values
population_dict = {'California': 7849781947,
                  'Texas': 72401374,
                  'New York': 8491041,
                  'Florida': 91348104,
                  'Illinois': 7381941}
population = pd.Series(population_dict)
population

California    7849781947
Florida         91348104
Illinois         7381941
New York         8491041
Texas           72401374
dtype: int64

In [19]:
population['California']


7849781947

### Other ways to construct Series objects


In [20]:
pd.Series([2,5,1])

0    2
1    5
2    1
dtype: int64

The data can be scalar that will be repeated to the specified index.

In [21]:
pd.Series(5, index=[1,2,3])

1    5
2    5
3    5
dtype: int64

Data can be created like a python dict where index defaults to sorted dict keys.

In [23]:
pd.Series({2:'b', 1:'a', 3:'c'})

1    a
2    b
3    c
dtype: object

## The Pandas DataFrame Object
The next fundamental data structure is the Pandas DataFrame. It can be thought of as a generalized NumPy array.

### Dataframe as a generalized NumPy Array
If a Series is an analog of a one-dimensional array with flexible indices, a DataFrame is an analog of a two dimensional array with flexibility in row indices and column names. Just as a two-dimensional array is a sequence of aligned one-dimensional arrays, a DataFrame is a sequence of aligned Series objects. "Aligned" means they share the same index.

In [24]:
#construct new series of area codes
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
dtype: int64

Now that we have a population series and an area code series we can combine them into a two-dimensional DataFrame.

In [25]:
states = pd.DataFrame({'population': population, 'area': area})
states

Unnamed: 0,area,population
California,423967,7849781947
Florida,170312,91348104
Illinois,149995,7381941
New York,141297,8491041
Texas,695662,72401374


DataFrames have the following attributes:
 * index
 * columns

In [28]:
states.index

Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

In [29]:
states.columns

Index(['area', 'population'], dtype='object')

### DataFrames as specialized dictionaries
Dataframes can also be seen as specialized versions of pythons builtin dict. Where dicts map keys to values, DataFrames map column name to a Series of a column Data. For example, 'area' attribute returns the Series object containing the area Series.

In [30]:
states['area']

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
Name: area, dtype: int64

In [31]:
states['population']

California    7849781947
Florida         91348104
Illinois         7381941
New York         8491041
Texas           72401374
Name: population, dtype: int64

### Constructing DataFrame objects
DataFrames can be constructed in a variety of ways

**From a single Series object**

A single Series object would lead to a DataFrame with one column

In [32]:
pd.DataFrame(population, columns=['population'])

Unnamed: 0,population
California,7849781947
Florida,91348104
Illinois,7381941
New York,8491041
Texas,72401374
