In [41]:
import numpy as np 
import pandas as pd

## The Pandas Series Object

Pandas Series is a one-dimensional array of indexed data

In [42]:
data = np.linspace(0,1,5)
data = pd.Series(data[1:])  # equivalently pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

As we see the output, the Series wraps both a sequence of values and a sequence of indices, which we can access with the `value` and `index` attribute

In [43]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [44]:
data.index

RangeIndex(start=0, stop=4, step=1)

data can be accessed by the associated index via the familiar Python square-bracket notation


In [45]:
data[1]

0.5

In [46]:
data[1:2]

1    0.5
dtype: float64

### Series as generalized Numpy array

Numpy Array has an **implicity** defined integer index used to access the values

Pandas Series has an **explicity** defined index associated with the values

The explicit index definition gives the Series object additional capabilities

In [47]:
# for example, we can use string as an index
data = pd.Series([.25, .5, .75, 1.0],
                index=list('abcd'))
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [48]:
data['b']


0.5

### Series as specialized dictionary
By default, Pandas Series will be created where index is drawn from the sorted keys

In [49]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}

In [50]:
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [51]:
population['California']

38332521

Unlike a dictionary,though, the Series also supports array-like-style operation such as slicing

In [52]:
population['Texas':'Florida']

Texas       26448193
New York    19651127
Florida     19552860
dtype: int64

### Constructing Series Object
`pd.Series(data[, index=index])`

In [53]:
pd.Series([1,2,3])

0    1
1    2
2    3
dtype: int64

In [54]:
pd.Series([2,1,3],index=[100,200,300])

100    2
200    1
300    3
dtype: int64

`data` can be a dictionary, in which `index` defaults to the **sorted??** dictionary keys 

In [55]:
print(pd.Series({3:'c',1:'a', 2:'b'},index=[1,2,3]))


1    a
2    b
3    c
dtype: object


## The Pandas DataFrame Object
the Pandas DataFrame can be thought of either as a generalization of a Numpy array, or as a specialization of a Python dictionary

### DataFrame as a generalized Numpy array

A DataFrame is an analog of a two-dimensional array with both fiexible row indices and fiexible column names

DataFrame like a sequence of aligned `Series` objects. 'aligned' we mean that they share the same index

In [56]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
print(area)
print(population)

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64


In [57]:
states = pd.DataFrame({'population':population, 'area':area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


Like Series object, the DataFrame also has an index attribute that gives access to the index label

In [58]:
states.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

Additionally, the DataFrame has a `column` attribute, which is an `index` object holding the column labels

In [59]:
states.columns

Index(['population', 'area'], dtype='object')

### DataFrame as specialized dictionary
Where a dictionary maps to a key to a value, a `DataFrame` maps a column name to a `Series` of column data.

eg: {column_name:Series_data}


In [60]:
states['area']

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

Notice the potential point of confusion here: in a two-dimesnional NumPy array, ``data[0]`` will return the first *row*. For a ``DataFrame``, ``data['col0']`` will return the first *column*.