Pandas is a data manipulation library built on top of numpy and provides an efficient implementation of a dataframe, multidimensional arrays with attached column and row labels. It also provides data operations similar to that of a spreadsheet or database

In [2]:
import pandas as pd
pd.__version__

'1.3.4'

The Pandas Series object is a one dimensional array of indexed data. It can be created from a list or array

In [7]:
data = pd.Series([1,2,3,4,])
data

array([1, 2, 3, 4])

In [8]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [9]:
data.values

array([1, 2, 3, 4])

The primary difference between a Series and a Numpy array is that a Series has explicit index definitions. This gives a Series additional capabilities such as allowing it's index to be any value (like a string), not just an integer.

In [10]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

This essentially makes a Series a fancy dictionary, the main difference being that a Series is vastly more effecient than a normal python dictionary. 

A series can also be constructed from a dictionary.

In [11]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135
}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [12]:
population['California']

38332521

Series's also support array operations like slicing

In [14]:
population['California':'New York']

California    38332521
Texas         26448193
New York      19651127
dtype: int64

A Dataframe can be described as a two dimensional array with flexible row indices and flexible column names. It can also be seen as a sequence of Pandas Series objects.

In [16]:
area_dict = {
    'California': 423967, 
    'Texas': 695662, 
    'New York': 141297,
    'Florida': 170312, 
    'Illinois': 149995
}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

In [None]:
states = pd.DataFrame({
    'population': population,
    'area': area,
})