A Pandas series is a one-dimensional array-like object that can hold many data types, such as numbers or strings. One of the main differences between Pandas Series and NumPy ndarrays is that you can assign an index label to each element in the Pandas Series. In other words, you can name the indices of your Pandas Series anything you want. Another big difference between Pandas Series and NumPy ndarrays is that Pandas Series can hold data of different data types.

In [2]:
# pd is the standard alias
import pandas as pd

In [45]:
# A panda series is a 1-dimentional array-like object, that can hold many data types.
# You can assign an index label to each element in the Series
groceries = pd.Series(data=[30, 6, 'yes', 'no'], index=['eggs', 'apples', 'milk', 'bread'])
groceries

eggs       30
apples      6
milk      yes
bread      no
dtype: object

In [12]:
# View properties of the Series
groceries.shape

(4,)

In [11]:
groceries.ndim

1

In [13]:
groceries.size

4

In [15]:
groceries.index

Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

In [16]:
groceries.values

array([30, 6, 'yes', 'no'], dtype=object)

In [18]:
'banana' in groceries

False

In [19]:
'bread' in groceries

True

One great advantage of Pandas Series is that it allows us to access data in many different ways. Elements can be accessed using index labels or numerical indices inside square brackets, [ ], similar to how we access elements in NumPy ndarrays. Since we can use numerical indices, we can use both positive and negative integers to access data from the beginning or from the end of the Series, respectively. Since we can access elements in various ways, in order to remove any ambiguity to whether we are referring to an index label or numerical index, Pandas Series have two attributes, .loc and .iloc to explicitly state what we mean. The attribute .loc stands for location and it is used to explicitly state that we are using a labeled index. Similarly, the attribute .iloc stands for integer location and it is used to explicitly state that we are using a numerical index. 

In [29]:
groceries

eggs       30
apples      6
milk      yes
bread      no
dtype: object

In [22]:
groceries['eggs']

30

In [25]:
## can specify a LIST of index labels (list is inside square brackets)
groceries[['milk','bread']]

milk     yes
bread     no
dtype: object

In [27]:
groceries[0]

30

In [30]:
groceries[-1]

'no'

In [32]:
# .loc for labelled index, .iloc for integer index
groceries.loc['eggs']

30

In [37]:
groceries.iloc[[0,3]]

eggs     30
bread    no
dtype: object

In [40]:
# drop apples out of place
groceries.drop('apples')

eggs      30
milk     yes
bread     no
dtype: object

In [46]:
# drop apples in place
groceries.drop('apples', inplace=True)
groceries

eggs      30
milk     yes
bread     no
dtype: object

Just like with NumPy ndarrays, we can perform element-wise arithmetic operations on Pandas Series. In this lesson we will look at arithmetic operations between Pandas Series and single numbers. Let's create a new Pandas Series that will hold a grocery list of just fruits.

In [49]:
fruits = pd.Series([10, 6, 3],['apples','oranges','bananas'])
fruits

apples     10
oranges     6
bananas     3
dtype: int64

In [51]:
fruits * 2

apples     20
oranges    12
bananas     6
dtype: int64

In [53]:
fruits / 2

apples     5.0
oranges    3.0
bananas    1.5
dtype: float64

You can also apply mathematical functions from NumPy, such assqrt(x), to all elements of a Pandas Series.

In [57]:
import numpy as np
np.power(fruits, 2)

apples     100
oranges     36
bananas      9
dtype: int64

In [59]:
fruits['bananas'] + 2

5

In [62]:
fruits.iloc[0] - 2

8

In [64]:
fruits.loc[['bananas','apples']] * 2

bananas     6
apples     20
dtype: int64

You can also apply arithmetic operations on Pandas Series of mixed data type provided that the arithmetic operation is defined for all data types in the Series, otherwise you will get an error. Let's see what happens when we multiply our grocery list by 2



In [69]:
fruits = pd.Series([1,2,3,'Yes','No'])
print(fruits)
print(fruits * 2)

0      1
1      2
2      3
3    Yes
4     No
dtype: object
0         2
1         4
2         6
3    YesYes
4      NoNo
dtype: object


Quiz

In [71]:
distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]

planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']


In [74]:
planet_distances = pd.Series(distance_from_sun,planets)
print(planet_distances)

Earth       149.6
Saturn     1433.5
Mars        227.9
Venus       108.2
Jupiter     778.6
dtype: float64


In [78]:
planet_times = planet_distances / 18
print(planet_times)

Earth       8.311111
Saturn     79.638889
Mars       12.661111
Venus       6.011111
Jupiter    43.255556
dtype: float64


In [81]:
planet_times[planet_times < 40]

Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64

Pandas DataFrames are two-dimensional data structures with labeled rows and columns, that can hold many data types. If you are familiar with Excel, you can think of Pandas DataFrames as being similar to a spreadsheet. We can create Pandas DataFrames manually or by loading data from a file. In these lessons we will start by learning how to create Pandas DataFrames manually from dictionaries and later we will see how we can load data into a DataFrame from a data file.

We will start by creating a DataFrame manually from a dictionary of Pandas Series. In this case the first step is to create the dictionary of Pandas Series. After the dictionary is created we can then pass the dictionary to the pd.DataFrame() function.

We will create a dictionary that contains items purchased by two people, Alice and Bob, on an online store. The Pandas Series will use the price of the items purchased as data, and the purchased items will be used as the index labels to the Pandas Series.

In [91]:
items = {'Bob':pd.Series([123,412,34],['bike','pants','watch']) , 'Alice':pd.Series([645,23,43],['bike','pants','glasses'])}
print(items)

{'Bob': bike     123
pants    412
watch     34
dtype: int64, 'Alice': bike       645
pants       23
glasses     43
dtype: int64}


In [94]:
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Bob,Alice
bike,123.0,645.0
glasses,,43.0
pants,412.0,23.0
watch,34.0,


In [96]:
shopping_carts.values

array([[123., 645.],
       [ nan,  43.],
       [412.,  23.],
       [ 34.,  nan]])

In [98]:
shopping_carts.ndim

2

There might be cases when you are only interested in a subset of the data. Pandas allows us to select which data we want to put into our DataFrame by means of the keywords columns and index. Let's see some examples:

In [101]:
pd.DataFrame(shopping_carts, index=['bike','pants'])

Unnamed: 0,Bob,Alice
bike,123.0,645.0
pants,412.0,23.0


In [107]:
pd.DataFrame(shopping_carts, columns=['Bob'])

Unnamed: 0,Bob
bike,123.0
glasses,
pants,412.0
watch,34.0


You can also manually create DataFrames from a dictionary of lists (arrays). The procedure is the same as before, we start by creating the dictionary and then passing the dictionary to the pd.DataFrame() function. In this case, however, all the lists (arrays) in the dictionary must be of the same length. Notice that since the data dictionary we created doesn't have label indices, Pandas automatically uses numerical row indexes when it creates the DataFrame. We can however, put labels to the row index by using the index keyword in the pd.DataFrame() function. Let's see an example

In [111]:
data = {'ints':[1,2,3],'floats':[1.4,4.2,5.1]}
data

{'ints': [1, 2, 3], 'floats': [1.4, 4.2, 5.1]}

In [113]:
pd.DataFrame(data)

Unnamed: 0,ints,floats
0,1,1.4
1,2,4.2
2,3,5.1


In [115]:
pd.DataFrame(data, index=['row1','row2','row3'])

Unnamed: 0,ints,floats
row1,1,1.4
row2,2,4.2
row3,3,5.1
