In [1]:
%autosave 0

Autosave disabled


In [2]:
#import standard for pandas
import pandas as pd
import numpy as np
from pydataset import data

Creating series from a variety of sources.

- From a list.
- From an array.
- From a dataframe.

In [3]:
my_list = [1, 2, 3, 4, 5]
pd.Series(my_list)

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [4]:
my_array = np.array(my_list)
pd.Series(my_array)

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [5]:
mpg = data('mpg')
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


In [6]:
mpg.manufacturer

1            audi
2            audi
3            audi
4            audi
5            audi
          ...    
230    volkswagen
231    volkswagen
232    volkswagen
233    volkswagen
234    volkswagen
Name: manufacturer, Length: 234, dtype: object

In [7]:
mpg_man = mpg['manufacturer']
mpg_man

1            audi
2            audi
3            audi
4            audi
5            audi
          ...    
230    volkswagen
231    volkswagen
232    volkswagen
233    volkswagen
234    volkswagen
Name: manufacturer, Length: 234, dtype: object

Common datatypes are supported within pandas!

We can use the .astype() method to cast the values in the series as a different datatype.

In [8]:
pd.Series(my_list).astype('str')

0    1
1    2
2    3
3    4
4    5
dtype: object

We can perform vectorized operations with pandas, just like numpy!

In [9]:
my_series = pd.Series(my_list)
my_series ** 2

0     1
1     4
2     9
3    16
4    25
dtype: int64

In [10]:
my_series <= 3

0     True
1     True
2     True
3    False
4    False
dtype: bool

In [11]:
my_series[my_series <= 3]

0    1
1    2
2    3
dtype: int64

We can return information about our series using attributes.

Common attributes include:
- .index
- .values
- .dtype
- .name
- .size
- .shape

In [12]:
my_series.index

RangeIndex(start=0, stop=5, step=1)

In [13]:
my_series.values

array([1, 2, 3, 4, 5])

In [14]:
my_series.dtype

dtype('int64')

In [15]:
my_series.name = 'my_name'
my_series.name

'my_name'

In [16]:
my_series.size

5

In [17]:
my_series.shape

(5,)

There are a number of useful series methods as well.

- .head() and .tail() return the first/last 5 rows by default
- .sample() takes a random sample from the series
- .value_counts() is like group by, counting the occurrence of each unique value

In [18]:
my_series.head(2)

0    1
1    2
Name: my_name, dtype: int64

In [19]:
my_series.tail(2)

3    4
4    5
Name: my_name, dtype: int64

In [20]:
my_series.sample(2)

1    2
3    4
Name: my_name, dtype: int64

In [21]:
mpg_man.value_counts(normalize=True)

dodge         0.158120
toyota        0.145299
volkswagen    0.115385
ford          0.106838
chevrolet     0.081197
audi          0.076923
hyundai       0.059829
subaru        0.059829
nissan        0.055556
honda         0.038462
jeep          0.034188
pontiac       0.021368
land rover    0.017094
mercury       0.017094
lincoln       0.012821
Name: manufacturer, dtype: float64

We can use methods to return descriptive statistics, much like numpy!

- Common methods such as .mean(), .mode(), etc..
- Can use .describe() to compute many statistics at once!
- .nlargest() and .nsmallest() return the n number of largest/smallest values
- .sort_values() will sort our values in ascending or descending order (sound familiar?)

In [22]:
my_series.mean()

3.0

In [23]:
my_series.describe()

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
Name: my_name, dtype: float64

In [24]:
my_series.nlargest(2)

4    5
3    4
Name: my_name, dtype: int64

In [25]:
my_series.nsmallest(1)

0    1
Name: my_name, dtype: int64

In [26]:
my_series.sort_values(ascending=False)

4    5
3    4
2    3
1    2
0    1
Name: my_name, dtype: int64