In [1]:
import pandas as pd

## Create A `Series` Object from A Python List
Series is a one dimentional object with an index that can be specified (a bit like keys and values in a dictionary)

In [3]:
numbers = [1, 2, 5, 19, 27, 15]

pd.Series(numbers)

0     1
1     2
2     5
3    19
4    27
5    15
dtype: int64

Series can contain any type but all values need to share the same one

In [4]:
fruits = ["Orange", "Banana", "Apple", "Water Melon", "Grapes"]

pd.Series(fruits)

0         Orange
1         Banana
2          Apple
3    Water Melon
4         Grapes
dtype: object

## Create A `Series` Object from a Dictionary

In [2]:
Animals = {"Cat" : "Mammal",
           "Spider" : "Insect",
           "Lizard" : "Reptile",
           "peacock": "Bird"}

Animals = pd.Series(Animals)
Animals

Cat         Mammal
Spider      Insect
Lizard     Reptile
peacock       Bird
dtype: object

## Series Attributes

In [7]:
Animals.values

array(['Mammal', 'Insect', 'Reptile', 'Bird'], dtype=object)

In [19]:
Animals.index

Index(['Cat', 'Spider', 'Lizard', 'peacock'], dtype='object')

In [14]:
Animals.dtype

dtype('O')

## Basic Statistics

In [13]:
prices = [2.99, 4.45, 1.36]
prices = pd.Series(prices)
prices

0    2.99
1    4.45
2    1.36
dtype: float64

In [14]:
prices.sum()

8.8

In [15]:
prices.product()

18.095480000000006

In [16]:
prices.mean()

2.9333333333333336

In [3]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date',squeeze = True)
labour.head()

Date
2014-04-30    12303.323
2014-03-31    12329.103
2014-02-28    12330.786
2014-01-31    12126.413
2013-12-31    12266.032
Name: Value, dtype: float64

The count method will only count actuall value so if there are empty (Nan) values in the Series they won't be counted

In [8]:
labour.count()

160

In [6]:
len(labour)

160

In [4]:
labour.mean()

10937.191824999998

In [5]:
labour.sum() / labour.count()

10937.191824999998

In [7]:
labour.std()

854.5062601802221

In [8]:
labour.min()

9460.687

In [9]:
labour.max()

12330.786

In [10]:
labour.median()

10932.617999999999

In [12]:
labour.mode().head()

0    9460.687
1    9606.247
2    9610.032
3    9621.300
4    9660.729
dtype: float64

### describe
***describe*** would show us a "mini report" containing a few important statistics on our Series

In [13]:
labour.describe()

count      160.000000
mean     10937.191825
std        854.506260
min       9460.687000
25%      10074.903500
50%      10932.618000
75%      11733.583500
max      12330.786000
Name: Value, dtype: float64

We can customize the report a bit by passing a list with the precentiles we want to see

In [14]:
labour.describe(percentiles=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])

count      160.000000
mean     10937.191825
std        854.506260
min       9460.687000
10%       9787.400000
20%       9991.945800
30%      10291.299600
40%      10591.726000
50%      10932.618000
60%      11334.239000
70%      11584.858100
80%      11822.131200
90%      12054.887200
max      12330.786000
Name: Value, dtype: float64

### idxmax / idxmin
We can use these methods to get the index label of the highest / lowest values

In [15]:
labour.idxmax()

'2014-02-28'

We can verify the label we got by using it in a slicer to actually get the largest value

In [16]:
labour['2014-02-28']

12330.786

In [17]:
labour.idxmin()

'2001-01-31'

In [18]:
labour['2001-01-31']

9460.687

## Specifying the index of a series

In [21]:
Names = ["Tom", "Daniel", "John", "Mike", "Ben"]
Months = ["Jan", "Feb", "Mar", "Apr", "May"]

pd.Series(Names, Months)

Jan       Tom
Feb    Daniel
Mar      John
Apr      Mike
May       Ben
dtype: object

We can specify the actual names of the parameters 

In [22]:
Names = ["Tom", "Daniel", "John", "Mike", "Ben"]
Months = ["Jan", "Feb", "Mar", "Apr", "May"]

pd.Series(data = Names, index = Months)

Jan       Tom
Feb    Daniel
Mar      John
Apr      Mike
May       Ben
dtype: object

### head / tail
We can use the ***head*** and tail methods to display the top / bottom values of a Series (default is 5)

In [3]:
numeric_series = pd.Series(range(100,999,10))
numeric_series.head(3)

0    100
1    110
2    120
dtype: int64

In [4]:
numeric_series.tail(3)

87    970
88    980
89    990
dtype: int64

## Import `Series` with the `read_csv` Method

In [16]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv")
labour.head(3)

Unnamed: 0,Date,Value
0,2014-04-30,12303.323
1,2014-03-31,12329.103
2,2014-02-28,12330.786


### usecols, squeeze
***usecols*** is used to specify which column (or columns) to extract from data source. Column names should be written in a list format (even if it's a single name)
***squeeze*** is used to "squeeze" the output dataframe into a series. will work only if the dataframe contains a single column of values

In [18]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", usecols=['Value'], squeeze=True)
labour.head(3)

0    12303.323
1    12329.103
2    12330.786
Name: Value, dtype: float64

### index_col
We can use the ***index_col*** parameter to specify a name of a column from the data source to serve as the index of the dataframe

In [2]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date', squeeze=True)
labour.head(3)

Date
2014-04-30    12303.323
2014-03-31    12329.103
2014-02-28    12330.786
Name: Value, dtype: float64

## Built-In Functions
Python's built-in functions are general purpose programs we usually use to get the result of a calculation. Built in functions are usually flexible and can work with most object types. 

### len
Returns the number of values in a sequence

In [3]:
len(labour)

160

### type
Returns the 'type' (what kind is it) of an object

In [4]:
type(labour)

pandas.core.series.Series

### dir
Returns the full list of actions available for a given object

In [5]:
dir(labour)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_prepare__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 

### sorted
Returns a list of all values from a given sequence, sorted from small to large.

In [6]:
sorted(labour)

[9460.687,
 9606.247,
 9610.032,
 9621.3,
 9660.729,
 9668.858,
 9673.37,
 9681.951,
 9687.27,
 9716.578000000001,
 9728.218,
 9745.45,
 9747.514000000001,
 9755.23,
 9755.525,
 9777.167,
 9788.537,
 9800.487,
 9812.184000000001,
 9815.319,
 9837.163,
 9852.662,
 9879.314,
 9921.543,
 9934.935,
 9944.725,
 9951.144,
 9957.997,
 9967.605,
 9974.440999999999,
 9985.96,
 9991.440999999999,
 9992.072,
 10004.964,
 10045.788,
 10047.115,
 10065.403,
 10069.894,
 10071.561,
 10073.879,
 10075.245,
 10080.618,
 10082.245,
 10137.024,
 10138.799,
 10152.36,
 10203.325,
 10219.886999999999,
 10321.905,
 10336.625,
 10356.914,
 10381.444,
 10385.884,
 10393.681,
 10395.733,
 10399.155999999999,
 10421.38,
 10434.185,
 10443.735999999999,
 10532.276000000002,
 10567.867,
 10574.008,
 10583.229,
 10591.431999999999,
 10591.921999999999,
 10600.534,
 10618.96,
 10625.403999999999,
 10648.306,
 10662.015,
 10690.668,
 10803.983,
 10844.149,
 10846.241000000002,
 10853.188,
 10861.198999999999,
 1087

We can switch the order with a parameter

In [7]:
sorted(labour, reverse=True)

[12330.786,
 12329.103000000001,
 12303.323,
 12266.032,
 12239.493999999999,
 12179.187,
 12171.525,
 12170.788999999999,
 12164.611,
 12164.385,
 12161.747,
 12155.612,
 12132.96,
 12128.473,
 12126.413,
 12113.002,
 12048.43,
 12024.964,
 12009.204,
 11999.916000000001,
 11991.656,
 11963.315,
 11956.443000000001,
 11949.275,
 11946.315,
 11941.582,
 11935.123,
 11927.038999999999,
 11862.658000000001,
 11856.493999999999,
 11849.621000000001,
 11836.252,
 11818.601,
 11808.864,
 11800.973,
 11790.13,
 11786.017,
 11785.47,
 11774.753999999999,
 11751.426000000001,
 11727.636,
 11718.78,
 11702.778999999999,
 11694.714,
 11630.069,
 11602.524,
 11595.476,
 11589.553,
 11582.846000000001,
 11579.696000000002,
 11562.041000000001,
 11533.105,
 11529.3,
 11468.18,
 11464.02,
 11460.648000000001,
 11458.87,
 11454.317,
 11429.753,
 11425.658000000001,
 11422.68,
 11422.132,
 11410.502,
 11339.405,
 11330.795,
 11281.038999999999,
 11266.083,
 11223.216999999999,
 11207.684,
 11201.787,


### list
creates a list from a given sequence

In [8]:
list(labour)

[12303.323,
 12329.103000000001,
 12330.786,
 12126.413,
 12266.032,
 12128.473,
 12179.187,
 12239.493999999999,
 12048.43,
 12132.96,
 12170.788999999999,
 12171.525,
 12164.611,
 12161.747,
 12164.385,
 11999.916000000001,
 12155.612,
 11963.315,
 12024.964,
 12113.002,
 11849.621000000001,
 11946.315,
 11949.275,
 12009.204,
 11927.038999999999,
 11991.656,
 11935.123,
 11800.973,
 11956.443000000001,
 11818.601,
 11862.658000000001,
 11941.582,
 11727.636,
 11790.13,
 11785.47,
 11786.017,
 11774.753999999999,
 11836.252,
 11808.864,
 11694.714,
 11856.493999999999,
 11702.778999999999,
 11718.78,
 11751.426000000001,
 11533.105,
 11602.524,
 11589.553,
 11562.041000000001,
 11582.846000000001,
 11595.476,
 11579.696000000002,
 11464.02,
 11630.069,
 11422.68,
 11458.87,
 11529.3,
 11330.795,
 11410.502,
 11425.658000000001,
 11454.317,
 11429.753,
 11468.18,
 11460.648000000001,
 11281.038999999999,
 11422.132,
 11223.216999999999,
 11266.083,
 11339.405,
 11139.805,
 11201.787,


In [9]:
dict(labour)

{'2014-04-30': 12303.323,
 '2014-03-31': 12329.103000000001,
 '2014-02-28': 12330.786,
 '2014-01-31': 12126.413,
 '2013-12-31': 12266.032,
 '2013-11-30': 12128.473,
 '2013-10-31': 12179.187,
 '2013-09-30': 12239.493999999999,
 '2013-08-31': 12048.43,
 '2013-07-31': 12132.96,
 '2013-06-30': 12170.788999999999,
 '2013-05-31': 12171.525,
 '2013-04-30': 12164.611,
 '2013-03-31': 12161.747,
 '2013-02-28': 12164.385,
 '2013-01-31': 11999.916000000001,
 '2012-12-31': 12155.612,
 '2012-11-30': 11963.315,
 '2012-10-31': 12024.964,
 '2012-09-30': 12113.002,
 '2012-08-31': 11849.621000000001,
 '2012-07-31': 11946.315,
 '2012-06-30': 11949.275,
 '2012-05-31': 12009.204,
 '2012-04-30': 11927.038999999999,
 '2012-03-31': 11991.656,
 '2012-02-29': 11935.123,
 '2012-01-31': 11800.973,
 '2011-12-31': 11956.443000000001,
 '2011-11-30': 11818.601,
 '2011-10-31': 11862.658000000001,
 '2011-09-30': 11941.582,
 '2011-08-31': 11727.636,
 '2011-07-31': 11790.13,
 '2011-06-30': 11785.47,
 '2011-05-31': 11786.0

In [10]:
max(labour)

12330.786

In [11]:
min(labour)

9460.687

## More `Series` Attributes

In [12]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date', squeeze=True)

### values
Get a numpy array containing all of the values in the Series

In [13]:
labour.values

array([12303.323, 12329.103, 12330.786, 12126.413, 12266.032, 12128.473,
       12179.187, 12239.494, 12048.43 , 12132.96 , 12170.789, 12171.525,
       12164.611, 12161.747, 12164.385, 11999.916, 12155.612, 11963.315,
       12024.964, 12113.002, 11849.621, 11946.315, 11949.275, 12009.204,
       11927.039, 11991.656, 11935.123, 11800.973, 11956.443, 11818.601,
       11862.658, 11941.582, 11727.636, 11790.13 , 11785.47 , 11786.017,
       11774.754, 11836.252, 11808.864, 11694.714, 11856.494, 11702.779,
       11718.78 , 11751.426, 11533.105, 11602.524, 11589.553, 11562.041,
       11582.846, 11595.476, 11579.696, 11464.02 , 11630.069, 11422.68 ,
       11458.87 , 11529.3  , 11330.795, 11410.502, 11425.658, 11454.317,
       11429.753, 11468.18 , 11460.648, 11281.039, 11422.132, 11223.217,
       11266.083, 11339.405, 11139.805, 11201.787, 11207.684, 11168.599,
       11190.339, 11157.072, 11123.266, 11028.224, 11180.904, 10986.442,
       10969.932, 11054.325, 10853.188, 10882.889, 

### index
Get a sequence containing all the index labels (or a range object if the index wasn't changed)

In [14]:
labour.index

Index(['2014-04-30', '2014-03-31', '2014-02-28', '2014-01-31', '2013-12-31',
       '2013-11-30', '2013-10-31', '2013-09-30', '2013-08-31', '2013-07-31',
       ...
       '2001-10-31', '2001-09-30', '2001-08-31', '2001-07-31', '2001-06-30',
       '2001-05-31', '2001-04-30', '2001-03-31', '2001-02-28', '2001-01-31'],
      dtype='object', name='Date', length=160)

### dtype
Get the data type of the Series

In [15]:
labour.dtype

dtype('float64')

### is_unique
Check to see if the Series doesn't contain any duplicates

In [16]:
labour.is_unique

True

### nunique()
Get the number of unique values (if is_unique is True than this number will be equal to the size of the Series)

In [20]:
labour.nunique()

160

### ndim
Number of dimensions (always 1 for a Series)

In [17]:
labour.ndim

1

In [18]:
labour.shape

(160,)

In [19]:
labour.size

160

In [25]:
labour.describe()

count      160.000000
mean     10937.191825
std        854.506260
min       9460.687000
25%      10074.903500
50%      10932.618000
75%      11733.583500
max      12330.786000
Name: Value, dtype: float64

In [20]:
labour.name = "Labour Force Australia"

In [22]:
labour.head()

Date
2014-04-30    12303.323
2014-03-31    12329.103
2014-02-28    12330.786
2014-01-31    12126.413
2013-12-31    12266.032
Name: Labour Force Australia, dtype: float64

## sort_values()
The ***sort_values*** method allows us to sort the Series by it's values

In [95]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date', squeeze=True)

In [23]:
labour.sort_values().head()

Date
2001-01-31    9460.687
2001-03-31    9606.247
2001-08-31    9610.032
2001-02-28    9621.300
2001-07-31    9660.729
Name: Labour Force Australia, dtype: float64

In [25]:
labour.sort_values(ascending = False).head()

Date
2014-02-28    12330.786
2014-03-31    12329.103
2014-04-30    12303.323
2013-12-31    12266.032
2013-09-30    12239.494
Name: Labour Force Australia, dtype: float64

### inplace
By default changes we make to our series object (like sorting the values) will only be calculated without actually affecting the object. We can change that using the ***inplace*** parameter

In [35]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date', squeeze=True)

In [36]:
labour.sort_values().head()

Date
2001-01-31    9460.687
2001-03-31    9606.247
2001-08-31    9610.032
2001-02-28    9621.300
2001-07-31    9660.729
Name: Value, dtype: float64

Notice how after sorting the values the order doesn't stick

In [37]:
labour.head()

Date
2014-04-30    12303.323
2014-03-31    12329.103
2014-02-28    12330.786
2014-01-31    12126.413
2013-12-31    12266.032
Name: Value, dtype: float64

Using the ***inplace*** parameter will make it permanant (no calculation will appear this time as the code performing the change on the object instead of just calculating it and displaying the result)

In [31]:
labour.sort_values(inplace=True)

In [34]:
labour.head()

Date
2001-01-31    9460.687
2001-03-31    9606.247
2001-08-31    9610.032
2001-02-28    9621.300
2001-07-31    9660.729
Name: Value, dtype: float64

### sort_index()
Allows us to sort the Series by it's index labels

In [38]:
labour.sort_index(ascending = True, inplace = True)

In [39]:
labour.head()

Date
2001-01-31    9460.687
2001-02-28    9621.300
2001-03-31    9606.247
2001-04-30    9687.270
2001-05-31    9668.858
Name: Value, dtype: float64

### in
Using the ***in*** keyword to check for existence in the Series

In [40]:
labour = pd.read_csv("Labour-Force-Monthly-Australia.csv", index_col='Date', squeeze=True)

In [43]:
labour.head()

Date
2014-04-30    12303.323
2014-03-31    12329.103
2014-02-28    12330.786
2014-01-31    12126.413
2013-12-31    12266.032
Name: Value, dtype: float64

In [44]:
'2014-04-30' in labour

True

In [45]:
'2014-04-30' in labour.index

True

In [46]:
12303.323 in labour

False

In [47]:
12303.323 in labour.values

True

## Extract Values by Index Position

In [2]:
regions = pd.read_csv("noc_regions.csv", usecols=['NOC', 'region'], index_col='NOC', squeeze=True)

In [54]:
regions.head(20)

NOC
AFG       Afghanistan
AHO           Curacao
ALB           Albania
ALG           Algeria
AND           Andorra
ANG            Angola
ANT           Antigua
ANZ         Australia
ARG         Argentina
ARM           Armenia
ARU             Aruba
ASA    American Samoa
AUS         Australia
AUT           Austria
AZE        Azerbaijan
BAH           Bahamas
BAN        Bangladesh
BAR          Barbados
BDI           Burundi
BEL           Belgium
Name: region, dtype: object

We can use slicers like regular array in numpy even if we replaced the index. Using the index position will return only the value for that label (much like getting a value from a dictionary using it's key)

In [52]:
regions[0]

'Afghanistan'

Slicing more then one row will return that part of the series (index + values)

In [56]:
regions[3:7]

NOC
ALG    Algeria
AND    Andorra
ANG     Angola
ANT    Antigua
Name: region, dtype: object

In [57]:
regions[:7]

NOC
AFG    Afghanistan
AHO        Curacao
ALB        Albania
ALG        Algeria
AND        Andorra
ANG         Angola
ANT        Antigua
Name: region, dtype: object

In [59]:
regions[-6:-2]

NOC
YAR     Yemen
YEM     Yemen
YMD     Yemen
YUG    Serbia
Name: region, dtype: object

In [58]:
regions[[10,20,30,10]]

NOC
ARU      Aruba
BEN      Benin
BRN    Bahrain
ARU      Aruba
Name: region, dtype: object

## Extract Values by Index Label
In cases where we change the index we can use the original index positions (like shown above) or the new index labels to "slice" specific rows and get the value for specified labels

In [60]:
regions['ARU']

'Aruba'

In [63]:
regions[['ARG', 'VNM', 'ZIM']]

NOC
ARG    Argentina
VNM      Vietnam
ZIM     Zimbabwe
Name: region, dtype: object

We can use labels to define a range. Notice that in these cases the last value is included (unlike numeric ranges)

In [65]:
regions['ARG':'BER']

NOC
ARG         Argentina
ARM           Armenia
ARU             Aruba
ASA    American Samoa
AUS         Australia
AUT           Austria
AZE        Azerbaijan
BAH           Bahamas
BAN        Bangladesh
BAR          Barbados
BDI           Burundi
BEL           Belgium
BEN             Benin
BER           Bermuda
Name: region, dtype: object

In case of specifying a wrong label we usually get an error. if the missing label is part of a sequence of labels we will get the result with the missing label getting a ***Nan*** for value

In [68]:
regions[['USA', 'ISR', 'ENG']]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


NOC
USA       USA
ISR    Israel
ENG       NaN
Name: region, dtype: object

### .get()
Used to avoide key error in case of nonexistent index label

In [69]:
regions = pd.read_csv("noc_regions.csv", usecols=['NOC', 'region'], index_col='NOC', squeeze=True)
regions.head()

NOC
AFG    Afghanistan
AHO        Curacao
ALB        Albania
ALG        Algeria
AND        Andorra
Name: region, dtype: object

In [71]:
regions.get(key = "AFG")

'Afghanistan'

In [70]:
regions.get(key = ["AFG", "AND"])

NOC
AFG    Afghanistan
AND        Andorra
Name: region, dtype: object

In [72]:
regions.get(key = "TLV")

In [3]:
regions.get(key = "TLV", default='Invalid Location Code')

'Invalid Location Code'

### value_counts
Returns a summary of the Series that shows how many occurances of each value are in the Series. Results are also sorted in descending error

In [7]:
customers = pd.read_csv('UK_Bank_Customers.csv', usecols=['Customer ID', 'Region'], index_col='Customer ID', squeeze=True)
customers.head()

Customer ID
100000001             England
400000002    Northern Ireland
100000003             England
300000004               Wales
100000005             England
Name: Region, dtype: object

In [10]:
customers.value_counts()

England             2159
Scotland            1124
Wales                520
Northern Ireland     211
Name: Region, dtype: int64

In [11]:
customers.value_counts().sum()

4014

In [12]:
customers.count()

4014

In [13]:
customers.value_counts(ascending = True)

Northern Ireland     211
Wales                520
Scotland            1124
England             2159
Name: Region, dtype: int64

### apply()
The ***apply*** method gets a function as argument and will apply it to each value in the Series

In [18]:
customers = pd.read_csv('UK_Bank_Customers.csv', usecols=['Customer ID', 'Balance'], index_col='Customer ID', squeeze=True)
customers.head()

Customer ID
100000001    113810.15
400000002     36919.73
100000003    101536.83
300000004      1421.52
100000005     35639.79
Name: Balance, dtype: float64

In [19]:
def customer_priority(balance):
    if balance <= 1000:
        return "Low"
    elif 1000 < balance <= 20000:
        return "Medium"
    elif 20000 < balance <= 100000:
        return "High"
    else:
        return "VIP"

In [20]:
customers.apply(customer_priority).head(15)

Customer ID
100000001       VIP
400000002      High
100000003       VIP
300000004    Medium
100000005      High
300000006       VIP
100000007      High
200000008      High
300000009      High
100000010    Medium
100000011      High
100000012      High
100000013      High
200000014      High
300000015    Medium
Name: Balance, dtype: object

In [22]:
customers.sort_values().apply(customer_priority).head()

Customer ID
100001320    Low
400000075    Low
200000775    Low
400002046    Low
300003468    Low
Name: Balance, dtype: object

Instead of writing an entire function we can use Lambda expressions

In [25]:
customers.apply(lambda balance : balance * 1.1 if balance > 50000 else balance * 0.9).head()

Customer ID
100000001    125191.165
400000002     33227.757
100000003    111690.513
300000004      1279.368
100000005     32075.811
Name: Balance, dtype: float64

### map()

In [28]:
sales = pd.read_csv('AdventureWorks-Sales-2015.csv', usecols=['OrderNumber', 'ProductKey'], index_col='OrderNumber', squeeze=True)
sales.head()

OrderNumber
SO45080    332
SO45079    312
SO45082    350
SO45081    338
SO45083    312
Name: ProductKey, dtype: int64

In [30]:
products = pd.read_csv('AdventureWorks-Products.csv', usecols=['ProductKey', 'ProductName'], index_col='ProductKey', squeeze=True)
products.head()

ProductKey
214      Sport-100 Helmet, Red
215    Sport-100 Helmet, Black
218     Mountain Bike Socks, M
219     Mountain Bike Socks, L
220     Sport-100 Helmet, Blue
Name: ProductName, dtype: object

In [31]:
sales.map(products).head()

OrderNumber
SO45080        Road-650 Black, 58
SO45079          Road-150 Red, 48
SO45082    Mountain-100 Black, 44
SO45081        Road-650 Black, 44
SO45083          Road-150 Red, 48
Name: ProductKey, dtype: object