## Pandas Series

In [1]:
import pandas as pd

### Lambda functions

In [2]:
f = lambda x: x + 10
f(7)

17

In [3]:
f = lambda x,y: x+y
f(10, 3)

13

In [4]:
# Lambda function that returns "Even" if the number is even, otherwise "Odd"
check_even_odd = lambda x: "Even" if x % 2 == 0 else "Odd"
check_even_odd(5)

'Odd'

Exercise: Write a lambda functions that sums 3 numbers. 

In [5]:
sums_three = lambda x,y,z: x + y + z
sums_three(3,4,5)

12

### Map
The map function applies a given function to each item of an iterable (like a list, tuple, etc.) and returns a map object (which is an iterator) containing the results.

In [6]:
def square(x):
    return x * x

numbers = [1, 2, 3, 4]
result = map(square, numbers)

print(list(result))


[1, 4, 9, 16]


Exercise: Using map write a code snipet that takes a log of every number on a list.

In [7]:
# multiple iterables
def add(x, y):
    return x + y

numbers1 = [1, 2, 3]
numbers2 = [4, 5, 6]
result = map(add, numbers1, numbers2)

print(list(result))

[5, 7, 9]


### Combining Map and Lambda
Exercise: rewrite previous examples with a lambda function.

In [8]:
numbers = [1, 2, 3, 4]
result = map(lambda x: x * x, numbers)

print(list(result))

[1, 4, 9, 16]


## Pandas Series
A Pandas Series is a one-dimensional labeled array that can hold data of any type, such as integers, floats, strings, or objects. Each element in a Series is associated with an index label, making it similar to both an array and a dictionary. The index labels allow for easy access and manipulation of data. Series are a fundamental building block in Pandas and are often used to represent columns in a DataFrame.

In [9]:
# gets a pandas dataframe
data_path = "../data/country-data.csv"
country = pd.read_csv(data_path)
country.head()

Unnamed: 0,country,child_mort,exports,health,imports,income,inflation,life_expec,total_fer,gdpp
0,Afghanistan,90.2,10.0,7.58,44.9,1610,9.44,56.2,5.82,553
1,Albania,16.6,28.0,6.55,48.6,9930,4.49,76.3,1.65,4090
2,Algeria,27.3,38.4,4.17,31.4,12900,16.1,76.5,2.89,4460
3,Angola,119.0,62.3,2.85,42.9,5900,22.4,60.1,6.16,3530
4,Antigua and Barbuda,10.3,45.5,6.03,58.9,19100,1.44,76.8,2.13,12200


In [16]:
# gets a pandas series with index country and values life_expec
life_exp = pd.read_csv(data_path, index_col="country", usecols=["life_expec", "country"])
life_exp = life_exp.squeeze() # to make it a series

In [21]:
life_exp.head().index

Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda'], dtype='object', name='country')

In [22]:
life_exp.head().values

array([56.2, 76.3, 76.5, 60.1, 76.8])

In [23]:
life_exp.shape

(167,)

### Subseting and filtering by position: iloc

In [25]:
life_exp.iloc[0: 3]

country
Afghanistan    56.2
Albania        76.3
Algeria        76.5
Name: life_expec, dtype: float64

Exercise: 
1. Get the last 10 elements from the series.
2. Get element 33.

### Subseting and filtering by label (index)

In [29]:
life_exp["Spain"], life_exp["Portugal"]

(81.9, 79.8)

In [32]:
life_exp["Servia" : "Spain"]

country
Seychelles         73.4
Sierra Leone       55.0
Singapore          82.7
Slovak Republic    75.5
Slovenia           79.5
Solomon Islands    61.7
South Africa       54.3
South Korea        80.1
Spain              81.9
Name: life_expec, dtype: float64

In [33]:
life_exp.loc["Servia" : "Spain"]

country
Seychelles         73.4
Sierra Leone       55.0
Singapore          82.7
Slovak Republic    75.5
Slovenia           79.5
Solomon Islands    61.7
South Africa       54.3
South Korea        80.1
Spain              81.9
Name: life_expec, dtype: float64

### Filtering with boolean masks

In [39]:
# creating a smaller series
ser_small = life_exp.head(10)
ser_small

country
Afghanistan            56.2
Albania                76.3
Algeria                76.5
Angola                 60.1
Antigua and Barbuda    76.8
Argentina              75.8
Armenia                73.3
Australia              82.0
Austria                80.5
Azerbaijan             69.1
Name: life_expec, dtype: float64

In [46]:
# creating a boolean mask
ser_small > 74

country
Afghanistan            False
Albania                 True
Algeria                 True
Angola                 False
Antigua and Barbuda     True
Argentina               True
Armenia                False
Australia               True
Austria                 True
Azerbaijan             False
Name: life_expec, dtype: bool

In [59]:
# another boolean mask
(ser_small > 74) & (ser_small > 80)

country
Afghanistan            False
Albania                False
Algeria                False
Angola                 False
Antigua and Barbuda    False
Argentina              False
Armenia                False
Australia               True
Austria                 True
Azerbaijan             False
Name: life_expec, dtype: bool

In [58]:
# filtering
ser_small[(ser_small > 74) & (ser_small > 80)]

country
Australia    82.0
Austria      80.5
Name: life_expec, dtype: float64

### Filtering with .filter()

In [50]:
# Select a subset of the Series based on the specified labels or index values. 
life_exp.filter(["Spain", "Portugal", "France"])

Spain       81.9
Portugal    79.8
France      81.4
Name: life_expec, dtype: float64

In [53]:
# similar to this
life_exp.loc[["Spain", "Portugal", "France"]]

country
Spain       81.9
Portugal    79.8
France      81.4
Name: life_expec, dtype: float64

In [55]:
# more interesting filtering 
# countries starting with A
life_exp.filter(regex="^A")

country
Afghanistan            56.2
Albania                76.3
Algeria                76.5
Angola                 60.1
Antigua and Barbuda    76.8
Argentina              75.8
Armenia                73.3
Australia              82.0
Austria                80.5
Azerbaijan             69.1
Name: life_expec, dtype: float64

In [56]:
life_exp.filter(like="z")

country
Azerbaijan                69.1
Belize                    71.4
Bosnia and Herzegovina    76.8
Brazil                    74.2
Czech Republic            77.5
Kazakhstan                68.4
Kyrgyz Republic           68.5
Mozambique                54.5
Switzerland               82.2
Tanzania                  59.3
Uzbekistan                68.8
Venezuela                 75.4
Name: life_expec, dtype: float64

### .where()
The where function in Pandas is used to replace elements in a Series (or DataFrame) based on a condition.

In [67]:
ser_small.where(lambda x: x > 70, other="LOW")

country
Afghanistan             LOW
Albania                76.3
Algeria                76.5
Angola                  LOW
Antigua and Barbuda    76.8
Argentina              75.8
Armenia                73.3
Australia              82.0
Austria                80.5
Azerbaijan              LOW
Name: life_expec, dtype: object

In [65]:
# Note the NaN
ser_small.where(lambda x: x > 70)

country
Afghanistan             NaN
Albania                76.3
Algeria                76.5
Angola                  NaN
Antigua and Barbuda    76.8
Argentina              75.8
Armenia                73.3
Australia              82.0
Austria                80.5
Azerbaijan              NaN
Name: life_expec, dtype: float64

In [66]:
ser_small.where(lambda x: x > 70).dropna()

country
Albania                76.3
Algeria                76.5
Antigua and Barbuda    76.8
Argentina              75.8
Armenia                73.3
Australia              82.0
Austria                80.5
Name: life_expec, dtype: float64

### .mask()
The mask function in a Pandas Series is used to replace elements based on a condition, similar to the where function but with the logic reversed. 

In [72]:
ser_small.mask(lambda x: x > 75)

country
Afghanistan            56.2
Albania                 NaN
Algeria                 NaN
Angola                 60.1
Antigua and Barbuda     NaN
Argentina               NaN
Armenia                73.3
Australia               NaN
Austria                 NaN
Azerbaijan             69.1
Name: life_expec, dtype: float64

### .map()
Used for element-wise transformations of a Pandas Series. It is especially useful when you want to replace values in a Series based on a mapping relationship or when applying simple functions.

In [73]:
s = pd.Series(['cat', 'dog', 'rabbit'])

# Using map with a dictionary
mapping = {'cat': 'meow', 'dog': 'woof', 'rabbit': 'squeak'}
s.map(mapping)

0      meow
1      woof
2    squeak
dtype: object

In [74]:
# Using map with a lambda function
s.map(lambda x: x.upper())

0       CAT
1       DOG
2    RABBIT
dtype: object

### .apply()
The apply function is more general and flexible than map. It is defined on Series and DataFrames. Map is defined only on series.

In [77]:
s = pd.Series(['apple', 'banana', 'cherry'])

# Function that adds a prefix and a suffix
def add_prefix_suffix(word, prefix, suffix):
    return f"{prefix}{word}{suffix}"

# Using apply with a lambda to pass extra arguments
s.apply(lambda x: add_prefix_suffix(x, 'fruit_', '_pie'))

0     fruit_apple_pie
1    fruit_banana_pie
2    fruit_cherry_pie
dtype: object

### .value_counts()

In [83]:
s = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])

# Count occurrences of each unique value
s.value_counts()

banana    3
apple     2
orange    1
Name: count, dtype: int64