#### Pandas Tutorial - Part 43

This notebook covers various Series methods including:
- Combining Series with `combine()` and `combine_first()`
- Converting data types with `convert_dtypes()`
- Dropping elements with `drop()` and `drop_duplicates()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Combining Series

Pandas provides methods to combine Series objects in different ways.

### The `combine()` Method

The `combine()` method combines two Series using a specified function that operates on each pair of elements.

In [2]:
# Create two Series with bird speeds
s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0})
s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})

print("Series 1:")
print(s1)
print("\nSeries 2:")
print(s2)

Series 1:
falcon    330.0
eagle     160.0
dtype: float64

Series 2:
falcon    345.0
eagle     200.0
duck       30.0
dtype: float64


In [3]:
# Combine using max function
result = s1.combine(s2, max)
print("Combined Series (max):")
print(result)

Combined Series (max):
duck        NaN
eagle     200.0
falcon    345.0
dtype: float64


In [4]:
# Combine with fill_value
result_filled = s1.combine(s2, max, fill_value=0)
print("Combined Series (max) with fill_value=0:")
print(result_filled)

Combined Series (max) with fill_value=0:
duck       30.0
eagle     200.0
falcon    345.0
dtype: float64


In [5]:
# Combine using min function
result_min = s1.combine(s2, min, fill_value=1000)
print("Combined Series (min) with fill_value=1000:")
print(result_min)

Combined Series (min) with fill_value=1000:
duck       30.0
eagle     160.0
falcon    330.0
dtype: float64


In [6]:
# Custom combining function
def average(x, y):
    return (x + y) / 2

result_avg = s1.combine(s2, average, fill_value=0)
print("Combined Series (average) with fill_value=0:")
print(result_avg)

Combined Series (average) with fill_value=0:
duck       15.0
eagle     180.0
falcon    337.5
dtype: float64


### The `combine_first()` Method

The `combine_first()` method combines two Series by using values from the calling Series where available, and values from the other Series otherwise.

In [7]:
# Create two Series with some missing values
s1 = pd.Series([1, np.nan, 3, np.nan])
s2 = pd.Series([np.nan, 4, 5, 6])

print("Series 1:")
print(s1)
print("\nSeries 2:")
print(s2)

Series 1:
0    1.0
1    NaN
2    3.0
3    NaN
dtype: float64

Series 2:
0    NaN
1    4.0
2    5.0
3    6.0
dtype: float64


In [8]:
# Combine first
result = s1.combine_first(s2)
print("Combined Series (s1 values prioritized):")
print(result)

Combined Series (s1 values prioritized):
0    1.0
1    4.0
2    3.0
3    6.0
dtype: float64


In [9]:
# Reverse the order
result_reversed = s2.combine_first(s1)
print("Combined Series (s2 values prioritized):")
print(result_reversed)

Combined Series (s2 values prioritized):
0    1.0
1    4.0
2    5.0
3    6.0
dtype: float64


In [10]:
# With different indices
s3 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s4 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])

print("Series 3:")
print(s3)
print("\nSeries 4:")
print(s4)

result_diff_idx = s3.combine_first(s4)
print("\nCombined Series (s3 values prioritized):")
print(result_diff_idx)

Series 3:
a    1
b    2
c    3
dtype: int64

Series 4:
b    4
c    5
d    6
dtype: int64

Combined Series (s3 values prioritized):
a    1
b    2
c    3
d    6
dtype: int64


##### Converting Data Types

Pandas provides methods to convert data types in a Series.

### The `convert_dtypes()` Method

The `convert_dtypes()` method converts columns to the best possible dtypes using dtypes supporting pandas NA values.

In [11]:
# Create a Series with mixed types
s = pd.Series([1, 2, None, 'a', 'b', True, False])
print("Original Series:")
print(s)
print("\nData type:", s.dtype)

Original Series:
0        1
1        2
2     None
3        a
4        b
5     True
6    False
dtype: object

Data type: object


In [12]:
# Convert dtypes
s_converted = s.convert_dtypes()
print("Converted Series:")
print(s_converted)
print("\nData type:", s_converted.dtype)

Converted Series:
0        1
1        2
2     None
3        a
4        b
5     True
6    False
dtype: object

Data type: object


In [13]:
# Create a DataFrame with mixed types
df = pd.DataFrame({
    'A': [1, 2, None],
    'B': [1.0, 2.0, None],
    'C': ['a', 'b', None],
    'D': [True, False, None]
})
print("Original DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)

Original DataFrame:
     A    B     C      D
0  1.0  1.0     a   True
1  2.0  2.0     b  False
2  NaN  NaN  None   None

Data types:
A    float64
B    float64
C     object
D     object
dtype: object


In [14]:
# Convert dtypes
df_converted = df.convert_dtypes()
print("Converted DataFrame:")
print(df_converted)
print("\nData types:")
print(df_converted.dtypes)

Converted DataFrame:
      A     B     C      D
0     1     1     a   True
1     2     2     b  False
2  <NA>  <NA>  <NA>   <NA>

Data types:
A             Int64
B             Int64
C    string[python]
D           boolean
dtype: object


In [15]:
# Control conversion parameters
df_custom = df.convert_dtypes(convert_integer=False, convert_boolean=False)
print("Custom converted DataFrame:")
print(df_custom)
print("\nData types:")
print(df_custom.dtypes)

Custom converted DataFrame:
      A     B     C      D
0   1.0   1.0     a   True
1   2.0   2.0     b  False
2  <NA>  <NA>  <NA>   None

Data types:
A           Float64
B           Float64
C    string[python]
D            object
dtype: object


##### Dropping Elements

Pandas provides methods to drop elements from a Series.

### The `drop()` Method

The `drop()` method removes specified labels from a Series.

In [16]:
# Create a Series
s = pd.Series(data=np.arange(3), index=['A', 'B', 'C'])
print("Original Series:")
print(s)

Original Series:
A    0
B    1
C    2
dtype: int64


In [17]:
# Drop labels
s_dropped = s.drop(labels=['B', 'C'])
print("Series after dropping labels 'B' and 'C':")
print(s_dropped)

Series after dropping labels 'B' and 'C':
A    0
dtype: int64


In [18]:
# Drop in-place
s_inplace = s.copy()
s_inplace.drop(labels=['B'], inplace=True)
print("Series after in-place dropping label 'B':")
print(s_inplace)

Series after in-place dropping label 'B':
A    0
C    2
dtype: int64


In [19]:
# Create a Series with MultiIndex
midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
                             ['speed', 'weight', 'length']],
                     codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
                            [0, 1, 2, 0, 1, 2, 0, 1, 2]])
s_multi = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
                    index=midx)
print("Series with MultiIndex:")
print(s_multi)

Series with MultiIndex:
lama    speed      45.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64


In [20]:
# Drop by level
s_dropped_level = s_multi.drop(labels='weight', level=1)
print("Series after dropping 'weight' at level 1:")
print(s_dropped_level)

Series after dropping 'weight' at level 1:
lama    speed      45.0
        length      1.2
cow     speed      30.0
        length      1.5
falcon  speed     320.0
        length      0.3
dtype: float64


In [21]:
# Drop multiple labels at a specific level
s_dropped_multi = s_multi.drop(labels=['falcon', 'cow'], level=0)
print("Series after dropping 'falcon' and 'cow' at level 0:")
print(s_dropped_multi)

Series after dropping 'falcon' and 'cow' at level 0:
lama  speed      45.0
      weight    200.0
      length      1.2
dtype: float64


### The `drop_duplicates()` Method

The `drop_duplicates()` method removes duplicate values from a Series.

In [22]:
# Create a Series with duplicates
s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
              name='animal')
print("Series with duplicates:")
print(s)

Series with duplicates:
0      lama
1       cow
2      lama
3    beetle
4      lama
5     hippo
Name: animal, dtype: object


In [23]:
# Drop duplicates (keep first occurrence)
s_no_dups = s.drop_duplicates()
print("Series with duplicates dropped (keep='first'):")
print(s_no_dups)

Series with duplicates dropped (keep='first'):
0      lama
1       cow
3    beetle
5     hippo
Name: animal, dtype: object


In [24]:
# Drop duplicates (keep last occurrence)
s_keep_last = s.drop_duplicates(keep='last')
print("Series with duplicates dropped (keep='last'):")
print(s_keep_last)

Series with duplicates dropped (keep='last'):
1       cow
3    beetle
4      lama
5     hippo
Name: animal, dtype: object


In [25]:
# Drop all duplicates
s_drop_all = s.drop_duplicates(keep=False)
print("Series with all duplicates dropped (keep=False):")
print(s_drop_all)

Series with all duplicates dropped (keep=False):
1       cow
3    beetle
5     hippo
Name: animal, dtype: object


In [26]:
# Drop duplicates in-place
s_inplace = s.copy()
s_inplace.drop_duplicates(inplace=True)
print("Series after in-place dropping duplicates:")
print(s_inplace)

Series after in-place dropping duplicates:
0      lama
1       cow
3    beetle
5     hippo
Name: animal, dtype: object


##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. Combining Series with `combine()` and `combine_first()`, which allow for flexible ways to merge data from different Series.
2. Converting data types with `convert_dtypes()`, which intelligently converts columns to the best possible dtypes supporting pandas NA values.
3. Dropping elements with `drop()` and `drop_duplicates()`, which provide ways to remove specific labels or duplicate values from a Series.

These methods are essential tools for data manipulation and cleaning in pandas, allowing for flexible and powerful operations on your data.