## Series in Pandas

- A Pandas Series is a one-dimensional labeled array that can hold any data type, such as integers, floats, strings, or even Python objects. It is similar to a column in an Excel spreadsheet or a dictionary where each value has an associated label (index).


In [1]:
import pandas as pd

# Creating a Pandas Series

# 1)  From a List

In [2]:
series1 = pd.Series([1, 2, 3, 4, 5])  # Creating from a list
print(series1)


0    1
1    2
2    3
3    4
4    5
dtype: int64


# 2) Creating Pandas Series from a List with a Custom Index

In [3]:
# Creating a Series with custom index
series_custom_index = pd.Series([100, 200, 300], index=['A', 'B', 'C'])

# Display the Series
print(series_custom_index)


A    100
B    200
C    300
dtype: int64


# 3) Creating Pandas Series from a NumPy Array

In [4]:
import numpy as np  

# Creating a NumPy array
np_array = np.array([1, 2, 3, 4, 5])

# Creating a Series from the NumPy array
series_numpy = pd.Series(np_array)

# Display the Series
print(series_numpy)


0    1
1    2
2    3
3    4
4    5
dtype: int32


# 4) Creating an Empty Pandas Series

In [5]:
# Creating an empty Series
empty_series = pd.Series(dtype='float64')

# Display the empty Series
print(empty_series)


Series([], dtype: float64)


# 5) Creating a Pandas Series with a Constant Value

In [6]:
# Creating a Series where all values are 5
constant_series = pd.Series(5, index=['A', 'B', 'C', 'D'])

# Display the Series
print(constant_series)


A    5
B    5
C    5
D    5
dtype: int64


# 6) Creating a Series Using range()


In [7]:
# Creating a Series from range()
series_range = pd.Series(range(1, 6))

# Display the Series
print(series_range)


0    1
1    2
2    3
3    4
4    5
dtype: int64


# Features of Pandas Series

# 1) Homogeneous Data Type

- Although a Series can hold different data types, it is optimized for homogeneous data types, similar to a NumPy Array.

In [8]:
import pandas as pd

# Creating a Series with integers
series_int = pd.Series([10, 20, 30, 40])

# Creating a Series with floats
series_float = pd.Series([10.5, 20.7, 30.9])

# Creating a Series with strings
series_str = pd.Series(["Apple", "Banana", "Cherry"])

# Display the Series
print(series_int)
print(series_float)
print(series_str)


0    10
1    20
2    30
3    40
dtype: int64
0    10.5
1    20.7
2    30.9
dtype: float64
0     Apple
1    Banana
2    Cherry
dtype: object


# 2) Labeled Indexing (Custom Index Support)

In [9]:
# Creating a Series with a custom index
series_custom_index = pd.Series([100, 200, 300], index=['A', 'B', 'C'])

# Accessing data using index labels
print(series_custom_index['B'])  # Output: 200


200


# 3) Supports Mixed Data Types

- Unlike NumPy Arrays, Pandas Series can hold mixed data types, although this results in an object (dtype) Series.

In [10]:
# Creating a Series with mixed data types
series_mixed = pd.Series([10, "Hello", 3.14, True])

# Display the Series
print(series_mixed)


0       10
1    Hello
2     3.14
3     True
dtype: object


# 4) Supports Missing Data Handling (NaN Values)

In [11]:
import numpy as np

# Creating a Series with NaN values
series_nan = pd.Series([10, np.nan, 30, None, 50])

# Checking for missing values
print(series_nan.isnull())  # Returns True for NaN values


0    False
1     True
2    False
3     True
4    False
dtype: bool


# 5) Supports Fast Vectorized Operations (Similar to NumPy)

- Pandas Series allows vectorized operations, meaning mathematical operations apply element-wise.

In [12]:
# Creating a numeric Series
series_numbers = pd.Series([10, 20, 30, 40])

# Performing vectorized operations
print(series_numbers * 2)  #Multiplication
print(series_numbers + 5)  #Addition


0    20
1    40
2    60
3    80
dtype: int64
0    15
1    25
2    35
3    45
dtype: int64


# 6) Supports Filtering and Conditional Selection

- We can filter values based on conditions.

In [13]:
# Creating a numeric Series
series_numbers = pd.Series([10, 20, 30, 40, 50])

# Filtering values greater than 25
filtered_series = series_numbers[series_numbers > 25]

# Display filtered values
print(filtered_series)


2    30
3    40
4    50
dtype: int64


# 7) Supports Sorting and Ranking

- Pandas Series provides built-in sorting and ranking functions.



In [14]:
# Creating a numeric Series
series_unsorted = pd.Series([30, 10, 50, 20])

# Sorting values
sorted_series = series_unsorted.sort_values()

# Ranking elements
ranked_series = series_unsorted.rank()

print(sorted_series)
print(ranked_series)


1    10
3    20
0    30
2    50
dtype: int64
0    3.0
1    1.0
2    4.0
3    2.0
dtype: float64


# Accessing Elements in a Series

# Accessing by Index

In [15]:
series_1 = pd.Series([1,2,3,4,5])
print(series_1[2])   # Access element at index 2


3


# Accessing by Custom Index

In [16]:
series4 = pd.Series([100, 200, 300], index=['A', 'B', 'C'])
print(series4['B'])  # Access value at custom index 'B' (Output: 200)


200


# Slicing in Pandas Series

- Slicing is a technique used to extract a subset of data from a Pandas Series based on index positions or labels.

- It allows us to retrieve a range of elements from the Series, similar to Python list slicing and NumPy Array slicing.

- Extracts specific portions of data for analysis.

- Helps in filtering data efficiently.

- Works on both numerical and labeled indices.

- Supports advanced indexing with step values (start:stop:step).


# 1) Position-Based Slicing Using .iloc[]

In [17]:
import pandas as pd

# Creating a Pandas Series
data = pd.Series([10, 20, 30, 40, 50, 60, 70])

# Slicing from index position 1 to 4 (excluding index 4)
subset = data.iloc[1:4]

# Display the sliced data
print(subset)


1    20
2    30
3    40
dtype: int64


# 2) Label-Based Slicing Using .loc[]

- This method selects data based on custom index labels.

- series.loc[start_label:end_label]

- start_label: Starting index label (inclusive).

- end_label: Ending index label (inclusive).


In [18]:
# Creating a Pandas Series with custom index labels
data = pd.Series([100, 200, 300, 400, 500], index=['A', 'B', 'C', 'D', 'E'])

# Slicing from label 'B' to 'D' (both inclusive)
subset = data.loc['B':'D']

# Display the sliced data
print(subset)


B    200
C    300
D    400
dtype: int64


# 3) Slicing with Step

- series[start:stop:step]

- Both .iloc[] and .loc[] support step values.

In [19]:
# Creating a Pandas Series
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90])

# Slicing every second element from index 1 to 7
subset = data.iloc[1:8:2]

# Display the sliced data
print(subset)


1    20
3    40
5    60
7    80
dtype: int64


# 4) Boolean Masking (Conditional Slicing)

- We can slice data based on conditions.

# Filtering Values Greater than 30

In [20]:
# Creating a Pandas Series
data = pd.Series([10, 20, 30, 40, 50, 60])

# Selecting values greater than 30
subset = data[data > 30]

# Display the filtered data
print(subset)


3    40
4    50
5    60
dtype: int64


# Key Features:

- Extracts data conditionally.

- Works without explicit slicing syntax.

# 5) Reverse Slicing (Negative Indexing)

- We can use negative indexing to slice from the end.

In [21]:
# Selecting last three elements
print(data.iloc[-3:])

# Reversing the Series
print(data.iloc[::-1])


3    40
4    50
5    60
dtype: int64
5    60
4    50
3    40
2    30
1    20
0    10
dtype: int64


# 6) Slicing with [start:stop:step]

- series[start:stop:step]

- start → The index where slicing begins (inclusive).

- stop → The index where slicing ends (exclusive).

- step → The interval or step size between elements.

In [22]:
import pandas as pd

# Creating a Pandas Series
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

# Slicing from index 2 to 8 with step 2
subset = data[2:8:2]

# Display the sliced Series
print(subset)


2    30
4    50
6    70
dtype: int64


# 7) Reverse Slicing using Negative step.

- We can use negative step values to slice in reverse order.

# 1) Reverse the Entire Series.

In [23]:
# Reversing the entire Series
subset = data[::-1]

print(subset)


9    100
8     90
7     80
6     70
5     60
4     50
3     40
2     30
1     20
0     10
dtype: int64


# 2) Reverse Subset of Series

In [24]:
# Reverse slicing from index 8 to 2
subset = data[8:2:-2]

print(subset)


8    90
6    70
4    50
dtype: int64


# 3) Extracting the Last n Elements

In [25]:
# Selecting last 4 elements
subset = data[-4:]

print(subset)


6     70
7     80
8     90
9    100
dtype: int64


# 1) series.loc[] - Access by Label (Index)

- Useful for:

- Label-based indexing when working with named indexes.


In [26]:
series = pd.Series([10, 20, 30], index=['A', 'B', 'C'])

# Accessing element by index label
print(series.loc['B'])  # Output: 20


20


# 2) series.iloc[] - Access by Position

- Useful for:

- Position-based indexing, similar to NumPy arrays.

In [27]:
# Accessing element by position
print(series.iloc[1])  # Output: 20


20


# 3) series.head(n) - First n Elements

In [28]:
print(series.head(2))  # First 2 elements of the Series.


A    10
B    20
dtype: int64


# 4) series.tail(n) - Last n Elements.

- Checking data at the end of a Series.



In [29]:
print(series.tail(2))  # Last 2 elements of the Series.


B    20
C    30
dtype: int64


# Statistical and Summary Methods

- Methods for statistical analysis and data insights.

# 1) series.describe() - Summary Statistics of the Series

In [30]:
series = pd.Series([10, 20, 30, 40, 50])

# Summary statistics of numerical Series
print(series.describe())


count     5.000000
mean     30.000000
std      15.811388
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
dtype: float64


# 2) series.mean() - Mean (Average) of the Series.

- Finding the central tendency of data.


In [31]:
print(series.mean())


30.0


# 3) series.median() - Median of the Series.

- Skewed datasets where the mean is misleading.


In [32]:
print(series.median())


30.0


# 4) series.mode() - Most Frequent Value of the Series.

- Categorical data analysis for the Series.

In [33]:
series_mode = pd.Series([10, 20, 20, 30, 40])
print(series_mode.mode())


0    20
dtype: int64


# 5) series.min() & series.max() - Minimum and Maximum element of the Series.

- Identifying Extreme Values.


In [34]:
print(series.min())  #For the Minimum element of the Series.
print(series.max())  #For the Maximum element of the Series.


10
50


# 6) series.sum() - Sum of Elements of the Series

In [35]:
print(series.sum())  #Sum of the Series Elements.


150


# 7) series.count() - Number of Non-NaN Elements.

In [36]:
series_with_nan = pd.Series([10, 20, None, 30])
print(series_with_nan.count())  #Count the no. of Non - NAN  Elements of the Series. 


3


# 8) series.value_counts() - Frequency Count of the Series Elements

In [37]:
series_category = pd.Series(['A', 'B', 'A', 'C', 'B', 'B'])
print(series_category.value_counts())


B    3
A    2
C    1
Name: count, dtype: int64


# Data Transformation Methods

# 1) series.apply() - Apply Function to Each Element in the Series

In [38]:
print(series.apply(lambda x: x * 2))


0     20
1     40
2     60
3     80
4    100
dtype: int64


# 2) series.map() - Element-wise Mapping

In [39]:
mapping_dict = {10: 'Low', 20: 'Medium', 30: 'High'}
print(series.map(mapping_dict))


0       Low
1    Medium
2      High
3       NaN
4       NaN
dtype: object


# 3) series.replace() - Replace Values in a Series

- Replacing outliers or incorrect values.


In [40]:
print(series.replace({10: 100, 20: 200}))


0    100
1    200
2     30
3     40
4     50
dtype: int64


# Handling Missing Data in Series

# 1) series.isnull() - Check Missing Values in a Series.



In [41]:
print(series_with_nan.isnull())  # Returns True for NaN values


0    False
1    False
2     True
3    False
dtype: bool


# 2) series.map() - Element-wise Mapping in a Series.

- Replacing values based on mapping.


In [42]:
mapping_dict = {10: 'Low', 20: 'Medium', 30: 'High'}
print(series.map(mapping_dict))


0       Low
1    Medium
2      High
3       NaN
4       NaN
dtype: object


# 3) series.replace() - Replace Values in a Series.

- Replacing outliers or incorrect values.



In [43]:
print(series.replace({10: 100, 20: 200}))


0    100
1    200
2     30
3     40
4     50
dtype: int64


# Handling Missing Data

- Methods to detect and handle NaN values.

# 1) series.isnull() - Check Missing Values in a Series

In [44]:
print(series_with_nan.isnull())  # Returns True for NaN values


0    False
1    False
2     True
3    False
dtype: bool


# 2) series.fillna(value) - Fill NaN with a Value

In [45]:
print(series_with_nan.fillna(0))  # Replace NaN with 0


0    10.0
1    20.0
2     0.0
3    30.0
dtype: float64


# 3) series.dropna() - Remove NaN Values

In [46]:
print(series_with_nan.dropna())  # Removes NaN values


0    10.0
1    20.0
3    30.0
dtype: float64


# Sorting and Ranking

# 1)  series.sort_values() - Sort Values in a Series

In [47]:
print(series.sort_values(ascending=False))  # Sort in descending order


4    50
3    40
2    30
1    20
0    10
dtype: int64


# 2) series.rank() - Rank Elements in a Series

In [48]:
print(series.rank())  # Assigns rank to each element


0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64


# String Methods (For Text Data)

# 1) series.str.upper() - Convert to Uppercase

In [49]:
series_str = pd.Series(["hello", "world"])
print(series_str.str.upper())


0    HELLO
1    WORLD
dtype: object


# 2) series.str.contains() - Check Substring

In [50]:
print(series_str.str.contains("o"))


0    True
1    True
dtype: bool


In [51]:
##DSA in Python 