### What is Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

### What is a Pandas Series?
A Pandas Series is a one-dimensional labeled array that can hold any data type (integers, floats, strings, Python objects, etc.). It is similar to a column in an Excel sheet or a list in Python, but with an associated index for each value.

In [1]:
# import pandas
import pandas as pd

# import numpy
import numpy as np

# Creating a Pandas Series

### Creating a Pandas Series from Python Lists

In [2]:
# Creating a Pandas Series with Integer Data

num = [1, 2, 3, 4]
pd.Series(num)

0    1
1    2
2    3
3    4
dtype: int64

In [3]:
# Creating a series with String Data

country = ["India", "Pakistan", "USA", "Nepal", "Sri Lanka"]
pd.Series(country)

0        India
1     Pakistan
2          USA
3        Nepal
4    Sri Lanka
dtype: object

In [4]:
# Creating a Series with Custom Index

num = [1, 2, 3, 4]
num_series = pd.Series(num, index=["a", "b", "c", "d"])

marks = [90, 100, 85]
subjects = ["Maths", "Science", "English"]
marks_series = pd.Series(marks, index=subjects)

print(num_series)
print("--------------")
print(marks_series)

a    1
b    2
c    3
d    4
dtype: int64
--------------
Maths       90
Science    100
English     85
dtype: int64


In [5]:
# Setting a name for the Series

marks = pd.Series(marks, index=subjects, name="Student Marks")
marks

Maths       90
Science    100
English     85
Name: Student Marks, dtype: int64

### Creating a Pandas Series from a Dictionary
Dictionary keys become the index of the Series, and values are stored as data.

In [6]:
marks = {"Maths": 90, "Science": 100, "English": 85}
pd.Series(marks)

Maths       90
Science    100
English     85
dtype: int64

In [7]:
teams = {"barelona": 95, "madrid": 92, "man city": 90}
pd.Series(teams)

barelona    95
madrid      92
man city    90
dtype: int64

### Creating a Pandas Series from a NumPy Array

In [8]:
array = np.array([1.5, 2.5, 3.5])

series_array = pd.Series(array)
series_array

0    1.5
1    2.5
2    3.5
dtype: float64

### Creating a Series from a Scalar Value

In [9]:
# If we provide a single scalar value, Pandas creates a Series by repeating it for the given index.

series_5 = pd.Series(5, index=["a", "b", "c", 2, 6, 88])
series_5

a     5
b     5
c     5
2     5
6     5
88    5
dtype: int64

# Series Attibutes

In [10]:
# Let us consider this series for the following examples.
marks = {"maths": 67, "english": 57, "science": 89, "hindi": 100}
marks_series = pd.Series(marks, name="Student Marks")
marks_series

maths       67
english     57
science     89
hindi      100
Name: Student Marks, dtype: int64

In [11]:
# s.index	-> Returns index labels
print("Index Labels : ")
print(marks_series.index)
print("----------------")

# s.values ->	Returns Series values
print("Series Values : ")
print(marks_series.values)
print("----------------")

# s.name	-> Returns Series name
print("Series Name : ")
print(marks_series.name)
print("----------------")

# s.dtype	-> Returns data type of the series
print("Data Type :")
print(marks_series.dtype)
print("----------------")

# s.size ->	Returns number of elements of the series including all null values
print("Number of elements :", marks_series.size)
print("----------------")

# s.isunique -> Returns True if all the elements of the series are unique, else False
print("Is Unique : ", marks_series.is_unique)

Index Labels : 
Index(['maths', 'english', 'science', 'hindi'], dtype='object')
----------------
Series Values : 
[ 67  57  89 100]
----------------
Series Name : 
Student Marks
----------------
Data Type :
int64
----------------
Number of elements : 4
----------------
Is Unique :  True


## Creating Series from csv files

In [13]:
# We will learn these methods in the future lessons

# Step 1 : Load the csv file in a dataframe using the pd.read_csv function
df = pd.read_csv("covid_toy.csv")
print(df)
print("------------------------------------------------------------")

# Step 2 : Extract the required columns into a Series
subs_series = df.iloc[:, 0]  # This selects the first column as the series

print(subs_series)

    age  gender  fever   cough       city has_covid
0    60    Male  103.0    Mild    Kolkata        No
1    27    Male  100.0    Mild      Delhi       Yes
2    42    Male  101.0    Mild      Delhi        No
3    31  Female   98.0    Mild    Kolkata        No
4    65  Female  101.0    Mild     Mumbai        No
..  ...     ...    ...     ...        ...       ...
95   12  Female  104.0    Mild  Bangalore        No
96   51  Female  101.0  Strong    Kolkata       Yes
97   20  Female  101.0    Mild  Bangalore        No
98    5  Female   98.0  Strong     Mumbai        No
99   10  Female   98.0  Strong    Kolkata       Yes

[100 rows x 6 columns]
------------------------------------------------------------
0     60
1     27
2     42
3     31
4     65
      ..
95    12
96    51
97    20
98     5
99    10
Name: age, Length: 100, dtype: int64


In [21]:
# Step 1 : Load the csv file in a dataframe using the pd.read_csv function
df = pd.read_csv("subs.csv")
print(df)
print("------------------------------------------------------------")

# Step 2 : Extract the required columns into a Series
subs_series = df.iloc[:, 0]  # This selects the first column as the series

print(subs_series)

     Subscribers gained
0                    48
1                    57
2                    40
3                    43
4                    44
..                  ...
360                 231
361                 226
362                 155
363                 144
364                 172

[365 rows x 1 columns]
------------------------------------------------------------
0       48
1       57
2       40
3       43
4       44
      ... 
360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, Length: 365, dtype: int64


In [14]:
# Lets load another dataset
kohli_dataframe = pd.read_csv("kohli_ipl.csv")
kohli_dataframe

Unnamed: 0,match_no,runs
0,1,1
1,2,23
2,3,13
3,4,12
4,5,1
...,...,...
210,211,0
211,212,20
212,213,73
213,214,25


In [15]:
kohli_df_indices = kohli_dataframe.iloc[:, 0].tolist()
kohli_df_runs = kohli_dataframe.iloc[:, 1].tolist()

print(kohli_df_indices)
print("----------")
print(kohli_df_runs)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215]
----------
[1, 23, 13, 12, 1, 9,

In [16]:
kohli_runs_series = pd.Series(kohli_df_runs, index=kohli_df_indices)
kohli_runs_series

1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Length: 215, dtype: int64

In [17]:
# Similarly let's create another Pandas Series using bollywood.csv file
movies = pd.read_csv("bollywood.csv")
movies

Unnamed: 0,movie,lead
0,Uri: The Surgical Strike,Vicky Kaushal
1,Battalion 609,Vicky Ahuja
2,The Accidental Prime Minister (film),Anupam Kher
3,Why Cheat India,Emraan Hashmi
4,Evening Shadows,Mona Ambegaonkar
...,...,...
1495,Hum Tumhare Hain Sanam,Shah Rukh Khan
1496,Aankhen (2002 film),Amitabh Bachchan
1497,Saathiya (film),Vivek Oberoi
1498,Company (film),Ajay Devgn


In [20]:
movies_indices = movies.iloc[:,0].tolist()
movies_values = movies.iloc[:,1].tolist()

movies_series = pd.Series(movies_values, index = movies_indices)
movies_series

Uri: The Surgical Strike                   Vicky Kaushal
Battalion 609                                Vicky Ahuja
The Accidental Prime Minister (film)         Anupam Kher
Why Cheat India                            Emraan Hashmi
Evening Shadows                         Mona Ambegaonkar
                                              ...       
Hum Tumhare Hain Sanam                    Shah Rukh Khan
Aankhen (2002 film)                     Amitabh Bachchan
Saathiya (film)                             Vivek Oberoi
Company (film)                                Ajay Devgn
Awara Paagal Deewana                        Akshay Kumar
Length: 1500, dtype: object

# Series Methods

In [22]:
print(subs_series)

0       48
1       57
2       40
3       43
4       44
      ... 
360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, Length: 365, dtype: int64


In [23]:
subs_series.head()  # Returns the first 5 rows of the series
subs_series.head(10)  # Returns the first 10 rows of the series

0    48
1    57
2    40
3    43
4    44
5    46
6    33
7    40
8    44
9    74
Name: Subscribers gained, dtype: int64

In [24]:
subs_series.head()  # Returns the first 5 rows of the series
subs_series.head(10)  # Returns the first 10 rows of the series

0    48
1    57
2    40
3    43
4    44
5    46
6    33
7    40
8    44
9    74
Name: Subscribers gained, dtype: int64

In [25]:
subs_series.tail()  # Return the last five rows of the series

360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, dtype: int64

In [26]:
subs_series.tail(3)  # Return the last 3 rows of the series

362    155
363    144
364    172
Name: Subscribers gained, dtype: int64

In [27]:
subs_series.sample(5)  # Return random 5 rows of the series

173    114
59      76
105     90
185    105
179    105
Name: Subscribers gained, dtype: int64

## Statistical Methods

In [28]:
numpy_array = np.random.randint(1, 100, 10)
runs = pd.Series(numpy_array)
runs

0    72
1    33
2    78
3    59
4    63
5    83
6     4
7    75
8    81
9    55
dtype: int64

In [29]:
print("Sum of all the runs", runs.sum())
print("Mean of all the runs", runs.mean())
print("Median of all the runs", runs.median())
print("Minimum of all the runs", runs.min())
print("Maximum of all the runs", runs.max())
print("Variance of all the runs", runs.var())
print("Standard Deviation of all the runs", runs.std())

Sum of all the runs 603
Mean of all the runs 60.3
Median of all the runs 67.5
Minimum of all the runs 4
Maximum of all the runs 83
Variance of all the runs 618.0111111111112
Standard Deviation of all the runs 24.859829265526166


In [30]:
# We can get all these information of the series using the describe method
runs.describe()

count    10.000000
mean     60.300000
std      24.859829
min       4.000000
25%      56.000000
50%      67.500000
75%      77.250000
max      83.000000
dtype: float64

In [31]:
kohli_runs_series

1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Length: 215, dtype: int64

## Sorting and Ranking

In [32]:
# Sort all the values in ascending order
runs.sort_values()

6     4
1    33
9    55
3    59
4    63
0    72
7    75
2    78
8    81
5    83
dtype: int64

In [33]:
# Sort all the values in descending order
runs.sort_values(ascending=False)

5    83
8    81
2    78
7    75
0    72
4    63
3    59
9    55
1    33
6     4
dtype: int64

In [34]:
# Sort all the values by their indices
runs.sort_index()

0    72
1    33
2    78
3    59
4    63
5    83
6     4
7    75
8    81
9    55
dtype: int64

In [35]:
# s.rank() -> Returns the rank of each value
runs.rank()

0     6.0
1     2.0
2     8.0
3     4.0
4     5.0
5    10.0
6     1.0
7     7.0
8     9.0
9     3.0
dtype: float64

## Modifying Data

In [36]:
runs

0    72
1    33
2    78
3    59
4    63
5    83
6     4
7    75
8    81
9    55
dtype: int64

In [37]:
# Replace 77 with 25
runs.replace(77, 25)

0    72
1    33
2    78
3    59
4    63
5    83
6     4
7    75
8    81
9    55
dtype: int64

In [38]:
# s.add(value) - Adds value to all elements
runs.add(5)
# It will not change the original Series unless the parameter inplace = True is given

0    77
1    38
2    83
3    64
4    68
5    88
6     9
7    80
8    86
9    60
dtype: int64

In [39]:
# s.subtract(value) - Subtracts value from all elements
runs.subtract(5)

0    67
1    28
2    73
3    54
4    58
5    78
6    -1
7    70
8    76
9    50
dtype: int64

In [40]:
# s.multiply(value) - Multiplies all elements by value
runs.multiply(2)

# s.divide(value) - Divides all elements by value
runs.divide(2)

0    36.0
1    16.5
2    39.0
3    29.5
4    31.5
5    41.5
6     2.0
7    37.5
8    40.5
9    27.5
dtype: float64

In [41]:
runs.replace(77, 25, inplace=True)

## String Operations

In [43]:
a = ["Bikash", "Sujata", "Nanu"]
s = pd.Series(a)
s

0    Bikash
1    Sujata
2      Nanu
dtype: object

In [44]:
s.str.upper()  # Converts all strings to uppercase

0    BIKASH
1    SUJATA
2      NANU
dtype: object

In [45]:
s.str.lower()  # Converts all strings to lowercase

0    bikash
1    sujata
2      nanu
dtype: object

In [46]:
s.str.contains("Bika") # Checks if strings contain 'text'

0     True
1    False
2    False
dtype: bool

## Using lambda functions

In [48]:
a = np.random.randint(1, 100, 10)
s = pd.Series(a)
print(a)
print(s)

[27 92 80 71 36 98 76 65 80 87]
0    27
1    92
2    80
3    71
4    36
5    98
6    76
7    65
8    80
9    87
dtype: int64


In [49]:
def double(x):
    return x * 2


s = s.apply(double)
print(s)

0     54
1    184
2    160
3    142
4     72
5    196
6    152
7    130
8    160
9    174
dtype: int64


In [50]:
# Half all the elements of the series using a lambda function

s = s.apply(lambda x: x // 2)
print(s)

0    27
1    92
2    80
3    71
4    36
5    98
6    76
7    65
8    80
9    87
dtype: int64


## Value Counts

In [51]:
# Value Counts -> s.value_counts() : It returns how many times the value is present in the series.

a = pd.Series([1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6])
a.value_counts()

1    3
4    3
2    2
3    2
5    2
6    1
Name: count, dtype: int64

## Series Indexing

In [52]:
# Accessing values with default index
x = pd.Series([12, 13, 14, 35, 46, 57, 58, 79, 9])
x

0    12
1    13
2    14
3    35
4    46
5    57
6    58
7    79
8     9
dtype: int64

In [53]:
print(x[0])  # Gets the first elememt
print(x[2])  # Gets the third element
# Negative indexing is not allowed

12
14


In [54]:
x.iloc[3]

np.int64(35)

In [55]:
marks = {"Maths": 90, "Science": 100, "English": 85}
pd.Series(marks)

Maths       90
Science    100
English     85
dtype: int64

In [56]:
# Accessing values with Custom Index
# Get the Maths and Science Marks

print("Maths :", marks["Maths"])

print("Science: ", marks["Science"])

Maths : 90
Science:  100


## Slicing

In [57]:
a = np.random.randint(1, 100, 10)
s = pd.Series(a)
s

0    94
1    70
2    94
3    16
4    10
5     1
6    91
7    83
8    38
9    45
dtype: int64

In [58]:
s[2:6]  # Gets the elements from index 2 to index 5

2    94
3    16
4    10
5     1
dtype: int64

In [59]:
s[2:10:2]  # Gets the elements from index 2 to index 9 skipping 1 value

2    94
4    10
6    91
8    38
dtype: int64

In [60]:
# Get all the elements
s[:]

0    94
1    70
2    94
3    16
4    10
5     1
6    91
7    83
8    38
9    45
dtype: int64

In [61]:
# Get the last 3 elements
s[-1:-4:-1]

9    45
8    38
7    83
dtype: int64

### Fancy Indexing

In [63]:
# Creating a Series
s = pd.Series([10, 20, 30, 40, 50], index=["a", "b", "c", "d", "e"])
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [64]:
# Selecting multiple elements using their index labels
selected = s[["a", "c", "e"]]

print(selected)

a    10
c    30
e    50
dtype: int64


In [65]:
selected = s.iloc[[0, 2, 4]]  # Selecting 1st, 3rd, and 5th elements

print(selected)

a    10
c    30
e    50
dtype: int64


### Boolean Indexing (Filtering Based on Conditions)

In [66]:
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [67]:
# Example: Selecting elements greater than 25

filtered = s[s > 25]
filtered

c    30
d    40
e    50
dtype: int64

In [68]:
# Example: Selecting even numbers

filtered = s[s % 2 == 0]
filtered

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [69]:
# Selecting values between 20 and 50
filtered = s[(s > 20) & (s < 50)]
filtered

c    30
d    40
dtype: int64

### Editing Series Values

In [70]:
marks = [90, 100, 85]
subjects = ["Maths", "Science", "English"]
marks_series = pd.Series(marks, index=subjects)
marks_series

Maths       90
Science    100
English     85
dtype: int64

In [71]:
marks_series.iloc[0] = 88
marks_series

Maths       88
Science    100
English     85
dtype: int64

In [72]:
marks_series["Maths"] = 99
marks_series

Maths       99
Science    100
English     85
dtype: int64

In [73]:
runs

0    72
1    33
2    78
3    59
4    63
5    83
6     4
7    75
8    81
9    55
dtype: int64

In [74]:
runs[5:8] = [67, 78, 99]
runs

0    72
1    33
2    78
3    59
4    63
5    67
6    78
7    99
8    81
9    55
dtype: int64

In [75]:
runs[[0, 1, 2]] = [20, 20, 20]
runs

0    20
1    20
2    20
3    59
4    63
5    67
6    78
7    99
8    81
9    55
dtype: int64

## Difference Between View and Copy in Pandas Series

### 1. What is a View in Pandas Series?
A view in Pandas Series means that the new object refers to the same memory as the original Series. This means that changes made to the new Series will affect the original Series.

### 2. What is a Copy in Pandas Series?
A copy creates a separate object in memory. Changes made to the copy do not affect the original Series.

In [76]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40, 50])

# Creating a view using slicing
view_s = s[:3]  # This may return a view

# Modifying the view
view_s[0] = 100  # It will modify the original Series

print(s)  # 10 will be replaced by 100

0    100
1     20
2     30
3     40
4     50
dtype: int64


In [77]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40, 50])

# Creating a copy explicitly
copy_s = s.copy()

# Modifying the copy
copy_s[0] = 500  # This will NOT modify original s

print("Original series :\n", s)  # Original remains unchanged
print("New Series :\n", copy_s)  # Only copy is modified

Original series :
 0    10
1    20
2    30
3    40
4    50
dtype: int64
New Series :
 0    500
1     20
2     30
3     40
4     50
dtype: int64


### Type Conversion of Pandas Series to List, Dictionary, and NumPy Array

## 1. Convert Series to List (.tolist())

In [78]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40])

# Convert to list
s_list = s.tolist()

print(s_list)
print(type(s_list))

[10, 20, 30, 40]
<class 'list'>


## 2. Convert Series to Dictionary (.to_dict())

In [79]:
s_dict = s.to_dict()
print(s_dict)
print(type(s_dict))

{0: 10, 1: 20, 2: 30, 3: 40}
<class 'dict'>


In [80]:
s = pd.Series([100, 200, 300], index=["a", "b", "c"])
s

a    100
b    200
c    300
dtype: int64

In [81]:
s2_dict = s.to_dict()
print(s2_dict)

{'a': 100, 'b': 200, 'c': 300}


## 3. Convert Series to NumPy Array (.to_numpy())

In [82]:
s_array = s.to_numpy()
print(s_array)
print(type(s_array))

[100 200 300]
<class 'numpy.ndarray'>


## Membership Operator ( in )

In [83]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40])

# Checking if a value exists in the Series
print(20 in s.values)  # Output: True
print(50 in s.values)  # Output: False

True
False


# Looping in Series

In [84]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40])

In [85]:
# Method 1 :
for value in s:
    print(value)

10
20
30
40


In [86]:
# Method 2 :
for index, value in s.items():
    print(f"Index: {index}, Value: {value}")

Index: 0, Value: 10
Index: 1, Value: 20
Index: 2, Value: 30
Index: 3, Value: 40


In [87]:
# Method 3 :
s_squared = s.apply(lambda x: x**2)
print(s_squared)

0     100
1     400
2     900
3    1600
dtype: int64


## Broadcasting in Series

In [88]:
# Broadcasting with arithmetic operations

# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40])


s_add = s + 5  # Adding 5 to each element
s_sub = s - 2  # Subtract 2 from all elements
s_mul = s * 3  # Multiply all elements by 3
s_div = s / 2  # Divide all elements by 2
s_pow = s**2  # Square all elements

print(s_add, s_sub, s_mul, s_div, s_pow, sep="\n")

0    15
1    25
2    35
3    45
dtype: int64
0     8
1    18
2    28
3    38
dtype: int64
0     30
1     60
2     90
3    120
dtype: int64
0     5.0
1    10.0
2    15.0
3    20.0
dtype: float64
0     100
1     400
2     900
3    1600
dtype: int64


In [89]:
# Broadcasting with Another Series (Element-wise Operations)
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([10, 20, 30, 40])

# Element-wise addition
s_sum = s1 + s2

print(s_sum)

0    11
1    22
2    33
3    44
dtype: int64


In [90]:
# Broadcasting with Different Indexes (Automatic Alignment)

s1 = pd.Series([1, 2, 3], index=["a", "b", "c"])
s2 = pd.Series([10, 20, 30], index=["b", "c", "d"])

s_result = s1 + s2  # Element-wise addition with different indexes

print(s_result)

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64


# Some Questions

In [92]:
# Find no of 50's and 100's scored by kohli
runs = kohli_runs_series
runs

1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Length: 215, dtype: int64

In [93]:
runs_50 = runs[runs == 50]
runs_100 = runs[runs == 100]

print("Number of 50's scored by Kohli :", len(runs_50))
print("Number of 100's scored by Kohli :", len(runs_100))

Number of 50's scored by Kohli : 2
Number of 100's scored by Kohli : 2


In [94]:
# find number of ducks of Kohli

runs_0 = runs[runs == 0]

print("Number of Ducks by Kohli", len(runs_0))

Number of Ducks by Kohli 9


In [95]:
# find actors who have done more than 20 movies
num_movies = movies_series.value_counts()
num_movies



Akshay Kumar        48
Amitabh Bachchan    45
Ajay Devgn          38
Salman Khan         31
Sanjay Dutt         26
                    ..
Diganth              1
Parveen Kaur         1
Seema Azmi           1
Akanksha Puri        1
Edwin Fernandes      1
Name: count, Length: 566, dtype: int64

In [96]:
actors_20 = num_movies[num_movies > 20]
actors_20

Akshay Kumar        48
Amitabh Bachchan    45
Ajay Devgn          38
Salman Khan         31
Sanjay Dutt         26
Shah Rukh Khan      22
Emraan Hashmi       21
Name: count, dtype: int64

### SOME IMPORTANT PANDAS SERIES METHODS

In [97]:
# 1. astype() – Change Data Type of Series


# Creating a Pandas Series
s = pd.Series([1, 2, 3, 4])

# Convert to float
s_float = s.astype(float)

print(s_float)
print(s_float.dtypes)  # Output: float64

0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64
float64


In [98]:
# 2. between() – Check If Values Fall Within a Range

s = pd.Series([10, 20, 30, 40, 50])

# Check if values are between 15 and 45
result = s.between(15, 45)

print(result)

0    False
1     True
2     True
3     True
4    False
dtype: bool


In [99]:
# 3. clip() – Restrict Values to a Given Range

s = pd.Series([5, 15, 25, 35, 45])

# Clip values below 10 and above 30
s_clipped = s.clip(lower=10, upper=30)

print(s_clipped)

0    10
1    15
2    25
3    30
4    30
dtype: int64


In [100]:
# 4. drop_duplicates() – Remove Duplicate Values

s = pd.Series([10, 20, 10, 30, 20, 40])

# Remove duplicates
s_unique = s.drop_duplicates()

print(s_unique)

0    10
1    20
3    30
5    40
dtype: int64


In [101]:
# 5. isnull() – Check for Missing Values

s = pd.Series([10, None, 20, None, 30])

# Check for missing values
print(s.isnull())

0    False
1     True
2    False
3     True
4    False
dtype: bool


In [102]:
# 6. dropna() – Remove Missing Values
s_cleaned = s.dropna()

print(s_cleaned)

0    10.0
2    20.0
4    30.0
dtype: float64


In [103]:
# 7. fillna() – Replace Missing Values

s = pd.Series([10, None, 20, None, 30])

s_filled = s.fillna(0)  # Replace NaN with 0

print(s_filled)

0    10.0
1     0.0
2    20.0
3     0.0
4    30.0
dtype: float64


In [104]:
# 8. isin() – Check if Values Exist in a List

s = pd.Series([10, 20, 30, 40, 50])

# Check if values exist in the given list
result = s.isin([20, 40, 60])

print(result)

0    False
1     True
2    False
3     True
4    False
dtype: bool
