# ðŸ“˜ Content from `01-Series.ipynb`


# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a python list (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [None]:
pip install pandas seaborn

In [None]:
import pandas as pd
import numpy as np

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [None]:
labels = ['a','b','c']
my_list = [10,20,30]
d = {'a':10,'b':20,'c':30}
my_list

** Using Lists**

In [None]:
ser1 = pd.Series(data=my_list)
ser1

0    10
1    20
2    30
dtype: int64

In [None]:
type(ser1)

pandas.core.series.Series

In [None]:
pd.Series(data=my_list,index=['a','b','c'])

a    10
b    20
c    30
dtype: int64

In [None]:
pd.Series(my_list,labels)

a    10
b    20
c    30
dtype: int64

** NumPy Arrays **

In [None]:
import numpy as np
arr = np.array(my_list)

pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [None]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

** Dictionary**

In [None]:
d = {'a':10,'b':20,'c':30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [None]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [None]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [None]:
ser1 = pd.Series(data=[1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [None]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [None]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [None]:
ser2

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

In [None]:
ser1['USA']

np.int64(1)

Operations are then also done based off of index:

In [None]:
ser1 + ser2

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

In [None]:
ser1 = pd.Series([1,2,3,4,5],index = ['USA', 'Germany','USSR', 'Japan','India'])                                   

In [None]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
India      5
dtype: int64

In [None]:
print(ser1.India)
print(ser1['India'])

5
5


In [None]:
ser1.

In [None]:
print(ser1.cumsum())


USA         1
Germany     3
USSR        6
Japan      10
India      15
dtype: int64


In [None]:
print(ser1.shape)

(5,)


In [None]:
print(ser1.min())
print(ser1.max())
print(ser1.median())
print(ser1.mode())

1
5
3.0
0    1
1    2
2    3
3    4
4    5
dtype: int64


In [None]:
ser1.dtype

dtype('int64')

In [None]:
ser2 = pd.Series(data = ['USA', 'Germany','USSR', 'Japan','India'],index = [1,2,3,4,5])
ser2

1        USA
2    Germany
3       USSR
4      Japan
5      India
dtype: object

In [None]:
ser2.dtype

dtype('O')

In [None]:
student_records = {'kiran':67,'kumar':89,'sandy':90,'sanjay':78, 'karthick':45}
stud_ser = pd.Series(student_records)

In [None]:
stud_ser

kiran       67
kumar       89
sandy       90
sanjay      78
karthick    45
dtype: int64

In [None]:
stud_ser['kiran']
stud_ser.sandy

np.int64(90)

In [None]:
stud_ser.mean()

np.float64(73.8)

In [None]:
stud_ser.argmax()

np.int64(2)

# ---
# ðŸ“˜ Content from `03-DataFrames.ipynb`
# ---

# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [None]:
import pandas as pd
import numpy as np

In [None]:
my_lst = [[1,2,3,4,5], [6,7,8,9,10]]

df = pd.DataFrame(my_lst, columns=['col1','col2','col3','col4', 'col5'])
df

Unnamed: 0,col1,col2,col3,col4,col5
0,1,2,3,4,5
1,6,7,8,9,10


In [None]:
my_dict = {"col1":[1,2,3,4,5], "col2":[6,7,8,9,10]}

pd.DataFrame(my_dict)

Unnamed: 0,col1,col2
0,1,6
1,2,7
2,3,8
3,4,9
4,5,10


In [None]:
from numpy.random import randn
np.random.seed(101)

In [None]:
'W X Y Z'.split()

['W', 'X', 'Y', 'Z']

In [None]:
randn(5,4)

array([[ 2.70684984,  0.62813271,  0.90796945,  0.50382575],
       [ 0.65111795, -0.31931804, -0.84807698,  0.60596535],
       [-2.01816824,  0.74012206,  0.52881349, -0.58900053],
       [ 0.18869531, -0.75887206, -0.93323722,  0.95505651],
       [ 0.19079432,  1.97875732,  2.60596728,  0.68350889]])

In [None]:
df = pd.DataFrame(randn(5,4),index=['A', 'B', 'C', 'D', 'E'],columns=['W', 'X', 'Y', 'Z'])

In [None]:
subjects = ['Math', 'Physics', 'Chemistry', 'Biology', 'English']
scores = np.random.randint(50, 101, size=(10, len(subjects)))
student_df = pd.DataFrame(data=scores, columns=subjects)

In [None]:
names = ["Aarav", "Vihaan", "Arjun", "Vivaan", "Aditya", "Rohan", "Karan", "Ishaan", "Sai", "Vikram"]
student_df.index = names
student_df

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Aarav,87,72,59,95,52
Vihaan,68,97,78,61,60
Arjun,99,80,85,78,53
Vivaan,69,70,97,64,55
Aditya,55,56,74,89,87
Rohan,96,57,87,54,73
Karan,85,65,84,53,68
Ishaan,63,96,53,96,87
Sai,79,72,71,71,67
Vikram,73,93,80,86,57


In [None]:
student_df

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Aarav,87,72,59,95,52
Vihaan,68,97,78,61,60
Arjun,99,80,85,78,53
Vivaan,69,70,97,64,55
Aditya,55,56,74,89,87
Rohan,96,57,87,54,73
Karan,85,65,84,53,68
Ishaan,63,96,53,96,87
Sai,79,72,71,71,67
Vikram,73,93,80,86,57


In [None]:
df['W']
# df.W

A    0.302665
B   -0.134841
C    0.807706
D   -0.497104
E   -0.116773
Name: W, dtype: float64

## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
df['W','Y']

KeyError: ('W', 'Y')

In [None]:
# Pass a list of column names
df[['W','Z']] 

Unnamed: 0,W,Z
A,0.302665,-1.159119
B,-0.134841,0.184502
C,0.807706,0.329646
D,-0.497104,0.484752
E,-0.116773,1.996652


In [None]:
# SQL Syntax (NOT RECOMMENDED!)
df.W

DataFrame Columns are just Series

In [None]:
type(df['W'])

pandas.core.series.Series

In [None]:
df['W'].dtype

dtype('float64')

In [None]:
df['W'] + df['Y']

A   -1.403420
B    0.032064
C    1.446493
D   -1.440510
E    0.121354
dtype: float64

**Creating a new column:**

In [None]:
df['new'] = df['W'] + df['Y']


In [None]:
df

Unnamed: 0,W,X,Y,Z,new
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342
B,-0.134841,0.390528,0.166905,0.184502,0.032064
C,0.807706,0.07296,0.638787,0.329646,1.446493
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051
E,-0.116773,1.901755,0.238127,1.996652,0.121354


In [None]:
df['sub'] = 'python'
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


**Removing Columns**

In [None]:
# df.drop('new')
df.drop('new', axis=0)

KeyError: "['new'] not found in axis"

In [None]:
df.drop('new', axis=1)  # default looks into rows , axis='columns'

Unnamed: 0,W,X,Y,Z,sub
A,0.302665,1.693723,-1.706086,-1.159119,python
B,-0.134841,0.390528,0.166905,0.184502,python
C,0.807706,0.07296,0.638787,0.329646,python
D,-0.497104,-0.75407,-0.943406,0.484752,python
E,-0.116773,1.901755,0.238127,1.996652,python


In [None]:
# Not inplace unless specified!
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


In [None]:
df.drop('new',axis=1,inplace=True)  # permanently drop it

In [None]:
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


Can also drop rows this way:

In [None]:
df.drop('E' ,axis=0) # inplace=True # 0 == rows 1 == columns

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python


In [None]:
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


In [None]:
df.loc[['B', 'C']][['X','Y']]

Unnamed: 0,X,Y
B,0.390528,0.166905
C,0.07296,0.638787


**Selecting Rows**

In [None]:
df['A']

KeyError: 'A'

In [None]:
df.loc['A']

W      0.302665
X      1.693723
Y     -1.706086
Z     -1.159119
new    -1.40342
sub      python
Name: A, dtype: object

In [None]:
df.loc[['A','B']]

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python


Or select based off of position instead of label 

In [None]:
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


In [None]:
df.iloc[0]  # index

W      0.302665
X      1.693723
Y     -1.706086
Z     -1.159119
new    -1.40342
sub      python
Name: A, dtype: object

** Selecting subset of rows and columns **

In [None]:
df.loc['B','X']

np.float64(0.39052784273374097)

In [None]:
df.loc[['B','C'],['X','Y']]

Unnamed: 0,X,Y
B,0.390528,0.166905
C,0.07296,0.638787


In [None]:
df.iloc[[1,2],[1,2]]

Unnamed: 0,X,Y
B,0.390528,0.166905
C,0.07296,0.638787


### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

Unnamed: 0,W,X,Y,Z,new,sub
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,python
B,-0.134841,0.390528,0.166905,0.184502,0.032064,python
C,0.807706,0.07296,0.638787,0.329646,1.446493,python
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,python
E,-0.116773,1.901755,0.238127,1.996652,0.121354,python


In [None]:
df.drop("sub", axis=1, inplace=True)

In [None]:
df>1

Unnamed: 0,W,X,Y,Z,new
A,False,True,False,False,False
B,False,False,False,False,False
C,False,False,False,False,True
D,False,False,False,False,False
E,False,True,False,True,False


In [None]:
df[df>1]

Unnamed: 0,W,X,Y,Z,new
A,,1.693723,,,
B,,,,,
C,,,,,1.446493
D,,,,,
E,,1.901755,,1.996652,


In [None]:
df['W']>0

A     True
B    False
C     True
D    False
E    False
Name: W, dtype: bool

In [None]:
df[df['W']>0]

Unnamed: 0,W,X,Y,Z,new
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342
C,0.807706,0.07296,0.638787,0.329646,1.446493


In [None]:
df[df['W']>0]['Y']

A   -1.706086
C    0.638787
Name: Y, dtype: float64

In [None]:
student_df[student_df["Math"] > 70]['Chemistry']

Aarav     59
Arjun     85
Rohan     87
Karan     84
Sai       71
Vikram    80
Name: Chemistry, dtype: int32

In [None]:
student_df.head(2)

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Aarav,87,72,59,95,52
Vihaan,68,97,78,61,60


In [None]:
student_df.tail(2)

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Sai,79,72,71,71,67
Vikram,73,93,80,86,57


In [None]:
student_df[student_df["Math"] > 70][['Chemistry', 'English']]

Unnamed: 0,Chemistry,English
Aarav,59,52
Arjun,85,53
Rohan,87,73
Karan,84,68
Sai,71,67
Vikram,80,57


For two conditions you can use | and & with parenthesis:

In [None]:
# df[df['W']>0 & df['Y'] > 1]
student_df[(student_df['Math'] > 70) & (student_df['Physics'] >70)][['Chemistry', 'English']]

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Aarav,87,72,59,95,52
Arjun,99,80,85,78,53
Sai,79,72,71,71,67
Vikram,73,93,80,86,57


In [None]:
df[(df['W']>0) & (df['Y'] > 0)]

Unnamed: 0,W,X,Y,Z,new
C,0.807706,0.07296,0.638787,0.329646,1.446493


In [None]:
df[(df['W'] < 0) & (df['Y'] < 0)]

Unnamed: 0,W,X,Y,Z,new
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051


## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [None]:
df

Unnamed: 0,W,X,Y,Z,new
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342
B,-0.134841,0.390528,0.166905,0.184502,0.032064
C,0.807706,0.07296,0.638787,0.329646,1.446493
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051
E,-0.116773,1.901755,0.238127,1.996652,0.121354


In [None]:
# Reset to default 0,1...n index
df.reset_index()   # drop=True  removes the index column
# df.reset_index(drop=True)   


Unnamed: 0,index,W,X,Y,Z,new
0,A,0.302665,1.693723,-1.706086,-1.159119,-1.40342
1,B,-0.134841,0.390528,0.166905,0.184502,0.032064
2,C,0.807706,0.07296,0.638787,0.329646,1.446493
3,D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051
4,E,-0.116773,1.901755,0.238127,1.996652,0.121354


In [None]:
df

Unnamed: 0,W,X,Y,Z,new
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342
B,-0.134841,0.390528,0.166905,0.184502,0.032064
C,0.807706,0.07296,0.638787,0.329646,1.446493
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051
E,-0.116773,1.901755,0.238127,1.996652,0.121354


In [None]:
newind = 'CA NY WY OR CO'.split()

In [None]:
df['States'] = newind

In [None]:
df

Unnamed: 0,W,X,Y,Z,new,States
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,CA
B,-0.134841,0.390528,0.166905,0.184502,0.032064,NY
C,0.807706,0.07296,0.638787,0.329646,1.446493,WY
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,OR
E,-0.116773,1.901755,0.238127,1.996652,0.121354,CO


In [None]:
df.set_index('States')

Unnamed: 0_level_0,W,X,Y,Z,new
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CA,0.302665,1.693723,-1.706086,-1.159119,-1.40342
NY,-0.134841,0.390528,0.166905,0.184502,0.032064
WY,0.807706,0.07296,0.638787,0.329646,1.446493
OR,-0.497104,-0.75407,-0.943406,0.484752,-1.44051
CO,-0.116773,1.901755,0.238127,1.996652,0.121354


In [None]:
df

Unnamed: 0,W,X,Y,Z,new,States
A,0.302665,1.693723,-1.706086,-1.159119,-1.40342,CA
B,-0.134841,0.390528,0.166905,0.184502,0.032064,NY
C,0.807706,0.07296,0.638787,0.329646,1.446493,WY
D,-0.497104,-0.75407,-0.943406,0.484752,-1.44051,OR
E,-0.116773,1.901755,0.238127,1.996652,0.121354,CO


In [None]:
df.set_index('States',inplace=True)

In [None]:
df

## Multi-Index and Index Hierarchy

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

In [None]:
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [None]:
hier_index

MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 1),
            ('G2', 2),
            ('G2', 3)],
           )

In [None]:
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Unnamed: 0,Unnamed: 1,A,B
G1,1,-0.976076,1.414672
G1,2,-1.562469,-0.676467
G1,3,-1.617897,-1.818591
G2,1,0.171447,0.37775
G2,2,1.049934,-0.526008
G2,3,-0.304556,-0.484535


In [None]:
# df.groupby(['outside','inside'])['inside']

Now let's show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:

In [None]:
df.loc['G1']

Unnamed: 0,A,B
1,-0.976076,1.414672
2,-1.562469,-0.676467
3,-1.617897,-1.818591


In [None]:
df.loc['G1'].loc[1]

A   -0.976076
B    1.414672
Name: 1, dtype: float64

In [None]:
df.loc['G1'].loc[1]['B']

np.float64(1.4146724903530414)

In [None]:
student_df

Unnamed: 0,Math,Physics,Chemistry,Biology,English
Aarav,87,72,59,95,52
Vihaan,68,97,78,61,60
Arjun,99,80,85,78,53
Vivaan,69,70,97,64,55
Aditya,55,56,74,89,87
Rohan,96,57,87,54,73
Karan,85,65,84,53,68
Ishaan,63,96,53,96,87
Sai,79,72,71,71,67
Vikram,73,93,80,86,57


In [None]:
student_df.index

Index(['Aarav', 'Vihaan', 'Arjun', 'Vivaan', 'Aditya', 'Rohan', 'Karan',
       'Ishaan', 'Sai', 'Vikram'],
      dtype='object')

In [None]:
df.index.names

FrozenList([None, None])

In [None]:
df.index.names = ['outer','inner']

In [None]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
outer,inner,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,-0.976076,1.414672
G1,2,-1.562469,-0.676467
G1,3,-1.617897,-1.818591
G2,1,0.171447,0.37775
G2,2,1.049934,-0.526008
G2,3,-0.304556,-0.484535


In [None]:
df.xs('G1')

Unnamed: 0,A,B
1,-0.976076,1.414672
2,-1.562469,-0.676467
3,-1.617897,-1.818591


In [None]:
df.xs(('G1',1))

A   -0.976076
B    1.414672
Name: (G1, 1), dtype: float64

In [None]:
df.xs(1,level='inner')

Unnamed: 0_level_0,A,B
outer,Unnamed: 1_level_1,Unnamed: 2_level_1
G1,-0.976076,1.414672
G2,0.171447,0.37775


# ---
# ðŸ“˜ Content from `04-Missing Data.ipynb`
# ---

# Missing Data

Let's show a few convenient methods to deal with Missing Data in pandas:

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})

In [None]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [None]:
df.isnull()

Unnamed: 0,A,B,C
0,False,False,False
1,False,True,False
2,True,True,False


In [None]:
df.isnull().sum()

A    1
B    2
C    0
dtype: int64

In [None]:
df.dropna()

Unnamed: 0,A,B,C
0,1.0,5.0,1


In [None]:
df.dropna(axis=1)

Unnamed: 0,C
0,1
1,2
2,3


In [None]:
df.dropna(thresh=2)

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


In [None]:
df.fillna(value='FILL VALUE')

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,FILL VALUE,2
2,FILL VALUE,FILL VALUE,3


In [None]:
data = {
    'product_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'product_name': ['Apples', 'Bread', np.nan, 'Milk', 'Eggs', 'Bananas', 'Rice', None, 'Chicken', 'Yogurt'],
    'category': ['Fruits', 'Bakery', 'Vegetables', 'Dairy', None, 'Fruits', 'Grains', 'Meat', 'Meat', 'Dairy'],
    'price': [2.99, 1.50, 3.25, None, 4.99, 1.25, 5.00, 8.99, None, 3.50],
    'quantity_sold': [45, None, 23, 67, 34, 89, 12, 56, 78, None],
    'supplier': ['Farm Fresh', 'Local Bakery', 'Green Garden', None, 'Happy Hens', 'Tropical Inc', 'Rice Co', 'Fresh Meat', 'Premium Poultry', None],
    'expiry_date': ['2024-10-15', '2024-09-25', None, '2024-09-28', '2024-09-30', '2024-10-01', '2025-06-15', '2024-10-05', None, '2024-09-27'],
    'in_stock': [True, True, False, True, None, True, True, False, True, True],
    'discount_percent': [10, 0, 15, 5, None, 20, 0, None, 12, 8]
}

grocery_df = pd.DataFrame(data)
grocery_df

Unnamed: 0,product_id,product_name,category,price,quantity_sold,supplier,expiry_date,in_stock,discount_percent
0,1,Apples,Fruits,2.99,45.0,Farm Fresh,2024-10-15,True,10.0
1,2,Bread,Bakery,1.5,,Local Bakery,2024-09-25,True,0.0
2,3,,Vegetables,3.25,23.0,Green Garden,,False,15.0
3,4,Milk,Dairy,,67.0,,2024-09-28,True,5.0
4,5,Eggs,,4.99,34.0,Happy Hens,2024-09-30,,
5,6,Bananas,Fruits,1.25,89.0,Tropical Inc,2024-10-01,True,20.0
6,7,Rice,Grains,5.0,12.0,Rice Co,2025-06-15,True,0.0
7,8,,Meat,8.99,56.0,Fresh Meat,2024-10-05,False,
8,9,Chicken,Meat,,78.0,Premium Poultry,,True,12.0
9,10,Yogurt,Dairy,3.5,,,2024-09-27,True,8.0


In [None]:
grocery_df.isnull().sum()

product_id          0
product_name        2
category            1
price               2
quantity_sold       2
supplier            2
expiry_date         2
in_stock            1
discount_percent    2
dtype: int64

In [None]:
avg_price = grocery_df['price'].mean()
avg_price

np.float64(3.93375)

In [None]:
grocery_df['price']

0    2.99
1    1.50
2    3.25
3     NaN
4    4.99
5    1.25
6    5.00
7    8.99
8     NaN
9    3.50
Name: price, dtype: float64

In [None]:
grocery_df['price'].fillna(value=avg_price, inplace=True)

In [None]:
grocery_df['price']

0    2.99000
1    1.50000
2    3.25000
3    3.93375
4    4.99000
5    1.25000
6    5.00000
7    8.99000
8    3.93375
9    3.50000
Name: price, dtype: float64

In [None]:
A_avg = df['A'].mean()

df['A'].fillna(value=A_avg)