# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


# Difference Between NumPy & Pandas

![](images/pandas_vs_numpy.png)

## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [2]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

![](images/dataframe.png)

### 1. From Python Dictionary

In [2]:
df1 = pd.DataFrame({'Name': ['Prabhat', 'Hari', 'Shyam',
                             'Sita', 'Mahima', 'Sunil', 'Bhawana'],
                   'Age': [24,34,50,32,18,23,22],
                   'Address': ['Manigram', 'Dhanewa', 'Bardaghat', 'Manglapur',
                              'Bharatpur', 'Kathmandu', 'Ramechap']})

In [3]:
df1

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 2. From a list of dictionaries

In [5]:
df2 = pd.DataFrame([{'Name': 'Prabhat', 'Age': 24, 'Address' : 'Manigram'},
                    {'Name': 'Hari', 'Age': 34, 'Address' : 'Dhanewa'},
                    {'Name': 'Shyam', 'Age': 50, 'Address' : 'Bardaghat'},
                    {'Name': 'Sita', 'Age': 32, 'Address' : 'Manglapur'},
                    {'Name': 'Mahima', 'Age': 18, 'Address' : 'Bharatpur'},
                    {'Name': 'Sunil', 'Age': 23, 'Address' : 'Kathmandu'},
                    {'Name': 'Bhawana', 'Age': 22, 'Address' : 'Ramechap'}
                   ])


In [6]:
df2

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 3. From a list of tuples

In [9]:
df3 = pd.DataFrame([('Prabhat', 24, 'Manigram'),
                    ('Hari', 34, 'Dhanewa'),
                    ('Shyam',50, 'Bardaghat'),
                    ('Sita', 32, 'Manglapur'),
                    ('Mahima', 18, 'Bharatpur'),
                    ('Sunil', 23, 'Kathmandu'),
                    ('Bhawana', 22, 'Ramechap')
                   ], columns = ['Name', 'Age', 'Address'])

In [10]:
df3

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 4. From list of lists

In [11]:
df4 = pd.DataFrame([['Prabhat', 24, 'Manigram'],
                    ['Hari', 34, 'Dhanewa'],
                    ['Shyam',50, 'Bardaghat'],
                    ['Sita', 32, 'Manglapur'],
                    ['Mahima', 18, 'Bharatpur'],
                    ['Sunil', 23, 'Kathmandu'],
                    ['Bhawana', 22, 'Ramechap']
                   ], columns = ['Name', 'Age', 'Address'])

In [12]:
df4

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


#### Question:
<li>Read 'weather_data.csv' file using csv reader.</li>
<li>Store the data inside the csv file into a list of lists.</li>
<li>Then create a pandas dataframe using list of list.</li>

In [14]:
from csv import reader

file = open('weather_data.csv')
file_reader = reader(file)
data = list(file_reader)
print(data)

[['day', 'temperature', 'windspeed', 'event'], ['1/1/2017', '32', '6', 'Rain'], ['1/4/2017', 'not available', '9', 'Sunny'], ['1/5/2017', '-1', 'not measured', 'Snow'], ['1/6/2017', 'not available', '7', 'no event'], ['1/7/2017', '32', 'not measured', 'Rain'], ['1/8/2017', 'not available', 'not measured', 'Sunny'], ['1/9/2017', 'not available', 'not measured', 'no event'], ['1/10/2017', '34', '8', 'Cloudy'], ['1/11/2017', '-4', '-1', 'Snow'], ['1/12/2017', '26', '12', 'Sunny'], ['1/13/2017', '12', '12', 'Rainy'], ['1/11/2017', '-1', '12', 'Snow'], ['1/14/2017', '40', '-1', 'Sunny']]


In [15]:
weather_df = pd.DataFrame(data[1:], columns = data[0])
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Question
<li>1. Read 'imports-85.data' file using file reader.</li>
<li>2. Store the data present inside the file into a list of list.</li>
<li>3. Create a pandas dataframe using list of lists.</li>
<li>4. For column name, we can use the columns variable given below.</li>

In [20]:
# total_data = []
# file = open('imports-85.data', 'r')
# data_read = file.readlines()
# for item in data_read:
#     item_list = item.split('\n')[:-1]
#     new_item_list = item[0].split(',')
#     total_data.append(new_item_list)


In [21]:
# print(total_data)

In [None]:
columns = ['symboling', 'normalized_losses', 'make', 'fuel_type', 'aspiration', 'num_of_doors',
          'body_style', 'drive_wheels', 'engine_location', 'wheel_base', 'length', 'width', 
           'height', 'curb_weight', 'engine_type', 'num_of_cylinders', 'engine_size', 'fuel_system',
          'bore', 'stroke', 'compression', 'horsepower', 'peak_rpm', 'city_mpg', 'highway_mpg', 
           'price']

### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [16]:
weather_df = pd.read_csv('weather_data.csv')

In [17]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


### Reading a csv file using skiprows and header parameters

In [26]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 1)

In [27]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [34]:
weather_df = pd.read_csv('weather_data.csv', header = 2)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Reading a csv file without header and giving names to the columns

In [40]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Read limited data from a csv file using nrows parameters


In [43]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3,nrows = 5, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


#### Reading csv files with na_values parameters ('weather_data.csv' file)


In [52]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
#                         na_values = ['not available', 'not measured', 
#                                     'no event']
                        )
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [24]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
                        na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

#### Write a pandas dataframe to a csv file
<li>We can write a pandas dataframe to a csv file using .to_csv() method.</li>
<li>You can specify any name to the csv file while writing a pandas dataframe into a csv file.</li>

In [57]:
weather_df.to_csv('weather_data_nan.csv', index = False)

### 6. Pandas Dataframe From Xcel files

<li>We can load an excel file with <b>.xlsx</b> extension and create a dataframe out of the data present inside an excel file using pandas.</li>
<li>We have <b>.read_excel()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>
<li>We also need to install <b>openpyxl</b> for working with excel files.</li>

In [4]:
weather_df = pd.read_excel('weather_data.xlsx',
                           na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,,Snow
9,1/12/2017,26.0,12.0,Sunny


#### Writing to an excel file
<li>We can write a pandas dataframe into a excel file using .to_excel() method.</li>

In [5]:
weather_df.to_excel('weather_data.xlsx', 'nans')

#### Using head() and tail() method to see top 5 and last 5 rows
<li>To view the first few rows of our dataframe, we can use the DataFrame.head() method.</li>
<li>By default, it returns the first five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

<li>Similarly, to view the last few rows of our dataframe, we can use the DataFrame.tail() method.</li>
<li>By default, it returns the last five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

In [28]:
weather_df = pd.read_csv('weather_data.csv', skiprows = 2)
weather_df.head(3)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow


In [29]:
weather_df.tail(3)

Unnamed: 0,day,temperature,windspeed,event
10,1/13/2017,12,12,Rainy
11,1/11/2017,-1,12,Snow
12,1/14/2017,40,-1,Sunny


#### Question:

<li>Use the head() method to select the first 6 rows.</li>
<li>Use the tail() method to select the last 8 rows.</li>

In [31]:
weather_df.head(6)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny


In [32]:
weather_df.tail(8)

Unnamed: 0,day,temperature,windspeed,event
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny
10,1/13/2017,12,12,Rainy
11,1/11/2017,-1,12,Snow
12,1/14/2017,40,-1,Sunny


#### Finding the column names from the dataframe
<li>We have df.columns attributes to check the name of columns in the pandas dataframe.</li>
<li>Similarly, we have df.values attributes to check the data present in the pandas dataframe.</li>

In [33]:
weather_df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [34]:
print(type(weather_df.columns))

<class 'pandas.core.indexes.base.Index'>


In [36]:
weather_df.columns[-2:]

Index(['windspeed', 'event'], dtype='object')

In [38]:
list(weather_df.columns)[:2]

['day', 'temperature']

In [39]:
weather_df.values

array([['1/1/2017', '32', '6', 'Rain'],
       ['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/5/2017', '-1', 'not measured', 'Snow'],
       ['1/6/2017', 'not available', '7', 'no event'],
       ['1/7/2017', '32', 'not measured', 'Rain'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event'],
       ['1/10/2017', '34', '8', 'Cloudy'],
       ['1/11/2017', '-4', '-1', 'Snow'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/13/2017', '12', '12', 'Rainy'],
       ['1/11/2017', '-1', '12', 'Snow'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [40]:
type(weather_df.values)

numpy.ndarray

In [42]:
weather_df.values.shape

(13, 4)

In [43]:
weather_df.values.ndim

2

In [44]:
weather_df.size

52

In [45]:
weather_df.values[-5:]

array([['1/11/2017', '-4', '-1', 'Snow'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/13/2017', '12', '12', 'Rainy'],
       ['1/11/2017', '-1', '12', 'Snow'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [51]:
weather_df.values[weather_df.values[:,-1] == 'Sunny']

array([['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [53]:
weather_df.values[weather_df.values[:,1] == 'not available']

array([['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/6/2017', 'not available', '7', 'no event'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

In [56]:
weather_df.values[weather_df.values[:,2] == 'not measured']

array([['1/5/2017', '-1', 'not measured', 'Snow'],
       ['1/7/2017', '32', 'not measured', 'Rain'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

In [58]:
weather_df.values[weather_df.values[:,-1] == 'no event']

array([['1/6/2017', 'not available', '7', 'no event'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

#### Checking the type of your dataframe 
<li>Another feature that makes pandas better for working with data is that dataframes can contain more than one data type.</li>
<li>Axis values can have string labels, not just numeric ones.</li>
<li>Dataframes can contain columns with multiple data types: including integer, float, and string.</li>
<li>We can use the DataFrame.dtypes attribute (similar to NumPy) to return information about the types of each column.</li>
<li>When we import data, pandas attempts to guess the correct dtype for each column.</li>
<li>Generally, pandas does well with this, which means we don't need to worry about specifying dtypes every time we start to work with data.</li>



In [60]:
weather_df.dtypes

day            object
temperature    object
windspeed      object
event          object
dtype: object

In [61]:
weather_df_nan = pd.read_csv('weather_data_nan.csv')
weather_df_nan.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [62]:
weather_df_nan.dtypes

day             object
temperature    float64
windspeed      float64
event           object
dtype: object

#### Datatypes Information
<li>We can get the shape of the dataset using <b>.shape()</b> method.</li>
<li><b>.shape()</b> method returns the tuple datatype containing the number of rows and number of columns in the dataset.</li>
<li>If we wanted an overview of all the dtypes used in our dataframe, we can use <b>.info()</b> method.</li>
<li>Note that <b>DataFrame.info()</b> prints the information, rather than returning it, so we can't assign it to a variable.</li>


In [64]:
weather_df_nan.shape

(13, 4)

In [66]:
weather_df_nan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          13 non-null     object 
 1   temperature  9 non-null      float64
 2   windspeed    7 non-null      float64
 3   event        11 non-null     object 
dtypes: float64(2), object(2)
memory usage: 544.0+ bytes


#### Checking the null values in the pandas dataframe

In [68]:
weather_df_nan.isnull().sum()

day            0
temperature    4
windspeed      6
event          2
dtype: int64

#### set_index() and reset_index() method

In [32]:
weather_df_nan = pd.read_csv('weather_data_nan.csv',
                             parse_dates = ['day'])
weather_df_nan.head()

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain


In [75]:
weather_df_nan.set_index('day', inplace = True)
weather_df_nan

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,-1.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,-4.0,,Snow
2017-01-12,26.0,12.0,Sunny


In [78]:
weather_df_nan.reset_index(inplace = True)
weather_df_nan

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain
5,2017-01-08,,,Sunny
6,2017-01-09,,,
7,2017-01-10,34.0,8.0,Cloudy
8,2017-01-11,-4.0,,Snow
9,2017-01-12,26.0,12.0,Sunny


In [85]:
temperature_index_df = weather_df_nan.set_index('temperature')
temperature_index_df.head()

Unnamed: 0_level_0,day,windspeed,event
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
32.0,2017-01-01,6.0,Rain
,2017-01-04,9.0,Sunny
-1.0,2017-01-05,,Snow
,2017-01-06,7.0,
32.0,2017-01-07,,Rain


In [86]:
temperature_reset_index_df = temperature_index_df.reset_index(drop = True)

In [88]:
temperature_reset_index_df.reset_index(inplace = True)
temperature_reset_index_df

Unnamed: 0,index,day,windspeed,event
0,0,2017-01-01,6.0,Rain
1,1,2017-01-04,9.0,Sunny
2,2,2017-01-05,,Snow
3,3,2017-01-06,7.0,
4,4,2017-01-07,,Rain
5,5,2017-01-08,,Sunny
6,6,2017-01-09,,
7,7,2017-01-10,8.0,Cloudy
8,8,2017-01-11,,Snow
9,9,2017-01-12,12.0,Sunny


In [90]:
temperature_reset_index_df.reset_index(inplace = True, drop = True)
temperature_reset_index_df

Unnamed: 0,level_0,index,day,windspeed,event
0,0,0,2017-01-01,6.0,Rain
1,1,1,2017-01-04,9.0,Sunny
2,2,2,2017-01-05,,Snow
3,3,3,2017-01-06,7.0,
4,4,4,2017-01-07,,Rain
5,5,5,2017-01-08,,Sunny
6,6,6,2017-01-09,,
7,7,7,2017-01-10,8.0,Cloudy
8,8,8,2017-01-11,,Snow
9,9,9,2017-01-12,12.0,Sunny


#### Selecting a column from a pandas DataFrame

<li>Since our axis in pandas have labels, we can select data using those labels.</li> 
<li>Unlike in NumPy, we donot need to know the exact index location of a pandas dataframe.</li>
<li>To do this, we can use the DataFrame.loc[] attribute. The syntax for DataFrame.loc[] is:</li>
<code>
df.loc[row_label, column_label]
</code>

<li>We can use the following shortcut to select a single column:</li>
<code>
df["column_name"]
</code>

<li>This style of selecting columns is very common.</li>


In [1]:
import pandas as pd
weather_df = pd.read_csv('weather_data_nan.csv')
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [2]:
weather_df.loc[:, 'event']

0       Rain
1      Sunny
2       Snow
3        NaN
4       Rain
5      Sunny
6        NaN
7     Cloudy
8       Snow
9      Sunny
10     Rainy
11      Snow
12     Sunny
Name: event, dtype: object

In [3]:
weather_df['temperature']

0     32.0
1      NaN
2     -1.0
3      NaN
4     32.0
5      NaN
6      NaN
7     34.0
8     -4.0
9     26.0
10    12.0
11    -1.0
12    40.0
Name: temperature, dtype: float64

#### Questions

<li>Read <b>'appointment_schedule.csv'</b> file using pandas.</li>
<li>Select the <b>'name'</b> column from the given dataset and store to <b>'appointment_names'</b> variable.</li>
<li>Use Python's <b>type()</b> function to assign the type of name column to <b>name_type</b>.</li>

In [4]:
appointment_df = pd.read_csv('appointment_schedule.csv')
appointment_df.tail()

Unnamed: 0,name,appointment_made_date,app_start_date,app_end_date,visitee_namelast,visitee_namefirst,meeting_room,description
580,Ryan J. Morgan,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
581,Alexander V. Nevsky,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
582,Montana J. Johnson,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
583,Joseph A. Pritchard,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
584,Martin O. Reina,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard


In [5]:
appointment_df.columns

Index(['name', 'appointment_made_date', 'app_start_date', 'app_end_date',
       'visitee_namelast', 'visitee_namefirst', 'meeting_room', 'description'],
      dtype='object')

In [6]:
appointment_names = appointment_df.loc[:,'name']
print(appointment_names)

0        Joshua T. Blanton
1          Jack T. Gutting
2        Bradley T. Guiles
3           Loryn F. Grieb
4         Travis D. Gordon
              ...         
580         Ryan J. Morgan
581    Alexander V. Nevsky
582     Montana J. Johnson
583    Joseph A. Pritchard
584        Martin O. Reina
Name: name, Length: 585, dtype: object


In [7]:
appointment_names = appointment_df['name']
print(appointment_names)

0        Joshua T. Blanton
1          Jack T. Gutting
2        Bradley T. Guiles
3           Loryn F. Grieb
4         Travis D. Gordon
              ...         
580         Ryan J. Morgan
581    Alexander V. Nevsky
582     Montana J. Johnson
583    Joseph A. Pritchard
584        Martin O. Reina
Name: name, Length: 585, dtype: object


In [8]:
print(type(appointment_names))

<class 'pandas.core.series.Series'>


In [9]:
appointment_names.shape

(585,)

#### Pandas Series
<li>Series is the pandas type for one-dimensional objects.</li>
<li>Anytime you see a 1D pandas object, it will be a series. Anytime you see a 2D pandas object, it will be a dataframe.</li>
<li>A dataframe is a collection of series objects, which is similar to how pandas stores the data behind the scenes.</li>

#### Adding a column in a pandas dataframe

In [12]:
import numpy as np
weather_df['is_play'] = np.nan
print(weather_df.shape)
weather_df.head()

(13, 5)


Unnamed: 0,day,temperature,windspeed,event,is_play
0,1/1/2017,32.0,6.0,Rain,
1,1/4/2017,,9.0,Sunny,
2,1/5/2017,-1.0,,Snow,
3,1/6/2017,,7.0,,
4,1/7/2017,32.0,,Rain,


In [13]:
is_play = weather_df['is_play']
print(type(is_play))

<class 'pandas.core.series.Series'>


In [14]:
weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          13 non-null     object 
 1   temperature  9 non-null      float64
 2   windspeed    7 non-null      float64
 3   event        11 non-null     object 
 4   is_play      0 non-null      float64
dtypes: float64(3), object(2)
memory usage: 648.0+ bytes


### Selecting Multiple Columns From the DataFrame

![](images/selecting_columns.png)

<li>We can select multiple columns from the dataframe by using the following codes:</li>
<code>
    df.loc[:, ["col1", "col2"]]
</code>

<li>We can use syntax shortcuts for selecting multiple columns by using the following syntax:</li>
<code>
    df[["col1", "col2"]]
</code>

In [16]:
weather_df = pd.read_csv('weather_data_nan.csv', parse_dates = ['day'])
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain


In [17]:
weather_df.set_index('day', inplace = True)

In [18]:
weather_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,-1.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,-4.0,,Snow
2017-01-12,26.0,12.0,Sunny


In [20]:
weather_df.loc[:, ["temperature", "event"]].head()

Unnamed: 0_level_0,temperature,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,32.0,Rain
2017-01-04,,Sunny
2017-01-05,-1.0,Snow
2017-01-06,,
2017-01-07,32.0,Rain


In [21]:
weather_df[['temperature', 'event']].head()

Unnamed: 0_level_0,temperature,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,32.0,Rain
2017-01-04,,Sunny
2017-01-05,-1.0,Snow
2017-01-06,,
2017-01-07,32.0,Rain


In [23]:
weather_df_no_windspeed = weather_df.drop('windspeed', axis = 1)
weather_df_no_windspeed.head()

Unnamed: 0_level_0,temperature,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,32.0,Rain
2017-01-04,,Sunny
2017-01-05,-1.0,Snow
2017-01-06,,
2017-01-07,32.0,Rain


#### Question:
<li>Read 'car_details.csv' file and create a pandas dataframe from it.</li>
<li>Then only select <b>'name'</b>, <b>'selling price'</b> and <b>'km_driven'</b> columns from the dataframe.</li>

![](images/selecting_3_cols.png)

In [24]:
car_details_df = pd.read_csv('car_details.csv')
car_details_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [26]:
car_details_df.loc[:, ['name', 'selling_price', 'km_driven']].head()

Unnamed: 0,name,selling_price,km_driven
0,Maruti 800 AC,60000,70000
1,Maruti Wagon R LXI Minor,135000,50000
2,Hyundai Verna 1.6 SX,600000,100000
3,Datsun RediGO T Option,250000,46000
4,Honda Amaze VX i-DTEC,450000,141000


In [27]:
car_details_df[['name', 'selling_price', 'km_driven']].head()

Unnamed: 0,name,selling_price,km_driven
0,Maruti 800 AC,60000,70000
1,Maruti Wagon R LXI Minor,135000,50000
2,Hyundai Verna 1.6 SX,600000,100000
3,Datsun RediGO T Option,250000,46000
4,Honda Amaze VX i-DTEC,450000,141000


In [30]:
car_details_limited = car_details_df.drop(['year', 'fuel', 'seller_type',
                                          'transmission', 'owner'],
                                          axis = 1)
car_details_limited.head()

Unnamed: 0,name,selling_price,km_driven
0,Maruti 800 AC,60000,70000
1,Maruti Wagon R LXI Minor,135000,50000
2,Hyundai Verna 1.6 SX,600000,100000
3,Datsun RediGO T Option,250000,46000
4,Honda Amaze VX i-DTEC,450000,141000


#### Selecting Rows From A Pandas DataFrame

<li>Now that we've learned how to select columns by label, let's learn how to select rows using the labels of the index axis.</li>
<li>We can use the same syntax to select rows from a dataframe as we do for columns:</li>
<code>
    df.loc[row_label, column_label]
</code>

![](images/selecting_one_row.png)

In [32]:
weather_df.loc['2017-01-01']

temperature    32.0
windspeed       6.0
event          Rain
Name: 2017-01-01 00:00:00, dtype: object

In [34]:
weather_df.reset_index(inplace = True)
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain


In [35]:
weather_df.set_index('temperature', inplace = True)
weather_df.head()

Unnamed: 0_level_0,day,windspeed,event
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
32.0,2017-01-01,6.0,Rain
,2017-01-04,9.0,Sunny
-1.0,2017-01-05,,Snow
,2017-01-06,7.0,
32.0,2017-01-07,,Rain


In [36]:
weather_df.loc[-1]

Unnamed: 0_level_0,day,windspeed,event
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-1.0,2017-01-05,,Snow
-1.0,2017-01-11,12.0,Snow


In [37]:
weather_df.reset_index(inplace = True)
weather_df.head()

Unnamed: 0,temperature,day,windspeed,event
0,32.0,2017-01-01,6.0,Rain
1,,2017-01-04,9.0,Sunny
2,-1.0,2017-01-05,,Snow
3,,2017-01-06,7.0,
4,32.0,2017-01-07,,Rain


In [38]:
weather_df.set_index('event', inplace = True)
weather_df.head()

Unnamed: 0_level_0,temperature,day,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Rain,32.0,2017-01-01,6.0
Sunny,,2017-01-04,9.0
Snow,-1.0,2017-01-05,
,,2017-01-06,7.0
Rain,32.0,2017-01-07,


In [39]:
weather_df.loc["Sunny"]

Unnamed: 0_level_0,temperature,day,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sunny,,2017-01-04,9.0
Sunny,,2017-01-08,
Sunny,26.0,2017-01-12,12.0
Sunny,40.0,2017-01-14,


### Selecting Multiple Rows From the DataFrame

![](images/selecting_multiple_rows.png)

In [43]:
weather_df = weather_df.reset_index().set_index('day')
weather_df.head()

Unnamed: 0_level_0,event,temperature,windspeed
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,Rain,32.0,6.0
2017-01-04,Sunny,,9.0
2017-01-05,Snow,-1.0,
2017-01-06,,,7.0
2017-01-07,Rain,32.0,


In [44]:
weather_df.loc[['2017-01-01', '2017-01-04']]

Unnamed: 0_level_0,event,temperature,windspeed
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,Rain,32.0,6.0
2017-01-04,Sunny,,9.0


#### Indexing & Slicing In Pandas DataFrame

<li>We can slice a dataset from their rows as well as columns.</li>
<li>If we have (5,5) shape data and we want first three rows and first three columns then we need to slice both rows and columns to get a desired shape.</li>
<li>We have df.iloc() method which we can use to do indexing as well as slicing in a dataframe.</li>
<li>Let's practice .iloc() method.</li>


In [46]:
weather_df.reset_index(inplace = True)

In [56]:
weather_df[3:6]

Unnamed: 0,day,event,temperature,windspeed
3,2017-01-06,,,7.0
4,2017-01-07,Rain,32.0,
5,2017-01-08,Sunny,,


In [58]:
weather_df.iloc[2:5, :2]

Unnamed: 0,day,event
2,2017-01-05,Snow
3,2017-01-06,
4,2017-01-07,Rain


In [60]:
weather_df.iloc[:,:3]

Unnamed: 0,day,event,temperature
0,2017-01-01,Rain,32.0
1,2017-01-04,Sunny,
2,2017-01-05,Snow,-1.0
3,2017-01-06,,
4,2017-01-07,Rain,32.0
5,2017-01-08,Sunny,
6,2017-01-09,,
7,2017-01-10,Cloudy,34.0
8,2017-01-11,Snow,-4.0
9,2017-01-12,Sunny,26.0


In [62]:
weather_df.iloc[11,1]

'Snow'

#### Datatype Conversion In Pandas

<li>Pandas astype() is the one of the most important methods. It is used to change data type of a series.</li>
<li>When a pandas dataframe is created from a csv file,the data type is set automatically.</li>
<li>The datatype will not be what it actually should be at times and this is where we can use astype()  to get desired datatype.</li>
<li>For example, a salary column could be imported as string but to do operations we have to convert it into float.</li>
<li>astype() is used to do such data type conversions.</li>

In [64]:
weather_df.dtypes

day            datetime64[ns]
event                  object
temperature           float64
windspeed             float64
dtype: object

In [69]:
weather_df['day'] = weather_df['day'].astype('str')

In [70]:
weather_df.dtypes

day             object
event           object
temperature    float64
windspeed      float64
dtype: object

In [72]:
car_details_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [74]:
car_details_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           4340 non-null   object
 1   year           4340 non-null   int64 
 2   selling_price  4340 non-null   int64 
 3   km_driven      4340 non-null   int64 
 4   fuel           4340 non-null   object
 5   seller_type    4340 non-null   object
 6   transmission   4340 non-null   object
 7   owner          4340 non-null   object
dtypes: int64(3), object(5)
memory usage: 271.4+ KB


In [1]:
import pandas as pd
car_df = pd.read_csv('car_details.csv')
car_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [2]:
car_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           4340 non-null   object
 1   year           4340 non-null   int64 
 2   selling_price  4340 non-null   int64 
 3   km_driven      4340 non-null   int64 
 4   fuel           4340 non-null   object
 5   seller_type    4340 non-null   object
 6   transmission   4340 non-null   object
 7   owner          4340 non-null   object
dtypes: int64(3), object(5)
memory usage: 271.4+ KB


In [3]:
car_df[['selling_price', 'km_driven']] = car_df[['selling_price', 'km_driven']].astype('float64')

In [4]:
car_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           4340 non-null   object 
 1   year           4340 non-null   int64  
 2   selling_price  4340 non-null   float64
 3   km_driven      4340 non-null   float64
 4   fuel           4340 non-null   object 
 5   seller_type    4340 non-null   object 
 6   transmission   4340 non-null   object 
 7   owner          4340 non-null   object 
dtypes: float64(2), int64(1), object(5)
memory usage: 271.4+ KB


In [5]:
car_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000.0,70000.0,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000.0,50000.0,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000.0,100000.0,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000.0,46000.0,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000.0,141000.0,Diesel,Individual,Manual,Second Owner


#### Value Counts Method

<li>Since series and dataframes are two distinct objects, they have their own unique methods.</li>

<li>Let's look at an example of a series method - the Series.value_counts() method.</li>

<li>This method displays each unique non-null value in a column and their counts in order.</li>

<li>value_counts() is a series only method, we get the following error if we try to use it for dataframes:</li>

<code>
    AttributeError: 'DataFrame' object has no attribute 'value_counts'
</code>

In [60]:
weather_df = pd.read_csv('weather_data_nan.csv')
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [8]:
weather_df['event'].value_counts()

Sunny     4
Snow      3
Rain      2
Cloudy    1
Rainy     1
Name: event, dtype: int64

In [10]:
weather_df['windspeed'].value_counts()

12.0    3
6.0     1
9.0     1
7.0     1
8.0     1
Name: windspeed, dtype: int64

In [11]:
car_df = pd.read_csv('car_details.csv')
car_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [12]:
car_df['name'].value_counts()

Maruti Swift Dzire VDI                     69
Maruti Alto 800 LXI                        59
Maruti Alto LXi                            47
Maruti Alto LX                             35
Hyundai EON Era Plus                       35
                                           ..
Hyundai Verna Transform CRDi VGT SX ABS     1
Maruti S-Presso VXI Plus                    1
Toyota Etios Liva 1.2 VX                    1
Toyota Yaris G                              1
Hyundai i20 Magna 1.4 CRDi                  1
Name: name, Length: 1491, dtype: int64

In [13]:
car_df['fuel'].value_counts()

Diesel      2153
Petrol      2123
CNG           40
LPG           23
Electric       1
Name: fuel, dtype: int64

#### Creating a frequency table from value_counts 

In [14]:
fuel_count_df = car_df['fuel'].value_counts().to_frame()
fuel_count_df.head()

Unnamed: 0,fuel
Diesel,2153
Petrol,2123
CNG,40
LPG,23
Electric,1


In [16]:
fuel_count_df.loc['Diesel']

fuel    2153
Name: Diesel, dtype: int64

In [17]:
fuel_count_df.reset_index(inplace = True)
fuel_count_df.head()

Unnamed: 0,index,fuel
0,Diesel,2153
1,Petrol,2123
2,CNG,40
3,LPG,23
4,Electric,1


#### Renaming the column names in a pandas dataframe

In [18]:
fuel_count_df.columns = ['fuel', 'frequency']
fuel_count_df.head()

Unnamed: 0,fuel,frequency
0,Diesel,2153
1,Petrol,2123
2,CNG,40
3,LPG,23
4,Electric,1


In [20]:
fuel_count_df.rename({'fuel': 'fuel_type', 'frequency': 'freq'}, 
                     inplace = True, axis = 1)
fuel_count_df.head()

Unnamed: 0,fuel_type,freq
0,Diesel,2153
1,Petrol,2123
2,CNG,40
3,LPG,23
4,Electric,1


In [23]:
def freq_count_df(df, exist_col, renamed_cols):
    """
    df -> dataframe (dataframe object),
    exist_col -> any feature from the dataframe (string)
    renamed_cols -> name of columns you want to rename with (list)
    """
    
    freq_count_df = df[exist_col].value_counts().to_frame().reset_index()
    freq_count_df.columns = renamed_cols
    return freq_count_df

In [None]:
seller_type, transmission, owner

In [24]:
seller_type_count_df = freq_count_df(df = car_df,
                                     exist_col = 'seller_type',
                                     renamed_cols = ['seller_type', 'freq']
                                    )
seller_type_count_df.head()

Unnamed: 0,seller_type,freq
0,Individual,3244
1,Dealer,994
2,Trustmark Dealer,102


In [25]:
transmission_count_df = freq_count_df(df = car_df,
                                     exist_col = 'transmission',
                                     renamed_cols = ['transmission', 'freq']
                                    )
transmission_count_df.head()

Unnamed: 0,transmission,freq
0,Manual,3892
1,Automatic,448


In [26]:
owner_count_df = freq_count_df(df = car_df,
                                     exist_col = 'owner',
                                     renamed_cols = ['owner', 'freq']
                                    )
owner_count_df.head()

Unnamed: 0,owner,freq
0,First Owner,2832
1,Second Owner,1106
2,Third Owner,304
3,Fourth & Above Owner,81
4,Test Drive Car,17


#### Selecting Items From A Series Method

<li>As with dataframes, we can use Series.loc[] to select items from a series using single labels, a list, or a slice object.</li>
<li>We can also omit loc[] and use bracket shortcuts for all three:</li>

![](images/selecting_series.png)

In [31]:
fuel_count_series = car_df['fuel'].value_counts()
fuel_count_series.head()

Diesel      2153
Petrol      2123
CNG           40
LPG           23
Electric       1
Name: fuel, dtype: int64

In [34]:
fuel_count_series.loc['Diesel']

2153

In [35]:
fuel_count_series.loc[['Diesel', 'Petrol']]

Diesel    2153
Petrol    2123
Name: fuel, dtype: int64

In [39]:
fuel_count_series.loc["Diesel": "CNG"]

Diesel    2153
Petrol    2123
CNG         40
Name: fuel, dtype: int64

#### Question

<li>Use the value counts method to check the frequency count of different names from 'appointment_schedule.csv' file.</li>
<li>Select only first row from the series.</li>
<li>Select the first row and the last row from the series.</li>
<li>Select the first five rows and the last five rows from the series.</li>



In [45]:
appointment_df = pd.read_csv('appointment_schedule.csv')
appointment_df.tail()

Unnamed: 0,name,appointment_made_date,app_start_date,app_end_date,visitee_namelast,visitee_namefirst,meeting_room,description
580,Ryan J. Morgan,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
581,Alexander V. Nevsky,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
582,Montana J. Johnson,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
583,Joseph A. Pritchard,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard
584,Martin O. Reina,2015-01-09T00:00:00,1/16/15 10:00,1/16/15 23:59,,potus,west wing,military honor guard


In [43]:
name_count_series = appointment_df['name'].value_counts()

In [44]:
name_count_series.loc["Joshua T. Blanton"]

1

In [46]:
name_count_series.loc[["Joshua T. Blanton", "Martin O. Reina"]]

Joshua T. Blanton    1
Martin O. Reina      2
Name: name, dtype: int64

#### DataFrame Vs DataSeries

![](images/dataframe_vs_series.png)

#### Summary

![](images/pandas_selection_summary.png)

#### Vecotrized Operations In Pandas

<li>We'll explore how pandas uses many of the concepts we learned in the NumPy.</li>
<li>Because pandas is designed to operate like NumPy, a lot of concepts and methods from Numpy are supported.</li>
<li>Recall that one of the ways NumPy makes working with data easier is with vectorized operations.</li>
<li>Just like with NumPy, we can use any of the standard Python numeric operators with series, including:</li>
<code>
    series_a + series_b - Addition
    series_a - series_b - Subtraction
    series_a * series_b - Multiplication
    series_a / series_b - Division
</code>

In [53]:
car_df['sp_per_kmdriven'] = car_df['selling_price'] / car_df['km_driven']

In [55]:
car_df['metres_driven'] = car_df['km_driven'] * 1000

In [58]:
car_df['age_in_years'] = 2023 - car_df['year']

In [59]:
car_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner,sp_per_kmdriven,metres_driven,age_in_years
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner,0.857143,70000000,16
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner,2.7,50000000,16
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner,6.0,100000000,11
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner,5.434783,46000000,6
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner,3.191489,141000000,9


In [61]:
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [62]:
weather_df['temp_in_kelvin'] = 273 + weather_df['temperature']
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event,temp_in_kelvin
0,1/1/2017,32.0,6.0,Rain,305.0
1,1/4/2017,,9.0,Sunny,
2,1/5/2017,-1.0,,Snow,272.0
3,1/6/2017,,7.0,,
4,1/7/2017,32.0,,Rain,305.0


#### Some Statistical Functions In Pandas

<li>Like NumPy, Pandas supports many descriptive stats methods such as mean, median, mode, min, max and so on.</li>
<li>Here are a few of the most useful ones.</li>
<code>
Series.max()
Series.min()
Series.mean()
Series.median()
Series.mode()
Series.sum()
</code>
<li>We can calculate the average value of a particular column(series) using df.column_name.mean().</li>
<li>For calculating the minimum value in a particular column(series), we can use df.column_name.min().</li>
<li>Similarly, for calculating the maximum value in a particular column(series), we can use df.column_name.max().</li>

In [64]:
car_df['selling_price'].min()

20000

In [65]:
car_df['selling_price'].max()

8900000

In [66]:
car_df['selling_price'].mean()

504127.3117511521

In [67]:
car_df['selling_price'].median()

350000.0

In [70]:
car_df['selling_price'].sum()

2187912533

In [73]:
car_df['selling_price'].std()

578548.7361388865

In [71]:
car_df['owner'].mode()

0    First Owner
Name: owner, dtype: object

In [72]:
car_df['owner'].value_counts()

First Owner             2832
Second Owner            1106
Third Owner              304
Fourth & Above Owner      81
Test Drive Car            17
Name: owner, dtype: int64

#### Finding the descriptive statistics of the dataframe using .describe() method

<li>Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.</li>
<li>describe() method in Pandas is used to compute descriptive statistics for all of your numeric columns.</li>
<li>Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types.</li>
<li>The output will vary depending on what is provided.</li>
<li>If we want to see the descriptive statistics of an object datatype then we have to specify <b>df.describe(include = "O")</b></li>

In [75]:
car_df.describe()

Unnamed: 0,year,selling_price,km_driven,sp_per_kmdriven,metres_driven,age_in_years
count,4340.0,4340.0,4340.0,4340.0,4340.0,4340.0
mean,2013.090783,504127.3,66215.777419,88.014104,66215780.0,9.909217
std,4.215344,578548.7,46644.102194,3801.519945,46644100.0,4.215344
min,1992.0,20000.0,1.0,0.25,1000.0,3.0
25%,2011.0,208749.8,35000.0,2.625,35000000.0,7.0
50%,2014.0,350000.0,60000.0,6.0,60000000.0,9.0
75%,2016.0,600000.0,90000.0,15.552441,90000000.0,12.0
max,2020.0,8900000.0,806599.0,250000.0,806599000.0,31.0


In [76]:
car_df.describe(include = "O")

Unnamed: 0,name,fuel,seller_type,transmission,owner
count,4340,4340,4340,4340,4340
unique,1491,5,3,2,5
top,Maruti Swift Dzire VDI,Diesel,Individual,Manual,First Owner
freq,69,2153,3244,3892,2832


#### Assigning Values With Pandas

<li>Just like in NumPy, the same techniques that we use to select data could be used for assignment.</li>

<li>When we selected a whole column by label and used assignment, we assigned the value to every item in that column.</li>

<li>By providing labels for both axes, we can assign them to a single value within our dataframe.</li>

<code>
    df.loc[row_label, col_label] = assignment_value
</code>

In [13]:
import pandas as pd

In [14]:
weather_df = pd.read_csv('weather_data_nan.csv')
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [5]:
weather_df.loc[1,'temperature'] = 31

In [6]:
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,31.0,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [8]:
weather_df.loc[3,["temperature", "event"]] = [30, "Sunny"]

In [9]:
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,31.0,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,30.0,7.0,Sunny
4,1/7/2017,32.0,,Rain


In [15]:
weather_df = weather_df.set_index('day')
weather_df.head()

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/4/2017,,9.0,Sunny
1/5/2017,-1.0,,Snow
1/6/2017,,7.0,
1/7/2017,32.0,,Rain


In [16]:
weather_df.loc['1/5/2017', 'windspeed'] = 8

In [17]:
weather_df.head()

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/4/2017,,9.0,Sunny
1/5/2017,-1.0,8.0,Snow
1/6/2017,,7.0,
1/7/2017,32.0,,Rain


#### Using Boolean Indexing With Pandas Objects (Selection With Condition In Pandas)
<li>We can assign a value by using row label and column label in pandas.</li>
<li>But what if we need to assign a same value to a group of similar rows with the same criteria.</li>
<li> Instead, we can use boolean indexing to change all rows that meet the same criteria, just like we did with NumPy.</li>


<ol>
    <li>Equals: df['series'] == value</li>
    <li>Not Equals: df['series'] != value</li>
    <li>Less than: df['series'] < value</li>
    <li>Less than or equal to: df['series'] <= value</li>
    <li>Greater than: df['series'] > value</li>
    <li>Greater than or equal to: df['series'] >= value</li>
</ol>
<li>These conditions can be used in several ways, most commonly inside .loc to select values with conditions.</li>

In [18]:
weather_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/4/2017,,9.0,Sunny
1/5/2017,-1.0,8.0,Snow
1/6/2017,,7.0,
1/7/2017,32.0,,Rain
1/8/2017,,,Sunny
1/9/2017,,,
1/10/2017,34.0,8.0,Cloudy
1/11/2017,-4.0,,Snow
1/12/2017,26.0,12.0,Sunny


In [20]:
weather_df[weather_df['temperature'] > 30]

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/7/2017,32.0,,Rain
1/10/2017,34.0,8.0,Cloudy
1/14/2017,40.0,,Sunny


In [22]:
weather_df[weather_df['event'] == "Rain"]

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/7/2017,32.0,,Rain


In [23]:
weather_df[weather_df['windspeed'] < 10]

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32.0,6.0,Rain
1/4/2017,,9.0,Sunny
1/5/2017,-1.0,8.0,Snow
1/6/2017,,7.0,
1/10/2017,34.0,8.0,Cloudy


In [24]:
car_details_df = pd.read_csv('car_details.csv')
car_details_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [27]:
maruti_800_ac_df = car_details_df[car_details_df['name'] == "Maruti 800 AC"]
maruti_800_ac_df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
13,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
175,Maruti 800 AC,2007,95000,100000,Petrol,Individual,Manual,Second Owner
259,Maruti 800 AC,2002,65000,100000,Petrol,Individual,Manual,Second Owner
372,Maruti 800 AC,2000,60000,40000,Petrol,Individual,Manual,Third Owner


In [29]:
maruti_800_ac_df['selling_price'].min()

40000

In [30]:
maruti_800_ac_df['selling_price'].max()

225000

In [31]:
maruti_800_ac_df['selling_price'].mean()

94347.82608695653

In [32]:
car_details_df[car_details_df['name'] == "Maruti 800 AC"]['selling_price'].mean()

94347.82608695653

In [33]:
car_details_df.loc[car_details_df['name'] == "Maruti 800 AC", "selling_price"].mean()

94347.82608695653

In [37]:
car_details_df.loc[car_details_df['year'] == 2012, 'selling_price'].min()

35000

In [38]:
car_details_df.loc[car_details_df['year'] == 2012, 'selling_price'].max()

2500000

In [39]:
car_details_df.loc[car_details_df['year'] == 2012, 'selling_price'].mean()

371628.8530120482

In [41]:
car_details_df.loc[car_details_df['selling_price'] > 100000, "name"].value_counts().head()

Maruti Swift Dzire VDI    69
Maruti Alto 800 LXI       55
Hyundai EON Era Plus      35
Maruti Swift VDI BSIV     29
Maruti Alto LXi           28
Name: name, dtype: int64

In [42]:
car_details_df.loc[car_details_df['selling_price'] < 100000, "name"].value_counts().head()

Maruti 800 AC            15
Tata Indica GLS BS IV    14
Maruti Alto LXi          12
Maruti Wagon R LXI       12
Hyundai Santro GS        12
Name: name, dtype: int64

In [43]:
car_details_df.loc[((car_details_df['selling_price'] < 80000) &
                   (car_details_df['owner'] == "Second Owner")),
                    "name"].value_counts().head()

Maruti Alto LXi          5
Tata Indica GLS BS IV    5
Maruti 800 AC            4
Maruti Wagon R LXI       4
Maruti Alto LXI          4
Name: name, dtype: int64

In [47]:
car_details_df.loc[((car_details_df['name'] == "Maruti Swift Dzire VDI") |
                   (car_details_df['name'] == "Maruti Alto 800 LXI"))].shape

(128, 8)

In [50]:
isinNamelist = ["Maruti Swift Dzire VDI", "Maruti Alto 800 LXI", "Maruti Alto LXi",            
"Maruti Alto LX", "Hyundai EON Era Plus", "Maruti Swift VDI BSIV",
 "Maruti Wagon R VXI BS IV", "Maruti Swift VDI", "Hyundai EON Magna Plus",
 "Maruti Wagon R LXI Minor"]

In [55]:
car_details_df['top10counts'] =0

In [59]:
car_details_df.loc[car_details_df['name'].isin(isinNamelist),
               "top10counts"] = 1

In [60]:
car_details_df['top10counts'].value_counts()

0    3962
1     378
Name: top10counts, dtype: int64

### Using Pandas Method To Create a Boolean Mask

<li>In the last couple lessons, we used Python boolean operators to create boolean masks to select subsets of data.</li>
    
<li>There are also a number of pandas methods that return boolean masks useful for exploring data.</li>

<li>Two examples are the Series.isnull() method and Series.notnull() method.</li>
<li>Series.isnull() method can be used to select either rows that contain null (or NaN) values for a certain column.</li>
<li>Similarly, Series.notnull() method is used to select rows that do not contain null values for a certain column.</li>

In [63]:
weather_df = pd.read_csv('weather_data_nan.csv')
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [69]:
weather_df[weather_df['temperature'].isnull()]

Unnamed: 0,day,temperature,windspeed,event
1,1/4/2017,,9.0,Sunny
3,1/6/2017,,7.0,
5,1/8/2017,,,Sunny
6,1/9/2017,,,


In [70]:
weather_df[weather_df['temperature'].notnull()]

Unnamed: 0,day,temperature,windspeed,event
1,1/4/2017,,9.0,Sunny
3,1/6/2017,,7.0,
5,1/8/2017,,,Sunny
6,1/9/2017,,,


In [71]:
weather_df[(weather_df['temperature'].notnull()) &
(weather_df['windspeed'].notnull()) &
(weather_df['event'].notnull())]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
7,1/10/2017,34.0,8.0,Cloudy
9,1/12/2017,26.0,12.0,Sunny
10,1/13/2017,12.0,12.0,Rainy
11,1/11/2017,-1.0,12.0,Snow


#### Question 1

<b><li>Read 'Fortune_1000.csv' file using pandas read_csv() method and store it in a variable named f1000.</li></b>
<b><li>Select the rank, revenues, and revenue_change columns in f1000. Then, use the DataFrame.head() method to select the first five rows. Assign the result to f1000_selection.</li></b>
<b><li>Select just the fifth row of the f1000 dataframe. Assign the result to fifth_row using iloc.</li></b>
<b><li>Select the value in first row of the company column. Assign the result to company_value.</li></b>
<b><li>Select the last three rows of the f1000 dataframe. Assign the result to last_three_rows.</li></b>
<b><li>Select the first to seventh rows and the first five columns of the f1000 dataframe. Assign the result to first_seventh_row_slice.</li></b>
<b><li>Use the Series.isnull() method to select all rows from f1000 that have a null value for the previous_rank column.</li></b>
<b><li>Select only the company, rank, and previous_rank columns where previous_rank column is null. Assign the result to null_previous_rank.</li></b>
<b><li>Use the Series.notnull() method to select all rows from f1000 that have a non-null value for the previous_rank column. Assign the result to previously_ranked</li></b>
<b><li>From the previously_ranked dataframe, subtract the rank column from the previous_rank column. Assign the result to rank_change.</li></b>
<b><li>Assign the values in the rank_change to a new column in the f1000 dataframe, "rank_change".</li></b>


#### Question 2
<b><li>Select all companies with revenues over 100 billion and negative profits from the f500 dataframe. The result should include all columns.</li></b>
<b><li>Create a boolean array that selects the companies with revenues greater than 100 billion. Assign the result to large_revenue.</li></b>
<b><li>Create a boolean array that selects the companies with profits less than 0. Assign the result to negative_profits.</li></b>
<b><li>Combine large_revenue and negative_profits. Assign the result to combined.</li></b>


#### Question 3
<b><li>Select all rows for companies whose country value is either Brazil or Venezuela. Assign the result to brazil_venezuela.</li></b>
<b><li>Select the first five companies in the Technology sector for which the country is not the USA from the f500 dataframe. Assign the result to tech_outside_usa.</li></b>

### String Manipulation In Pandas DataFrame

<li>String manipulation is the process of changing, parsing, splitting, 'cleaning' or analyzing strings.</li>
<li>As we know that sometimes, data in the string is not suitable for manipulating the analysis or get a description of the data.</li>
<li>But Python is known for its ability to manipulate strings.</li>
<li>Pandas provides us the ways to manipulate to modify and process string data-frame using some builtin functions.</li>
<li>Some of the most useful pandas string processing functions are as follows:</li>
<ol>
    <li><b>lower()</b></li>
    <li><b>upper()</b></li>
    <li><b>islower()</b></li>
    <li><b>isupper()</b></li>
    <li><b>isnumeric()</b></li>
    <li><b>strip()</b></li>
    <li><b>split()</b></li>
    <li><b>len()</b></li>
    <li><b>get_dummies()</b></li>
    <li><b>startswith()</b></li>
    <li><b>endswith()</b></li>
    <li><b>replace()</b></li>
    <li><b>contains()</b></li>
</ol>


#### 1. lower(): 
<li>It converts all uppercase characters in strings in the dataframe to lower case and returns the lowercase strings in the result.</li>


#### 2. upper():
<li>It converts all lowercase characters in strings in the dataframe to upper case and returns the uppercase strings in result.</li>


#### 3. islower(): 
<li>It checks whether all characters in each string in the Data-Frame is in lower case or not, and returns a Boolean value.</li>


#### 4. isupper(): 
<li>It checks whether all characters in each string in the Data-Frame is in upper case or not, and returns a Boolean value.</li>


#### 5. isnumeric():
<li>It checks whether all characters in each string in the Data-Frame are numeric or not, and returns a Boolean value.</li>


#### 6. strip():
<li>If there are spaces at the beginning or end of a string, we should trim the strings to eliminate spaces using strip() method.</li>
<li>It remove the extra spaces contained by a string in a DataFrame.</li>


#### 7. split(‘ ‘):
<li>It splits each string with the given pattern.</li>
<li>Strings are split and the new elements after the performed split operation, are stored in a list.</li>


#### 8. len():
<li>With the help of len() we can compute the length of each string in DataFrame.</li>
<li>If there is empty data in a DataFrame, it returns NaN.</li>


#### 9. get_dummies(): 
<li>It returns the DataFrame with One-Hot Encoded values like we can see that it returns boolean value 1 if it exists in relative index or 0 if not exists.</li>


#### 10. startswith(pattern):
<li>It returns true if the element or string in the DataFrame Index starts with the pattern.</li>
<li>If you wanted to filter out rows that startswith 'ind' then you can specify df[df[col].str.startswith('ind')</li>


#### 11. endswith(pattern):
<li>It returns true if the element or string in the DataFrame Index ends with the pattern.</li>
<li>If you wanted to filter out rows that ends with 'es' then you can specify df[df[col].str.endswith('es')</li>


#### 12. replace(a,b):
<li>It replaces the value a with the value b.</li>
<li>If you wanted to remove white space characters then you can use replace() method as:</li>
<code>
df[col_name].str.replace(" ", "")
</code>


#### 13. contains():
<li>contains() method checks whether the string contains a particular substring or not.</li>
<li>The function is quite similar to replace() but instead of replacing the string itself it just returns the boolean value True or False.</li>
<li>If a substring is present in a string, then it returns boolean value True else False.</li>



#### Handling Missing Values
<li>We can use fillna() method in pandas to fill missing values using different ways.</li>
<li>We can use interpolation method to make a guess on missing values.</li>
<li>We can use dropna() method to drop rows with missing values.</li>
<li>We can also fill missing values with the mean value, median value or the mode value depending on the values of columns.</li>
<li>Filling missing values with mean is appropriate when the column has continuous values.</li>
<li>If the data is categorical then filling missing values with median and mode is a good idea.</li>

#### fillna(method = 'ffill')

#### fillna(method = 'bfill')