# DataFrame in Pandas

## Definition
A **DataFrame** in Pandas is a two-dimensional labeled data structure that resembles a table in a database or a spreadsheet. It consists of rows and columns, where:
- Each column can have a different data type (e.g., integers, floats, strings).
- Rows and columns are labeled, providing easy data access and manipulation.

---

## Features of a DataFrame

1. **Two-Dimensional Structure**  
   - Rows represent individual observations.  
   - Columns represent variables or features.


2. **Heterogeneous Data**  
   - Each column can store data of different types.


3. **Labeling**  
   - Rows have an index, and columns have labels, making it easy to locate and manipulate data.


4. **Flexible Operations**  
   - Supports filtering, grouping, merging, reshaping, and many more operations.

---

## Why Use a DataFrame?

### 1. Data Organization
- Structures complex data in a tabular format for easier analysis.

### 2. Flexible Indexing
- Rows and columns are easily accessible by labels or positions.

### 3. Powerful Tools
- Built-in functions for data cleaning, analysis, and visualization.

### 4. Integration
- Seamlessly integrates with libraries like NumPy, Matplotlib, and Scikit-learn.

---

## Practical Applications of DataFrames

### 1. Data Cleaning
- Handle missing or inconsistent data.

### 2. Data Analysis
- Perform calculations, summaries, and visualizations.

### 3. Data Transformation
- Reshape, filter, and merge datasets.

### 4. Preprocessing for Machine Learning
- Prepare structured data for training ML models.

---


In [1]:
import numpy as np 
import pandas as pd 

# Creating a DataFrame


## 1. Creating a DataFrame Using the Numpy Array

In [2]:
np.random.seed(101)
mydata = np.random.randint(0,101,size=(4,3))

In [3]:
mydata

array([[95, 11, 81],
       [70, 63, 87],
       [75,  9, 77],
       [40,  4, 63]])

In [4]:
my_index = ['CA', 'NY', 'AZ', 'TX']


In [5]:
mycols = ['Jan', 'Feb', 'Mar']

In [6]:
df = pd.DataFrame(mydata)
df

Unnamed: 0,0,1,2
0,95,11,81
1,70,63,87
2,75,9,77
3,40,4,63


In [7]:
df = pd.DataFrame(data=mydata, index=my_index)
df

Unnamed: 0,0,1,2
CA,95,11,81
NY,70,63,87
AZ,75,9,77
TX,40,4,63


In [8]:
df = pd.DataFrame(mydata, index=my_index, columns=mycols)
df

Unnamed: 0,Jan,Feb,Mar
CA,95,11,81
NY,70,63,87
AZ,75,9,77
TX,40,4,63


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, CA to TX
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Jan     4 non-null      int32
 1   Feb     4 non-null      int32
 2   Mar     4 non-null      int32
dtypes: int32(3)
memory usage: 80.0+ bytes


## 2. Creating the DataFrame Using the Dictionary 

In [10]:
my_dict = {'Name': ['Alice', 'Bob', 'John'], 
           'Age': [40, 20, 30], 
           'Salary': [1000, 100, 200]}

my_dict

{'Name': ['Alice', 'Bob', 'John'],
 'Age': [40, 20, 30],
 'Salary': [1000, 100, 200]}

- The **keys** inside the dictionary get converted into **column labels** in the DataFrame.
- The corresponding **values** are arranged as **rows under their respective columns**. 
- Each `Key represents a column`, and its associated list of `values represents the data for that column`.

In [11]:
df = pd.DataFrame(my_dict)
df

Unnamed: 0,Name,Age,Salary
0,Alice,40,1000
1,Bob,20,100
2,John,30,200


## 3. Creating the DataFrame Using the List

In [12]:
data = [['Alice', 20, 1000], ['Bob', 30, 200], ['John', 25, 2500]]
data

[['Alice', 20, 1000], ['Bob', 30, 200], ['John', 25, 2500]]

In [13]:
my_data = pd.DataFrame(data)
my_data

Unnamed: 0,0,1,2
0,Alice,20,1000
1,Bob,30,200
2,John,25,2500


In [14]:
my_data = pd.DataFrame(data, columns=['Name', 'Age', 'Salary'])
my_data

Unnamed: 0,Name,Age,Salary
0,Alice,20,1000
1,Bob,30,200
2,John,25,2500


# Reading the CSV file data 

In [15]:
df = pd.read_csv('tips.csv')
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


 `df.head()`
- Displays the first 5 rows of the DataFrame by default.
- Useful for quickly previewing the data at the beginning of the dataset.

In [16]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


`df.tail()`
- Displays the last 5 rows of the DataFrame by default.
- Useful for quickly previewing the data at the end of the dataset.

In [17]:
df.tail()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.0,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17
243,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672


`df.columns`
- Returns the column labels of the DataFrame as an Index object.
- Useful for checking the names of all the columns in the DataFrame.

In [18]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'Payment ID'],
      dtype='object')

`df.index`
- Returns the row labels (index) of the DataFrame as an Index object.
- Useful for understanding or manipulating the row indexing.


In [19]:
df.index

RangeIndex(start=0, stop=244, step=1)

`df.info()`
- Provides a concise summary of the DataFrame.
- Includes details like the number of rows, columns, data types, and memory usage.
- Useful for understanding the structure and composition of the dataset.

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   sex               244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


`df.describe()`
- Provides a summary of statistical information for numerical columns in the DataFrame.
- Includes metrics like mean, standard deviation, minimum, maximum, and percentiles.

In [21]:
df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


In [22]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.78594,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9510998,1.0,2.0,2.0,3.0,6.0
price_per_person,244.0,7.888197,2.914234,2.88,5.8,7.255,9.39,20.27
CC Number,244.0,2563496000000000.0,2369340000000000.0,60406790000.0,30407310000000.0,3525318000000000.0,4553675000000000.0,6596454000000000.0


In [23]:
df_1 = ['total_bill','tip', 'size', 'price_per_person']
selected_df = df[df_1]
selected_df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person
count,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197
std,8.902412,1.383638,0.9511,2.914234
min,3.07,1.0,1.0,2.88
25%,13.3475,2.0,2.0,5.8
50%,17.795,2.9,2.0,7.255
75%,24.1275,3.5625,3.0,9.39
max,50.81,10.0,6.0,20.27


In [24]:
df[['total_bill','tip', 'size', 'price_per_person']].describe()

Unnamed: 0,total_bill,tip,size,price_per_person
count,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197
std,8.902412,1.383638,0.9511,2.914234
min,3.07,1.0,1.0,2.88
25%,13.3475,2.0,2.0,5.8
50%,17.795,2.9,2.0,7.255
75%,24.1275,3.5625,3.0,9.39
max,50.81,10.0,6.0,20.27


# Working With Columns

### 1.1 Adding a new column 

In [25]:
# Adding a new columns to the existing table
df['tip_percent'] = 100 * df['tip'] / df['total_bill']

In [26]:
df.head(3)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percent
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,16.658734


In [27]:
# Adding a new coulmn to the existing table 
df['price_per_each_person'] = round(df['total_bill'] / df['size'], 2)

In [28]:
df.head(2)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percent,price_per_each_person
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673,8.49
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159,3.45


### 1.2 Droping the exiting columns from the table

In [29]:
df.drop(['tip_percent', 'price_per_each_person'], axis=1) # Drop the columns but it is not inplace bydefalut 

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


In [30]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percent,price_per_each_person
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673,8.49
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159,3.45
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734,7.00
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041,11.84
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765,6.15
...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,20.392697,9.68
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766,7.358352,13.59
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880,8.822232,11.34
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17,9.820426,8.91


In [31]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'Payment ID',
       'tip_percent', 'price_per_each_person'],
      dtype='object')

In [32]:
df.drop(['tip_percent', 'price_per_each_person'], axis=1, inplace=True)

In [33]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


# Working With Rows

In [34]:
df.head(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [35]:
df.index

RangeIndex(start=0, stop=244, step=1)

# Setting the column as a Index
---

## `set_index()` in Pandas

The `set_index()` method in Pandas is used to set one or more columns of a DataFrame as its index.

`inplace:` **Default: False.** If True, the operation modifies the original DataFrame instead of creating a new one.

---

In [38]:
df.set_index('Payment ID', inplace=True)


# In the below table we can see that now the payment ID is not treated as the column but as an index 

In [39]:
df

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
Sun4458,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221
...,...,...,...,...,...,...,...,...,...,...
Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842
Sat1766,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404
Sat3880,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950


# Setting the index as a column

# `reset_index()` in Pandas

The `reset_index()` method in Pandas is used to reset the index of a DataFrame, converting the existing index into a column.

---

## **Key Features**
- The method **does not require any arguments** to work.
- It **converts the current index into a column** of the DataFrame.
- By default, it creates a new default index (0, 1, 2, …).

## inplace:
- Default: False. If True, modifies the original DataFrame instead of returning a new one
---

In [42]:
df.reset_index(inplace=True)

In [43]:
df.head()

Unnamed: 0,Payment ID,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
0,Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
1,Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
2,Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322
3,Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994
4,Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221


In [45]:
df = df.set_index('Payment ID')

In [46]:
df.head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221


# Comparison of `iloc` vs `loc` in Pandas

| Feature              | `iloc`                                              | `loc`                                         |
|-----------------------|-----------------------------------------------------|-----------------------------------------------|
| **Purpose**          | Selects data by **integer-based positions**.         | Selects data by **label-based indexing**.     |
| **Syntax**           | `DataFrame.iloc[row_index, col_index]`               | `DataFrame.loc[row_label, col_label]`         |
| **Input Type**       | Accepts integer indices (0, 1, 2, ...).              | Accepts labels (row/column names).            |
| **Slicing**          | Slicing is exclusive for the endpoint.               | Slicing is inclusive for the endpoint.        |
| **Use Case**         | When working with **numeric positions** in rows/cols.| When working with **explicit labels**.        |
| **Error Handling**   | Throws an error if integer position is invalid.      | Throws an error if the label is not found.    |
| **Example**          |                                                     |                                               |
| **Single Row**       | `df.iloc[2]` (selects 3rd row by position)           | `df.loc['row3']` (selects row with label `row3`) |
| **Single Cell**      | `df.iloc[2, 1]` (3rd row, 2nd column by position)    | `df.loc['row3', 'col2']` (row `row3`, col `col2` by label) |
| **Row Slice**        | `df.iloc[0:3]` (first 3 rows, exclusive)             | `df.loc['row1':'row3']` (rows `row1` to `row3`, inclusive) |

---

In [47]:
df.iloc[0] # Select the data by based on the index postion 

total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
time                            Dinner
size                                 2
price_per_person                  8.49
Payer Name          Christy Cunningham
CC Number             3560325168603410
Name: Sun2959, dtype: object

In [48]:
df.loc['Sun2959'] # Select the data based on the label based indexing 

total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
time                            Dinner
size                                 2
price_per_person                  8.49
Payer Name          Christy Cunningham
CC Number             3560325168603410
Name: Sun2959, dtype: object

In [51]:
# Grabing multiple row using the loc required us to pass the label index name as a list 
df.loc[['Sun2959', 'Sun4608', 'Sun4458']]

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322


### Grabing Multiple rows using `iloc` using the slicing syntax

In [49]:
df.iloc[0:4] # Selecting the data for the first 3 rows

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994


# Drop the existing row 

## `drop()` in Pandas

The `drop()` method in Pandas is used to remove rows or columns from a DataFrame based on labels.

## **Key Points**
- Use `axis=0` to drop rows and `axis=1` to drop columns.
- Specify `inplace=True` to directly modify the DataFrame.
- Use `errors='ignore'` to avoid errors when labels are not found.

---

In [53]:
df.drop(['Sun2959', 'Sun4608', 'Sun4458'], axis=0)

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221
Sun9679,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882
Sun5985,8.77,2.00,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344
Sun8157,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092
...,...,...,...,...,...,...,...,...,...,...
Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842
Sat1766,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404
Sat3880,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950


In [54]:
df.head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221
