### 🧠 What is pandas?
pandas is an open-source Python library that provides high-performance, easy-to-use data structures and data analysis tools.

It is mainly built around two key data structures:
- Series – 1D labeled array (like a list with index)

- DataFrame – 2D labeled table (like an Excel spreadsheet or SQL table)

- Creating a Series

 data = [10, 20, 30]

In [1]:
import pandas as pd

data = [10,20,30]
data = pd.Series(data)
print (data)

0    10
1    20
2    30
dtype: int64


-You can also create a Series with custom labels:


In [4]:
data = [10,20,30]
data = pd.Series(data, index = ["A", "B", "C"])

In [5]:
data

A    10
B    20
C    30
dtype: int64

### Create a Pandas Series using the list [10, 20, 30, 40, 50] and print it.

In [6]:
data_1 = [10,20,30,40,50]

data_1 = pd.Series(data_1)

In [7]:
data_1

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [10]:
data_2 = [100,200,300]

data_2 = pd.Series(data_2, index = ["A", "B", "C"])
print (data_2)

A    100
B    200
C    300
dtype: int64


In [11]:
dict_1 = {'apple': 3, 'banana': 5, 'cherry': 2}

data = pd.Series(dict_1)

In [12]:
data

apple     3
banana    5
cherry    2
dtype: int64

### Given a Series of numbers from 1 to 5, square each element using .apply() or vectorized operations.


In [17]:
data = [1,2,3,4,5]

def square_1(number):
    return number**2

data = pd.Series(data)

In [18]:
data.apply(square_1)

0     1
1     4
2     9
3    16
4    25
dtype: int64

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv(r"C:\Users\kanha\Desktop\Python_work\Python_practice_d\tips.csv")

In [3]:
data.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [4]:
## Show all rows where the bill was greater than ₹20.

data[data["total_bill"]>20].shape



(97, 11)

In [5]:
### Filter all dinners where the tip was more than ₹3 and the party size was 2.

data[(data['tip']>3) & (data['size']==2)].shape

(44, 11)

In [6]:
### 3. Get all rows where the payer is female and not a smoker
data[(data["sex"] =="Female") & (data["smoker"] =="No") ].shape

(54, 11)

In [7]:
### Create a new column tip_percent = (tip / total_bill) * 100.

data["tip_percent"] = round((data['tip']/ data["total_bill"])*100,4)


In [8]:
data['tip_percent']

0       5.9447
1      16.0542
2      16.6587
3      13.9780
4      14.6808
        ...   
239    20.3927
240     7.3584
241     8.8222
242     9.8204
243    15.9744
Name: tip_percent, Length: 244, dtype: float64

In [9]:
## Round the price_per_person column to 2 decimal placess
data["price_per_person"]= data["price_per_person"].round(2)

In [10]:
data['price_per_person']

0       8.49
1       3.45
2       7.00
3      11.84
4       6.15
       ...  
239     9.68
240    13.59
241    11.34
242     8.91
243     9.39
Name: price_per_person, Length: 244, dtype: float64

In [11]:
##  Extract the first name of each Payer Name into a new column.
sent = "Kashish Vaid"
sent.split()[0]


'Kashish'

In [12]:
data['first_name'] = data["Payer Name"].apply(lambda x : x.split()[0])

In [13]:
data['first_name']

0        Christy
1        Douglas
2         Travis
3      Nathaniel
4          Tonya
         ...    
239      Michael
240       Monica
241        Keith
242       Dennis
243     Michelle
Name: first_name, Length: 244, dtype: object

In [14]:
### 7. What is the average tip given by males vs. females?

data.groupby("sex")["tip"].mean()

sex
Female    2.833448
Male      3.089618
Name: tip, dtype: float64

In [15]:
data.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'Payment ID',
       'tip_percent', 'first_name'],
      dtype='object')

In [18]:
### Group by day and get the total bill collected on each day.
data.groupby("day")["total_bill"].sum().sort_values(ascending= False)

day
Sat     1778.40
Sun     1627.16
Thur    1096.33
Fri      325.88
Name: total_bill, dtype: float64

In [19]:
###  Group by time and calculate average price_per_person.

data.groupby("time")["price_per_person"].mean()

time
Dinner    8.109205
Lunch     7.316176
Name: price_per_person, dtype: float64

In [25]:
## Which day has the highest total tip amount?
data.groupby("day")["tip"].sum().sort_values(ascending= False)

day
Sat     260.40
Sun     247.39
Thur    171.83
Fri      51.96
Name: tip, dtype: float64

In [28]:
##🔹 Sorting & Indexing
##Sort the dataframe by tip in descending order

data.sort_values(by="tip", ascending= False)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percent,first_name
170,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,19.6812,Gregory
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,18.6220,Alex
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,19.2288,Lance
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,13.9424,Brian
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,19.5335,Steven
...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.9447,Christy
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032,7.9365,Matthew
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,13.7931,Terri
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,32.5733,Tiffany


In [29]:
 ## Sort by day, then by total_bill in ascending order.

data.sort_values(['tip', 'total_bill'], ascending= True)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percent,first_name
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,32.5733,Tiffany
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,17.3913,Leah
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,13.7931,Terri
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032,7.9365,Matthew
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.9447,Christy
...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,19.5335,Steven
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,13.9424,Brian
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,19.2288,Lance
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,18.6220,Alex


In [31]:
## . Set Payment ID as the index of the dataframe.

data.set_index("Payment ID", inplace= True)

In [32]:
data.head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percent,first_name
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.9447,Christy
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.0542,Douglas
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,16.6587,Travis
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978,Nathaniel
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.6808,Tonya


In [36]:
## How many unique credit card numbers are there?

data["CC Number"].nunique()

244

In [37]:
## Count how many people dined for lunch vs. dinner.
data['time'].value_counts()

time
Dinner    176
Lunch      68
Name: count, dtype: int64

In [38]:
data.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'tip_percent',
       'first_name'],
      dtype='object')

In [39]:
## Who paid the highest tip per person? Show their name, tip, size, and tip per person.
##(Hint: Create a tip_per_person column and sort)

data['tip_per_person'] = data['tip']/data['size']


In [44]:
data.sort_values("tip_per_person", ascending= False)[["Payer Name","tip", 'size', "tip_per_person"]].head(1)

Unnamed: 0_level_0,Payer Name,tip,size,tip_per_person
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sat1954,Gregory Clark,10.0,3,3.333333


In [45]:
##For each day, what is the average tip_percent given by smokers vs non-smokers?
## (Hint: Use groupby on both day and smoker)

data.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'tip_percent',
       'first_name', 'tip_per_person'],
      dtype='object')

In [48]:
data['tip_percent'] = (data['tip']/data['total_bill'])*100

data.groupby(["smoker", "day"])["tip_percent"].mean()

smoker  day 
No      Fri     15.165044
        Sat     15.804766
        Sun     16.011294
        Thur    16.029808
Yes     Fri     17.478305
        Sat     14.790607
        Sun     18.725032
        Thur    16.386327
Name: tip_percent, dtype: float64

In [58]:
##  Find the top 3 payers who spent the most after including tip (i.e., total_bill + tip).
## (Hint: Add new column total_spent and sort)

data['total_spent'] = data["total_bill"] + data["tip"]

data[["Payer Name", "total_spent"]].sort_values(by ="total_spent", ascending= False).head(3)

Unnamed: 0_level_0,Payer Name,total_spent
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1
Sat1954,Gregory Clark,60.81
Sat4590,Alex Williamson,57.33
Sat8139,Brian Ortiz,55.0


In [63]:
## 4. How many males and females paid more than ₹6 per person on average?
## (Hint: Use price_per_person and sex columns

filtered_data = data[data['price_per_person']>6]

filtered_data['sex'].value_counts()

sex
Male      116
Female     61
Name: count, dtype: int64

In [70]:
#5. For each day, find the number of unique customers who paid with a Visa card.
#(Hint: Visa starts with '4', use string filtering on CC Number)

data['CC Number'] = data['CC Number'].astype(str)
visa_data = data[data['CC Number'].str.startswith("4")]

visa_data.groupby("day")["Payer Name"].nunique()

day
Fri      2
Sat     21
Sun     25
Thur    22
Name: Payer Name, dtype: int64

In [78]:
## Which day had the highest average tip per person?

data.groupby("day")["tip_per_person"].mean().idxmax()

'Fri'

In [80]:
###  Create a new column meal_label that labels rows as 'Light' if total_bill < 15, 'Medium' if between 15–25, and 'Heavy' if more than 25.
import numpy as np

conditions = [data['total_bill'] < 15,
    (data['total_bill'] >= 15) & (data['total_bill'] <= 25),
    data['total_bill'] > 25]


choices = ['Light', 'Medium', 'Heavy']

data['meal_label'] = np.select(conditions, choices)

In [82]:
### What’s the average price_per_person for each combination of day and time (i.e., Lunch/Dinner)?
## (Hint: Use multi-index groupby)

data.groupby(['day', 'time'])['price_per_person'].mean()




day   time  
Fri   Dinner    8.995000
      Lunch     6.655714
Sat   Dinner    8.186782
Sun   Dinner    7.863684
Thur  Dinner    9.390000
      Lunch     7.391967
Name: price_per_person, dtype: float64