# Data Analysis with Pandas: Problems 01

This document contains a series of data analysis problems to be solved using the Pandas library in Python. The problems are based on a real-world use case involving daily operational data. A dataset has been provided for this assignment.

## The Use Case: Analyzing Daily Operations

A business owner, Ms. Kavita, wants to analyze the daily sales data to better understand customer behavior and product performance. The data is available in a CSV file named sales_data.csv. Your task is to use Pandas to answer her questions. The dataset contains records of orders with details like customer name, product, quantity, unit price, and the date of the order.

## Instructions

For each problem, write and execute the Python code using Pandas. The problems are designed to be solved sequentially. You can load the data once and use the same DataFrame for all questions.

### Problem 1: Data Loading and Initial Inspection

Your first step is to load the provided CSV file into a Pandas DataFrame and perform an initial check to ensure the data is loaded correctly. This is a crucial first step in any data analysis workflow.

Write Python code to:

* Load the sales_data.csv file into a DataFrame.
* Display the first 5 rows to get a quick look at the data structure.
* Display the last 5 rows to see the end of the data.
* Print a concise summary of the DataFrame, including the data types of each column and the number of non-null values.

In [109]:
import os
os.getcwd()

'C:\\Users\\home\\Desktop\\FSD\\Assignments\\Fahimida Begum - 23_16_Aug_2025_Data_Analysis_with_Pandas_assignment_01'

In [110]:
os.listdir()

['.ipynb_checkpoints',
 'Data_Analysis_with_Pandas_Problems_01.ipynb',
 'Data_Analysis_with_Pandas_Problems_01.pdf',
 'sales_data.csv',
 'week_of_sales.csv']

In [3]:
import pandas as pd
#Load the sales_data.csv file into a DataFrame
df_sales=pd.read_csv('sales_data.csv')
#Display the first 5 rows to get a quick look at the data structure.
print("Display the first 5 rows:")
print(df_sales.head())
#Display the last 5 rows to see the end of the data.
print("Display the last 5 rows:")
print(df_sales.tail())
#concise summary of the DataFrame
print("Concise summary of the DataFrame")
print(df_sales.info())

Display the first 5 rows:
   order_id customer_name product_id       product_name  quantity  \
0       101         Aarav    PROD004  Pistachio Delight         2   
1       102          Siya    PROD004   Strawberry Swirl         3   
2       103         Kiran    PROD004   Strawberry Swirl         4   
3       104         Priya    PROD001     Chocolate Chip         1   
4       105         Mohan    PROD004   Strawberry Swirl         1   

   unit_price_inr  order_date  total_price_inr  
0           152.0  2025-07-01              304  
1           193.0  2025-07-02              579  
2           226.0  2025-07-03              904  
3           138.0  2025-07-04              138  
4           177.0  2025-07-05              177  
Display the last 5 rows:
    order_id customer_name product_id       product_name  quantity  \
95       196         Aarav    PROD002   Strawberry Swirl         2   
96       197          Siya    PROD001   Strawberry Swirl         1   
97       198         Kiran    

### Problem 2: Basic Descriptive Analysis

Ms. Kavita wants to get a general overview of the dataset. Use basic Pandas functions to get a high-level summary of the data.

Write Python code to:

* Calculate the total number of orders.
* Find the total quantity of products sold.
* Calculate the total revenue (sum of 'total_price_inr').
* Find the number of unique products sold.
* Determine how many times each unique product was sold.

In [4]:
# Calculate total number of orders
total_orders=df_sales['order_id'].count()
print(f"Total number of orders:{total_orders}")
#  Find total quantity of products sold
total_quantity=df_sales['quantity'].sum()
print(f"Total quantity of product sold:{total_quantity}")
# Calculate total revenue 
total_revenue=df_sales['total_price_inr'].sum()
print(f"Total Revenue is:{total_revenue}")
# Number of unique products sold
unique_products=df_sales['product_name'].nunique()
print(f"The number of unique products sold:{unique_products}")
#Determine how many times each unique product was sold
unique_product_sales_count=df_sales['product_name'].value_counts()
print("No.of times each unique product was sold")
print(unique_product_sales_count)

Total number of orders:100
Total quantity of product sold:273
Total Revenue is:43585
The number of unique products sold:5
No.of times each unique product was sold
product_name
Strawberry Swirl     27
Chocolate Chip       22
Vanilla Dream        20
Pistachio Delight    16
Mango Medley         15
Name: count, dtype: int64


### Problem 3: Answering Specific Business Questions with Filtering and Grouping

Ms. Kavita has some specific questions about her operations. Use filtering and grouping techniques to find the answers.

Write Python code to:

* Filter the DataFrame to show all orders made by 'Aarav'.
* Find the total revenue from 'Aarav's orders.
* Identify the product that generated the most revenue.
* Calculate the average order value for each unique customer.
* Sort the data to show the top 5 orders by revenue, from highest to lowest.

In [5]:
#Filter the DataFrame to show all orders made by 'Aarav'
aarav_orders=df_sales[df_sales['customer_name']=='Aarav']
print("All orders made by Aarav:")
print(aarav_orders)
#Total-revenue from Aarav's orders
print("Total revenue from Aarav's orders")
aarav_total_revenue=aarav_orders['total_price_inr'].sum()
print(aarav_total_revenue)
# Identify the product that generated the most revenue
print("Product that generate the most revenue")
product_revenue=df_sales.groupby('product_name')['total_price_inr'].sum().sort_values(ascending=False)
print(product_revenue)
#Calculate the average order value for each unique customer.
print("The average order value for each unique customer")
average_order_value=df_sales.groupby(['customer_name'])['total_price_inr'].mean()
print(average_order_value)
#Sort the data to show the top 5 orders by revenue, from highest to lowest.
high_revenue=df_sales.groupby(['order_id','customer_name','order_id','product_name'])['total_price_inr'].sum()
print("The top 5 orders by revenue after sorting:")
orders_by_revenue=high_revenue.sort_values(ascending=False)
print(orders_by_revenue.head())

All orders made by Aarav:
    order_id customer_name product_id       product_name  quantity  \
0        101         Aarav    PROD004  Pistachio Delight         2   
5        106         Aarav    PROD002      Vanilla Dream         1   
15       116         Aarav    PROD005     Chocolate Chip         2   
20       121         Aarav    PROD001  Pistachio Delight         3   
25       126         Aarav    PROD005     Chocolate Chip         3   
30       131         Aarav    PROD005   Strawberry Swirl         4   
35       136         Aarav    PROD001   Strawberry Swirl         2   
40       141         Aarav    PROD004  Pistachio Delight         4   
45       146         Aarav    PROD003       Mango Medley         4   
50       151         Aarav    PROD004     Chocolate Chip         4   
55       156         Aarav    PROD004     Chocolate Chip         2   
60       161         Aarav    PROD001   Strawberry Swirl         2   
65       166         Aarav    PROD002     Chocolate Chip        

### Problem 4: Combining DataFrames (Concatenation) and Time-Series data

Ms. Kavita has a new dataset representing an additional week of sales data. She also wants to analyze sales trends over time.

Write Python code to:

* Create a new DataFrame for an additional week of sales. Make sure its columns match the original DataFrame.
  The data to be used is presented below :
   `'order_id': ['201', '202', '203'],
    'customer_name': ['Rahul', 'Ananya', 'Aarav'],
    'product_id': ['PROD001', 'PROD005', 'PROD002'],
    'product_name': ['Chocolate Chip', 'Vanilla Dream', 'Mango Medley'],
    'quantity': [2, 3, 1],
    'unit_price_inr': [120, 180, 150],
    'order_date': ['2025-08-01', '2025-08-02', '2025-08-03'],
    'total_price_inr': [240, 540, 150`
* Concatenate the new DataFrame with the original one.
* Convert the 'order_date' column to a proper datetime format if not already done.
* Calculate the total daily revenue over the entire period.
* Find the day of the week with the highest sales on average.

Hint: The new DataFrame can be small, for example, 5 rows. You can create it manually using pd.DataFrame().

In [2]:
import pandas as pd
week_sales={
    'order_id':['201','202','203'],
    'customer_name':['Rahul','Ananya','Aarav'],
    'product_id':['PROD001','PROD005','PROD002'],
    'product_name':['Chocolate Chip','Vanilla Dream','Mango Medely'],
    'quantity':[2,3,1],
    'unit_price_inr':[120,180,150],
    'order_date':['2025-08-01','2025-08-02','2025-08-03'],
    'total_price_inr':[240,540,150]
}
df_week_sales=pd.DataFrame(week_sales)
week_of_sales=df_week_sales.to_csv("week_of_sales.csv",index=False)
week_of_sales=pd.read_csv("week_of_sales.csv")
week_of_sales

Unnamed: 0,order_id,customer_name,product_id,product_name,quantity,unit_price_inr,order_date,total_price_inr
0,201,Rahul,PROD001,Chocolate Chip,2,120,2025-08-01,240
1,202,Ananya,PROD005,Vanilla Dream,3,180,2025-08-02,540
2,203,Aarav,PROD002,Mango Medely,1,150,2025-08-03,150


In [3]:
#Concatenate the new DataFrame with the original one
df_sales=pd.read_csv('sales_data.csv')
df_week_sales=pd.read_csv('week_of_sales.csv')
combined_sales=pd.concat([df_sales,week_of_sales],ignore_index=True)
print(combined_sales)

     order_id customer_name product_id       product_name  quantity  \
0         101         Aarav    PROD004  Pistachio Delight         2   
1         102          Siya    PROD004   Strawberry Swirl         3   
2         103         Kiran    PROD004   Strawberry Swirl         4   
3         104         Priya    PROD001     Chocolate Chip         1   
4         105         Mohan    PROD004   Strawberry Swirl         1   
..        ...           ...        ...                ...       ...   
98        199         Priya    PROD003  Pistachio Delight         3   
99        200         Mohan    PROD005      Vanilla Dream         2   
100       201         Rahul    PROD001     Chocolate Chip         2   
101       202        Ananya    PROD005      Vanilla Dream         3   
102       203         Aarav    PROD002       Mango Medely         1   

     unit_price_inr  order_date  total_price_inr  
0             152.0  2025-07-01              304  
1             193.0  2025-07-02              

In [4]:
#Convert the 'order_date' column to a proper datetime format if not already done.
combined_sales["order_date"]=pd.to_datetime(combined_sales["order_date"])
combined_sales.head()

Unnamed: 0,order_id,customer_name,product_id,product_name,quantity,unit_price_inr,order_date,total_price_inr
0,101,Aarav,PROD004,Pistachio Delight,2,152.0,2025-07-01,304
1,102,Siya,PROD004,Strawberry Swirl,3,193.0,2025-07-02,579
2,103,Kiran,PROD004,Strawberry Swirl,4,226.0,2025-07-03,904
3,104,Priya,PROD001,Chocolate Chip,1,138.0,2025-07-04,138
4,105,Mohan,PROD004,Strawberry Swirl,1,177.0,2025-07-05,177


In [5]:
#Calculate the total daily revenue over the entire period
total_daily_revenue=combined_sales.groupby(['order_date','customer_name'])['total_price_inr'].sum()
total_daily_revenue.head()

order_date  customer_name
2025-07-01  Aarav            304
2025-07-02  Siya             579
2025-07-03  Kiran            904
2025-07-04  Priya            138
2025-07-05  Mohan            177
Name: total_price_inr, dtype: int64

In [26]:
#Find the day of the week with the highest sales on average
combined_sales['day_of_week']=combined_sales['order_date'].dt.day_name()
print(combined_sales)
average_sales=combined_sales.groupby('day_of_week')['total_price_inr'].mean().sort_values(ascending=False)
day_of_highest_sales=average_sales.idxmax()
print(f"Day with highest averag_sales:{day_of_highest_sales}")
highest_sales=average_sales.max()
print(f"Highest sales on average:{highest_sales}")


     order_id customer_name product_id       product_name  quantity  \
0         101         Aarav    PROD004  Pistachio Delight         2   
1         102          Siya    PROD004   Strawberry Swirl         3   
2         103         Kiran    PROD004   Strawberry Swirl         4   
3         104         Priya    PROD001     Chocolate Chip         1   
4         105         Mohan    PROD004   Strawberry Swirl         1   
..        ...           ...        ...                ...       ...   
98        199         Priya    PROD003  Pistachio Delight         3   
99        200         Mohan    PROD005      Vanilla Dream         2   
100       201         Rahul    PROD001     Chocolate Chip         2   
101       202        Ananya    PROD005      Vanilla Dream         3   
102       203         Aarav    PROD002       Mango Medely         1   

     unit_price_inr order_date  total_price_inr day_of_week  
0             152.0 2025-07-01              304     Tuesday  
1             193.0 202