### PROJECT: SALES DATA ANALYSIS

## Project Objective
Analyze a simple sales dataset to:

Find total sales (revenue)

Identify the best-selling product

Create a clean, readable report with insights

### DAY 1: Setup & Load Data

Load Dataset

In [10]:
import pandas as pd

df = pd.read_csv("sales_data.csv")

### DAY 2: Explore the Data

In [12]:
print(df.head())

         Date     Product  Quantity  Price Customer_ID Region  Total_Sales
0  2024-01-01       Phone         7  37300     CUST001   East       261100
1  2024-01-02  Headphones         4  15406     CUST002  North        61624
2  2024-01-03       Phone         2  21746     CUST003   West        43492
3  2024-01-04  Headphones         1  30895     CUST004   East        30895
4  2024-01-05      Laptop         8  39835     CUST005  North       318680


### Check Shape (Rows, Columns)

In [14]:
print(df.shape)

(100, 7)


### Check Column Names & Data Types

In [16]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Date         100 non-null    object
 1   Product      100 non-null    object
 2   Quantity     100 non-null    int64 
 3   Price        100 non-null    int64 
 4   Customer_ID  100 non-null    object
 5   Region       100 non-null    object
 6   Total_Sales  100 non-null    int64 
dtypes: int64(3), object(4)
memory usage: 5.6+ KB
None


### DAY 3: Clean the Data
Handle Missing Values

In [18]:
df["Quantity"].fillna(df["Quantity"].mean(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Quantity"].fillna(df["Quantity"].mean(), inplace=True)


### Remove Duplicate Rows

In [21]:
df.drop_duplicates(inplace=True)

### DAY 4: Analyze Sales Data
Total Sales

In [25]:
df["Total_Sales"] = df["Quantity"] * df["Price"]

In [27]:
df["Total_Sales"] 

0     261100
1      61624
2      43492
3      30895
4     318680
       ...  
95    166160
96      7647
97    135980
98     30717
99    116880
Name: Total_Sales, Length: 100, dtype: int64

In [29]:
#Total Revenue
total_revenue = df["Total_Sales"].sum()

In [31]:
## Best-Selling Product
best_product = df.groupby("Product")["Total_Sales"].sum().idxmax()

In [33]:
#Highest Single Sale
highest_sale = df["Total_Sales"].max()

### DAY 5: Create Final Report

#### COMPLETE FINAL PYTHON PROGRAM

In [42]:
print("\n  SALES DATA ANALYSIS REPORT")
print("----------------------------")
print(f"Total Revenue: ₹{total_revenue:,.2f}")
print(f"Best-Selling Product: {best_product}")
print(f"Highest Single Sale: ₹{highest_sale:,.2f}")

print("\nInsights:")
print("1.Laptop generates the highest revenue.")
print("2.Cleaned missing and duplicate data for accuracy.")


  SALES DATA ANALYSIS REPORT
----------------------------
Total Revenue: ₹12,365,048.00
Best-Selling Product: Laptop
Highest Single Sale: ₹373,932.00

Insights:
1.Laptop generates the highest revenue.
2.Cleaned missing and duplicate data for accuracy.
