<center> 
        <h1>Walmart Sales Analysis Project</h1>
</center>

<h1> Project Objetive </h1>
<h4> This end-to-end data analysis project focuses on uncovering key business insights from Walmart sales data. It combines Python for data cleaning and preprocessing, SQL for complex analytical querying, and Power BI for dynamic dashboarding and visualization. Through structured problem-solving, the project addresses real-world business questions related to sales trends, store performance, customer behavior, and operational efficiency.
Ideal for aspiring data analysts, it demonstrates hands-on expertise in data manipulation, SQL-driven analysis, and interactive reporting.
</h4>

In [1]:
#Importing Necessary Libraries
import pandas as pd

In [2]:
# Loading Dataset
df = pd.read_csv(r"C:\Users\Nishs\OneDrive\Desktop\Portfolio Projects Fresh\Walmart Project\Walmart.csv")

## Exploratory Data Analysis (EDA)

In [3]:
df.head() # First 5 rows

Unnamed: 0,invoice_id,Branch,City,category,unit_price,quantity,date,time,payment_method,rating,profit_margin
0,1,WALM003,San Antonio,Health and beauty,$74.69,7.0,05/01/19,13:08:00,Ewallet,9.1,0.48
1,2,WALM048,Harlingen,Electronic accessories,$15.28,5.0,08/03/19,10:29:00,Cash,9.6,0.48
2,3,WALM067,Haltom City,Home and lifestyle,$46.33,7.0,03/03/19,13:23:00,Credit card,7.4,0.33
3,4,WALM064,Bedford,Health and beauty,$58.22,8.0,27/01/19,20:33:00,Ewallet,8.4,0.33
4,5,WALM013,Irving,Sports and travel,$86.31,7.0,08/02/19,10:37:00,Ewallet,5.3,0.48


In [4]:
df.shape # Number of rows and columns

(10051, 11)

In [5]:
df.info() # Information about the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10051 entries, 0 to 10050
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   invoice_id      10051 non-null  int64  
 1   Branch          10051 non-null  object 
 2   City            10051 non-null  object 
 3   category        10051 non-null  object 
 4   unit_price      10020 non-null  object 
 5   quantity        10020 non-null  float64
 6   date            10051 non-null  object 
 7   time            10051 non-null  object 
 8   payment_method  10051 non-null  object 
 9   rating          10051 non-null  float64
 10  profit_margin   10051 non-null  float64
dtypes: float64(3), int64(1), object(7)
memory usage: 863.9+ KB


In [6]:
#Checking for null values
df.isnull().sum()

invoice_id         0
Branch             0
City               0
category           0
unit_price        31
quantity          31
date               0
time               0
payment_method     0
rating             0
profit_margin      0
dtype: int64

#### There are 31 null values in 'Unit_price' & 'Quantity' we dropped it as there is no use of that.

In [7]:
# Remove null values
df.dropna(inplace=True)

In [8]:
df.duplicated().sum() # Checking for duplicates

np.int64(51)

In [9]:
# Remove Duplicates
df.drop_duplicates(inplace=True)

In [10]:
df.dtypes # Checking data types

invoice_id          int64
Branch             object
City               object
category           object
unit_price         object
quantity          float64
date               object
time               object
payment_method     object
rating            float64
profit_margin     float64
dtype: object

In [11]:
df.head() # First 5 rows

Unnamed: 0,invoice_id,Branch,City,category,unit_price,quantity,date,time,payment_method,rating,profit_margin
0,1,WALM003,San Antonio,Health and beauty,$74.69,7.0,05/01/19,13:08:00,Ewallet,9.1,0.48
1,2,WALM048,Harlingen,Electronic accessories,$15.28,5.0,08/03/19,10:29:00,Cash,9.6,0.48
2,3,WALM067,Haltom City,Home and lifestyle,$46.33,7.0,03/03/19,13:23:00,Credit card,7.4,0.33
3,4,WALM064,Bedford,Health and beauty,$58.22,8.0,27/01/19,20:33:00,Ewallet,8.4,0.33
4,5,WALM013,Irving,Sports and travel,$86.31,7.0,08/02/19,10:37:00,Ewallet,5.3,0.48


#### In unit_price there is '$' sign we have to remove it

In [12]:
# Removing $ sign
df['unit_price'] = df['unit_price'].str.replace('$', '') 

In [13]:
# Converting to float
df['unit_price'] = df['unit_price'].astype(float)

In [14]:
df.info() # checking data information

<class 'pandas.core.frame.DataFrame'>
Index: 9969 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   invoice_id      9969 non-null   int64  
 1   Branch          9969 non-null   object 
 2   City            9969 non-null   object 
 3   category        9969 non-null   object 
 4   unit_price      9969 non-null   float64
 5   quantity        9969 non-null   float64
 6   date            9969 non-null   object 
 7   time            9969 non-null   object 
 8   payment_method  9969 non-null   object 
 9   rating          9969 non-null   float64
 10  profit_margin   9969 non-null   float64
dtypes: float64(4), int64(1), object(6)
memory usage: 934.6+ KB


In [None]:
# Creating Total_Sales column
df['Total_Sales'] = df['unit_price'] * df['quantity']
df.head()

Unnamed: 0,invoice_id,Branch,City,category,unit_price,quantity,date,time,payment_method,rating,profit_margin,Total_Sales
0,1,WALM003,San Antonio,Health and beauty,74.69,7.0,05/01/19,13:08:00,Ewallet,9.1,0.48,522.83
1,2,WALM048,Harlingen,Electronic accessories,15.28,5.0,08/03/19,10:29:00,Cash,9.6,0.48,76.4
2,3,WALM067,Haltom City,Home and lifestyle,46.33,7.0,03/03/19,13:23:00,Credit card,7.4,0.33,324.31
3,4,WALM064,Bedford,Health and beauty,58.22,8.0,27/01/19,20:33:00,Ewallet,8.4,0.33,465.76
4,5,WALM013,Irving,Sports and travel,86.31,7.0,08/02/19,10:37:00,Ewallet,5.3,0.48,604.17


In [None]:
df.shape #After adding Total_Sales column

(9969, 12)

In [None]:
df.columns # Checking column names

Index(['invoice_id', 'Branch', 'City', 'category', 'unit_price', 'quantity',
       'date', 'time', 'payment_method', 'rating', 'profit_margin',
       'Total_Sales'],
      dtype='object')

In [18]:
# Making lowercase
df.columns = df.columns.str.lower()
df.columns

Index(['invoice_id', 'branch', 'city', 'category', 'unit_price', 'quantity',
       'date', 'time', 'payment_method', 'rating', 'profit_margin',
       'total_sales'],
      dtype='object')

In [19]:
df.to_csv(r"C:\Users\Nishs\OneDrive\Desktop\Portfolio Projects Fresh\Walmart Project\Walmart_Clean_Data.csv", index=False) #Cleaned file exporting in excel