# Walmart Purchase Behavior Analysis

**By Akanksha Trivedi**

## 1. Business Problem

- Analyze customer purchase behavior
- Understand gender-based differences in Black Friday spending

## 2. Data Loading and Summary

In [None]:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('walmart_data.csv')
df.head()
df.info()
df.describe()
```

## 3. Data Cleaning and Profiling

In [None]:
```python
# Convert data types
df['Gender'] = df['Gender'].astype('category')
df['Age'] = df['Age'].astype('category')
# Convert 0/1 to labels
df['Marital_Status'] = df['Marital_Status'].map({0:'Unmarried', 1:'Married'})
```

### 🔍 Observations

- 550,068 transactions from 5891 users
- Male: ~75% of transactions
- Age group 26-35 dominates
- No missing or duplicate data

## 4. Univariate Analysis

In [None]:
```python
df['Purchase'].hist(bins=30)
plt.title('Purchase Amount Distribution')
plt.show()

sns.countplot(data=df, x='Gender')
plt.show()
```

## 5. Bivariate Analysis

In [None]:
```python
sns.boxplot(x='Gender', y='Purchase', data=df)
plt.title('Gender vs Purchase Amount')
plt.show()
```

### 🔍 Insights

- Males contribute ~3x more in purchase amount
- Males in City C spend the most
- Average purchase: Males ($9438), Females ($8735)

## 6. Confidence Intervals

In [None]:
```python
import scipy.stats as stats
sample = df['Purchase'].sample(1000)
ci = stats.t.interval(0.95, len(sample)-1, loc=np.mean(sample), scale=stats.sem(sample))
print('95% Confidence Interval:', ci)
```

## 7. Recommendations

- Focus more on male customers
- Target high spenders (age 51–55)
- Create youth and senior-specific marketing campaigns
- Invest in City C for higher returns