# 小红书销售情况分析报告

## 一、查看数据

数据由29451行，8列数据组成，以下是各数据名称的含义：
- Revenue 用户下单的购买金额
- 3rd_party_stores 用户过往在app中从第三方购买的数量，为0则代表只在自营商品中购买
- Gender 1：男 0：女 
- Engaged_last_30 最近30天在app上有参与重点活动（讨论，卖家秀）
- Lifecycle 生命周期分为A,B,C （分别对应注册6个月内，1年内，2年内）
- days_since_last_order 最近一次下单距今的天数 （小于1则代表当天有下单）
- previous_order_amount 以往累积的用户购买金额

## 二、导入库

In [1]:
import matplotlib as plt
import pandas as pd
import numpy as np

## 三、导入数据

In [2]:
df=pd.read_csv(r'F:\Data\data_hongshu5427\31 l2_week2.csv')

## 四、查看数据

In [3]:
df.head()

Unnamed: 0,revenue,gender,age,engaged_last_30,lifecycle,days_since_last_order,previous_order_amount,3rd_party_stores
0,72.98,1.0,59.0,0.0,B,4.26,2343.87,0
1,200.99,1.0,51.0,0.0,A,0.94,8539.872,0
2,69.98,1.0,79.0,0.0,C,4.29,1687.646,1
3,649.99,,,,C,14.9,3498.846,0
4,83.59,,,,C,21.13,3968.49,4


In [4]:
df.shape

(29452, 8)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29452 entries, 0 to 29451
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   revenue                  29452 non-null  float64
 1   gender                   17723 non-null  float64
 2   age                      16716 non-null  float64
 3   engaged_last_30          17723 non-null  float64
 4   lifecycle                29452 non-null  object 
 5    days_since_last_order   29452 non-null  float64
 6   previous_order_amount    29452 non-null  float64
 7   3rd_party_stores         29452 non-null  int64  
dtypes: float64(6), int64(1), object(1)
memory usage: 1.8+ MB


In [6]:
df.dtypes

revenue                    float64
gender                     float64
age                        float64
engaged_last_30            float64
lifecycle                   object
 days_since_last_order     float64
previous_order_amount      float64
3rd_party_stores             int64
dtype: object

## 五、数据清洗

### 去除重复值

In [9]:
# 检查重复值
df.duplicated().sum()

1

In [11]:
# 查看重复值
df[df.duplicated(keep=False)]

Unnamed: 0,revenue,gender,age,engaged_last_30,lifecycle,days_since_last_order,previous_order_amount,3rd_party_stores
9254,99.99,,,,A,0.13,0.0,0
13622,99.99,,,,A,0.13,0.0,0


In [12]:
# 去除重复值
df.drop_duplicates(inplace=True)

In [13]:
# 重新索引
df.reset_index(drop=True,inplace=True)

In [14]:
# 再次检查重复值
df.duplicated().sum()

0

### 去除缺失值

In [15]:
# 查看缺失值
df.isnull().sum()

revenue                        0
gender                     11728
age                        12735
engaged_last_30            11728
lifecycle                      0
 days_since_last_order         0
previous_order_amount          0
3rd_party_stores               0
dtype: int64

#### 填充年龄

In [17]:
df.age.describe()

count    16716.000000
mean        60.397404
std         14.823026
min         18.000000
25%         50.000000
50%         60.000000
75%         70.000000
max         99.000000
Name: age, dtype: float64

In [18]:
# 使用年龄平均值填充年龄
df.age.fillna(int(df.age.mean()),inplace=True)

In [19]:
# 查看年龄缺失值
df.age.isnull().sum()

0

gender和engaged_last_30的值不容易进行填充，暂时不处理，以免产生误差

In [20]:
# 查看数据
df.describe()

Unnamed: 0,revenue,gender,age,engaged_last_30,days_since_last_order,previous_order_amount,3rd_party_stores
count,29451.0,17723.0,29451.0,17723.0,29451.0,29451.0,29451.0
mean,398.298166,0.950742,60.225561,0.073069,7.711606,2348.984587,2.286136
std,960.266457,0.216412,11.169016,0.260257,6.489249,2379.775254,3.538254
min,0.02,0.0,18.0,0.0,0.13,0.0,0.0
25%,74.97,1.0,58.0,0.0,2.19,773.5715,0.0
50%,175.98,1.0,60.0,0.0,5.97,1655.99,0.0
75%,499.99,1.0,62.0,0.0,11.74,3096.841,3.0
max,103466.1,1.0,99.0,1.0,23.71,11597.9,10.0


## 六、保存数据

In [None]:
df.to_csv(r'F:\Data\data_hongshu5427\hongshu.csv')

## 七、数据可视化

### 不同性别的顾客的购买平均金额对比(条形图)

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5tAQu1USIukTqr4O85fcwsGaL842mK4O4m.fy*cevWeEZ*U9mAKk.Fw6E9hEQ6dndleCDKMGjYeBhXMhkfrqLuA!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：男性购买的平均金额比女性高

### 不同性别的顾客的购买平均金额对比

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5sG*NJWc*ygGOlcrbcgfWEyfRpzulJ4noAaB.u0iLOYgwCik.5vplZdMeQYnFgurS4MoTSvITz1kgWxuq*8wP2I!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：男性购买的平均金额，二分位数，中位数，四分位数都比女性高，说明男性的购买力比较强

### 不同年龄段的顾客的购买金额对比(条形图)

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5nd.hhv6YZT3hzjYeoDLLM2DePCCj.kFGeV2SOBotFITdTlslTq4z9y06AmWo2gEmoD5MQ80hTauG0phhCXyVas!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：20岁以下年龄段的顾客购买力最强

### 不同年龄段的顾客的购买金额对比

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcVYni4OhT*HMK96v171w36idFbl1YWIYB.w6ITuKh7hi*Q0NDLYXiC4rI2wi4gO0qt5gWacBO7EEbklDdh4V3nA!/b&bo=zAP7AcwD.wEDGTw!&rf=viewer_4)

### 是否参与重点活动的顾客对应的购买金额 (条形图)

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5sG*NJWc*ygGOlcrbcgfWEzlibbvt6y4OSoQaAzz9Eb2UWZVhKdRBVsKzYxJtKgwToXzxCjXhIWFjKFNpjd1Io8!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：参与重点活动的用户的购买金额更高，说明活动效果比较好

### 是否参与重点活动的顾客对应的购买金额

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5sG*NJWc*ygGOlcrbcgfWExjnCgJ7pzmWhZfVTaPTO6ga4e4yA*xpE5bagSVCfBtsT.6YUt4tApTvzxUIjnS868!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：参与重点活动顾客的购买金额、二分位数、三分位数、中位数都较高，说明购买力较高

### 不同生命周期的顾客的购买平均金额对比(条形图)

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5tAQu1USIukTqr4O85fcwsG1eF8CgPTQBBjyhxW7sBVLs0NtTe4mYb3zVC5a52DBgH.6*bY5D3XPhdexwUq5H7A!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：A组购买力较高，也就是注册6个月内的用户的购买力较高

### 不同生命周期的顾客的购买平均金额对比

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5tAQu1USIukTqr4O85fcwsFoYzXsmEic91CXsfHeUuHsm9tgsV3ScA12P1*YCIZQ.Ove5uoNohVBRJSwc4gvD8I!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：A组用户购买金额、二分位数、三分位数、中位数都较高，所以说明注册6个月内的用户的购买力较高

### 不同间隔天数的顾客对应的购买平均金额

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5nd.hhv6YZT3hzjYeoDLLM0YiKtwumk1wgOdijxf4mNHEM6aLacWlg9XnSFhQ8UOX11FJuhdlPyCclk*TiinOKw!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：注册用户间隔时间越短，平均购买金额越高

### 不同间隔天数的顾客对应的购买金额

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5nd.hhv6YZT3hzjYeoDLLM0fQ.DiNX039ulaS37wEUgrwykmAwrTlZOiEX2iLaPqZcu3vKmNunfqBXiHx5nvAhI!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：间隔时间越短，购买金额的二分位数、中位数、三分位数更高

### 在第三方购买数量不同的顾客的购买平均金额对比

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5iZ8vBkSnprAHKwR9mW*Hh2Hp5Pt*zpUvvJi2.GphXdROFvh1LjM0D2lLtlfxLz5SUnTskM.RSqH7d0amRjGYbQ!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

得出结论：在第三方购买数量对购买金额影响无规律

### 在第三方购买数量不同的顾客的购买金额

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5sG*NJWc*ygGOlcrbcgfWExAvlashJliOXdspbXxkMt0LOuRFXAIzz9JTdb4IxXVgQanpQIPet93DHy.Z5PtDZs!/b&bo=zAP7AcwD.wEDCSw!&rf=viewer_4)

## 八、结论

- 男性用户购买金额高，购买力更强
- 20岁以下用户购买金额高，购买力更强
- 生命周期在6个月的用户购买金额高，购买力更强
- 参与重点活动的用户购买金额高，购买力更强
- 距最近一次下单天数越小的用户购买金额高，购买力更强
- 在第三方购买数量对购买金额影响无规律