**作业题：**
1.	计算2017年4月销售额、客流量、客单价
2.	计算2017年4月的同比销售额、客流量、客单价
3.	计算2017年4月的环比销售额、客流量、客单价

**题目说明：**
- 销售额 = 售价 乘 销量 = ["Price"] 乘 ["Qty"]
- 客流量 = 订单量（客流量用订单量代替）
- 客单价 = 销售额/客流量
- 同比是指相邻时间段内的相同时间段内的数据之比，2017年的4月的同比是2016年4月的数据。
- 环比是指相邻时间段内的数据之比，2017年4月的环比是2017年3月的数据。（这里没有2017年3月的数据，用2017年1月的数据代替）。

**业务背景：**
- 你现在是一个连锁超市的数据分析师，你需要知道你们公司本月的营业情况如何，营业情况的核心指标用销售额、客流量、客单价这三个指标来反映，这三个指标将会直接影响公司的盈利情况。
- 你如何去评判这个三个指标的发展情况呢？肯定是先想到对比，因为只有对比才有好坏之分，那么和谁去对比呢，最先对比的就是相邻的时间段，也就是和上个月比；再然后就是和相邻时间段的同一时间段，也就是去年的同期做对比，这样就可以综合反映本月各指标的发展情况。

**计算逻辑说明：**
- 我们要计算的是某一时间段内的各指标数据，那么首先需要把这一时间段的明细数据索引出来，然后再在这段时间范围内的基础上去计算各指标。
- 目前明细数据中已有的时间字段是SDate和STime,SDate是非时间格式，需要将该字段解析为时间格式，STime是时间格式，但是该字段是分秒粒度的，我们目前需要月维度的数据，所以需要将这两个字段中其中一个转化为月维度的数据，这里选择将SDate转化为月维度的数据。
- 有月维度数据以后，我们就可以通过让月等于你要计算的月份，比如让月份等于4，就是把4月的数据筛选出来，然后在筛选出来的数据上进行各指标求取。

- 销售额 = 售价 乘 销量 = ["Price"] 乘 ["Qty"]
- 客流量 = 订单量（客流量用订单量代替）= SheetID去重以后计数
- 客单价 = 销售额/客流量

# 导入相关库

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dateutil.parser import parse
plt.rcParams["font.sans-serif"]='SimHei'#解决中文乱码
plt.rcParams['axes.unicode_minus'] = False#解决负号无法正常显示的问题

# 导入数据 

In [2]:
data_2017 = pd.read_csv(r"2017年1月&2017年4月数据.csv",parse_dates = ["STime"])
data_2017.head()

Unnamed: 0,GoodsID,CateID,Cost,Price,Qty,shopID,SDate,STime,SheetID,CashValue
0,30006206,915000003,20.673643,25.23,0.328,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453
1,30163281,914010000,1.72421,2.0,2.0,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453
2,30200518,922000000,16.344857,19.62,0.23,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453
3,29989105,922000000,1.971475,2.8,2.044,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453
4,30179558,915000100,34.21974,47.41,0.226,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453


In [3]:
data_2017.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 897232 entries, 0 to 897231
Data columns (total 10 columns):
GoodsID      897232 non-null int64
CateID       897232 non-null int64
Cost         896550 non-null float64
Price        897232 non-null float64
Qty          897232 non-null float64
shopID       897232 non-null object
SDate        897232 non-null int64
STime        897232 non-null datetime64[ns]
SheetID      897232 non-null object
CashValue    897232 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(3), object(2)
memory usage: 68.5+ MB


In [4]:
data_2016 = pd.read_csv(r"2016年1月&20164月数据.csv",parse_dates = ["STime"])

In [5]:
data_2016.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1116447 entries, 0 to 1116446
Data columns (total 10 columns):
GoodsID      1116447 non-null int64
CateID       1116447 non-null int64
Cost         1114288 non-null float64
Price        1116447 non-null float64
Qty          1116447 non-null float64
shopID       1116447 non-null object
SDate        1116447 non-null int64
STime        1116447 non-null datetime64[ns]
SheetID      1116447 non-null object
CashValue    1116447 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(3), object(2)
memory usage: 85.2+ MB


In [6]:
data_2016.drop_duplicates().shape

(1067302, 10)

In [7]:
data_2017.drop_duplicates().shape

(855620, 10)

In [8]:
data_2016 = data_2016.drop_duplicates()
data_2017 = data_2017.drop_duplicates()

# 数据预处理

In [9]:
data_2016["date"] = [parse(str(time)) for time in data_2016["SDate"]]
data_2017["date"] = [parse(str(time)) for time in data_2017["SDate"]]

In [10]:
data_2016["月份"] = [time.month for time in data_2016["date"]]
data_2017["月份"] = [time.month for time in data_2017["date"]]

In [11]:
data_2017["月份"].unique()

array([1, 4], dtype=int64)

In [12]:
data_2016["月份"].unique()

array([1, 4], dtype=int64)

# 报表制作

## 本月累计数据计算 

In [13]:
April_data = data_2017[data_2017["date"]>=parse("2017-04-01")]
April_data.head()

Unnamed: 0,GoodsID,CateID,Cost,Price,Qty,shopID,SDate,STime,SheetID,CashValue,date,月份
493644,30089347,910030000,2.37,2.62,7.0,CDLG,20170405,2017-04-05 18:20:35,20170405CDLG000310112461,18.34,2017-04-05,4
493645,30020269,923000101,7.636254,10.8,1.064,CDLG,20170405,2017-04-05 18:20:57,20170405CDLG000310112462,11.496,2017-04-05,4
493646,30006266,914100000,9.02,9.12,1.0,CDLG,20170405,2017-04-05 18:21:43,20170405CDLG000310112463,35.2,2017-04-05,4
493647,30166320,936010000,5.20875,6.52,1.0,CDLG,20170405,2017-04-05 18:21:43,20170405CDLG000310112463,35.2,2017-04-05,4
493651,29989058,922000001,0.525031,0.58,2.414,CDLG,20170405,2017-04-05 18:22:22,20170405CDLG000310112464,47.322,2017-04-05,4


In [14]:
#四月累计销售额
sale_sum = (April_data["Price"]*April_data["Qty"]).sum()
sale_sum

3041195.2679000003

In [15]:
#四月累计客流用订单量代替
traffic_sum = April_data["SheetID"].drop_duplicates().count()
traffic_sum

91831

In [16]:
#四月累计客单价
Customer_price = sale_sum/traffic_sum
Customer_price

33.11730535331207

## 上月同期计算

In [17]:
March_data = data_2017[(data_2017["date"]>=parse("2017-01-01"))&(data_2017["date"]<=parse("2017-01-31"))]
March_data.head()

Unnamed: 0,GoodsID,CateID,Cost,Price,Qty,shopID,SDate,STime,SheetID,CashValue,date,月份
0,30006206,915000003,20.673643,25.23,0.328,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453,2017-01-03,1
1,30163281,914010000,1.72421,2.0,2.0,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453,2017-01-03,1
2,30200518,922000000,16.344857,19.62,0.23,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453,2017-01-03,1
3,29989105,922000000,1.971475,2.8,2.044,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453,2017-01-03,1
4,30179558,915000100,34.21974,47.41,0.226,CDLG,20170103,2017-01-03 09:56:55,20170103CDLG000210052759,56.453,2017-01-03,1


In [18]:
#一月累计销售额
sale_sum_1 = (March_data["Price"]*March_data["Qty"]).sum()
sale_sum_1

4674895.95622

In [19]:
#一月累计客流用订单量代替
traffic_sum_1 = March_data["SheetID"].drop_duplicates().count()
traffic_sum_1

100648

In [20]:
#一月累计客单价
Customer_price_1 = sale_sum_1/traffic_sum_1
Customer_price_1

46.44797667335665

## 去年同期计算 

In [21]:
last_April = data_2016[(data_2016["date"]>=parse("2016-04-01"))]
last_April.head()

Unnamed: 0,GoodsID,CateID,Cost,Price,Qty,shopID,SDate,STime,SheetID,CashValue,date,月份
470587,29989076,922000201,5.071292,5.61,0.551,CDLG,20160401,2016-04-01 07:27:53,20160401CDLGP00110000001,39.249,2016-04-01,4
470588,29989099,922000201,5.020101,6.21,0.334,CDLG,20160401,2016-04-01 07:27:53,20160401CDLGP00110000001,39.249,2016-04-01,4
470589,29989058,922000201,2.706399,3.6,1.256,CDLG,20160401,2016-04-01 07:27:53,20160401CDLGP00110000001,39.249,2016-04-01,4
470590,30023041,923000006,3.322512,3.6,1.306,CDLG,20160401,2016-04-01 07:27:53,20160401CDLGP00110000001,39.249,2016-04-01,4
470591,30151329,924000000,0.960027,1.0,14.6,CDLG,20160401,2016-04-01 07:27:53,20160401CDLGP00110000001,39.249,2016-04-01,4


In [22]:
#去年同期累计销售额
sale_sum_last = (last_April["Price"]*last_April["Qty"]).sum()
sale_sum_last

3867720.16205

In [23]:
#去年同期累计客流用订单量代替
traffic_sum_last = last_April["SheetID"].drop_duplicates().count()
traffic_sum_last

109648

In [24]:
#去年同期客单价
Customer_price_last = sale_sum_last/traffic_sum_last
Customer_price_last

35.27396908333941

## 编写函数减少代码 

In [25]:
def get_month_data(data):
    sale = (data["Price"]*data["Qty"]).sum()
    traffic = data["SheetID"].drop_duplicates().count()
    price = sale/traffic
    return (sale,traffic,price)

In [26]:
get_month_data(last_April)

(3867720.16205, 109648, 35.27396908333941)

In [27]:
get_month_data(March_data)

(4674895.95622, 100648, 46.44797667335665)

## 报表汇总 

In [31]:
data = {"本月累计":[round(sale_sum),round(traffic_sum),round(Customer_price)],
        "环比上月":[round(sale_sum_1),round(traffic_sum_1),round(Customer_price_1)],
        "去年同期":[round(sale_sum_last),round(traffic_sum_last),round(Customer_price_last)]}
columns = ["本月累计","环比上月","去年同期"]
index = ["销售额","客流量","客单价"]
month_report = pd.DataFrame(data,index = index,columns = columns)
month_report

Unnamed: 0,本月累计,环比上月,去年同期
销售额,3041195.0,4674896.0,3867720.0
客流量,91831.0,100648.0,109648.0
客单价,33.0,46.0,35.0


In [32]:
month_report["环比"] = (month_report["本月累计"]/month_report["环比上月"] - 1)
month_report["同比"] = (month_report["本月累计"]/month_report["去年同期"] - 1)
month_report

Unnamed: 0,本月累计,环比上月,去年同期,环比,同比
销售额,3041195.0,4674896.0,3867720.0,-0.349463,-0.213698
客流量,91831.0,100648.0,109648.0,-0.087602,-0.162493
客单价,33.0,46.0,35.0,-0.282609,-0.057143


In [33]:
month_report[["本月累计","环比上月","环比","去年同期","同比"]]

Unnamed: 0,本月累计,环比上月,环比,去年同期,同比
销售额,3041195.0,4674896.0,-0.349463,3867720.0,-0.213698
客流量,91831.0,100648.0,-0.087602,109648.0,-0.162493
客单价,33.0,46.0,-0.282609,35.0,-0.057143
