Doc title: **Amazon Advertising Purchased Product Report**

Article notes: Data came from 'Reports/Advertising Reports/Sponsored Products/Purchased Product Report' @Amazon Seller Central. Certain columns of the file which contained the product ASIN has been removed or replaced by SKU instead.

文章备注：亚马逊后台广告产品报告分析。其中包含ASIN数据的一些列已从文件中删除或被替换为SKU。

Last modified date: 2019-09-12 17:29:01 

In [1]:
# 引入pandas数据分析模块
import pandas as pd

# 数据范例：美国站，月度数据
workdf = pd.read_excel('data/amz_ads_pp_us_201908.xlsx', usecols=['广告活动名称', '广告组名称', '广告SKU', '投放', '匹配类型', '已购买的SKU'])

In [2]:
# Manipulating data 'SKU' and 'Hit Rates'
workdf['广告款式'] = workdf['广告SKU'].str.slice(0,6)
workdf['购买款式'] = workdf['已购买的SKU'].str.slice(0,6)
workdf['命中'] = (workdf['广告款式'] == workdf['购买款式'])

# 广告购买产品数据分析

## 广告产品命中率分析

In [3]:
hit_df = workdf.groupby(['命中']).count()
iyes = hit_df.at[False, '购买款式']
ino = hit_df.at[True, '购买款式']
icount = iyes + ino

print('广告产品命中率分析\n')
print('- 客户最终购买产品与广告产品为同款的数量（命中）：{}件。'.format(iyes))
print('- 客户最终购买产品与其他款式的数量（不命中）：{}件。'.format(ino))
print('\n结论：广告产品销售合计：{0}件， 命中率：{1:.2f}%。'.format(icount, iyes / icount * 100))

广告产品命中率分析

- 客户最终购买产品与广告产品为同款的数量（命中）：478件。
- 客户最终购买产品与其他款式的数量（不命中）：507件。

结论：广告产品销售合计：985件， 命中率：48.53%。


## 广告产品命中率分析（按广告产品款式）

In [4]:
# Initialize dataframe
hit_count_df = workdf.sort_values(by=['广告款式']).groupby(['广告款式']).count()
hit_count_df = hit_count_df[['广告SKU']]
hit_count_df['销售总数'] = hit_count_df['广告SKU']
hit_count_df = hit_count_df[['销售总数']]

hit_true_df = workdf.groupby(['广告款式', '命中']).count()
hit_true_df = hit_true_df[['广告SKU']].xs(True, level='命中')
hit_true_df['命中数'] = hit_true_df['广告SKU']
hit_true_df = hit_true_df[['命中数']]

# Merge dataframe base on the index of hit_count_df.
hit_df = hit_count_df.merge(hit_true_df, how='left', left_on='广告款式', right_on='广告款式').fillna(0)
hit_df['命中率'] = round(hit_df['命中数'] / hit_df['销售总数'] * 100, 2)

hit_df.sort_values(by=['命中率'], ascending = False).head(10)

Unnamed: 0_level_0,销售总数,命中数,命中率
广告款式,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HM0505,9,9.0,100.0
HM0806,1,1.0,100.0
HM0903,22,18.0,81.82
HM0701,195,153.0,78.46
HM0709,3,2.0,66.67
HM0722,162,105.0,64.81
HM0725,7,4.0,57.14
HM0801,18,10.0,55.56
HM0715,2,1.0,50.0
HM0803,19,9.0,47.37


此表解决的问题为：

- 计算各广告产品款式的命中率。（客户最终购买产品与广告产品为相同款式，即为“命中”，命中率由高到低排序）

- 从概率上而言，选择命中率越高的产品做广告，客户点击广告产品后购买的可能性也越高。

## 广告产品与非同款购买产品闭环销售的关联情况

In [5]:
# Initialize dataframe
hit_count_df = workdf.sort_values(by=['广告款式']).groupby(['广告款式']).count()
hit_count_df = hit_count_df[['广告SKU']]
hit_count_df['广告款式销售总数'] = hit_count_df['广告SKU']
hit_count_df = hit_count_df[['广告款式销售总数']]

hit_df = workdf.sort_values(by=['广告款式']).groupby(['广告款式', '购买款式']).count()
hit_df = hit_df[['广告SKU']]
hit_df['销售数'] = hit_df['广告SKU']
hit_df = hit_df[['销售数']]
hit_df.reset_index(level='购买款式', inplace=True)

# Merge dataframe base on the index of hit10_df.
hit_df = hit_df.merge(hit_count_df, how='left', left_on='广告款式', right_on='广告款式').fillna(0)
hit_df['占比'] = round(hit_df['销售数'] / hit_df['广告款式销售总数'] * 100, 2)
hit_df.reset_index(inplace=True)

# Get list of the item which is '广告款式'='购买款式' or '占比'=100.
remove_item_list = []
for content in hit_df.iterrows():
    if (content[1].广告款式 == content[1].购买款式) or (content[1].占比 == 100):
        remove_item_list.append(content[1].name)

# Remove items based on above conditions.
hit_df.drop(remove_item_list, inplace=True)
hit_df.drop(columns=['广告款式销售总数'], inplace=True)

hit_df.sort_values(by=['占比'], ascending=False).head(10)

Unnamed: 0,广告款式,购买款式,销售数,占比
131,HM0737,HM0722,17,68.0
88,HM0727,HM0741,1,50.0
47,HM0715,HM0740,1,50.0
45,HM0714,HM0737,1,50.0
44,HM0714,HM0701,1,50.0
87,HM0727,HM0719,1,50.0
121,HM0732,HM0731,1,50.0
122,HM0732,HM0739,1,50.0
204,HM0801,HM0803,8,44.44
207,HM0803,HM0801,7,36.84


此表解决的问题为：

- 找出没有购买同款广告产品的客户发生了其他哪些关联销售？按其占比进行排序。

**[返回目录](amz_ads_catalog.ipynb)**