## 场景描述：
对不同用户（每各用户作为一个事务）使用过的电信产品清单进行分析，试图挖掘哪些电信产品存在相关性，或者哪些地区的用户更倾向于订购哪种套餐。

注：请在操作手册中注明一下：编码只是示例，不代表和商用真实编码相同，套餐属性也不代表和真实商用套餐相同。

###  读取数据

In [1]:
# 不能用pandas的原因是每一行的列数不相等
#导入应用库
import csv
import itertools
import pyfpgrowth as fp #导入fpgrowth算法库
import matplotlib as plt

In [2]:
#初始化数据，依据地址信息对读取数据预处理，拼装成指定地区的套餐数据列表
def initData(destId):
    destDatalist=[] #定义指定地区的目标数据列表，存储套餐数据
    #读取数据文件，pandas是做结构化数据处理的，即列是确定的，针对列不确定的数据，最好还是采用Python进行相关数据处理，才具备一定的通用性
    with open("Correlation.CSV","r",encoding= 'utf8') as csv_file:
        csv_file1 = csv.reader(csv_file) #定义数据读取
        #按行读取文件数据，循环筛选获取所需的数据
        for i, rows in enumerate(csv_file1):
            if   0<i: #不读取文件头信息
                rowValue=rows[0].split(" ") #按行读，读取进来是字符串，通过分隔符（空格）拆分
                if int(rowValue[1])==destId: #获取每个地区的记录数据
                    if len(rowValue)>2: #判断每条记录是否都有对应的套餐
                        rowValue.remove(rowValue[0]) #删除掉用户的属性信息
                        rowValue.remove(rowValue[0]) #删除掉用户的地区信息
                        destDatalist.append(rowValue) #拼装成目的数据列表
                else:
                    continue #如果不满足，则不处理该条记录
    return destDatalist

### 打印关联规则

In [91]:
def fp_show_mining_results(rules,len_RuleKinds=[2,3]):
    print("Rules:\n------")
    for key, val in rules.items():
        head = key
        tail = val[0]
        confidence = val[1]
        len_Rule = len(head)+len(tail)
        if len_Rule in len_RuleKinds:
            if len(tail) == 0:
                continue
            print('({}) ==> ({})  confidence = {}'.format(', '.join(head), ', '.join(tail), round(confidence, 3)))         
    print()

### 转换为实际中文名称的套餐

In [70]:
# for rule in sortRules:
def getLabeledRules(rules):
    card_dict = {'90063345': '腾讯大王卡', '90046637': '腾讯视频小王卡',
                 '90046638': '腾讯呢音频小王卡', '90065147': '滴滴大王卡',
                 '90065148': '滴滴小王卡', '90151621': '滴滴大橙卡',
                 '90151624': '滴滴小橙卡', '90109916': '蚂蚁大宝卡',
                 '90109906': '蚂蚁小宝卡', '90127327': '百度大神卡',
                 '90157593': '百度女神卡', '90138402': '招行大招卡',
                 '90157638': '哔哩哔哩22卡', '90151622': '微博大V卡',
                 '90163763': '饿了么大饿卡', '90199605': '懂我卡',
                 '90129503': '京东小强卡', '90215356': '阿里YunOS-9元卡'}
    labeled_rules = {}
    for key,value in rules.items():
        head = key
        tail = value[0]
        confidence =value[1]
        
        labeled_head = tuple(map(lambda x:card_dict.get(x),head))
        labeled_tail = tuple(map(lambda x:card_dict.get(x),tail))
    #     print(labeled_head)
    #     print(labeled_tail)
        labeled_rules[labeled_head] = (labeled_tail,confidence)
    return labeled_rules
labeled_rules =getLabeledRules(rules)

### 统计关联规则套餐的类型

In [83]:
def getlenRuleKinds(labeled_rules):
    len_RuleKinds = {} 
    for key,value in labeled_rules.items():
        len_rules = len(key)+len(value[0])
        if len_RuleKinds.get(len_rules):
            len_RuleKinds[len_rules] = len_RuleKinds.get(len_rules)+1
        else:
            len_RuleKinds[len_rules] = 1
    return len_RuleKinds
getlenRuleKinds(labeled_rules)

{2: 1, 3: 18}

### 示例：杭州地区

In [3]:
hangzhou_data = initData(int('0571'))
hangzhou_data[0:5]

[['90109906'],
 ['90109906', '90215356', '90157638'],
 ['90138402', '90046638'],
 ['90063345', '90163763', '90157638'],
 ['90151624', '90063345', '90065147', '90199605']]

In [7]:
result1=fp.find_frequent_patterns(hangzhou_data,100) #调用fp函数找到频繁组合
rules=fp.generate_association_rules(result1,0.35) #依据组合结果找到关联规则
rules

{('90046637', '90129503'): (('90157638',), 0.38341968911917096),
 ('90046637', '90163763'): (('90157638',), 0.36613475177304966),
 ('90046637', '90215356'): (('90157638',), 0.35606731620903453),
 ('90046638',): (('90157638',), 0.3680452522255193),
 ('90046638', '90063345'): (('90157638',), 0.3632286995515695),
 ('90046638', '90151621'): (('90157638',), 0.3687881429816914),
 ('90046638', '90151622'): (('90157638',), 0.3730703259005146),
 ('90046638', '90157593'): (('90157638',), 0.3704663212435233),
 ('90063345', '90065148'): (('90157638',), 0.3625461254612546),
 ('90063345', '90199605'): (('90157638',), 0.3562005277044855),
 ('90065147', '90151621'): (('90157638',), 0.35526315789473684),
 ('90065148', '90109916'): (('90157638',), 0.36528758829465185),
 ('90065148', '90129503'): (('90157638',), 0.3543859649122807),
 ('90109906', '90109916'): (('90157638',), 0.36434782608695654),
 ('90109906', '90151622'): (('90157638',), 0.36538461538461536),
 ('90109916', '90127327'): (('90157638',), 0

In [92]:
fp_show_mining_results(dict(rules))

Rules:
------
(90046637, 90163763) ==> (90157638)  confidence = 0.366
(90065147, 90151621) ==> (90157638)  confidence = 0.355
(90151622, 90151624) ==> (90157638)  confidence = 0.367
(90063345, 90065148) ==> (90157638)  confidence = 0.363
(90046638, 90063345) ==> (90157638)  confidence = 0.363
(90063345, 90199605) ==> (90157638)  confidence = 0.356
(90065148, 90109916) ==> (90157638)  confidence = 0.365
(90065148, 90129503) ==> (90157638)  confidence = 0.354
(90109916, 90127327) ==> (90157638)  confidence = 0.356
(90109906, 90109916) ==> (90157638)  confidence = 0.364
(90127327, 90157593) ==> (90157638)  confidence = 0.353
(90199605, 90215356) ==> (90157638)  confidence = 0.356
(90046637, 90129503) ==> (90157638)  confidence = 0.383
(90046638, 90151621) ==> (90157638)  confidence = 0.369
(90046638, 90157593) ==> (90157638)  confidence = 0.37
(90109906, 90151622) ==> (90157638)  confidence = 0.365
(90046637, 90215356) ==> (90157638)  confidence = 0.356
(90046638, 90151622) ==> (90157638)

In [71]:
fp_show_mining_results(dict(labeled_rules),10)

Rules:
------
(腾讯视频小王卡, 饿了么大饿卡) ==> (哔哩哔哩22卡)  confidence = 0.366
(滴滴大王卡, 滴滴大橙卡) ==> (哔哩哔哩22卡)  confidence = 0.355
(微博大V卡, 滴滴小橙卡) ==> (哔哩哔哩22卡)  confidence = 0.367
(腾讯大王卡, 滴滴小王卡) ==> (哔哩哔哩22卡)  confidence = 0.363
(腾讯呢音频小王卡, 腾讯大王卡) ==> (哔哩哔哩22卡)  confidence = 0.363
(腾讯大王卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(滴滴小王卡, 蚂蚁大宝卡) ==> (哔哩哔哩22卡)  confidence = 0.365
(滴滴小王卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.354
(蚂蚁大宝卡, 百度大神卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(蚂蚁小宝卡, 蚂蚁大宝卡) ==> (哔哩哔哩22卡)  confidence = 0.364
(百度大神卡, 百度女神卡) ==> (哔哩哔哩22卡)  confidence = 0.353
(懂我卡, 阿里YunOS-9元卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(腾讯视频小王卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.383
(腾讯呢音频小王卡, 滴滴大橙卡) ==> (哔哩哔哩22卡)  confidence = 0.369
(腾讯呢音频小王卡, 百度女神卡) ==> (哔哩哔哩22卡)  confidence = 0.37
(蚂蚁小宝卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.365
(腾讯视频小王卡, 阿里YunOS-9元卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(腾讯呢音频小王卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.373
(腾讯呢音频小王卡) ==> (哔哩哔哩22卡)  confidence = 0.368



In [97]:
# if __name__ == '__main_`_':
#获取地区的名字和Id组成字典
destIddict = {571:'杭州', 574:'宁波', 577:'温州', 573:'嘉兴',
              572:'湖州', 575:'绍兴', 579:'金华', 570:'衢州',
              580:'舟山', 576:'台州', 578:'丽水'}
#依据地区id进行遍历，获取每个地区的id
# i = 0
for destid in destIddict.items():
#     i+=1
#     if i>1:
#         break
    print('%s地区的电话套餐列表'%destid[1]) #输出指定地区的信息
    dest_data=initData(destid[0]) #获取指定地区的记录数据
    print(len(dest_data)) #计算指定地区的信息记录数
    
    #   获取满足条件的置信度组合，100为频次，0.35为置信度
    result1=fp.find_frequent_patterns(dest_data,100) #调用fp函数找到频繁组合
    rules=fp.generate_association_rules(result1,0.35) #依据组合结果找到关联规则
    # print(rules)
    
    labeled_rules =getLabeledRules(rules)
    
    print("包含%d类套餐方案" % len(getlenRuleKinds(labeled_rules)))
    
    j = 0
    for kindkey,value in getlenRuleKinds(labeled_rules).items():#依据类别key获取数据
        print("************第%d类套餐*****************"%(j+1)) #获取每一类套餐
        j+=1
        print("包含%d个套餐组合的频繁套餐规则共 %d："%(kindkey,value)) #获取对应的套餐规则编号
        fp_show_mining_results(labeled_rules,[kindkey])
        print("***************************************")
    print('\n')

杭州地区的电话套餐列表
83251
包含2类套餐方案
************第1类套餐*****************
包含3个套餐组合的频繁套餐规则共 18：
Rules:
------
(腾讯视频小王卡, 饿了么大饿卡) ==> (哔哩哔哩22卡)  confidence = 0.366
(滴滴大王卡, 滴滴大橙卡) ==> (哔哩哔哩22卡)  confidence = 0.355
(微博大V卡, 滴滴小橙卡) ==> (哔哩哔哩22卡)  confidence = 0.367
(腾讯大王卡, 滴滴小王卡) ==> (哔哩哔哩22卡)  confidence = 0.363
(腾讯呢音频小王卡, 腾讯大王卡) ==> (哔哩哔哩22卡)  confidence = 0.363
(腾讯大王卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(滴滴小王卡, 蚂蚁大宝卡) ==> (哔哩哔哩22卡)  confidence = 0.365
(滴滴小王卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.354
(蚂蚁大宝卡, 百度大神卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(蚂蚁小宝卡, 蚂蚁大宝卡) ==> (哔哩哔哩22卡)  confidence = 0.364
(百度大神卡, 百度女神卡) ==> (哔哩哔哩22卡)  confidence = 0.353
(懂我卡, 阿里YunOS-9元卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(腾讯视频小王卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.383
(腾讯呢音频小王卡, 滴滴大橙卡) ==> (哔哩哔哩22卡)  confidence = 0.369
(腾讯呢音频小王卡, 百度女神卡) ==> (哔哩哔哩22卡)  confidence = 0.37
(蚂蚁小宝卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.365
(腾讯视频小王卡, 阿里YunOS-9元卡) ==> (哔哩哔哩22卡)  confidence = 0.356
(腾讯呢音频小王卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0

83678
包含2类套餐方案
************第1类套餐*****************
包含3个套餐组合的频繁套餐规则共 19：
Rules:
------
(懂我卡, 阿里YunOS-9元卡) ==> (哔哩哔哩22卡)  confidence = 0.352
(蚂蚁大宝卡, 滴滴大橙卡) ==> (哔哩哔哩22卡)  confidence = 0.378
(滴滴大王卡, 蚂蚁小宝卡) ==> (哔哩哔哩22卡)  confidence = 0.361
(蚂蚁大宝卡, 滴滴小橙卡) ==> (哔哩哔哩22卡)  confidence = 0.382
(腾讯大王卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.376
(百度大神卡, 招行大招卡) ==> (哔哩哔哩22卡)  confidence = 0.373
(腾讯呢音频小王卡, 滴滴大王卡) ==> (哔哩哔哩22卡)  confidence = 0.358
(滴滴大王卡, 百度女神卡) ==> (哔哩哔哩22卡)  confidence = 0.37
(饿了么大饿卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.362
(百度大神卡, 饿了么大饿卡) ==> (哔哩哔哩22卡)  confidence = 0.353
(滴滴小王卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.372
(蚂蚁大宝卡, 京东小强卡) ==> (哔哩哔哩22卡)  confidence = 0.377
(滴滴大王卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.386
(腾讯视频小王卡, 微博大V卡) ==> (哔哩哔哩22卡)  confidence = 0.354
(滴滴大王卡, 蚂蚁大宝卡) ==> (哔哩哔哩22卡)  confidence = 0.391
(百度大神卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.395
(腾讯视频小王卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.364
(滴滴大王卡, 懂我卡) ==> (哔哩哔哩22卡)  confidence = 0.368
(腾讯视频小王卡, 百度大神卡) ==> (哔哩哔哩22卡