# IMS信令流程可视化分析与智能预测方案

By Marshall Zhang  
Date 2019年2月19日  
From 电力IMS信令数据

## 1.导入相关库和数据源

In [1]:
import pandas as pd
import seaborn as sea
import sklearn as sk

In [24]:
df_data = pd.read_excel("通话记录.xlsx")#数据源暂不提供

### 原始数据格式实例

In [3]:
df_data.head(10)

Unnamed: 0,主叫号码,被叫号码,开始时间,应答时间,结束时间,通话时长,源IP,目标IP,消息类型
0,8841715,18036114567,2019-01-22 19:03:07.814495,,2019-01-22 19:03:08.611737,0,30.22.25.239,30.20.1.12,480
1,5337969597,8707239,2019-01-22 19:06:01.753423,2019-01-22 19:06:05.403681,2019-01-22 19:06:05.456206,1,30.20.1.12,30.20.1.80,BYE
2,8576061,8579561,2019-01-22 19:06:18.998325,,,0,30.22.202.4,30.20.1.80,
3,8782409,17806346788,2019-01-22 19:06:03.603935,,,0,30.23.226.69,30.20.1.140,
4,15253331888,8705270,2019-01-22 19:07:34.623272,,,0,30.20.1.12,30.20.1.8,
5,8602862,2652380,2019-01-22 19:11:46.787491,2019-01-22 19:11:54.706545,2019-01-22 19:12:04.725882,11,30.23.100.111,30.20.200.1,BYE
6,18353261700,8637024,2019-01-22 19:11:29.326516,2019-01-22 19:11:33.72154,2019-01-22 19:11:33.76709,1,30.20.1.140,30.20.1.208,BYE
7,8667313,15966390410,2019-01-22 19:03:08.974337,,,0,30.23.77.7,30.20.200.3,
8,8125354,8124562,2019-01-22 19:04:15.056258,,2019-01-22 19:04:35.996183,0,30.21.37.132,30.20.1.8,487
9,53182424947,8462631,2019-01-22 19:04:50.735256,,2019-01-22 19:05:15.517847,0,30.20.1.12,30.20.1.136,487


### 数据包含的属性

In [6]:
df_data.columns

Index(['主叫号码', '被叫号码', '开始时间', '应答时间', '结束时间', '通话时长', '源IP', '目标IP', '消息类型'], dtype='object')

### 数据总量：（呼叫总条数，属性数）

In [5]:
df_data.shape

(98400, 9)

## 2.数据清洗（删除无效、错误、缺失的数据）

In [25]:
df_data_lack_info = df_data[df_data["消息类型"].isnull()]
df_data = df_data[df_data["消息类型"].notnull()]

“消息类型”为空的数据，视为数据严重缺失的信息，做删除处理；  

In [26]:
df_data_lack_info.head(5)

Unnamed: 0,主叫号码,被叫号码,开始时间,应答时间,结束时间,通话时长,源IP,目标IP,消息类型
2,8576061,8579561,2019-01-22 19:06:18.998325,,,0,30.22.202.4,30.20.1.80,
3,8782409,17806346788,2019-01-22 19:06:03.603935,,,0,30.23.226.69,30.20.1.140,
4,15253331888,8705270,2019-01-22 19:07:34.623272,,,0,30.20.1.12,30.20.1.8,
7,8667313,15966390410,2019-01-22 19:03:08.974337,,,0,30.23.77.7,30.20.200.3,
10,8566364,15205300438,2019-01-22 19:08:51.36639,,,0,30.23.216.21,30.20.200.1,


## 3.特征工程（在原有数据基础上生成二次数据）

In [27]:
df_data["消息类型"].value_counts()

BYE    52976
487    29279
486     6580
480     1690
404      745
500      348
484       78
481       72
603       54
488       51
403       50
502        7
410        7
408        6
503        2
491        2
422        2
Name: 消息类型, dtype: int64

### 增加消息类型解释

In [31]:
df_cor = pd.read_excel("响应消息类型与原因对照表.xlsx")
df_cor.head(3)

Unnamed: 0,消息类型,原因
0,100 Trying 正在处理中,
1,180 Ringing 振铃,
2,181 call being forwarder 呼叫正在前向,


In [32]:
df_cor["原因"] = df_cor["消息类型"].apply(lambda x:x[4:])
df_cor["消息类型"] = df_cor["消息类型"].apply(lambda x:x[0:3])
df_cor.to_excel("响应消息类型与原因对照表2.xlsx")
df_cor.head()

Unnamed: 0,消息类型,原因
0,100,Trying 正在处理中
1,180,Ringing 振铃
2,181,call being forwarder 呼叫正在前向
3,182,queue 排队
4,181,session progress 会话进行


In [61]:
df_data = pd.merge(df_data,df_cor,on="消息类型",how="left")

### 增加“主叫地区”（省、市、单位/运营商）

In [None]:
def numAnalyze(num):
    num_str = str(num)
    num_len = len(num_str)
    if num_len == 9:#全号，省外
        pass
    elif num_len == 11:#手机号/外网固话
        pass
    elif num_len == 
    
        
    

In [70]:
df_data["主叫长度"] = df_data["主叫号码"].apply(lambda x:len(str(x)))
df_data["被叫长度"] = df_data["被叫号码"].apply(lambda x:len(str(x)))

In [67]:
df_data["主叫长度"].value_counts()

7     80886
11     8523
10     1126
8       708
17      301
4       243
16       45
14       34
9        30
12       18
13       12
6        10
18        5
19        4
15        3
2         1
Name: 主叫长度, dtype: int64

In [71]:
df_data["被叫长度"].value_counts()

7     61264
11    26836
12     1486
8       902
10      411
3       340
1       212
2       129
9       122
6       112
4        64
17       49
16       11
13        4
19        4
14        1
18        1
15        1
Name: 被叫长度, dtype: int64

In [72]:
df_data[df_data["主叫长度"]==11]

Unnamed: 0,主叫号码,被叫号码,开始时间,应答时间,结束时间,通话时长,源IP,目标IP,消息类型,原因,主叫长度,被叫长度
3,18353261700,8637024,2019-01-22 19:11:29.326516,2019-01-22 19:11:33.72154,2019-01-22 19:11:33.76709,1,30.20.1.140,30.20.1.208,BYE,,11,7
5,53182424947,8462631,2019-01-22 19:04:50.735256,,2019-01-22 19:05:15.517847,0,30.20.1.12,30.20.1.136,487,request terminated 请求终止,11,7
7,13225340071,8602222,2019-01-22 19:03:11.635766,,2019-01-22 19:03:11.888707,0,30.20.1.12,30.20.1.80,486,busy here 这里忙,11,7
8,13225340071,8602222,2019-01-22 19:04:55.187366,,2019-01-22 19:04:55.400922,0,30.20.1.12,30.20.1.80,486,busy here 这里忙,11,7
12,13225340071,8602222,2019-01-22 19:03:39.116694,,2019-01-22 19:03:39.333528,0,30.20.1.12,30.20.1.80,486,busy here 这里忙,11,7
13,17862980161,8843417,2019-01-22 19:03:52.245091,,2019-01-22 19:04:31.4648,0,30.20.1.12,30.20.1.136,487,request terminated 请求终止,11,7
15,18753476111,8603281,2019-01-22 19:11:24.300103,,2019-01-22 19:11:58.496059,0,30.20.1.12,30.20.1.8,487,request terminated 请求终止,11,7
18,17753048393,8565354,2019-01-22 19:03:38.234313,,2019-01-22 19:04:17.30807,0,30.20.1.140,30.20.1.136,487,request terminated 请求终止,11,7
26,13225340071,8602222,2019-01-22 19:03:56.205386,,2019-01-22 19:03:56.424524,0,30.20.1.12,30.20.1.80,486,busy here 这里忙,11,7
30,13225340071,8602222,2019-01-22 19:04:17.924758,,2019-01-22 19:04:18.111936,0,30.20.1.12,30.20.1.80,486,busy here 这里忙,11,7


## 4.功能一：通话原始数据可视化

## 5.功能二：呼叫流向和流量可视化

## 6.功能三：周期内呼叫量变化分析

## 7.功能四：基于人工智能/机器学习的话务量、流向预测（人工智能模型设计与分析）

## 8.功能五：IMS网络关键节点负载比/吞吐量监控