# 运营商客户流失分析

通过提出问题、理解数据、数据清洗、可视化分析、用户流失预测、结论和建议方面进行详细分析

## 1 背景

关于用户留存有这样一个观点，如果将用户流失率降低5%，公司利润将提升25%-85%。如今高居不下的获客成本让电信运营商遭遇“天花板”，甚至陷入获客难的窘境。随着市场饱和度上升，电信运营商亟待解决增加用户黏性，延长用户生命周期的问题。因此，电信用户流失分析相关的数据分析至关重要。

## 2 提出问题

- 分析用户特征与流失的关系
- 从整体情况看，流失用户的普遍具有哪些特征？
- 尝试找到合适的模型预测流失用户
- 针对性给出增加用户黏性、预防流失的建议。

## 3 理解数据

该数据集有21个字段，共7043条记录。每条记录包含了唯一客户的特征。 我们目标就是发现前20列特征和最后一列客户是否流失特征之间的关系

## 4 数据清洗

### 导入库

In [1]:
import numpy as np 
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import calendar

### 读取数据

In [2]:
path=r'F:\Data\yidong\WA_Fn-UseC_-Telco-Customer-Churn.csv'
df=pd.read_csv(path)

### 查看数据

In [3]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
df.shape

(7043, 21)

### 查看空值

In [5]:
df.isnull().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

无空值

### 查看数据类型

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


我们发现，TotalCharges列应该是float类型，而不是字符串类型

In [7]:
df['TotalCharges'].astype('float64')

ValueError: could not convert string to float: ''

我们发现，本列中含有空格，无法转换为float函数

In [9]:
# 将空格代替为NaN再进行筛选
df['TotalCharges']=df['TotalCharges'].replace(' ',np.nan)

In [10]:
df.isnull().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

已经代替成功，可以转换为浮点数类型

In [11]:
df['TotalCharges']=df['TotalCharges'].astype('float64')

### 再次查看数据是否合理

In [12]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges
count,7043.0,7043.0,7043.0,7032.0
mean,0.162147,32.371149,64.761692,2283.300441
std,0.368612,24.559481,30.090047,2266.771362
min,0.0,0.0,18.25,18.8
25%,0.0,9.0,35.5,401.45
50%,0.0,29.0,70.35,1397.475
75%,0.0,55.0,89.85,3794.7375
max,1.0,72.0,118.75,8684.8


### 查看空列数据

In [13]:
df[df['TotalCharges'].isnull()]

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
488,4472-LVYGI,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,No
753,3115-CZMZD,Male,0,No,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,,No
936,5709-LVOEQ,Female,0,Yes,Yes,0,Yes,No,DSL,Yes,...,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,,No
1082,4367-NUYAO,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,,No
1340,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,,No
3331,7644-OMVMY,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,,No
3826,3213-VVOLG,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,,No
4380,2520-SGTTA,Female,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,,No
5218,2923-ARZLG,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,,No
6670,4075-WKNIU,Female,0,Yes,Yes,0,Yes,Yes,DSL,No,...,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,,No


我们发现这11个用户‘tenure’（入网时长）为0个月，推测是当月新入网用户。
根据一般经验，用户即使在注册的当月流失，也需缴纳当月费用。因此将这11个用户入网时长改为1，将总消费额填充为月消费额，符合实际情况

In [14]:
df.loc[df['TotalCharges'].isnull(),'tenure']=1
df.loc[df['TotalCharges'].isnull(),'TotalCharges']=df.loc[df['TotalCharges'].isnull(),'MonthlyCharges']

### 查看所有数据

In [15]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges
count,7043.0,7043.0,7043.0,7043.0
mean,0.162147,32.37271,64.761692,2279.798992
std,0.368612,24.557454,30.090047,2266.73017
min,0.0,1.0,18.25,18.8
25%,0.0,9.0,35.5,398.55
50%,0.0,29.0,70.35,1394.55
75%,0.0,55.0,89.85,3786.6
max,1.0,72.0,118.75,8684.8


## 文件保存

In [None]:
df.to_csv(r'F:\Data\yidong\yidong.csv')

## 可视化分析

将用户特征划分为用户属性、服务属性、合同属性，并从这三个维度进行可视化分析

### 流失用户数量和占比

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5rRwH.*bnexjZICG18IabpL3i8yaSbWj7WIQMfe8*54MA7r34f9SYWFB8w2*BwwIblV3FUECUtkMGPBZ1bG.WeE!/b&bo=WAT7AQAAAAADB4Q!&rf=viewer_4)

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5rRwH.*bnexjZICG18IabpLHHmeWRUjuYSYwWz3sYB3UdYXDKDMu.qUCQuOWbqSI7nEVUJNhQfxYLEbh2LSNhSE!/b&bo=UAEkAgAAAAADB1U!&rf=viewer_4)

流失用户达到26.54%，出现问题

### 用户属性分析

1. 流失用户年龄比例

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5tzY08M6IPj8jGN1GvYMsjATi8x.XSwe8VmuvLQ0GI2ArQje1CGSQQzzQZrgd3JiDTWt*t*6yi6fi*17bNjBVB4!/b&bo=xwH7AQAAAAADBx4!&rf=viewer_4)

2. 流失用户性别比例

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcY94tz20RvMEuL6kWrKs67FKJphq1QLOB059*XWlelILIhfvr3rHFvRaMWZ7Z*imEYB6ZM*pozJhtANwz*XhQGw!/b&bo=xwH7AQAAAAADFw4!&rf=viewer_4)

3. 用户分析

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5qf6MZJEOLug7BJyCMKD2kLGx*B4k2PGlbQIWG1XPvEb5IAfahZkcR6OpK*cf0akJH8kUtEqrq1qXxOxFili9G0!/b&bo=xwH7AccB.wEDCSw!&rf=viewer_4)

4. 小结：用户流失与性别基本无关； 年老用户流失占显著高于年轻用户

5. 有伴侣的流失用户比例

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcSb1OHuRnYEBiXmLa9aCZQCUSt8Km*W.YTa2LF6gasTQG*nboU3erWFvnLN2KUCcQFLDwaMyOUbZrCbmD8Hmwjs!/b&bo=ygH7AQAAAAADFwM!&rf=viewer_4)

6. 有住宅的流失用户比例

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcaSgW4BsgwcsaqmzSZcbM0nNzN.YrIAUFevCTkr.z3doHVS6pnaKlllXPY3sbFaU7CbNEyybSPaV*I98eFvRJ7E!/b&bo=xwH7AQAAAAADFw4!&rf=viewer_4)

#### 小结

- 有伴侣的用户流失占比低于无伴侣用户；
- 有伴侣的用户较少；
- 有伴侣的用户流失占比低于无家属用户;
- 在网时长越久，流失率越低，符合一般经验；
- 在网时间达到三个月，流失率小于在网率，证明用户心理稳定期一般是三个月。

### 服务属性分析

1. 流失用户与设备

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5mDSXEokDWQyYsx8WMMTac4ImtYkyGOvrgewF*QaUy6dQRndfYc.4j1*zV0db0FPJXF8*vLgdmDEwrWPyS6Dzp4!/b&bo=wwP7AQAAAAADBxg!&rf=viewer_4)

2. 流失用户与服务

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcY94tz20RvMEuL6kWrKs67G6XiG6UHRKwlfrE*cgTODkuZSvLSlXPhc.wUX2CU9yPrggYsMtTi.SfgMjfvGH*TE!/b&bo=wwP7AQAAAAADFwg!&rf=viewer_4)

#### 小结

- 电话服务整体对用户流失影响较小。
- 光纤用户的流失占比较高；

### 合同属性分析

1. 流失用户与支付方式

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/45NBuzDIW489QBoVep5mcSb1OHuRnYEBiXmLa9aCZQDnrlx5Uh4Lj79YrpOr33AKds.TMy6gFNEx79bNeaV*vs.u1DqNL3KqrHHBycLMiLY!/b&bo=wwP7AQAAAAADFwg!&rf=viewer_4)

2. 纸质支付与在线支付

![](http://m.qpic.cn/psc?/V509KgjP2rVc1x3xXZE72dVD4k46B5pi/ruAMsa53pVQWN7FLK88i5lrx5vwmPwemqF*C0p8A*RXyVctHhCrT7R2lxksdwBkHhbbWh*Ft4.vYeOfqygbz.DCgFEH8nuJ*1dcRZO.PSFQ!/b&bo=wwP7AQAAAAADBxg!&rf=viewer_4)

#### 小结

- 采用电子支票支付的用户流失率最高，推测该方式的使用体验较为一般；
- 签订合同方式对客户流失率影响为：按月签订 > 按一年签订 > 按两年签订，证明长期合同最能保留客户；