# 武汉肺炎疫情地图可视化

author：EVA

**注：** 本次实验仅供教学分析使用。

数据源自[kaggle](https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/data#).

# 导入数据

In [3]:
import pandas as pd

data = pd.read_csv('./dataset/2019_nCoV_data.csv')

In [4]:
type(data)

pandas.core.frame.DataFrame

在导入数据后，我们查看其数据结构。

In [6]:
data.head()

Unnamed: 0,Sno,Date,Province/State,Country,Last Update,Confirmed,Deaths,Recovered
0,1,1/22/2020 12:00,Anhui,China,1/22/2020 12:00,1.0,0.0,0.0
1,2,1/22/2020 12:00,Beijing,China,1/22/2020 12:00,14.0,0.0,0.0
2,3,1/22/2020 12:00,Chongqing,China,1/22/2020 12:00,6.0,0.0,0.0
3,4,1/22/2020 12:00,Fujian,China,1/22/2020 12:00,1.0,0.0,0.0
4,5,1/22/2020 12:00,Gansu,China,1/22/2020 12:00,0.0,0.0,0.0


一共有700条信息，包含时间，中国省市，多国城市，数据更新时间，当天的确诊人数，死亡人数和治愈人数。我们将在后续对这个数据进一步处理优化。

另外，由于在分析中只考虑国内疫情的趋势，我们将排除国外的城市。

# 整合数据

### 中国各地信息

整理出只包含中国的数据：

In [7]:
# 整合只包含中国的数据
a = data[data['Country']=='China']
b = data[data['Country']=='Mainland China']
c = data[data['Country']=='Hong Kong']
d = data[data['Country']=='Macau']
e = data[data['Country']=='Taiwan']

In [8]:
# 整合数据,删除无用列
ncov = pd.concat([a,b,c,d,e],axis = 0).drop(columns = ['Last Update','Sno','Country'])

### 整合时间数据

In [9]:
# 整理时间数据，去除日期以外的时间节点
ncov['Date'] = ncov['Date'].apply(lambda x: x.split(' ')[0])

In [10]:
# 修改时间错误，并只保留月和日
ncov.replace('2020-01-02','2020-02-01',inplace = True)
ncov['Date'] = pd.to_datetime(ncov['Date'])
ncov['Date'] = ncov['Date'].apply(lambda x: str(x)[5:10])

In [12]:
ncov.head()

Unnamed: 0,Date,Province/State,Confirmed,Deaths,Recovered
0,01-22,Anhui,1.0,0.0,0.0
1,01-22,Beijing,14.0,0.0,0.0
2,01-22,Chongqing,6.0,0.0,0.0
3,01-22,Fujian,1.0,0.0,0.0
4,01-22,Gansu,0.0,0.0,0.0


为了后续可视化的需要，我们现在将英文名称转为中文名称：

In [13]:
# 建立字典，一一对应
province_names = {
    'Beijing': '北京',
    'Shanghai': '上海',
    'Tianjin': '天津',
    'Chongqing': '重庆',
    'Hong Kong': '香港',
    'Macau': '澳门',
    'Anhui': '安徽',
    'Fujian': '福建',
    'Guangdong': '广东',
    'Guangxi': '广西',
    'Guizhou': '贵州',
    'Gansu': '甘肃',
    'Hainan': '海南',
    'Hebei': '河北',
    'Henan': '河南',
    'Heilongjiang': '黑龙江',
    'Hubei': '湖北',
    'Hunan': '湖南',
    'Jilin': '吉林',
    'Jiangsu': '江苏',
    'Jiangxi': '江西',
    'Liaoning': '辽宁',
    'Inner Mongolia': '内蒙古',
    'Ningxia': '宁夏',
    'Qinghai': '青海',
    'Shaanxi': '陕西',
    'Shanxi': '山西',
    'Shandong': '山东',
    'Sichuan': '四川',
    'Taiwan': '台湾',
    'Tibet': '西藏',
    'Xinjiang': '新疆',
    'Yunnan': '云南',
    'Zhejiang': '浙江',
}

In [14]:
# 将英文名称转为中文名称
ncov['Province/State'] = ncov['Province/State'].apply(lambda x:  province_names[x])

In [16]:
ncov.head()

Unnamed: 0,Date,Province/State,Confirmed,Deaths,Recovered
0,01-22,安徽,1.0,0.0,0.0
1,01-22,北京,14.0,0.0,0.0
2,01-22,重庆,6.0,0.0,0.0
3,01-22,福建,1.0,0.0,0.0
4,01-22,甘肃,0.0,0.0,0.0


提取最近日期的数据：

In [17]:
# 提取最近日期的数据
last = ncov[ncov['Date'] =='02-02']

In [18]:
last.head()

Unnamed: 0,Date,Province/State,Confirmed,Deaths,Recovered
564,02-02,湖北,11177.0,350.0,295.0
565,02-02,浙江,724.0,0.0,36.0
566,02-02,广东,683.0,0.0,15.0
567,02-02,河南,566.0,2.0,14.0
568,02-02,湖南,521.0,0.0,16.0


In [19]:
last.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 34 entries, 564 to 600
Data columns (total 5 columns):
Date              34 non-null object
Province/State    34 non-null object
Confirmed         34 non-null float64
Deaths            34 non-null float64
Recovered         34 non-null float64
dtypes: float64(3), object(2)
memory usage: 1.6+ KB


# 绘制地图

使用pyecharts绘制疫情地图。通过设置`pieces`参数，设定分段组件的上下限值。

In [20]:
from pyecharts import options as opts
from pyecharts.charts import Map,Geo
from pyecharts.globals import ChartType

def ncov_map():
    num = max(last.Confirmed)
    c = (
        Map()
        .add('确诊人数',
             [list(z) for z in zip(last['Province/State'],last.Confirmed.astype(int))]
             )
        .set_global_opts(
            title_opts = opts.TitleOpts(title = 'ncov-02-03确诊人数'),
            visualmap_opts = opts.VisualMapOpts(is_piecewise = True,
                                                pieces=[
                                                    {'min':num -10000,'max':num},
                                                    {'min':500, 'max':num - 10001},
                                                    {'min':300, 'max':499},
                                                    {'min':100, 'max':299},
                                                    {'min':50,'max':99},
                                                    {'max':49}
                                                    ],
                                               range_color = {0:'pink',1:'red'})
            )
        )
    
    return c

ncov_map().render_notebook()

# 保存数据

最后，我们用pickle保存这次使用的数据。

In [None]:
import pickle
with open('./data/ncov.pkl','wb+') as f:
    pickle.dump(ncov, f)
    f.close()