# Uber 交通数据可视化

[![Licensed with MIT!](https://img.shields.io/github/license/Dragon1573/Python-Analysis?color=blue&label=License&style=flat-square)](https://github.com/Dragon1573/Python-Analysis/blob/master/LICENSE)
[![Datasets from Kaggle](https://img.shields.io/badge/Kaggle-118KB-blue?style=flat-square&logo=Kaggle)](https://www.kaggle.com/shobhit18th/uber-traffic-data-visualization)

## 背景

&emsp;&emsp;嗯，这些数据转载自 [MachineHack](https://www.machinehack.com/) ，它能让我们发现大型城市的交通问题。它还涉及如何调度车辆的移动，从而控制交通问题。

## 内容

&emsp;&emsp;现代城市日新月异，机动车交通的兴起改变了我们的城市设计。了解一座城市的交通流量和峰谷时段变化至关重要，因此**分析交通数据并从中提取关键信息**非常重要。我们邀请数据科学家、相关分析人员和存在研究兴趣的社会人士来分析**班加罗尔市**的交通数据。数据源自 [Uber Movement](https://movement.uber.com/) ，它提供了超过20亿次出行的匿名化数据以帮助全球范围内的城市规划。

## 题目及任务

### 步骤一　下载数据集

In [None]:
%%bash
# 激活 Anaconda 环境
activate

# 确认 Numpy 是否已安装
python -m pip list | grep 'numpy' > /dev/null
if [ $? == 1 ]; then
    echo 'Installing Numpy ...'
    python -m pip install numpy
fi
echo 'Numpy has successfully installed!'

# 确认 Pandas 是否已安装
python -m pip list | grep 'pandas' > /dev/null
if [ $? == 1 ]; then
    echo 'Installing Pandas ...'
    python -m pip install numpy
fi
echo 'Pandas has successfully installed!'

# 确认 Scikit-learn 是否已安装
python -m pip list | grep 'sklearn' > /dev/null
if [ $? == 1 ]; then
    echo 'Installing Scikit-learn ...'
    python -m pip install numpy
fi
echo 'Scikit-learn has successfully installed!'

# 确认 kaggle 是否已安装
python -m pip list | grep 'kaggle' > /dev/null
if [ $? == 1 ]; then
    echo 'Installing Kaggle ...'
    python -m pip install kaggle
fi
echo 'Kaggle has successfully installed!'

# 检查数据集是否存在
if [ -f 'Final_Majestic_to_AIM_jan-2016tomarch-2018.xlsx' ]; then
    echo 'Datasets has successfully downloaded!'
else
    echo 'Downloading datasets ...'
    python -m kaggle datasets download -d shobhit18th/uber-traffic-data-visualization -q --unzip
    echo 'Download complete!'
fi

### 步骤二　数据预处理

In [1]:
# 从 Excel 读入数据
import pandas
import numpy
table = pandas.read_excel('Final_Majestic_to_AIM_jan-2016tomarch-2018.xlsx', encoding='UTF-8', index_col='Date')

In [2]:
# 哑变量处理
table = pandas.get_dummies(table)

In [3]:
# 剔除无意义列
for column in table.columns:
    if table[column].std() == 0:
        table.drop(labels=column, axis=1, inplace=True)

In [4]:
# 转换为时刻索引和时段列
table.index = pandas.to_datetime(table.index)
table.sort_index(inplace=True)
for column in table.columns:
    table[column] = pandas.to_timedelta(table[column], unit='S')

In [5]:
print('数据集的描述性统计为：', table.describe(), sep='\n')

数据集的描述性统计为：
      Daily Mean Travel Time (Seconds)  \
count                              821   
mean            0 days 00:45:35.002436   
std             0 days 00:05:50.883762   
min                    0 days 00:27:21   
25%                    0 days 00:42:31   
50%                    0 days 00:46:02   
75%                    0 days 00:49:06   
max                    0 days 01:11:23   

      Daily Range - Lower Bound Travel Time (Seconds)  \
count                                             821   
mean                           0 days 00:31:05.090133   
std                            0 days 00:03:07.631584   
min                                   0 days 00:19:27   
25%                                   0 days 00:29:01   
50%                                   0 days 00:31:00   
75%                                   0 days 00:32:56   
max                                   0 days 00:45:12   

      Daily Range - Upper Bound Travel Time (Seconds)  \
count                                 

### 步骤三　数据可视化

## 致谢

1. MachineHack: <https://www.machinehack.com/>
2. Uber Movement: <https://movement.uber.com/>