In [2]:
import pandas as pd

In [3]:
stations=pd.read_csv('/Users/mingyuan.xu/Desktop/archive-6/austin_bikeshare_stations.csv')
trips=pd.read_csv('/Users/mingyuan.xu/Desktop/archive-6/austin_bikeshare_trips.csv')

In [4]:
stations.head()

Unnamed: 0,latitude,location,longitude,name,station_id,status
0,30.27041,(30.27041 -97.75046),-97.75046,West & 6th St.,2537,active
1,30.26452,(30.26452 -97.7712),-97.7712,Barton Springs Pool,2572,active
2,30.27595,(30.27595 -97.74739),-97.74739,ACC - Rio Grande & 12th,2545,closed
3,30.2848,(30.2848 -97.72756),-97.72756,Red River & LBJ Library,1004,closed
4,30.26694,(30.26694 -97.74939),-97.74939,Nueces @ 3rd,1008,moved


This dataset includes the information of positions of bike share station, and the correspongding station status.

**Uniqueness Check:**\
First, we check the uniqueness of this dataset. Since the station_id should be unique, we use this information to check.

In [5]:
len(stations['station_id'].unique())==stations.shape[0]

True

It can be seen that the number of unique station_id is equal to the number of rows of this dataset. Therefore, there is no repeated data in this dataset.

**Missing Data Check:**\
Next, we check whether there are some missing data in this dataset.

In [14]:
pd.isnull(stations).any()

latitude      False
location      False
longitude     False
name          False
station_id    False
status        False
dtype: bool

It can be seen that there is **no** missing data in this dataset.

In [15]:
stations['status'].value_counts()

active      56
closed      10
moved        5
ACL only     1
Name: status, dtype: int64

As for the status of stations, there are 56 active stations, 10 closed stations, and 5 moved stations, and 1 ACL only (active only during Austin City Limits Music Festival).

Now, since the dataset provides the latitude and longitude of stations, we can create a map.

In [16]:
import folium
from folium import plugins
from folium import Choropleth, Circle, Marker
import seaborn as sns
from folium.plugins import HeatMap

In [17]:
active_stations=stations[stations.status=='active']
closed_stations=stations[stations.status=='closed']
moved_stations=stations[stations.status=='moved']
acl_stations=stations[stations.status=='ACL only']   #divide the stations into 4 groups based on their status

In [18]:
stations.describe()

Unnamed: 0,latitude,longitude,station_id
count,72.0,72.0,72.0
mean,30.266822,-97.742937,2625.055556
std,0.007811,0.012681,708.966633
min,30.24891,-97.7712,1001.0
25%,30.262128,-97.749315,2503.75
50%,30.266955,-97.7432,2562.5
75%,30.270573,-97.73836,2940.0
max,30.28576,-97.71002,3687.0


From the table above, we can used the means of latitude and longitude as the center position of our map.\
**latitude:30.266822, longitude:-97.742937**\
**Active:Blue\
Closed:Red\
Moved:Green\
ACL only:Pink**

In [19]:
lat=30.266822
lon=-97.742937

whole_map=folium.Map(location=[lat, lon],zoom_start=14,control_scale=True)


for lat,lon,label in zip(active_stations.latitude,active_stations.longitude,active_stations.name):
    folium.Marker(location=[lat,lon],icon=folium.Icon(color='blue', icon='cloud'),
                  popup=label).add_to(whole_map)

for lat,lon,label in zip(closed_stations.latitude,closed_stations.longitude,closed_stations.name):
    folium.Marker(location=[lat,lon],icon=folium.Icon(color='red', icon='cloud'),
                  popup=label).add_to(whole_map)

for lat,lon,label in zip(moved_stations.latitude,moved_stations.longitude,moved_stations.name):
    folium.Marker(location=[lat,lon],icon=folium.Icon(color='green', icon='cloud'),
                  popup=label).add_to(whole_map)

folium.Marker(location=[acl_stations.latitude,acl_stations.longitude],icon=folium.Icon(color='pink', icon='cloud'),
              popup=label).add_to(whole_map)

whole_map

From this map, it can be seen that most of the active stations are highly distributed among downtown, which locates on the north bank of the Colorado River. Also, most of them are distributed along the streets and at the intersections of streets. They avoid being located along the US 290, which is a highway. Setting stations here will cause  waste of resources. In addition, there are some stations near the communal facilities, such as school and Capitol Square. **(Here the map cannot be shown in .ipynb on the website of github, so I upload the screenshot, map of all the stations.jpeg)**

As for the moved station, in the highly distributed region, downtown, three of them are located near other stations. Also, the other two located in the region with fewer stations. The reason of being moved maybe the avoidance of oversupply.

We can then focus on the active stations and use the **heat map** to reveal the distribution of active stations.

In [20]:
active_stations.describe() #still use the mean of latitude and longitude as the center of this map.

Unnamed: 0,latitude,longitude,station_id
count,56.0,56.0,56.0
mean,30.266038,-97.743419,2794.982143
std,0.007607,0.012476,567.016193
min,30.24891,-97.7712,1001.0
25%,30.260287,-97.74983,2538.75
50%,30.266655,-97.7432,2567.5
75%,30.26971,-97.738885,3293.25
max,30.28576,-97.71007,3687.0


In [21]:
active_map=folium.Map(location=[30.266038,-97.743419],zoom_start=13,tiles='Stamen Toner', no_touch=True)
HeatMap(data=active_stations[['latitude','longitude']],radius=15).add_to(active_map)
active_map

From this heat map, it can be seen that most of the active stations are distributed among downtown, which locates at the north bank of the river. In addition, at the south bank of this river, there are also two regions with high density.**(Here the map cannot be shown in .ipynb on the website of github, so I upload the screenshot, heat map of active stations.jpeg)**