# CityBikes

**Send a request to CityBikes for the city of your choice.**

In [1]:
import requests

Auckland, NZ only has one bike network; when a city has multiple networks, zooming in on it will result in a bubble for each network, and clicking on one will show the stations belonging to that network. For example, Tokyo has 2, and Kyoto has 3. (Most cities only have one.) The API is structured so that the first layer returns all of the networks alphabetically, and inside each network section is each of the stations and their info. This means that, because my city only has one network, I can query for that network and get all of the information for Auckland, meaning I don't need to do a search by GPS coordinates in this case. 

'net_name' criteria:
<ul>
<li>based on bubble while zoomed out</li>
<li>all lowercase</li>
<li>connected by '-'</li>
<li>no other punctuation</li>


In [2]:
net_name = "belfastbikes-belfast"
url = "https://api.citybik.es/v2/networks/" + net_name

In [3]:
response = requests.get(url)

**Parse through the response to get the details you want for the bike stations in that city (latitude, longitude, number of bikes).**

In [4]:
import pandas as pd
import numpy as np 

In [5]:
response.json()

{'network': {'id': 'belfastbikes-belfast',
  'name': 'BelfastBikes',
  'location': {'latitude': 54.5969,
   'longitude': -5.92918,
   'city': 'Belfast',
   'country': 'GB'},
  'href': '/v2/networks/belfastbikes-belfast',
  'company': ['Nextbike GmbH'],
  'system': 'Nextbike',
  'stations': [{'id': '00a175b9694ce13c3b9bc9e1244e1612',
    'name': 'Carlisle Circus',
    'latitude': 54.607397,
    'longitude': -5.937086,
    'timestamp': '2024-10-24T00:31:22.985299Z',
    'free_bikes': 1,
    'empty_slots': 20,
    'extra': {'uid': '20754306',
     'number': '3939',
     'slots': 20,
     'bike_uids': ['830048']}},
   {'id': '036a91020ad7f40ea0d2bfda0fb14400',
    'name': 'City Hall',
    'latitude': 54.597177,
    'longitude': -5.930825,
    'timestamp': '2024-10-24T00:31:22.985383Z',
    'free_bikes': 0,
    'empty_slots': 20,
    'extra': {'uid': '20754767',
     'number': '3902',
     'slots': 20,
     'bike_uids': ['']}},
   {'id': '04764ca50c52190e0e6448114ad094d8',
    'name': 'Brad

In [6]:
pd.DataFrame(response)

Unnamed: 0,0
0,"b'{""network"":{""id"":""belfastbikes-belfast"",""nam..."
1,"b't"",""country"":""GB""},""href"":""/v2/networks/belf..."
2,"b'""00a175b9694ce13c3b9bc9e1244e1612"",""name"":""C..."
3,"b'0:31:22.985299Z"",""free_bikes"":1,""empty_slots..."
4,"b'""id"":""036a91020ad7f40ea0d2bfda0fb14400"",""nam..."
...,...
139,"b'.955373,""timestamp"":""2024-10-24T00:31:22.986..."
140,"b'ts"":6,""bike_uids"":[""830396"",""830257"",""830179..."
141,"b'""latitude"":54.576116,""longitude"":-5.942574,""..."
142,"b'id"":""244525787"",""number"":""3983"",""slots"":20,""..."


Get all of the basic area information on the network level into a dataframe:

In [7]:
location = pd.json_normalize(response.json())
location

Unnamed: 0,network.id,network.name,network.location.latitude,network.location.longitude,network.location.city,network.location.country,network.href,network.company,network.system,network.stations
0,belfastbikes-belfast,BelfastBikes,54.5969,-5.92918,Belfast,GB,/v2/networks/belfastbikes-belfast,[Nextbike GmbH],Nextbike,"[{'id': '00a175b9694ce13c3b9bc9e1244e1612', 'n..."


'network.company' is technically a list but it only has one value, so to get it out of that list status and standing on it's own, I'm just going to unravel it with .explode()

In [8]:
location['network.company'] = location['network.company'].explode()
location

Unnamed: 0,network.id,network.name,network.location.latitude,network.location.longitude,network.location.city,network.location.country,network.href,network.company,network.system,network.stations
0,belfastbikes-belfast,BelfastBikes,54.5969,-5.92918,Belfast,GB,/v2/networks/belfastbikes-belfast,Nextbike GmbH,Nextbike,"[{'id': '00a175b9694ce13c3b9bc9e1244e1612', 'n..."


Now I'm going to split off the staations into their own frame to un-nest, and make a frame that's just the overall info. Gonna do the first layer of normalize at the same time.

In [9]:
belfast = location.drop(columns={'network.stations'})
belfast

Unnamed: 0,network.id,network.name,network.location.latitude,network.location.longitude,network.location.city,network.location.country,network.href,network.company,network.system
0,belfastbikes-belfast,BelfastBikes,54.5969,-5.92918,Belfast,GB,/v2/networks/belfastbikes-belfast,Nextbike GmbH,Nextbike


Actually these columns need better names in a few places.

In [10]:
belfast = belfast.rename(columns={
    'network.location.latitude' : 'latitude',
    'network.location.longitude' : 'longitude',
    'network.location.city' : 'city',
    'network.location.country' : 'country'})
belfast

Unnamed: 0,network.id,network.name,latitude,longitude,city,country,network.href,network.company,network.system
0,belfastbikes-belfast,BelfastBikes,54.5969,-5.92918,Belfast,GB,/v2/networks/belfastbikes-belfast,Nextbike GmbH,Nextbike


In [11]:
station_set = pd.json_normalize(location['network.stations'])
station_set

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,"{'id': '00a175b9694ce13c3b9bc9e1244e1612', 'na...","{'id': '036a91020ad7f40ea0d2bfda0fb14400', 'na...","{'id': '04764ca50c52190e0e6448114ad094d8', 'na...","{'id': '0a705a961d040d272e9e72b27fd1239d', 'na...","{'id': '0bcde9e962503960bbf3c1940e63be23', 'na...","{'id': '1d32f54e87d456e46654efe7f4d3bb79', 'na...","{'id': '21736d82110ac1d36d2d08877b89d666', 'na...","{'id': '2dde63ecc4e7629be51b8869769355f8', 'na...","{'id': '37bdf55233e68976b8d6e48b05b4870f', 'na...","{'id': '38c115f757ddfb8932019674ceaae244', 'na...",...,"{'id': 'd412a3eb183b7113ff5ec28eb6b562e8', 'na...","{'id': 'd873badf682d23aaa171313c0c6973af', 'na...","{'id': 'dcd2485cbb0056140bab0700d1735d77', 'na...","{'id': 'dce1e20d7e09fcaff649fe373933af8a', 'na...","{'id': 'de687ed226fad2886d641ea708e7282e', 'na...","{'id': 'e50e9ac63d4b7e1ff5e3c6735fd4f6a1', 'na...","{'id': 'e6334864bee916cd07ed9eafb0ab5ec7', 'na...","{'id': 'e7bc242e7b81a9091ab625e9403680fa', 'na...","{'id': 'eb405bd827607e718ed7873e2a88f2f9', 'na...","{'id': 'fc6fb7a4026e37418578fdf54fdcb826', 'na..."


(This is where I realized I had a problem with my original city.)

Ok so it looks like I only need to do one layer of normalizing for each column; I might eventually apply a count funtion to the extra.bike_uids column to get the number of bikes in use, but the actual user IDs isn't the important thing. It can stay as a list. That way each station is a single row. 

In [12]:
pd.json_normalize(station_set[25])

Unnamed: 0,id,name,latitude,longitude,timestamp,free_bikes,empty_slots,extra.uid,extra.number,extra.slots,extra.bike_uids
0,6bf6e588dec71ba0846a28656aa849f5,Falls Road,54.599399,-5.947595,2024-10-24T00:31:22.985464Z,2,5,20755478,3941,8,"[830107, 830707]"


I cannot get normalize using record_path and meta to cooperate, so I have used a while-loop to generate the numbers that are the column names, and then concatenated each resulting single row dataframe onto a placeholder frame. 

In [13]:
def get_stations(data):
    count = 0
    target = len(data.axes[1])
    output = pd.DataFrame(np.array([[0,0,0,0,0,0,0,0,0,0,0,]]),
        columns=['id','name','latitude','longitude',
        'timestamp','free_bikes','empty_slots','extra.uid',
        'extra.number','extra.slots','extra.bike_uids'])
    while count < target:
        convert = pd.json_normalize(data[count])
        count += 1
        output = pd.concat([output, convert])
    return output



In [14]:
stations = get_stations(station_set)
stations.head()

Unnamed: 0,id,name,latitude,longitude,timestamp,free_bikes,empty_slots,extra.uid,extra.number,extra.slots,extra.bike_uids
0,0,0,0.0,0.0,0,0,0,0,0,0,0
0,00a175b9694ce13c3b9bc9e1244e1612,Carlisle Circus,54.607397,-5.937086,2024-10-24T00:31:22.985299Z,1,20,20754306,3939,20,[830048]
0,036a91020ad7f40ea0d2bfda0fb14400,City Hall,54.597177,-5.930825,2024-10-24T00:31:22.985383Z,0,20,20754767,3902,20,[]
0,04764ca50c52190e0e6448114ad094d8,Bradbury Place,54.588123,-5.935525,2024-10-24T00:31:22.985262Z,2,14,20754036,3924,16,"[830382, 830488]"
0,0a705a961d040d272e9e72b27fd1239d,Botanic Avenue,54.589329,-5.933508,2024-10-24T00:31:22.985244Z,5,8,20753840,3909,16,"[830182, 830146, 830002, 830698, 830639]"


Because I had to have _something_ in the dataframe, there's a junk row at the top to get rid of, and I want the indexes to be the same numbers as the column names they came from; which is what each station was numbered inside the database. Because each item was added in order, I can use the reset_index function for that once the junk row is gone. <br>
-> In order to get a unique index to drop the junk row by, reset_index has to happen twice. 

In [15]:
stations = stations.reset_index().drop([0]).reset_index()

In [16]:
stations = stations.drop(['level_0','index'],axis=1)

In [17]:
stations.head(2)

Unnamed: 0,id,name,latitude,longitude,timestamp,free_bikes,empty_slots,extra.uid,extra.number,extra.slots,extra.bike_uids
0,00a175b9694ce13c3b9bc9e1244e1612,Carlisle Circus,54.607397,-5.937086,2024-10-24T00:31:22.985299Z,1,20,20754306,3939,20,[830048]
1,036a91020ad7f40ea0d2bfda0fb14400,City Hall,54.597177,-5.930825,2024-10-24T00:31:22.985383Z,0,20,20754767,3902,20,[]


Remove 'extra.' prefix; it's very confusing. 'extra' is the name of the folder, not anything to do with the value. Also gonna call 'id' "station id" for my own sanity. Check and handle NaNs.

In [18]:
stations = stations.rename(columns={
    'id' : 'station_id','extra.uid' : 'uid','extra.number' : 'number',
    'extra.slots' : 'slots','extra.bike_uids' : 'bike_uids'})

In [19]:
stations[['empty_slots', 'slots']].isnull().value_counts()

empty_slots  slots
False        False    58
Name: count, dtype: int64

In [20]:
stations.loc[49]

station_id     d873badf682d23aaa171313c0c6973af
name                          Cathedral Gardens
latitude                               54.60368
longitude                             -5.929345
timestamp           2024-10-24T00:31:22.985347Z
free_bikes                                    1
empty_slots                                  16
uid                                    20754597
number                                     3914
slots                                        20
bike_uids                              [830108]
Name: 49, dtype: object

Turns out both of those null values are in the same row, 49, and given that there are no free/available bikes but also no empty slots, I think this station is out of service, so I'm going to drop it entirely. 

In [21]:
stations = stations.drop(49)

In [22]:
stations.head(2)

Unnamed: 0,station_id,name,latitude,longitude,timestamp,free_bikes,empty_slots,uid,number,slots,bike_uids
0,00a175b9694ce13c3b9bc9e1244e1612,Carlisle Circus,54.607397,-5.937086,2024-10-24T00:31:22.985299Z,1,20,20754306,3939,20,[830048]
1,036a91020ad7f40ea0d2bfda0fb14400,City Hall,54.597177,-5.930825,2024-10-24T00:31:22.985383Z,0,20,20754767,3902,20,[]


**Put your parsed results into a DataFrame.**

In [26]:
belfast #network meta; not needed for further analysis

Unnamed: 0,network.id,network.name,latitude,longitude,city,country,network.href,network.company,network.system
0,belfastbikes-belfast,BelfastBikes,54.5969,-5.92918,Belfast,GB,/v2/networks/belfastbikes-belfast,Nextbike GmbH,Nextbike


In [24]:
stations

Unnamed: 0,station_id,name,latitude,longitude,timestamp,free_bikes,empty_slots,uid,number,slots,bike_uids
0,00a175b9694ce13c3b9bc9e1244e1612,Carlisle Circus,54.607397,-5.937086,2024-10-24T00:31:22.985299Z,1,20,20754306,3939,20,[830048]
1,036a91020ad7f40ea0d2bfda0fb14400,City Hall,54.597177,-5.930825,2024-10-24T00:31:22.985383Z,0,20,20754767,3902,20,[]
2,04764ca50c52190e0e6448114ad094d8,Bradbury Place,54.588123,-5.935525,2024-10-24T00:31:22.985262Z,2,14,20754036,3924,16,"[830382, 830488]"
3,0a705a961d040d272e9e72b27fd1239d,Botanic Avenue,54.589329,-5.933508,2024-10-24T00:31:22.985244Z,5,8,20753840,3909,16,"[830182, 830146, 830002, 830698, 830639]"
4,0bcde9e962503960bbf3c1940e63be23,Europa Bus Station / Blackstaff Square,54.594821,-5.933116,2024-10-24T00:31:22.985446Z,5,6,20755396,3910,12,"[830389, 830073, 830690, 830634, 830625]"
5,1d32f54e87d456e46654efe7f4d3bb79,Gasworks (Cromac Street),54.592172,-5.925547,2024-10-24T00:31:22.985497Z,8,6,20755616,3907,12,"[83475, 830316, 830277, 830267, 830685, 830655..."
6,21736d82110ac1d36d2d08877b89d666,Titanic Train Halt/ Sydenham Road,54.60342,-5.906825,2024-10-24T00:31:22.986085Z,11,10,80624568,3949,20,"[830424, 830411, 830379, 830280, 830276, 83024..."
7,2dde63ecc4e7629be51b8869769355f8,Castlereagh Road,54.586112,-5.891912,2024-10-24T00:31:22.986399Z,13,1,254962866,3901,12,"[830440, 830407, 830403, 830395, 830191, 83014..."
8,37bdf55233e68976b8d6e48b05b4870f,Ormeau Road / Rosetta Roundabout,54.572534,-5.915076,2024-10-24T00:31:22.985800Z,14,2,20756242,3930,16,"[83471, 830386, 830171, 830131, 830007, 830004..."
9,38c115f757ddfb8932019674ceaae244,Millfield / Divis Street,54.599697,-5.936308,2024-10-24T00:31:22.985723Z,6,9,20756044,3920,18,"[830320, 830080, 830074, 830052, 830699, 830570]"


I think I might want to use bike usage in some way, so I am saving the data from the '8:10 pm on a Saturday' call, just in case.

In [25]:
# stations.to_csv('bike_stations.csv')