# CityBikes

Send a request to CityBikes for the city of your choice. 

In [5]:
import pandas as pd
import requests
import json
stationsurl = "http://api.citybik.es/v2/networks/broward"
stations_r=requests.get(stationsurl)
stations_r.json=json.loads(stations_r.text)

Parse through the response to get the details you want for the bike stations in that city (latitude, longitude, number of bikes). 

In [6]:
stations_parsed_data = []
for info in stations_r.json['network']['stations']:
    stations_info_data = {
        "name": info['name'],
        "latitude": info['latitude'],
        "longitude": info['longitude'],
        "free_bikes": info['free_bikes'],
        "empty_slots": info['empty_slots'],
        "total_bikes": (info['empty_slots'] + info ['free_bikes']),
        "usage_percentage": (info['empty_slots'] / (info['empty_slots'] + info ['free_bikes'])),
    }
    stations_parsed_data.append(stations_info_data)
    stations_parsed_data

Put your parsed results into a DataFrame.

In [8]:
import pandas as pd
stations_df = pd.DataFrame(stations_parsed_data)
stations_df

Unnamed: 0,name,latitude,longitude,free_bikes,empty_slots,total_bikes,usage_percentage
0,Hollywood North Beach,26.03444,-80.11455,4,7,11,0.636364
1,Earl Lifshey 3.0,26.16739,-80.10032,7,5,12,0.416667
2,Atlantic & Briny,26.23237,-80.08965,4,5,9,0.555556
3,Esplanade Park,26.12026,-80.14819,6,5,11,0.454545
4,Bayshore & A1A 3.0,26.12807,-80.10368,11,6,17,0.352941
5,Holiday Park at Sunrise,26.13707,-80.12981,6,5,11,0.454545
6,Broadwalk at Jefferson St.,26.00656,-80.116,11,6,17,0.352941
7,Las Olas Beach Park,26.11837,-80.10492,7,7,14,0.5
8,Las Olas & SE 9th Ave,26.11954,-80.13407,11,0,11,0.0
9,George English Park,26.13813,-80.1157,6,5,11,0.454545


In [13]:
# I'm saving the dataframe to CSV for the next section
stations_df.to_csv('stations.csv', index=False)

## Descriptive statistics

In [9]:
print(stations_df.describe())

        latitude  longitude  free_bikes  empty_slots  total_bikes  \
count  20.000000  20.000000   20.000000    20.000000    20.000000   
mean   26.125372 -80.112819    6.350000     5.600000    11.950000   
std     0.047343   0.015257    2.476734     2.087557     2.187885   
min    26.006560 -80.148190    3.000000     0.000000     9.000000   
25%    26.114793 -80.117942    4.750000     5.000000    11.000000   
50%    26.122285 -80.106615    6.000000     5.000000    11.000000   
75%    26.137867 -80.103572    7.250000     6.000000    13.000000   
max    26.232370 -80.089650   11.000000    10.000000    17.000000   

       usage_percentage  
count         20.000000  
mean           0.472156  
std            0.165164  
min            0.000000  
25%            0.403409  
50%            0.454545  
75%            0.547980  
max            0.769231  


## Checking types of data

In [10]:
stations_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              20 non-null     object 
 1   latitude          20 non-null     float64
 2   longitude         20 non-null     float64
 3   free_bikes        20 non-null     int64  
 4   empty_slots       20 non-null     int64  
 5   total_bikes       20 non-null     int64  
 6   usage_percentage  20 non-null     float64
dtypes: float64(3), int64(3), object(1)
memory usage: 1.2+ KB


## Data cleaning - Checking for duplicates

In [6]:
## for this small a dataframe, it can be done visually, but I thought I'd show the code

In [11]:
print(stations_df.duplicated().sum())

0


In [8]:
# so no duplicates, hence none to drop

## Data cleaning - Checking for Null Values

In [12]:
print(stations_df.isnull().sum())

name                0
latitude            0
longitude           0
free_bikes          0
empty_slots         0
total_bikes         0
usage_percentage    0
dtype: int64


In [11]:
# so no null values either, hence none to drop
# we won't check for outliers as thats not relevant for stations