# Dublinbikes Project
<img src="https://www.dublinbikes.ie/assets/img/home/accueil-une-bg.jpg" alt="A row of Dublin Bikes in front of Dublin's Custom House" width="800"/>

This project is undertaken for the [Programming for Data Analytics](https://www.atu.ie/courses/higher-diploma-in-science-data-analytics#:~:text=Programming%20for%20Data%20Analytics) module as part of the [Higher Diploma in Science in Data Analytics](https://www.atu.ie/courses/higher-diploma-in-science-data-analytics) at ATU.

The brief for this project was quite broad:

> Write a notebook that demonstrates what you have learned in the Module, if you can not think of an area you wish to explore, then create a project that analyses windspeed for windfarms.

As I have an interest is public transport, library-type sharing systems, and the movement towards better utilisation of public service data, I have decided to interrogate some of the freely available data for the [Dublinbikes](https://www.dublinbikes.ie/) bike-sharing scheme.

In [7]:
import pandas as pd # for data manipulation
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for visualisations
import folium as fl # for mapping
from datetime import datetime 

# set Seaborn visual theme for plots
sns.set()

## Overview

In [8]:
# reading the CSV into a dataframe and getting a quick overview
df = pd.read_csv("https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/6bad1ee7-2c2b-4a52-9567-db7445fc64ff/download/dublinbike-historical-data-2024-01.csv")
df.head()

KeyboardInterrupt: 

In [None]:
df['NAME'] = df['NAME'].str.title()
df.info()

The data contains 11 columns:
1. **STATION ID**: a unique identifying digit for each Dublinbike station
2. **TIME**: a timestamp for the data
3. **LAST UPDATED**: when the data was last updated
4. **NAME**: the station name
5. **BIKE_STANDS**: the total number of bike stands at the station
6. **AVAILABLE_BIKE_STANDS**: the number of stands available to deposit a bike
7. **AVAILABLE_BIKES**: the number of bikes available to borrow
8. **STATUS**: the open/closed status of the station
9. **ADDRESS**: the address at which the station is located
10. **LATITUDE**: the co-ordinates of the station's latitude
11. **LONGITUDE**: the co-ordinates of the station's longitude

I can see from the <code>info()</code> overview that there are no null values in the dataset. This will make analysing the data more straightforward. I won't always be this lucky - if I come across null values in another analysis I will need to determine a suitable approach for handling them depending on the nature of the overall dataset - would I remove them entirely; or average them out depending on the values either side of them? Null values need to be considered in their context on each instance.

In [None]:
df['TIME'] = pd.to_datetime(df['TIME'], format='%d-%b-%Y %H:%M')
df['TIME'].dtype

In [None]:
# extract unique detail to own dataframe
# sort bike stands by ascending
unique_df = df[['NAME', 'BIKE_STANDS','LATITUDE', 'LONGITUDE']].drop_duplicates()
unique_df = unique_df.sort_values(by='BIKE_STANDS', ascending=True)

# Display the resulting dataframe
unique_df.head()

In [None]:
# show distribution of bike stands across the various stations
plt.figure(figsize=(20, 6))
plt.bar(unique_df['NAME'], unique_df['BIKE_STANDS'], color='#70C1B3', edgecolor='black', linewidth=0.3)
plt.xlabel('Station Name', fontsize=12, weight='bold')
plt.ylabel('# of Bike Stands', fontsize=12, weight='bold')
plt.title('Bike Stands per Station', fontsize=14, weight='bold')
plt.xticks(rotation=45, ha='right', fontsize=6)

plt.show()

In [None]:
# create base map centered on mean of co-ords
map_centre = [unique_df['LATITUDE'].mean(), unique_df['LONGITUDE'].mean()]
map = fl.Map(location=map_centre, zoom_start=14)

# add pin for each location
for _, row in unique_df.iterrows():
    fl.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=f"Location: {row['LATITUDE']}, {row['LONGITUDE']}"
    ).add_to(map)

map.save('map.html')
map