<a id="top"></a>

The purpose of this notebook is to create a dataset containing LINEID, ROUTEID, PROGRNUMBER & STOPPOINTID and to gain a better understanding of this data.

***

# Import Packages

In [1]:
import pandas as pd
import sqlite3

***

<a id="contents"></a>
# Contents

- [1. Connect to Database](#connect_to_db)
- [2. Query Database](#query_db)

***

<a id="connect_to_db"></a>
# 1. Connect to Database
[Back to contents](#contents)

In [2]:
# def function to create connection to db
def create_connection(db_file):
    """
    create a database connection to the SQLite database specified by db_file
    :param df_file: database file
    :return: Connection object or None
    """
    conn = None
    try: 
        conn = sqlite3.connect(db_file)
        return conn
    except Error as e:
        print(e)
        
    return conn

In [3]:
# create connection to database
db = '/home/faye/Data-Analytics-CityRoute/dublinbus.db'
conn = create_connection(db)

***

<a id="query_db"></a>
# 2. Query Database
[Back to contents](#contents)

In [4]:
# initialise query
query = """
SELECT DISTINCT T.LINEID, T.ROUTEID, T.DIRECTION, L.PROGRNUMBER, L.STOPPOINTID
FROM leavetimes L, trips T
WHERE L.TRIPID = T.TRIPID
"""

In [5]:
df_query = pd.read_sql(query, conn)

***

<a id="xxx"></a>
# 3. Data Overview
[Back to contents](#contents)

In [6]:
# print number of rows
rows = df_query.shape[0]
print(f"There are {rows} rows in this dataset.")

There are 30649 rows in this dataset.


In [7]:
# print first 5 rows
print("The first 5 rows:")
df_query.head(5)

The first 5 rows:


Unnamed: 0,LINEID,ROUTEID,DIRECTION,PROGRNUMBER,STOPPOINTID
0,14,14_15,1,1,248
1,14,14_15,1,2,249
2,14,14_15,1,3,250
3,14,14_15,1,4,251
4,14,14_15,1,5,252


In [8]:
# print last 5 rows
print("The last 5 rows:")
df_query.tail(5)

The last 5 rows:


Unnamed: 0,LINEID,ROUTEID,DIRECTION,PROGRNUMBER,STOPPOINTID
30644,41X,41X_131,1,6,772
30645,41X,41X_131,1,7,773
30646,46A,46A_65,2,16,7688
30647,41,41_22,1,28,7685
30648,41B,41B_58,1,29,3671


In [9]:
# print feature datatypes
print("Feature datatypes")
print("-"*25)
print(df_query.dtypes)

Feature datatypes
-------------------------
LINEID         object
ROUTEID        object
DIRECTION       int64
PROGRNUMBER    object
STOPPOINTID    object
dtype: object


In [10]:
# change datatype of PROGRNUMBER
df_query['PROGRNUMBER'] = df_query['PROGRNUMBER'].astype('int64')
print(f"The datatype of PROGRNUMBER is now {df_query['PROGRNUMBER'].dtype}")

The datatype of PROGRNUMBER is now int64


In [43]:
# print the number of routes for each line
lines = df_query['LINEID'].unique()
lines.sort()
print("The number of routes for each line")
print("-"*50)
routes_total = 0
for line in lines:
    routes = df_query[df_query['LINEID'] == line]['ROUTEID'].unique()
    print(f"Line {line:3} has {len(routes):2} routes.")
    routes_total += len(routes)

print()
print("-"*50)
print(f"There is an average of {routes_total // len(lines)} routes for each line.")

The number of routes for each line
--------------------------------------------------
Line 1   has  5 routes.
Line 102 has  3 routes.
Line 104 has  2 routes.
Line 11  has  4 routes.
Line 111 has  4 routes.
Line 114 has  2 routes.
Line 116 has  2 routes.
Line 118 has  2 routes.
Line 120 has  7 routes.
Line 122 has  8 routes.
Line 123 has  4 routes.
Line 13  has 18 routes.
Line 130 has  2 routes.
Line 14  has  4 routes.
Line 140 has  5 routes.
Line 142 has  5 routes.
Line 145 has 15 routes.
Line 14C has  3 routes.
Line 15  has  6 routes.
Line 150 has  3 routes.
Line 151 has  5 routes.
Line 15A has  3 routes.
Line 15B has  4 routes.
Line 15D has  3 routes.
Line 16  has  5 routes.
Line 161 has  3 routes.
Line 16C has  4 routes.
Line 16D has  1 routes.
Line 17  has 10 routes.
Line 17A has  8 routes.
Line 18  has  3 routes.
Line 184 has  2 routes.
Line 185 has 14 routes.
Line 220 has  4 routes.
Line 236 has  2 routes.
Line 238 has  3 routes.
Line 239 has  2 routes.
Line 25  has  2 routes.
Li

In [39]:
# print details for each line
lines = df_query.sort_values('LINEID')['LINEID'].unique()
for line in lines:
    print()
    print(f"Line {line:3}")
    
    routes = df_query[df_query['LINEID'] == line]['ROUTEID'].unique()
    num_routes = len(routes)
    print(f"\tThis line has {num_routes} routes.")
    
    for route in routes:
        #print()
        print(f"\tRoute: {route:6}")
        print(f"\t\tDirection: {df_query[df_query['ROUTEID'] == route]['DIRECTION'].unique()}")
        stops = df_query[df_query['ROUTEID'] == '14_15']['STOPPOINTID'].to_list()
        print(f"\t\tThis routes has {len(stops):2} stops")
        #print(f"\t\tStops: {stops}")
    
    print("~"*50)


Line 1  
	This line has 5 routes.
	Route: 1_40  
		Direction: [2]
		This routes has 76 stops
	Route: 1_37  
		Direction: [1]
		This routes has 76 stops
	Route: 1_39  
		Direction: [1]
		This routes has 76 stops
	Route: 1_41  
		Direction: [2]
		This routes has 76 stops
	Route: 1_38  
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 102
	This line has 3 routes.
	Route: 102_10
		Direction: [2]
		This routes has 76 stops
	Route: 102_8 
		Direction: [1]
		This routes has 76 stops
	Route: 102_9 
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 104
	This line has 2 routes.
	Route: 104_15
		Direction: [1]
		This routes has 76 stops
	Route: 104_16
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 11 
	This line has 4 routes.
	Route: 11_42 
		Direction: [2]
		This routes has 76 stops
	Route: 11_40 
		Direction: [1]
		This routes has 76 stops

		Direction: [1]
		This routes has 76 stops
	Route: 15B_61
		Direction: [2]
		This routes has 76 stops
	Route: 15B_56
		Direction: [1]
		This routes has 76 stops
	Route: 15B_64
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 15D
	This line has 3 routes.
	Route: 15D_62
		Direction: [1]
		This routes has 76 stops
	Route: 15D_63
		Direction: [2]
		This routes has 76 stops
	Route: 15D_65
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 16 
	This line has 5 routes.
	Route: 16_20 
		Direction: [1]
		This routes has 76 stops
	Route: 16_24 
		Direction: [2]
		This routes has 76 stops
	Route: 16_21 
		Direction: [1]
		This routes has 76 stops
	Route: 16_23 
		Direction: [1]
		This routes has 76 stops
	Route: 16_22 
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 161
	This line has 3 routes.
	Route: 161_51
		Direction: [2]
		This routes has

		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 31D
	This line has 2 routes.
	Route: 31D_51
		Direction: [2]
		This routes has 76 stops
	Route: 31D_50
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 32 
	This line has 2 routes.
	Route: 32_58 
		Direction: [2]
		This routes has 76 stops
	Route: 32_57 
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 32X
	This line has 3 routes.
	Route: 32X_74
		Direction: [1]
		This routes has 76 stops
	Route: 32X_76
		Direction: [2]
		This routes has 76 stops
	Route: 32X_77
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 33 
	This line has 7 routes.
	Route: 33_70 
		Direction: [2]
		This routes has 76 stops
	Route: 33_44 
		Direction: [1]
		This routes has 76 stops
	Route: 33_45 
		Direction: [1]
		This routes has 76 stops
	Route: 33_72 
		Direction: [2]
		This 

		Direction: [2]
		This routes has 76 stops
	Route: 41X_126
		Direction: [2]
		This routes has 76 stops
	Route: 41X_123
		Direction: [1]
		This routes has 76 stops
	Route: 41X_122
		Direction: [1]
		This routes has 76 stops
	Route: 41X_130
		Direction: [1]
		This routes has 76 stops
	Route: 41X_129
		Direction: [1]
		This routes has 76 stops
	Route: 41X_131
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 42 
	This line has 2 routes.
	Route: 42_44 
		Direction: [2]
		This routes has 76 stops
	Route: 42_42 
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 42D
	This line has 2 routes.
	Route: 42D_51
		Direction: [2]
		This routes has 76 stops
	Route: 42D_50
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 43 
	This line has 4 routes.
	Route: 43_85 
		Direction: [1]
		This routes has 76 stops
	Route: 43_89 
		Direction: [2]
		This rout

		Direction: [1]
		This routes has 76 stops
	Route: 68_66 
		Direction: [2]
		This routes has 76 stops
	Route: 68_82 
		Direction: [2]
		This routes has 76 stops
	Route: 68_78 
		Direction: [1]
		This routes has 76 stops
	Route: 68_84 
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 68A
	This line has 2 routes.
	Route: 68A_87
		Direction: [2]
		This routes has 76 stops
	Route: 68A_86
		Direction: [1]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 68X
	This line has 1 routes.
	Route: 68X_88
		Direction: [2]
		This routes has 76 stops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Line 69 
	This line has 6 routes.
	Route: 69_43 
		Direction: [1]
		This routes has 76 stops
	Route: 69_44 
		Direction: [2]
		This routes has 76 stops
	Route: 69_30 
		Direction: [2]
		This routes has 76 stops
	Route: 69_45 
		Direction: [1]
		This routes has 76 stops
	Route: 69_47 
		Direction: [2]
		This routes has

***

[Back to top](#top)