# Predicting Bike Rentals

The data we will use in this chapter is used with the permission of Capital Bikeshare. You can download the data from [their website](https://www.capitalbikeshare.com/system-data). We are using a prepared version of this data that has already been augmented with additional weather data.

Predicting bike rental trends is very important from both an operational and planning perspective. Bikeshare companies need to stay up to date on rental trends to know where they should add new facilities, and how to reposition bikes to get them to the locations with the highest demand. They do not want to wait until all of the bikes are rented at a particular location before moving additional bikes into position, as that is lost revenue for them.

A lot of the data that we interact with today is stored in databases. For example:

* Student records, including grades, at a school
* Posts and friends in your favorite social network
* News stories on a newspaper’s website
* Your contacts list on your mobile phone
* All images that make up Google Maps

All these bits of information are stored in various kinds of databases. Some of these are stored in relational databases that are available as open source tools like Postgresql, MySQL and SQLite, as well as commercial databases such as Google BigQuery, Oracle, Microsoft SQL Server, or Amazon Aurora.

Others are stored in proprietary systems like Google’s BigTable or Facebook’s Haystack Object Store.

# 1.1. Query Language

Whatever the database is, there needs to be a way to extract data from it and a lot of these systems have agreed on a shared language for accessing data. For relational databases, this language is called SQL (Structured Query Language, pronounced like “sequel”).

Before you stress out about learning a new language, lets take a minute and review the things you have already learned how to do with Pandas.

* You can change the shape of a DataFrame by selecting the columns you want or by computing new columns.
* You can filter a DataFrame by using conditions to select just the rows you want.
* You can reorder a DataFrame by sorting on one or more columns.
* You can group by one or more columns and compute aggregate summaries of other columns in the group.
* You can join two dataframes together using the merge function.

The operations just described comprise a basic set of tools that any data manipulation language should support, and SQL supports these operations very well, in a very natural way. You are not going to have to learn any new concepts in this chapter, you are just learning some new query syntax that will open up whole new worlds of data access for you. Most businesses run on a relational database of some kind, so it follows that a lot of real world data analysis requires you to get data from one. In this section we will teach you how to get started.

# 1.2 Getting started with the bike data

In this Lesson, we will be hands on and try out SQL with the Capital bike sharing dataset, hosted on an SQLLite database. In order to run queries on our tables, we must:
* install the SQL magic function
* load the SQL extension
* connect to the database with our data

In [1]:
# install SQL magic function
!pip install ipython-sql



In [2]:
# load extension sql
%load_ext sql

In [3]:
# connect to the database
%sql sqlite:////Users/kkoum/Documents/database.db

'Connected: @/Users/kkoum/Documents/database.db'

## 1.2.1. Verify access to the dataset

Let’s verify that you have access to the dataset by running a simple SQL query.

The code snippet contains a few lines:

* The first line of that code block is just a magic invocation that lets Jupyter know that this cell contains SQL and not Python.
* The second line introduces SQL syntax for the first time.

To help you understand the SQL commands we are using, the SQL syntax words are listed in CAPITAL letters, the lowercase words are the names of tables or columns. The SQL statement translates to: grab (SELECT) all the values (*) in the table called trip_data but only show me the first ten (LIMIT 5).

In [4]:
%%sql
SELECT *
FROM trip_data
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_date,end_date,duration,start_station_id,start_station,end_station_id,end_station,bike_number,member_type
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member
2012-01-01 00:10:05,2012-01-01 00:29:28,1162,31400,Georgia & New Hampshire Ave NW,31103,16th & Harvard St NW,W00524,Casual
2012-01-01 00:10:23,2012-01-01 00:29:28,1145,31400,Georgia & New Hampshire Ave NW,31103,16th & Harvard St NW,W00235,Member
2012-01-01 00:15:41,2012-01-01 00:23:46,485,31101,14th & V St NW,31602,Park Rd & Holmead Pl NW,W00864,Member
2012-01-01 00:15:42,2012-01-01 00:23:34,471,31102,11th & Kenyon St NW,31109,7th & T St NW,W00995,Member


The trip_data table is composed of several columns, but we don’t always want to read all the columns in a table. For example, if we just want the subscriber type, start time, and duration in minutes columns we could select:

In [5]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Member,2012-01-01 00:04:00,475
Casual,2012-01-01 00:10:05,1162
Member,2012-01-01 00:10:23,1145
Member,2012-01-01 00:15:41,485
Member,2012-01-01 00:15:42,471


Tips: SQL doesn’t care about line breaks so we can spread a SQL query over multiple lines just to make it easier to read.

Its also really easy to forget the exact names of all of the columns in a table, especially when you are just getting started with a new database. Here’s a handy one-liner that will remind you of the names of your tables and all of their columns and types:

In [6]:
%%sql
SELECT name, sql
FROM sqlite_master

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name,sql
stations,"CREATE TABLE ""stations"" ( 	""station_id""	INTEGER, 	""name""	TEXT, 	""capacity""	INTEGER, 	""status""	TEXT, 	""latitude""	REAL, 	""longitude""	REAL )"
trip_data,"CREATE TABLE ""trip_data"" ( 	""start_date""	TEXT, 	""end_date""	TEXT, 	""duration""	INTEGER, 	""start_station_id""	INTEGER, 	""start_station""	TEXT, 	""end_station_id""	INTEGER, 	""end_station""	TEXT, 	""bike_number""	TEXT, 	""member_type""	TEXT )"


Note, this works fine for SQLITE but will not work for Postgresql or MySQL or other databases. Each database has its own query for things like this, and once you get more experience you’ll be able to easily find them on the internet.

# 1.3 Filtering

We’ve seen how to look only at certain columns of the table but it is often useful to only look at certain rows in a table. For example, we could look only at the bike trips which are at least a certain number of minutes. Let’s say you’re only interested in bike trips of 60 minutes or more:

In [7]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
WHERE duration >= 3600
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Casual,2012-01-01 01:16:16,41427
Casual,2012-01-01 03:08:08,33057
Casual,2012-01-01 08:53:48,4535
Casual,2012-01-01 09:02:07,5588
Casual,2012-01-01 09:22:24,4279


It’s also possible to filter by multiple criteria. For example, we can look at bike trips which took 60 minutes or more and were perfomed by members:

In [8]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
WHERE duration >= 3600 AND member_type = 'Member'
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Member,2012-01-01 13:23:50,6435
Member,2012-01-01 13:35:32,4690
Member,2012-01-01 14:08:29,4003
Member,2012-01-01 14:17:36,7135
Member,2012-01-01 14:48:25,5590


## 1.3.1. Practice exercises

Figure out how to get all the trips of the bike with id W01274 and only include rides which are shorter than 15 minutes.

In [9]:
%%sql
SELECT *
FROM trip_data
WHERE bike_number = 'W01274' AND duration < 900
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_date,end_date,duration,start_station_id,start_station,end_station_id,end_station,bike_number,member_type
2012-01-01 16:26:03,2012-01-01 16:38:13,729,31202,14th & R St NW,31307,3000 Connecticut Ave NW / National Zoo,W01274,Member
2012-01-01 18:04:37,2012-01-01 18:06:41,124,31307,3000 Connecticut Ave NW / National Zoo,31307,3000 Connecticut Ave NW / National Zoo,W01274,Member
2012-01-01 20:45:40,2012-01-01 20:56:18,637,31307,3000 Connecticut Ave NW / National Zoo,31202,14th & R St NW,W01274,Member
2012-01-02 18:24:16,2012-01-02 18:28:35,258,31602,Park Rd & Holmead Pl NW,31401,14th St & Spring Rd NW,W01274,Member
2012-01-03 08:15:45,2012-01-03 08:27:02,677,31401,14th St & Spring Rd NW,31106,Calvert & Biltmore St NW,W01274,Member


Get the ending station and the duration of all the bike trips originating from station with id 31111 that lasted 8 hours or more.

In [10]:
%%sql
SELECT end_station, duration
FROM trip_data
WHERE start_station_id = '31111' AND duration >= 28800
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


end_station,duration
11th & F St NW,39144
14th & V St NW,30128
13th St & New York Ave NW,40131
11th & F St NW,49106
7th & F St NW / National Portrait Gallery,29189


How many trips longer than 5 hours started and ended in station with id 31111 by casual riders?

In [11]:
%%sql
SELECT COUNT(*) as trips
FROM trip_data
WHERE duration > 18000 AND start_station_id = "31111" AND end_station_id = "31111" AND member_type = 'Casual' 

 * sqlite:////Users/kkoum/Documents/database.db
Done.


trips
3


# 1.4. Sorting

So far, we’ve only looked at rows of data in the order of the query is returning to us. What if we want to see the rows in a certain sorting order? We can use the ORDER BY command to sort them by some other criteria.

For example, to see the bike trips in the order of the duration in seconds:

In [12]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
ORDER BY duration
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Member,2012-01-12 13:52:17,60
Member,2012-01-27 09:16:33,60
Member,2012-01-29 12:42:50,60
Member,2012-02-12 11:54:58,60
Member,2012-02-16 22:31:36,60


Well, it turns out by default the sorting order is ascending. To sort the rows in descending order, add the keyword DESC.

In [13]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
ORDER BY duration DESC
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Casual,2012-08-05 10:27:21,86326
Casual,2012-08-05 10:27:59,86303
Casual,2012-06-02 19:24:06,85671
Casual,2012-06-04 18:34:46,85531
Casual,2012-09-01 17:48:38,85430


Of course, we can mix WHERE and ORDER BY, to get only the bike trips with Casual Member type in the order of the duration.

In [14]:
%%sql
SELECT member_type, start_date, duration
FROM trip_data
WHERE member_type = "Casual"
ORDER BY duration
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_date,duration
Casual,2012-06-16 06:32:35,63
Casual,2012-09-13 14:27:47,63
Casual,2012-03-14 22:16:17,70
Casual,2012-02-03 18:09:48,73
Casual,2012-07-25 22:39:01,75


## 1.4.1 Practice exercises

Get the start and end station IDs for bike trips that are 60 minutes or longer, in the order of largest number of seconds first and display the top 20 results.

In [15]:
%%sql
SELECT start_station_id, end_station_id
FROM trip_data
WHERE duration >= 3600
ORDER BY duration DESC
LIMIT 20

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_station_id,end_station_id
31202,31243
31202,31243
31306,31306
31628,31628
31502,31502
31011,31009
31011,31009
31623,31248
31204,31204
31213,31220


On which bike was the longest bike ride done? How many seconds long was that ride?

In [16]:
%%sql
SELECT bike_number, duration
FROM trip_data
ORDER BY duration DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


bike_number,duration
W00751,86326


What is the starting station and duration of the longest ride starting and ending at the same station?

In [17]:
%%sql
SELECT start_station, duration
FROM trip_data
WHERE start_station = end_station
ORDER BY duration DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_station,duration
39th & Calvert St NW / Stoddert,85671


# 1.5. Aggregation or Group By

One very powerful feature of SQL is that it allows us to create summary information by grouping rows together. For example, we could ask ourselves how many bike trips were taken for each subscriber type or which subscriber type has the most bike trips.

In [18]:
%%sql
SELECT member_type, COUNT(*) AS bike_trips
FROM trip_data
GROUP BY member_type
ORDER BY bike_trips DESC

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,bike_trips
Member,1656252
Casual,372642
Unknown,17


GROUP BY member_type takes all the rows with a given member_type and produces a single row in the result. This means that we need to tell SQL how we want to combine the other columns’ values into a single row. The above example uses COUNT(*) which reports the number of rows that were combined.

Aggregating the values for each member_type is not hard. Since they are all the same, SQL just gives us a single copy of the publisher name. Other columns, we need to either ignore (causing them to be omitted from the output) or specify a way to aggregate them.

We must specify an aggregate function for any column that we SELECT in our query (except the column that we’re grouping by) in order for the command to succeed. If we don’t specify a way to aggregate the value most database servers will complain. However, SQLITE does not. SQLite lets you do silly things without giving you an error. For example, the following query will work, but you have no idea what the results actually mean.

In [19]:
%%sql
SELECT duration, COUNT(*)
FROM trip_data
GROUP BY member_type
ORDER BY COUNT(*) DESC

 * sqlite:////Users/kkoum/Documents/database.db
Done.


duration,COUNT(*)
475,1656252
1162,372642
1181,17


Here you have grouped by member_type, but without member_type in the select clause you have no idea which row corresponds to each member type. That is why most databases will flag this as a error. Furthermore the duration field may be the first duration in the group or maybe the last duration in the group or possibly in between, But its not defined. The best practices for writing GROUP BY queries that work well across database systems are as follows:

* Always include the GROUP BY column(s) in your SELECT clause.

* If you include a column that is not in the GROUP BY clause in your SELECT clause you must do some form of aggregation on the values in that column. For example, min, max, mean, count, etc.

Let’s go back briefly to the first query in the Aggregation section. The top result was the count of bike trips for member_type Member: 1.656.252

If you want to get a more granular break down of this count, you may specify multiple columns to aggregate within the GROUP BY clause, for example: further breakdown the aggregate count by the start station IDs:

In [20]:
%%sql
SELECT member_type, start_station_id, COUNT(*) AS bike_trips
FROM trip_data
WHERE member_type = 'Member'
GROUP BY member_type, start_station_id
ORDER BY COUNT(*) DESC
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,start_station_id,bike_trips
Member,31200,47052
Member,31623,40818
Member,31201,36392
Member,31214,32321
Member,31101,29770


Great! Now that you’re familiar with how to aggregate data using SQL queries with COUNT() as your aggregation function, let’ take a look at other aggregation functions. There are many such functions. Some common ones include:

* SUM: To add the values together
* AVG: To compute the mean of the values
* MIN or MAX: To compute the minimum and maximum respectively

So we could, for example, compute the total number of minutes of all bike trips for all subscriber types

In [21]:
%%sql
SELECT member_type, SUM(duration)/60 AS total_minutes
FROM trip_data
GROUP BY member_type

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,total_minutes
Casual,15090095
Member,19638594
Unknown,242


## 1.5.1. Practice exercises

Compute the average duration of bike trips for each starting station id and list the results in order of highest average to lowest average for the 10 stations with the highest average. What is the highest average duration?

In [22]:
%%sql
SELECT start_station_id, AVG(duration) AS average_duration
FROM trip_data
GROUP BY start_station_id
ORDER BY average_duration DESC
LIMIT 10

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_station_id,average_duration
31262,5215.625
31248,2842.0650870517698
31247,2589.210461253187
31240,2401.782774390244
31258,2389.185922502666
31041,2330.345104333868
31706,2093.6473988439307
31703,1994.9260869565217
31235,1958.2764060027807
31243,1924.9379058659367


What is the bike_number and count of the bike with the most rides?

In [23]:
%%sql
SELECT bike_number, COUNT(*) AS rides
FROM trip_data
GROUP BY bike_number
ORDER BY rides DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


bike_number,rides
W01414,1949


How many were the total rides by Members and Casual users?

In [24]:
%%sql
SELECT member_type, COUNT(*) AS total_rides
FROM trip_data
WHERE member_type = 'Casual' OR member_type = 'Member'
GROUP BY member_type

 * sqlite:////Users/kkoum/Documents/database.db
Done.


member_type,total_rides
Casual,372642
Member,1656252


What is the station that has the most rides that start and end at the same station? How many rides started there?

In [25]:
%%sql
SELECT start_station, COUNT(*) AS rides
FROM trip_data
WHERE start_station = end_station
GROUP BY start_station
ORDER BY rides DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_station,rides
Jefferson Dr & 14th St SW,4388


# 1.6 Joining

It is frequently the case that the data we need is spread across multiple tables in our database. For example, we might want to store additional information about the starting and ending location of the ride beside their IDs in a table called stations. Here is a look at that table:

In [26]:
%%sql
SELECT *
FROM stations
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


station_id,name,capacity,status,latitude,longitude
31000,Eads St & 15th St S,15,OPEN,38.858971,-77.05323
31001,18th St & S Eads St,11,OPEN,38.85725,-77.05332
31002,Crystal Dr & 20th St S,17,OPEN,38.856425,-77.049232
31003,Crystal Dr & 15th St S,10,OPEN,38.86017,-77.049593
31004,Aurora Hills Cmty Ctr / 18th St & S Hayes St,11,OPEN,38.857866,-77.05949


This means that we now have the data to answer questions like "How many bike trips originated from bike station that’s at Van Ness Metro / UDC?", but the data are spread across two tables.

We could imagine storing the name column in our trip_data table since we list the start and end stations IDs for each trip but there are a few important reasons why that’s a bad idea:

* We would waste space by duplicating data (This is not a big deal for this example but a real concern for large systems)
* Updating data (for example status of station from OPEN to CLOSED) would require updating each row in trip_data that refers to that station ID. This is time-consuming and error-prone.

Instead we leave the data in two separate tables and need a way to "join" the values together. We can do that by just listing multiple table names but the result is a mess:

In [27]:
%%sql
SELECT *
FROM trip_data, stations
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_date,end_date,duration,start_station_id,start_station,end_station_id,end_station,bike_number,member_type,station_id,name,capacity,status,latitude,longitude
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31000,Eads St & 15th St S,15,OPEN,38.858971,-77.05323
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31001,18th St & S Eads St,11,OPEN,38.85725,-77.05332
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31002,Crystal Dr & 20th St S,17,OPEN,38.856425,-77.049232
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31003,Crystal Dr & 15th St S,10,OPEN,38.86017,-77.049593
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31004,Aurora Hills Cmty Ctr / 18th St & S Hayes St,11,OPEN,38.857866,-77.05949


If you look carefully you might notice that the rows are identical for the first few columns and then start to differ. That’s because SQL joins each row in the first table with each row in the second table. With 579 rows in stations and 2.028.911 rows in trips_data, we end up with a table of more than a billion rows!

In [28]:
%%sql
SELECT COUNT(*) AS number_of_rows
FROM trip_data, stations

 * sqlite:////Users/kkoum/Documents/database.db
Done.


number_of_rows
1174739469


This rarely if ever is what we want. In most cases, we want to match up some aspect of the rows in the first table with some aspect of the rows in the second table. In most cases, we want to match up based on some column being equal.

In our bike sharing example, the station_id column of stations matches up with the start_station_id or end_station_id column of trip_data. To force this match, we filter out the ones that don’t have the same value for both of these columns.

In [29]:
%%sql
SELECT *
FROM trip_data, stations
WHERE start_station_id = station_id
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


start_date,end_date,duration,start_station_id,start_station,end_station_id,end_station,bike_number,member_type,station_id,name,capacity,status,latitude,longitude
2012-01-01 00:04:00,2012-01-01 00:11:56,475,31245,7th & R St NW / Shaw Library,31109,7th & T St NW,W01412,Member,31245,7th & R St NW / Shaw Library,15,OPEN,38.912719,-77.022155
2012-01-01 00:10:05,2012-01-01 00:29:28,1162,31400,Georgia & New Hampshire Ave NW,31103,16th & Harvard St NW,W00524,Casual,31400,Georgia & New Hampshire Ave NW,19,OPEN,38.93668393,-77.02418089
2012-01-01 00:10:23,2012-01-01 00:29:28,1145,31400,Georgia & New Hampshire Ave NW,31103,16th & Harvard St NW,W00235,Member,31400,Georgia & New Hampshire Ave NW,19,OPEN,38.93668393,-77.02418089
2012-01-01 00:15:41,2012-01-01 00:23:46,485,31101,14th & V St NW,31602,Park Rd & Holmead Pl NW,W00864,Member,31101,14th & V St NW,31,OPEN,38.917931,-77.032112
2012-01-01 00:15:42,2012-01-01 00:23:34,471,31102,11th & Kenyon St NW,31109,7th & T St NW,W00995,Member,31102,11th & Kenyon St NW,27,CLOSED,38.929464,-77.027822


Notice that the result looks more sensical: we end up with one row from trip_data and the corresponding row from bikeshare_stations.

We can check the size of the resulting table by running:

In [30]:
%%sql
SELECT COUNT(*) AS number_of_rows
FROM trip_data, stations
WHERE start_station_id = station_id

 * sqlite:////Users/kkoum/Documents/database.db
Done.


number_of_rows
2005443


You might also see some cases where the comma between the table names is replaced with the keyword JOIN and WHERE is replaced with ON. This is synonymous but sometimes preferred to make it clear that you are joining two tables and that your filters are there to specify how those tables are to be joined.

In [31]:
%%sql
SELECT COUNT(*) AS number_of_rows
FROM trip_data JOIN stations ON start_station_id = station_id

 * sqlite:////Users/kkoum/Documents/database.db
Done.


number_of_rows
2005443


We can now use all the SQL tools we’ve learned on this combined table. For example, in order to find out which open bike station has the highest bike trip count, so that we can ensure there are always plenty of bikes available there, we can run:

In [32]:
%%sql
SELECT name
FROM trip_data JOIN stations ON start_station_id = station_id
WHERE status = 'OPEN'
GROUP BY name
ORDER BY COUNT(*) DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name
Massachusetts Ave & Dupont Circle NW


## 1.6.1. Practice exercises

Use JOIN to show the station IDs of active stations and what’s the average duration of bike trips originating and ending at the same station with member type Member.

In [33]:
%%sql
SELECT station_id, AVG(duration) AS average_duration
FROM trip_data JOIN stations ON start_station_id = station_id
WHERE start_station = end_station AND member_type = 'Member'
GROUP BY station_id
LIMIT 5

 * sqlite:////Users/kkoum/Documents/database.db
Done.


station_id,average_duration
31000,1295.6571428571428
31001,966.0491803278688
31002,1740.0762711864406
31003,1940.421052631579
31004,1434.1311475409836


What is the name of the station where the most rides start?

In [34]:
%%sql
SELECT name
FROM trip_data JOIN stations ON start_station_id = station_id
GROUP BY station_id
ORDER BY COUNT(*) DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name
Massachusetts Ave & Dupont Circle NW


What is the name of the station where the most rides end?

In [35]:
%%sql
SELECT name
FROM trip_data JOIN stations ON end_station_id = station_id
GROUP BY station_id
ORDER BY COUNT(*) DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name
Massachusetts Ave & Dupont Circle NW


What is the name of the station where most rides both start and end?

In [36]:
%%sql
SELECT name
FROM trip_data JOIN stations ON start_station_id = station_id
WHERE start_station = end_station
GROUP BY station_id
ORDER BY COUNT(*) DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name
Jefferson Dr & 14th St SW


What is the name of the most popular ending station for rides that begin at Massachusetts Ave & Dupont Circle NW (Station: 31200)?

In [37]:
%%sql
SELECT name
FROM trip_data JOIN stations ON end_station_id = station_id
WHERE start_station_id = '31200'
GROUP BY station_id
ORDER BY COUNT(*) DESC
LIMIT 1

 * sqlite:////Users/kkoum/Documents/database.db
Done.


name
15th & P St NW
