# Capital Bikeshare Summer 2018 Analysis
We use the data publicly available at https://s3.amazonaws.com/dmfa-2020/project-1/2018-capitalbikesharetripdata.zip for this analysis. 

Over the course of this analysis, we find the most popular stations and bikes, providing valuable insights for optimizing the bikeshare system. 

In [11]:
!wget https://s3.amazonaws.com/dmfa-2020/project-1/2018-capitalbikeshare-tripdata.zip

--2024-10-01 13:16:15--  https://s3.amazonaws.com/dmfa-2020/project-1/2018-capitalbikeshare-tripdata.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.217.90.214, 52.217.174.160, 52.216.245.86, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.90.214|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18365732 (18M) [application/zip]
Saving to: '2018-capitalbikeshare-tripdata.zip.1'


2024-10-01 13:16:16 (31.9 MB/s) - '2018-capitalbikeshare-tripdata.zip.1' saved [18365732/18365732]



The zip file shall also be provided on GitHub if required for this analysis. 

**1. We proceed by Unziping the data and combining the two inflating CSV files using csvstack. Then we name the combined file as “trips.csv”.**

In [3]:
!unzip 2018-capitalbikeshare-tripdata.zip

Archive:  2018-capitalbikeshare-tripdata.zip
replace 201807-capitalbikeshare-tripdata.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C


In [13]:
!csvstack 201807-capitalbikeshare-tripdata.csv 201808-capitalbikeshare-tripdata.csv > trips.csv

In [6]:
!wc -l 201807-capitalbikeshare-tripdata.csv

  404762 201807-capitalbikeshare-tripdata.csv


In [7]:
!wc -l 201808-capitalbikeshare-tripdata.csv

  403867 201808-capitalbikeshare-tripdata.csv


In [15]:
!wc -l trips.csv

  808628 trips.csv


The **First CSV** file has a total of **404762 lines**, the **Second CSV** file has a total of **403867 lines**. After using the command CSVstack the combined file **Trips.csv** has a total of **808628 lines**.

**2. List the labels for the heading line.**

In [17]:
!head -n 1 trips.csv

Duration,Start date,End date,Start station number,Start station,End station number,End station,Bike number,Member type


In [19]:
!head -n 1 trips.csv | tr ',' '|'


Duration|Start date|End date|Start station number|Start station|End station number|End station|Bike number|Member type


The headings can be listed using the above commands. 

**3. To find the 12 Capital Bikeshare stations that were the most popular departing stations in July and August
2018 in terms of number of rides? We also aim to provide the full name of the station along with the station number too.**

In [21]:
!csvcut -c4,5 trips.csv | sort | uniq -c | sort -rn | head -12

15474 31258,Lincoln Memorial
13082 31623,Columbus Circle / Union Station
11999 31247,Jefferson Dr & 14th St SW
11671 31289,Henry Bacon Dr & Lincoln Memorial Circle NW
11650 31288,4th St & Madison Dr NW
10357 31248,Smithsonian-National Mall / Jefferson Dr & 12th St SW
9554 31249,Jefferson Memorial
8620 31290,17th St & Independence Ave SW
8583 31200,Massachusetts Ave & Dupont Circle NW
8569 31201,15th & P St NW
7711 31603,1st & M St NE
7578 31321,15th St & Constitution Ave NW


The above output shows the top 12 Capital Bikeshare stations and the station number that were the most popular departing stations in July and August 2018.

In [26]:
!awk -F, 'NR>1 {print $4 " | " $5}' trips.csv | sort | uniq -c | sort -nr | head -n 12 | awk '{print $1 " | " $2 " | " substr($0, index($0,$3))}'

15474 | 31258 | | Lincoln Memorial
13082 | 31623 | | Columbus Circle / Union Station
11999 | 31247 | | Jefferson Dr & 14th St SW
11671 | 31289 | | Henry Bacon Dr & Lincoln Memorial Circle NW
11650 | 31288 | | 4th St & Madison Dr NW
10357 | 31248 | | Smithsonian-National Mall / Jefferson Dr & 12th St SW
9554 | 31249 | | Jefferson Memorial
8620 | 31290 | | 17th St & Independence Ave SW
8583 | 31200 | | Massachusetts Ave & Dupont Circle NW
8569 | 31201 | | 15th & P St NW
7711 | 31603 | | 1st & M St NE
7578 | 31321 | | 15th St & Constitution Ave NW


The above question can also be executed with the use of awk command that is used to more efficiently find patterns and process data.

**4. To find the 12 Capital Bikeshare stations that were the most popular destination stations in July and August 2018 in terms of number of rides? We also aim to provide the full name of the station along with the station number too**

In [37]:
!csvcut -c6,7 trips.csv | sort | uniq -c | sort -rn | head -12

15642 31258,Lincoln Memorial
13635 31623,Columbus Circle / Union Station
12135 31247,Jefferson Dr & 14th St SW
11722 31289,Henry Bacon Dr & Lincoln Memorial Circle NW
11555 31288,4th St & Madison Dr NW
10693 31248,Smithsonian-National Mall / Jefferson Dr & 12th St SW
9866 31249,Jefferson Memorial
9141 31200,Massachusetts Ave & Dupont Circle NW
8884 31201,15th & P St NW
8640 31290,17th St & Independence Ave SW
8041 31321,15th St & Constitution Ave NW
7813 31603,1st & M St NE


In [39]:
!awk -F, 'NR>1 {print $6 " | " $7}' trips.csv | sort | uniq -c | sort -nr | head -n 12 | awk '{print $1 " | " $2 " | " substr($0, index($0,$3))}'

15642 | 31258 | | Lincoln Memorial
13635 | 31623 | | Columbus Circle / Union Station
12135 | 31247 | | Jefferson Dr & 14th St SW
11722 | 31289 | | Henry Bacon Dr & Lincoln Memorial Circle NW
11555 | 31288 | | 4th St & Madison Dr NW
10693 | 31248 | | Smithsonian-National Mall / Jefferson Dr & 12th St SW
9866 | 31249 | | Jefferson Memorial
9141 | 31200 | | Massachusetts Ave & Dupont Circle NW
8884 | 31201 | | 15th & P St NW
8640 | 31290 | | 17th St & Independence Ave SW
8041 | 31321 | | 15th St & Constitution Ave NW
7813 | 31603 | | 1st & M St NE


**5. Which 12 station-pairs (Departing-Destination) are most popular in July and August 2018 in
terms of number of rides? provide the full name of the station, not just the station number.**

In [45]:
!csvcut -c4,5,6,7 trips.csv | sort | uniq -c | sort -rn | head -12

1633 31248,Smithsonian-National Mall / Jefferson Dr & 12th St SW,31248,Smithsonian-National Mall / Jefferson Dr & 12th St SW
1553 31247,Jefferson Dr & 14th St SW,31247,Jefferson Dr & 14th St SW
1521 31258,Lincoln Memorial,31249,Jefferson Memorial
1441 31288,4th St & Madison Dr NW,31288,4th St & Madison Dr NW
1327 31258,Lincoln Memorial,31258,Lincoln Memorial
1231 31247,Jefferson Dr & 14th St SW,31258,Lincoln Memorial
1225 31290,17th St & Independence Ave SW,31258,Lincoln Memorial
1121 31240,Ohio Dr & West Basin Dr SW / MLK & FDR Memorials,31240,Ohio Dr & West Basin Dr SW / MLK & FDR Memorials
1115 31290,17th St & Independence Ave SW,31290,17th St & Independence Ave SW
1085 31289,Henry Bacon Dr & Lincoln Memorial Circle NW,31289,Henry Bacon Dr & Lincoln Memorial Circle NW
1068 31321,15th St & Constitution Ave NW,31321,15th St & Constitution Ave NW
1062 31248,Smithsonian-National Mall / Jefferson Dr & 12th St SW,31258,Lincoln Memorial
sort: Broken pipe


In [45]:
!awk -F',' 'NR > 1 {count[$4" | "$5" -> "$6" | "$7]++} END {for (pair in count) print count[pair] " | " pair}' trips.csv | sort -nr | head -12

1633 | 31248 | Smithsonian-National Mall / Jefferson Dr & 12th St SW -> 31248 | Smithsonian-National Mall / Jefferson Dr & 12th St SW
1553 | 31247 | Jefferson Dr & 14th St SW -> 31247 | Jefferson Dr & 14th St SW
1521 | 31258 | Lincoln Memorial -> 31249 | Jefferson Memorial
1441 | 31288 | 4th St & Madison Dr NW -> 31288 | 4th St & Madison Dr NW
1327 | 31258 | Lincoln Memorial -> 31258 | Lincoln Memorial
1231 | 31247 | Jefferson Dr & 14th St SW -> 31258 | Lincoln Memorial
1225 | 31290 | 17th St & Independence Ave SW -> 31258 | Lincoln Memorial
1121 | 31240 | Ohio Dr & West Basin Dr SW / MLK & FDR Memorials -> 31240 | Ohio Dr & West Basin Dr SW / MLK & FDR Memorials
1115 | 31290 | 17th St & Independence Ave SW -> 31290 | 17th St & Independence Ave SW
1085 | 31289 | Henry Bacon Dr & Lincoln Memorial Circle NW -> 31289 | Henry Bacon Dr & Lincoln Memorial Circle NW
1068 | 31321 | 15th St & Constitution Ave NW -> 31321 | 15th St & Constitution Ave NW
1062 | 31248 | Smithsonian-National Mall /

**6. Here are a few key findings for Q3, Q4 and Q5.**

1. Most Popular Departing Stations: Lincoln Memorial was the top departing station with 15,474 rides, followed by Columbus Circle/Union Station and Jefferson Dr & 14th St SW. These stations, located near major landmarks, had the highest demand.

2. Most Popular Destination Stations: Lincoln Memorial also led as the top destination station with 15,642 rides, followed by Columbus Circle/Union Station. Stations near tourist attractions were the most common destination points.

3. Most Popular Station Pairs: The Smithsonian-National Mall / Jefferson Dr & 12th St SW was the most popular station pair, with 1,633 rides where both the departure and destination were the same. Several other high-traffic station pairs involved popular landmarks.

**7. For the most popular departure station, we aim to find which 10 bikes were used most in trips departing from there? Also aim to provide the full name of the station, not just the station number.**

In [124]:
!cut -d',' -f5 trips.csv | sort | uniq -c | sort -nr | head -1

15474 Lincoln Memorial


This means that **15,474** trips started from the **“Lincoln Memorial”** station, making it the most popular departure station based on data analysis. 

**Lincoln Memorial is the most popular departure station**

In [60]:
!csvgrep -c5 -r '^Lincoln Memorial$' trips.csv | csvcut -c4,5,8 | sort | uniq -c | sort -rn | head -10

  18 31258,Lincoln Memorial,W01311
  17 31258,Lincoln Memorial,W22553
  16 31258,Lincoln Memorial,W00919
  15 31258,Lincoln Memorial,W23074
  15 31258,Lincoln Memorial,W22567
  15 31258,Lincoln Memorial,W21882
  15 31258,Lincoln Memorial,W20527
  14 31258,Lincoln Memorial,W23311
  14 31258,Lincoln Memorial,W23003
  14 31258,Lincoln Memorial,W22369
sort: Broken pipe


1. 18,17,16... indicate the count of trips 
2. 31258 indicates the Station Number
3. Lincoln Memorial Indicates the Station Name
4. W01311, W22553... indicates the bike number

The output indicates that 18 trips were made using bike W01311 etc.

**8.Which 10 bikes were used most in trips ending at the most popular destination station? Also aim to provide the full name of the station, not just the station number**

In [137]:
!cut -d',' -f7 trips.csv | sort | uniq -c | sort -nr | head -1

15642 Lincoln Memorial


**Lincoln Memorial is again the most popular destination station**

In [69]:
!csvgrep -c7 -r '^Lincoln Memorial$' trips.csv | csvcut -c6,7,8 | sort | uniq -c | sort -rn | head -10

  17 31258,Lincoln Memorial,W22553
  17 31258,Lincoln Memorial,W01311
  16 31258,Lincoln Memorial,W21882
  16 31258,Lincoln Memorial,W20527
  16 31258,Lincoln Memorial,W00919
  15 31258,Lincoln Memorial,W22567
  15 31258,Lincoln Memorial,W22369
  14 31258,Lincoln Memorial,W23311
  14 31258,Lincoln Memorial,W23074
  14 31258,Lincoln Memorial,W23003
sort: Broken pipe


# These are some of the insights gathered from the huge data from Capital Bikeshare. The same steps and commands can be used to analyse and get insights for any timeline in the data. 
