<center>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/module_1/images/SN_web_lightmode.png" width="300"> 
</center>


<h1>Web scrape a Global Bike-Sharing Systems Wiki Page</h1>

Estimated time needed: **20** minutes


## Lab Overview:

Before getting your hands dirty on the actual data analysis tasks, you first need to obtain some background and context information about well-known bike sharing systems worldwide, such as their location, launch date, rental bike size, and so on.

You can get such information from this Wiki page: 

https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems

<a href="https://cognitiveclass.ai/">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0321EN-SkillsNetwork/labs/module_1/images/l2-list-bike-sharing-systems.png" width="600" align="center">
</a>


First import necessary libraries for the webscraping task.



In this lab, you need to use the `rvest` library to obtain the bike sharing systems table from the above web page, convert the table into a data frame, and write the data frame to a csv file for future data wrangling and analysis tasks.


In [1]:
# Check if need to install rvest` library
require("rvest")

library(rvest)

Loading required package: rvest



# TASK: Extract bike sharing systems HTML table from a Wiki page and convert it into a data frame


_TODO:_ Get the root HTML node


In [2]:
url <- "https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems"
# Get the root HTML node by calling the `read_html()` method with URL
raw_url <- read_html(url)

derooted_url <- html_nodes(raw_url, "table")
derooted_url
print("")
for (i in seq_along(derooted_url)) {
  df <- html_table(derooted_url[[i]], fill = TRUE)
  print(head(df))
}
    

{xml_nodeset (4)}
[1] <table class="wikitable sortable sticky-header" style="background:#f8f9fa ...
[2] <table class="nowraplinks mw-collapsible autocollapse navbox-inner" style ...
[3] <table class="nowraplinks navbox-subgroup" style="border-spacing:0"><tbod ...
[4] <table class="nowraplinks navbox-subgroup" style="border-spacing:0"><tbod ...

[1] ""
[90m# A tibble: 6 × 8[39m
  Country   Country  `City / Region` Name  System Operator Launched Discontinued
  [3m[90m<chr>[39m[23m     [3m[90m<chr>[39m[23m    [3m[90m<chr>[39m[23m           [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m    [3m[90m<chr>[39m[23m    [3m[90m<chr>[39m[23m       
[90m1[39m Albania   Albania  Tirana[5]       Ecov… [90m"[39m[90m"[39m     [90m"[39m[90m"[39m       March 2… [90m"[39mDiscontinu…
[90m2[39m Argentina Argenti… Buenos Aires[6… Ecob… [90m"[39mSert… [90m"[39mBike I… 2010     [90m"[39m[90m"[39m          
[90m3[39m Argentina Argenti… Mendoza[10]     Metr… [90m"[39m[90m"[39m     [90m"[39m[90m"[39m       2014     [90m"[39m[90m"[39m          
[90m4[39m Argentina Argenti… Rosario         Mi B… [90m"[39m[90m"[39m     [90m"[39m[90m"[39m       2 Decem… [90m"[39m[90m"[39m          
[90m5[39m Argentina Argenti… San Lorenzo, S… Bici… [90m"[39mBici… [90m"

Note that this HTML page at least contains three child `<table>` nodes under the root HTML node. So, you will need to use `html_nodes(root_node, "table")` function to get all its child `<table>` nodes:

```
<html>
  <table>(table1)</table>
  <table>(table2)</table>
  <table>(table3)</table>
  ...
</html>
```


table_nodes <- html_nodes(root_node, "table")


You can use a `for` loop to print each table, and then you will see that the actual the bike sharing table is the first element `table_nodes[[1]]`.


Next, you need to convert this HTML table into a data frame using the `html_table()` function. You may choose to include `fill = TRUE` argument to fill any empty table rows/columns.


In [3]:
# Convert the bike-sharing system table into a dataframe
bikes_dataframe <- html_table(derooted_url[[1]], fill = TRUE)
bikes_dataframe

Country,Country,City / Region,Name,System,Operator,Launched,Discontinued
<chr>,<chr>.1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
Albania,Albania,Tirana[5],Ecovolis,,,March 2011,Discontinued
Argentina,Argentina,Buenos Aires[6][7],Ecobici,Serttel Brasil[8],Bike In Baires Consortium[9],2010,
Argentina,Argentina,Mendoza[10],Metrobici,,,2014,
Argentina,Argentina,Rosario,Mi Bici Tu Bici[11],,,2 December 2015,
Argentina,Argentina,"San Lorenzo, Santa Fe",Biciudad,Biciudad,,27 November 2016,
Australia,Australia,Melbourne[12],Melbourne Bike Share,PBSC & 8D,Motivate,June 2010,30 November 2019[13]
Australia,Australia,Melbourne[12],oBike,4 Gen. oBike,,July 2017,July 2018
Australia,Australia,Brisbane[14][15],CityCycle,3 Gen. Cyclocity,JCDecaux,September 2010,July 2021[16]
Australia,Australia,Sydney,Reddy Go,Reddy Go,,July 2017,
Australia,Australia,Sydney,oBike,4 Gen. oBike,,July 2017,July 2018


Summarize the bike sharing system data frame


In [4]:
# Summarize the dataframe
summary(bikes_dataframe)

   Country            Country          City / Region          Name          
 Length:896         Length:896         Length:896         Length:896        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    System            Operator           Launched         Discontinued      
 Length:896         Length:896         Length:896         Length:896        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  

Export the data frame as a csv file called `raw_bike_sharing_systems.csv`


In [5]:
# Export the dataframe into a csv file
write.csv(bikes_dataframe, file = "raw_bike_sharing_systems.csv", row.names = FALSE)

For more details about webscraping with `rvest`, please refer to the previous webscraping notebook here:

[Webscraping in R](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/M4_Working_With_Data/lab3_jupyter_webscraping.ipynb)


## Authors

<a href="https://www.linkedin.com/in/yan-luo-96288783/" target="_blank">Yan Luo</a>


### Other Contributors


<!-- ## Change Log

| Date (YYYY-MM-DD) | Version | Changed By | Change Description           |
| ----------------- | ------- | ---------- | ---------------------------- |
| 2021-04-05        | 0.1     | Yan        | Initial version created      |
|                   |         |            |                              |
|                   |         |            |                              | -->

![footer](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/zOMU1iwlZgwJXjWYzQAIgg/SNIBMfooter.png "footer")
