Web-Crawler

Basic web crawling and data processing based on Python(using selenium and openpyxl) and R(using rvest, xml2 and xlsx).

Work done during the first day of internship as a data analyst in Dec. 2019 @AssetPro.

`WebCrawler.py`

This script gives a basic example of how to utilize webdriver to crawls fund IDs from a funding company's website: www.ifund.com.hk, you could crawl any data you want from any webpage following the similar pattern of the usage of webdriver.

Author: Changyuan Qiu

Contact: peterqiu@umich.edu

Latest Update: Nov. 12, 2020

Build:

Make sure that the latest version of selenium and openpyxl is installed on your computer.

Apart from selenium and openpyxl, you also need to download chrome driver from

https://sites.google.com/a/chromium.org/chromedriver/downloads

and add it to the PATH for executing this script.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Part-Of-PDF-Downloaded		Part-Of-PDF-Downloaded
code		code
.gitattributes		.gitattributes
.gitignore		.gitignore
1219更新基金产品表.xlsx		1219更新基金产品表.xlsx
121~161_2019港股上市情况.xlsx		121~161_2019港股上市情况.xlsx
1225更新基金信息(updated).xlsx		1225更新基金信息(updated).xlsx
ID_ISIN_ALL.csv		ID_ISIN_ALL.csv
LICENSE		LICENSE
README.md		README.md
WebCrawler.py		WebCrawler.py
ifund.R		ifund.R
ifund_updated.R		ifund_updated.R
input.xlsx		input.xlsx
req.py		req.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Crawler

`WebCrawler.py`

Author: Changyuan Qiu

Contact: peterqiu@umich.edu

Latest Update: Nov. 12, 2020

Build:

About

Releases

Packages

Languages

License

PeterQiu0516/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

Web-Crawler

WebCrawler.py

Author: Changyuan Qiu

Contact: peterqiu@umich.edu

Latest Update: Nov. 12, 2020

Build:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`WebCrawler.py`

Packages