Web scraping biotech company information

As a final year bioinformatics PhD student, I decided to try and make the job hunt a little more enjoyable by automating and standardising the process of finding companies that I would be interested in working for. Here, I use Beautiful Soup and Selenium to find details of all UK companies within the field of Biotechnnology.

Disclaimer

I do not recommend web scraping in any way. If you do web scrape, please respect the Terms of Service and robots.txt of the site you scrape. For more information on the legality of web scraping, you may find this blog useful.

Script	Description
01a-get_biotech_companies.py	Web scrape names of all UK biotech companies.
01b-tidy_biotech_companies.py	Tidy data from previous step.
02a-scrape_company_info.py	Use selenium to navigate, search and scrape description, size, location, url and domains/tags of companies.
02b-merge_tidy_company_info.py	Tidying. Find the exceptions that were not scraped successfully.
02c-scrape_company_info_2nd_pass.py	2nd pass, re-run scraping on the exceptions.
02d-merge_exceptions_2nd_pass.py	Merge together all company info.
utils.py	Utility function to keep project self-contained.

Acknowledgements

This project was inspired by this blog post and accompanying youtube video by Chris Lovejoy.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping biotech company information

Disclaimer

Contents

Acknowledgements

About

Releases

Packages

Languages

dzhang32/biotech_web_scrape

Folders and files

Latest commit

History

Repository files navigation

Web scraping biotech company information

Disclaimer

Contents

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages