Skip to content

A web scraping project that uses Scrapy, Pandas, PyYAML, Selenium Wire, and PyMongo to extract variations and mutations of COVID-19 from https://cov-lineages.org/lineage_list.html, store them in a MongoDB database, and provide a Python GUI for tracking and viewing the data.

Notifications You must be signed in to change notification settings

aymenssf/WebScrapingProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebScrapingProject

A web scraping project that uses Scrapy, Pandas, PyYAML, Selenium Wire, and PyMongo to extract variations and mutations of COVID-19 from https://cov-lineages.org/lineage_list.html, store them in a MongoDB database, and provide a Python GUI for tracking and viewing the data. First, we would use Scrapy, a web scraping framework in Python, to extract the HTML content from https://cov-lineages.org/lineage_list.html. We would then use XPath selectors to extract the relevant information, including the lineage, the mutation, the amino acid change, and the frequency.

Next, we would use Pandas, a data manipulation library in Python, to clean and organize the extracted data into a tabular format. This would make it easier to work with the data and perform further analysis.

We would then use PyYAML, a YAML parser and emitter for Python, to load and parse a configuration file that contains the database connection information. This would enable us to connect to a MongoDB database and store the scraped data.

To track the web scraping process, we would use Selenium and Selenium Wire, which are libraries in Python for automated web testing. We would set up a headless browser and use it to navigate to https://cov-lineages.org/lineage_list.html, simulate user behavior, and capture any errors or exceptions that occur during the scraping process.

Finally, we would use PyMongo, a Python library for working with MongoDB, to insert the scraped data into the database. We would also use a Python GUI, such as PyQt or Tkinter, to provide a user-friendly interface for querying and viewing the scraped data in the database.

Overall, this web scraping project would automate the process of extracting COVID-19 variation and mutation data from https://cov-lineages.org/lineage_list.html, clean and organize the data, store it in a MongoDB database, and provide a user-friendly interface for querying and viewing the data.

About

A web scraping project that uses Scrapy, Pandas, PyYAML, Selenium Wire, and PyMongo to extract variations and mutations of COVID-19 from https://cov-lineages.org/lineage_list.html, store them in a MongoDB database, and provide a Python GUI for tracking and viewing the data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages