Skip to content

I'm crawling the Wikipedia website, and I want to store them in a database(PostgreSQL maybe). My future plans, use this database and make a full-stack app.

Notifications You must be signed in to change notification settings

cs-fedy/wikipedia-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikipedia crawler:

I'm crawling wikipedia website, and i want to store them in a database(postgresql maybe). My future plans, use this data base and make a full stack app.

P.S: docker is required

installation:

  1. clone the repo git clone https://github.com/cs-fedy/wikipedia-crawler
  2. run docker compose up -d to start the db.
  3. install virtualenv using pip: sudo pip install virtualenv
  4. create a new virtualenv: virtualenv venv
  5. activate the virtualenv: source venv/bin/activate
  6. install requirements: pip install requirements.txt
  7. run the script and enjoy: python scraper.py

used tools:

  1. requests: Python HTTP for Humans.
  2. BeautifulSoup: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
  3. python-dotenv: Add .env support to your django/flask apps in development and deployments.
  4. psycopg2: psycopg2 - Python-PostgreSQL Database Adapter.
  5. tabulate: Pretty-print tabular data.

Author:

created at 🌙 with 💻 and ❤ by f0ody

About

I'm crawling the Wikipedia website, and I want to store them in a database(PostgreSQL maybe). My future plans, use this database and make a full-stack app.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published