Skip to content

Latest commit

 

History

History
77 lines (56 loc) · 2.02 KB

readme.md

File metadata and controls

77 lines (56 loc) · 2.02 KB

Python station backend

About

  • The backend behind : python-station

  • Full data pipeline to scrape http://planetpython.org

  • Output: Every Github (Python) project featured on the history of planetpython.

  • Also includes data enrichment using Github + Reddit + Hackernews APi.

How does it work?

  1. Download the pages from planetPython.org clone

  2. Use BeautifulSoup to transform raw page into posts

  3. Use Github API to get basic project data (And filter no python projects)

  4. Use Praw (Reddit) + HN Api + Github Trending to enrich data

  5. Show data using Github pages + Vue.js

How to run?

  • Clone the project
  • python3 -m venv ./venv && source venv/bin/activate && pip install -r requirements.txt
  • venv/bin/python pipeline.py --pages-to-download 5
  • To download Reddit data you need to fill in your reddit creds in: requests_utils.py
  • If you get limit on your Github requests you need to fill in your Github creds in: requests_utils.py

Pipeline Flow chart

+-------------------+
| Download Pages    |
+---------+---------+
          |
+---------v---------+
|Transform to Posts |
+---------+---------+
          |
+---------v---------+
|Extract projects   |
+---------+---------+
          |
+---------v---------+
|Enrich Using Apis  |
+---------+---------+
          |
+---------v----------+
|Deploy Using Github |
| Pages              |
+--------------------+

Development

Want to contribute? Great! Feel free to open PR/Issue :)

License

MIT - Free Software, Hell Yeah!