Skip to content

This Repo is build for educational purposes. It's aim is to build a dynamic possibility of a "scientific purpose harvester" (SPH) to crawl scientific webpages for relevant content.

License

Notifications You must be signed in to change notification settings

SimonScapan/scientific-purpose-harvester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scientific-purpose-harvester

Crawl Scientific Webpages for relevant papers. The easisets way to start your journey in the scientific jungle.

  • The SPH Team

You can try the SPH here.

This service / repo is build for only educational purposes!

The Scientific-Purpose-Harvester (SPH) aims to build a dynamic possibility of crawling scientific webpages for relevant content, matching your questions.

Landing-Page

Vision

Check Google Scholar for the best scientific results for your question, with the help of an easy-to-use Graphical User Interface.
In the long run we might connect additional data sources like https://dblp.uni-trier.de/ (conferences not yet peer reviewed...).

How to start

Video Introduction

Quickly start with a Video Tutorial for the SPH! (Click Image to go to Youtube)

Introduction

Online

The easiest way to access the SPH.

Simply open SPH, hosted by an SPH-Teammember. This Website uses the svelte-Version of the SPH.

Offline (Localy)

  1. Clone the Repro
git clone https://github.com/SimonScapan/scientific-purpose-harvester.git
  1. Navigate into harvester
cd harvester
  1. Uncomment Lines 22-24 in api.py
  2. Start the api.py to start the harvester
python api.py
  1. Open Local Website in your Browser
  2. Shutdown Local Website with Using CTRL+C in your terminal

How to use

  1. Enter your question Question
  2. Hit the search button and wait for results.
  3. Get a quick Overview of the best scientific papers for your questions. Follow a Link to get directly to the paper. Result

Used technology / Interesting Facts

  • Scraper-API allows us to crawl Google Scholar (or other Websites) without getting blacklisted.
    • A Free Plan of ScraperAPI is used. It allows 1000 free Requests per Month
    • If there is a Problem with the used API Key
      • Get your own free API-Key on the ScraperAPI Website
      • Replace the given API-Key with your personal API-Key in the harvester_scholar.py file (Line 34)
  • Svelte allows us to use python file within the website
  • The Papers are ranked by citation count

Future Extensions

Here are some Idead for future extions. Feel free to fork this Project and add some of these, or your own Ideas!

  • Free Text based NLP Training --> Q&A Pair generation --> Feed Fancy Flash Cards with Content
  • Build a Network of the cited articles. Who cited who? Where are conections?
  • Build an Integration to some more Scientific Search Engines, like IEEE, arxiv, ...
  • Generate with the Help of NLP abstracts for each Paper

Thank you

Thank you for using the SPH. If you have questions, feel free to reach out for the SPH-Team:

About

This Repo is build for educational purposes. It's aim is to build a dynamic possibility of a "scientific purpose harvester" (SPH) to crawl scientific webpages for relevant content.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages