Creepy_Crawler

Creepy Crawler is a full-stack search engine application. It's inspired by popular search engine apps. It allows the user to make queries, see their history, and set their theme.

Crawl the web 🕷

Queries from the frontend are received asynchronously by Flask with help from the Crochet library where they are processed and passed to the Scrapy spiders.

import crochet
crochet.setup()
@crochet.wait_for(timeout=200.0)
def scrape_with_crochet(raw_query):
  partitioned_query = ...
  query_regex = re.compile(...)
  dispatcher.connect(_crawler_result, signal=signals.item_scraped)
  spiders = [...]
  if len(partitioned_query):
      for spider in spiders: crawl_runner.crawl(spider, query_regex=query_regex)
      eventual = crawl_runner.join()
      return

Settings are passed from Flask backend to Scrapy framework through configuration object.

...
from scrapy.utils.project import get_project_settings
...
settings = get_project_settings()
settings_dict = json.load(open('app/api/routes/settings.json'))
settings.update(settings_dict)
crawl_runner = CrawlerRunner(settings)

Each spider runs a broad crawl through the web, starting from a seed URL.

class BroadCrawler2(scrapy.Spider):
  """Broad crawling spider."""

  name = 'broad_crawler_2'
  start_urls = ['https://example.com/']

  def parse(self, response):
      """Follow links."""
      try:
          all_text = response.css('*:not(script):not(style)::text')
          for text in all_text:
              query_found = bool(re.search(self.query_regex, text.get()))
              if query_found: yield { 'url': response.request.url, 'text': text.get() }
              
      except: print(f'End of the line error for {self.name}.')

      yield from response.follow_all(css='a::attr(href)', callback=self.parse)

Create custom themes 🎨

AWS integration allows users to add backgrounds and profile images of their choice.

Look over your search history 🔍

The user can conveniently switch between 24 and 12 hour time.
Moreover, NATO timezone abbreviations are specially parsed for users with altered native settings.

Name		Name	Last commit message	Last commit date
Latest commit History 704 Commits
.github/workflows		.github/workflows
.scrapy/httpcache		.scrapy/httpcache
app		app
migrations		migrations
react-app		react-app
z-files		z-files
.dockerignore		.dockerignore
.env.example		.env.example
.flaskenv		.flaskenv
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
dev-requirements.txt		dev-requirements.txt
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
steps.md		steps.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Creepy_Crawler

Crawl the web 🕷

Create custom themes 🎨

Look over your search history 🔍

Enjoy advanced interactions with your themes 🧮

Contact

Errors I encountered and conquered:

About

Releases

Packages

Languages

Ayanzino/Creepy_Crawler

Folders and files

Latest commit

History

Repository files navigation

Creepy_Crawler

Crawl the web 🕷

Create custom themes 🎨

Look over your search history 🔍

Enjoy advanced interactions with your themes 🧮

Contact

Errors I encountered and conquered:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages