Skip to content

Baconbuilder/LeakHunter

Repository files navigation

LeakHunter

Overview

This platform is built using the ASP.NET Core MVC framework and is designed to detect potential leaks of personal data on websites. It employs web crawling techniques to extract data from the internet, which is then analyzed using named entity recognition (NER) and regular expression techniques. The backend is developed primarily in C# and Python.

Features

Web Crawling with Beautiful Soup

The platform leverages Beautiful Soup, a powerful Python library, for web crawling and data extraction. Beautiful Soup simplifies the process of parsing HTML and XML documents, allowing the platform to systematically scrape data from web pages. With its intuitive syntax and robust functionality, Beautiful Soup enables efficient and accurate extraction of relevant information from diverse web sources.

Named Entity Recognition (NER) with Jieba

Named Entity Recognition (NER) is a crucial component of the platform's natural language processing capabilities, particularly for Chinese text analysis. Jieba, a leading Chinese NLP library, is employed for NER tasks, including the identification and extraction of named entities such as names of people, organizations, locations, and other entities. By utilizing Jieba's advanced algorithms and extensive language models, the platform can effectively identify and categorize named entities in Chinese-language content, enhancing its ability to detect and analyze personal data leakage in web pages.

About

Web-Based Personal Data Leak Detection Platform

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published