A comprehensive web-based search engine solution built with ASP.NET Core (.NET 8), Entity Framework Core, SQL Server, and Python for data processing. This project demonstrates crawling, indexing, ranking, and searching web pages, with a modern frontend and robust backend API.
- Folder & File Explanations
- How to Clone and Run the Full Project
- API Usage & Examples
- Troubleshooting & FAQ
- Contribution Guidelines
- Credits & Acknowledgments
- Technologies Used
- Security & Performance Notes
- Future Improvements & Roadmap
- crawler.py: Crawls web pages and collects data for indexing.
- inverted_index.py: Builds an inverted index mapping words to URLs for fast search.
- pageRank.py: Calculates PageRank scores for each URL.
- Jsons/links_graph.json: The web graph structure.
- Jsons/pageRankResults.json: PageRank scores for URLs.
- Jsons/url_to_file_map.json: Maps URLs to local files.
- index.html: Main HTML file for the search UI.
- script.js: Handles search requests, result rendering, modal logic, and UI interactivity.
- style.css: Styles the search UI, including dark mode and responsive design.
- json_to_csv.py: Converts JSON data to CSV for analysis or import.
- process_input.py: Processes and cleans input data for the engine.
- Controllers/SearchEngineController.cs: Exposes the API endpoint for searching words and retrieving URLs, supporting ordering by count or PageRank.
- Data/AppDbContext.cs: Entity Framework Core context for database access.
- Migrations/20250503214727_InitialCreate.cs: Initial database migration.
- Migrations/20250503214727_InitialCreate.Designer.cs: Designer file for migration.
- Migrations/AppDbContextModelSnapshot.cs: Current database model snapshot.
- Models/UrlInfo.cs: Data model for URLs.
- Models/WordInfo.cs: Data model for word occurrences.
- Repositories/ISearchEngineRepository.cs: Repository interface for data access abstraction.
- Repositories/SearchEngineRepository.cs: Repository implementation for data access.
- Program.cs: Main entry point and configuration for the ASP.NET Core app.
- appsettings.json: Configuration file for database connection and app settings.
- appsettings.Development.json: Development configuration.
- .NET 8 SDK
- Python 3.x (for engine scripts)
- SQL Server (or compatible connection string)
- Docker and Docker Compose
-
Clone the repository:
git clone https://github.com/omarovici/Search-Engine-Project.git cd Search-Engine-Project
-
Build and start services:
docker compose up --build
This will build the backend image and start both the backend and SQL Server containers.
-
Access the backend API:
- Swagger UI: http://localhost:5062/swagger
- The backend image is available at: https://hub.docker.com/r/omarovici/search-engine-project
- Database credentials and connection strings are managed via environment variables in
docker-compose.yml
.
docker compose down
- Navigate to
Engine Scripts/
and run the scripts in order:crawler.py
to crawl and collect web data.inverted_index.py
to build the index.pageRank.py
to compute PageRank.
- Ensure the output JSON files are generated in
Engine Scripts/Jsons/
. - (Optional) Use scripts in
Organize Scrapping Using Python/
for data cleaning or conversion.
- Update the
DefaultConnection
string inSearch Engine/appsettings.Development.json
and/orappsettings.json
to point to your SQL Server instance.
cd "Search Engine"
dotnet ef database update
dotnet run
- The API will be available at
https://localhost:<port>/api/SearchEngine
- Swagger UI is available at
https://localhost:<port>/swagger
- Open
Frontend/search-ui/index.html
in your browser. - The frontend will connect to the backend API to perform searches and display results.
- GET
/api/SearchEngine?word={word}&orderBy={orderBy}
word
: The word to search for (required)orderBy
:pagerank
orcount
(optional, default: pagerank)
GET https://localhost:5001/api/SearchEngine?word=python&orderBy=pagerank
[
{
"Url": "http://example.com/python-tutorial",
"Count": 8,
"PageRank": 0.92
},
{
"Url": "http://another.com/python-guide",
"Count": 5,
"PageRank": 0.85
}
]
- If no results are found, the API returns HTTP 404 Not Found.
- If parameters are missing or invalid, the API returns HTTP 400 Bad Request.
Q: The API is not responding or returns 500 errors.
- Check your database connection string in
appsettings.json
. - Ensure SQL Server is running and accessible.
- Check for missing migrations or run
dotnet ef database update
.
Q: The frontend does not display results.
- Make sure the backend API is running and accessible at the expected URL.
- Check browser console for CORS or network errors.
Q: Python scripts fail to run.
- Ensure you have Python 3.x installed and all required packages (see script headers for requirements).
Q: How do I reset the database?
- Delete the database and run migrations again with
dotnet ef database update
.
- Fork the repository and create a new branch for your feature or bugfix.
- Write clear, concise commit messages.
- Ensure your code follows the existing style and conventions.
- Add or update documentation and tests as needed.
- Submit a pull request with a detailed description of your changes.
- Team Members:
- Abd El-Rahman Eldeeb (Frontend Developer)
- Abd El-Rahman Ehab (Frontend Developer)
- Omar Khalid (.NET Developer)
- Shehab Mohamed (.NET Developer)
- Shehab Yasser (Python Developer)
- Haneen Hassan (Python Developer)
- Special Thanks:
- Open source libraries and the .NET, Python, and SQL Server communities.
- ASP.NET Core (.NET 8)
- Entity Framework Core
- SQL Server
- Python 3.x
- JavaScript (ES6+)
- HTML5 & CSS3
- Swagger (Swashbuckle)
- Modern browser APIs
- Security:
- Always validate and sanitize user input.
- Use HTTPS in production.
- Restrict CORS as needed for your deployment.
- Performance:
- Use database indexes on frequently queried columns.
- Consider caching frequent queries.
- Use pagination in the backend for large result sets.
- Add user authentication and personalized search history.
- Implement backend pagination and lazy loading for even faster responses.
- Add more advanced ranking algorithms and machine learning integration.
- Improve crawler to handle JavaScript-heavy sites.
- Add automated tests and CI/CD pipeline.
- Deploy demo version online.
- Docker Hub: omarovici/search-engine-project
- GitHub: omarovici/Search-Engine-Project
You can pull the pre-built backend image from Docker Hub and run the full stack using Docker Compose.
- Pull the backend image:
docker pull omarovici/search-engine-project:latest
- Update your
docker-compose.yml
(if needed): Ensure thebackend
service uses the image:services: backend: image: omarovici/search-engine-project:latest # ... other settings ...
- Start the application:
This will start both the backend and SQL Server containers using the pulled image.
docker compose up
You can view the image on Docker Hub: omarovici/search-engine-project