Skip to content

Ramonawestphal/Custom-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval and Web Analytics (IRWA) - Final Project template

Project Logo This repository contains the template code for the IRWA Final Project - Search Engine with Web Analytics. The project is implemented using Python and the Flask web framework. It includes a simple web application that allows users to search through a collection of documents and view analytics about their searches.

Project Structure

/irwa-search-engine
├── myapp                # Contains the main application logic
├── templates            # Contains HTML templates for the Flask application
├── static               # Contains static assets (images, CSS, JavaScript)
├── data                 # Contains the dataset file (fashion_products_dataset.json)
├── project_progress     # Contains your solutions for Parts 1, 2, and 3 of the project
├── .env                 # Environment variables for configuration (e.g., API keys)
├── .gitignore           # Specifies files and directories to be ignored by Git
├── LICENSE              # License information for the project
├── requirements.txt     # Lists Python package dependencies
├── web_app.py           # Main Flask application
└── README.md            # Project documentation and usage instructions

To download this repo locally

Open a terminal console and execute:

cd <your preferred projects root directory>
git clone https://github.com/trokhymovych/irwa-search-engine.git

Setting up the Python environment (only for the first time you run the project)

Install virtualenv

Setting up a virtualenv is recommended to isolate the project dependencies from other Python projects on your machine. It allows you to manage packages on a per-project basis, avoiding potential conflicts between different projects.

In the project root directory execute:

pip3 install virtualenv
virtualenv --version

Prepare virtualenv for the project

In the root of the project folder run to create a virtualenv named irwa_venv:

virtualenv irwa_venv

If you list the contents of the project root directory, you will see that it has created a new folder named irwa_venv that contains the virtualenv:

ls -l

The next step is to activate your new virtualenv for the project:

source irwa_venv/bin/activate

or for Windows...

irwa_venv\Scripts\activate.bat

This will load the python virtualenv for the project.

Installing Flask and other packages in your virtualenv

Make sure you are in the root of the project folder and that your virtualenv is activated (you should see (irwa_venv) in your terminal prompt). And then install all the packages listed in requirements.txt with:

pip install -r requirements.txt

If you need to add more packages in the future, you can install them with pip and then update requirements.txt with:

pip freeze > requirements.txt

Enjoy!

Starting the Web App

python -V
# Make sure we use Python 3

cd search-engine-web-app
python web_app.py

The above will start a web server with the application:

 * Serving Flask app 'web-app' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8088/ (Press CTRL+C to quit)

Open Web app in your Browser:
http://127.0.0.1:8088/ or http://localhost:8088/

Creating your own GitHub repo

After creating the project and code in local computer...

  1. Login to GitHub and create a new repo.
  2. Go to the root page of your new repo and note the url from the browser.
  3. Execute the following locally:
cd <project root folder>
git init -b main
git add . && git commit -m "initial commit"
git remote add origin <your GitHub repo URL from the browser>
git push -u origin main

Usage:

  1. Put the data file fashion_products_dataset.json in the data folder. It will be provided to you by the instructor.
  2. As for Parts 1, 2, and 3 of the project, please use the project_progress folder to store your solutions. Each part should contain .pdf file with your report and .ipynb (Jupyter Notebook) file with your code for solution and README.md with explanation of the content and instructions for results reproduction.
  3. For the Part 4, of the project, you should build a web application using Flask that allows users to search through a collection of documents and view analytics about their searches. You should work mailnly in the web_app.py file myapp and templates folders. Feel free to change any code or add new files as needed. The provided code is just a starting point to help you get started quickly.
  4. Make sure to update the .env file with your Groq API key (can be found here, the free version is more than enough for our purposes) and any other necessary configurations. IMPORTANT: Do not share your .env file publicly as it contains sensitive information. It is included in .gitignore to prevent accidental commits. (It should never be included in the repos and appear here only for demonstration purposes).
  5. Have fun and be creative!

Attribution:

The project is adapted from the following sources:

About

Search Engine with Web Analytics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors