# SE Textbook Chatbot

The textbook Chatbot project for CSE 6550 is designed to assist with queries related to the textbook "Software Engineering: A Practitioner's Approach." The chatbot serves as an educational tool, helping users by providing information, answering questions, and possibly retrieving content from the textbook.

## Documentation for app.py

The `app.py` file sets up the environment for a web application, initializing a database and launching both a Streamlit frontend and a Jupyter Notebook server. It ensures only one instance of Streamlit runs and configures the system to serve the app and notebook on specific ports and paths.

In [None]:
corpus_source = "" # If empty, app will default to using the textbook PDF

import os
import subprocess
from frontend import streamlit
from backend.statistics import init_db
os.environ['CORPUS_SOURCE'] = corpus_source

_Explanation_:

1. `corpus_source = ""`

- Initializes the corpus_source variable as an empty string.
- If no value is provided to this variable, the program will default to using a textbook PDF (or another predefined default corpus) as the source of content.

2. `import os`

- Imports the os module, which provides functions to interact with the operating system.
- This module is used for various operations like setting environment variables, managing paths, etc.

3. `import subprocess`

- Imports the subprocess module, which allows for running and managing system-level commands and subprocesses in the Python environment.
- This module is useful for executing shell commands, running other programs, or interacting with the OS at a lower level.

4. `from frontend import streamlit`

- Imports the streamlit module from the frontend package.
- Streamlit is typically used to build user interfaces for data apps or visualization tools, indicating that the frontend of the app may involve rendering through Streamlit.

5. `from backend.statistics import init_db`

- Imports the init_db function from the statistics module in the backend package.
- This function is likely responsible for initializing a database to store or manage data, particularly statistics in this context.

6. `os.environ['CORPUS_SOURCE'] = corpus_source`

- Sets the environment variable CORPUS_SOURCE to the value of corpus_source.
- This is essential to ensure that the operating system is aware of the source of the corpus being used. If corpus_source is an empty string, the application may default to using another predefined source (like the textbook PDF).

_Process Overview:_

1. Corpus Source Setup:
- The code starts by initializing a variable, ```corpus_source```, to determine where the app will pull its text data from. If it's left blank, the app uses a default source (e.g., textbook PDF).

2. System Environment Interaction:
- The program then interacts with the OS to define the ```CORPUS_SOURCE``` environment variable, allowing other components of the app to reference this source dynamically.

3. Frontend & Backend Initialization:
- Modules for handling the frontend ```(streamlit)``` and backend operations ```(init_db)``` are imported, indicating this is part of a larger application setup, likely involving user interaction through a web-based interface and some form of database management.


In [None]:
if __name__ == "__main__":
    # If Streamlit instance is running
    if os.environ.get("STREAMLIT_RUNNING") == "1":
        # Initialize the database
        init_db()
        # Start the Streamlit frontend
        streamlit.main()
    else:
        # Set the environment variable to indicate Streamlit is running
        os.environ["STREAMLIT_RUNNING"] = "1"    
        # Start Streamlit as a background process
        subprocess.Popen(["streamlit", "run", __file__,"--server.port=5003", "--server.address=0.0.0.0", "--server.baseUrlPath=/team3"])
        # Start Jupyter Notebook
        subprocess.run(["jupyter", "notebook","--ip=0.0.0.0", "--port=6003","--no-browser", "--allow-root","--NotebookApp.base_url=/team3/jupyter","--NotebookApp.token=''", "--NotebookApp.password=''" ])

_Explanation_:

1. `if __name__ == "__main__":`

- This condition checks if the script is being run as the main module.
- Python executes the code inside this block only if the file is run directly (not imported as a module). This is a common Python idiom to ensure that certain initialization code runs when executing the script.

2. `if os.environ.get("STREAMLIT_RUNNING") == "1":`

- Checks if the environment variable STREAMLIT_RUNNING is set to "1".
- This indicates whether a Streamlit instance is already running. The application uses this to prevent launching multiple instances of Streamlit.

3. `init_db()`

- Calls the function to initialize the database.
- If the Streamlit app is already running, it initializes the database using the init_db function, which likely sets up or connects to a database needed for the app’s operation.

4. `streamlit.main()`

- Starts the Streamlit frontend.
- This launches the main function of the Streamlit app, initializing the user interface and allowing the application to be displayed to users.

5. `else:`

- Executes if STREAMLIT_RUNNING is not set to "1" (i.e., Streamlit is not running).
- This block runs when no instance of Streamlit is active. It sets up the environment and launches Streamlit and Jupyter Notebook processes.

6. `os.environ["STREAMLIT_RUNNING"] = "1"`

- Sets the ```STREAMLIT_RUNNING``` environment variable to "1".
- This marks that the Streamlit instance is running, preventing another instance from being launched during the same execution.

7. `subprocess.Popen(["streamlit", "run", __file__, "--server.port=5003", "--server.address=0.0.0.0", "--server.baseUrlPath=/team3"])`

- Starts Streamlit as a background process.
- This uses the subprocess.Popen function to launch the Streamlit app, specifying the port ```(5003)```, server address ```(0.0.0.0)```, and base URL path ```(/team3)```. This allows the app to run in the background while other tasks (such as launching Jupyter Notebook) continue.

8. `subprocess.run(["jupyter", "notebook","--ip=0.0.0.0", "--port=6003","--no-browser", "--allow-root","--NotebookApp.base_url=/team3/jupyter","--NotebookApp.token=''", "--NotebookApp.password=''"])`
   
- Launches Jupyter Notebook in the foreground.
This command starts a Jupyter Notebook server, specifying the IP ```(0.0.0.0)```, port ```(6003)```, and base URL path ```(/team3/jupyter)```. 

- It also disables browser auto-launch ```(--no-browser)```, allows root access ```(--allow-root)```, and removes authentication requirements by setting empty token and password options. This makes the notebook accessible without needing to log in.

_Process Overview_:

- Main Execution:
When the script is run, it checks if a Streamlit instance is already running via the environment variable STREAMLIT_RUNNING.

- Streamlit Frontend:
If Streamlit is already running, the database is initialized, and the frontend is started. If it’s not running, the environment is updated to mark Streamlit as running, and a new instance is launched as a background process.

- Jupyter Notebook:
Alongside Streamlit, the script starts a Jupyter Notebook server on port 6003 with a custom base URL (/team3/jupyter). The notebook is accessible without any token or password authentication.