Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing with main on head repo #1

Merged
merged 11 commits into from
Mar 22, 2024
42 changes: 22 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
</h2>

<p align="center">
<p align="center">Open Source Unified Search and Gen-AI Chat with your Docs.</p>
<p align="center">Open Source Gen-AI Chat + Unified Search.</p>

<p align="center">
<a href="https://docs.danswer.dev/" target="_blank">
Expand All @@ -22,16 +22,16 @@
</a>
</p>

<strong>[Danswer](https://www.danswer.ai/)</strong> lets you ask questions in natural language and get back
answers based on your team specific documents. Think ChatGPT if it had access to your team's unique
knowledge. Connects to all common workplace tools such as Slack, Google Drive, Confluence, etc.
<strong>[Danswer](https://www.danswer.ai/)</strong> is the ChatGPT for teams. Danswer provides a Chat interface and plugs into any LLM of
your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own
the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be
modular and easily extensible. The system also comes fully ready for production usage with user authentication, role
management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts.

Teams have used Danswer to:
- Speedup customer support and escalation turnaround time.
- Improve Engineering efficiency by making documentation and code changelogs easy to find.
- Let sales team get fuller context and faster in preparation for calls.
- Track customer requests and priorities for Product teams.
- Help teams self-serve IT, Onboarding, HR, etc.
Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc.
By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if
it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already
supported?" or "Where's the pull request for feature Y?"

<h3>Usage</h3>

Expand All @@ -57,19 +57,27 @@ We also have built-in support for deployment on Kubernetes. Files for that can b


## 💃 Main Features
* Chat UI with the ability to select documents to chat with.
* Create custom AI Assistants with different prompts and backing knowledge sets.
* Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
* Document Search + AI Answers for natural language queries.
* Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Chat support (think ChatGPT but it has access to your private knowledge sources).
* Create custom AI Assistants with different prompts and backing knowledge sets.
* Slack integration to get answers and search results directly in Slack.


## 🚧 Roadmap
* Chat/Prompt sharing with specific teammates and user groups.
* Multi-Model model support, chat with images, video etc.
* Choosing between LLMs and parameters during chat session.
* Tool calling and agent configurations options.
* Organizational understanding and ability to locate and suggest experts from your team.


## Other Noteable Benefits of Danswer
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Connect Danswer with LLM of your choice for a fully airgapped solution.
* Easy deployment and ability to host Danswer anywhere of your choosing.


Expand All @@ -96,11 +104,5 @@ Efficiently pulls the latest changes from:
* Websites
* And more ...

## 🚧 Roadmap
* Organizational understanding.
* Ability to locate and suggest experts from your team.
* Code Search
* Structured Query Languages (SQL, Excel formulas, etc.)

## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
29 changes: 29 additions & 0 deletions backend/alembic/versions/173cae5bba26_port_config_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""Port Config Store

Revision ID: 173cae5bba26
Revises: e50154680a5c
Create Date: 2024-03-19 15:30:44.425436

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql

# revision identifiers, used by Alembic.
revision = "173cae5bba26"
down_revision = "e50154680a5c"
branch_labels = None
depends_on = None


def upgrade() -> None:
op.create_table(
"key_value_store",
sa.Column("key", sa.String(), nullable=False),
sa.Column("value", postgresql.JSONB(astext_type=sa.Text()), nullable=False),
sa.PrimaryKeyConstraint("key"),
)


def downgrade() -> None:
op.drop_table("key_value_store")
28 changes: 28 additions & 0 deletions backend/alembic/versions/4738e4b3bae1_pg_file_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""PG File Store

Revision ID: 4738e4b3bae1
Revises: e91df4e935ef
Create Date: 2024-03-20 18:53:32.461518

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = "4738e4b3bae1"
down_revision = "e91df4e935ef"
branch_labels = None
depends_on = None


def upgrade() -> None:
op.create_table(
"file_store",
sa.Column("file_name", sa.String(), nullable=False),
sa.Column("lobj_oid", sa.Integer(), nullable=False),
sa.PrimaryKeyConstraint("file_name"),
)


def downgrade() -> None:
op.drop_table("file_store")
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""Remove DocumentSource from Tag

Revision ID: 91fd3b470d1a
Revises: 173cae5bba26
Create Date: 2024-03-21 12:05:23.956734

"""
from alembic import op
import sqlalchemy as sa
from danswer.configs.constants import DocumentSource

# revision identifiers, used by Alembic.
revision = "91fd3b470d1a"
down_revision = "173cae5bba26"
branch_labels = None
depends_on = None


def upgrade() -> None:
op.alter_column(
"tag",
"source",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)


def downgrade() -> None:
op.alter_column(
"tag",
"source",
type_=sa.Enum(DocumentSource, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)
38 changes: 38 additions & 0 deletions backend/alembic/versions/e50154680a5c_no_source_enum.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""No Source Enum

Revision ID: e50154680a5c
Revises: fcd135795f21
Create Date: 2024-03-14 18:06:08.523106

"""
from alembic import op
import sqlalchemy as sa

from danswer.configs.constants import DocumentSource

# revision identifiers, used by Alembic.
revision = "e50154680a5c"
down_revision = "fcd135795f21"
branch_labels = None
depends_on = None


def upgrade() -> None:
op.alter_column(
"search_doc",
"source_type",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)
op.execute("DROP TYPE IF EXISTS documentsource")


def downgrade() -> None:
op.alter_column(
"search_doc",
"source_type",
type_=sa.Enum(DocumentSource, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)
118 changes: 118 additions & 0 deletions backend/alembic/versions/e91df4e935ef_private_personas_documentsets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""Private Personas DocumentSets

Revision ID: e91df4e935ef
Revises: 91fd3b470d1a
Create Date: 2024-03-17 11:47:24.675881

"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = "e91df4e935ef"
down_revision = "91fd3b470d1a"
branch_labels = None
depends_on = None


def upgrade() -> None:
op.create_table(
"document_set__user",
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("document_set_id", "user_id"),
)
op.create_table(
"persona__user",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("persona_id", "user_id"),
)
op.create_table(
"document_set__user_group",
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.Column(
"user_group_id",
sa.Integer(),
nullable=False,
),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("document_set_id", "user_group_id"),
)
op.create_table(
"persona__user_group",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column(
"user_group_id",
sa.Integer(),
nullable=False,
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("persona_id", "user_group_id"),
)

op.add_column(
"document_set",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
# fill in is_public for existing rows
op.execute("UPDATE document_set SET is_public = true WHERE is_public IS NULL")
op.alter_column("document_set", "is_public", nullable=False)

op.add_column(
"persona",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
# fill in is_public for existing rows
op.execute("UPDATE persona SET is_public = true WHERE is_public IS NULL")
op.alter_column("persona", "is_public", nullable=False)


def downgrade() -> None:
op.drop_column("persona", "is_public")

op.drop_column("document_set", "is_public")

op.drop_table("persona__user")
op.drop_table("document_set__user")
op.drop_table("persona__user_group")
op.drop_table("document_set__user_group")
4 changes: 0 additions & 4 deletions backend/danswer/background/celery/celery.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,4 @@ def clean_old_temp_files_task(
"task": "check_for_document_sets_sync_task",
"schedule": timedelta(seconds=5),
},
"clean-old-temp-files": {
"task": "clean_old_temp_files_task",
"schedule": timedelta(minutes=30),
},
}
4 changes: 2 additions & 2 deletions backend/danswer/configs/app_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,8 +224,8 @@
#####
# Miscellaneous
#####
DYNAMIC_CONFIG_STORE = os.environ.get(
"DYNAMIC_CONFIG_STORE", "FileSystemBackedDynamicConfigStore"
DYNAMIC_CONFIG_STORE = (
os.environ.get("DYNAMIC_CONFIG_STORE") or "PostgresBackedDynamicConfigStore"
)
DYNAMIC_CONFIG_DIR_PATH = os.environ.get("DYNAMIC_CONFIG_DIR_PATH", "/home/storage")
JOB_TIMEOUT = 60 * 60 * 6 # 6 hours default
Expand Down
8 changes: 7 additions & 1 deletion backend/danswer/connectors/confluence/rate_limit_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
F = TypeVar("F", bound=Callable[..., Any])


RATE_LIMIT_MESSAGE_LOWERCASE = "Rate limit exceeded".lower()


class ConfluenceRateLimitError(Exception):
pass

Expand All @@ -27,7 +30,10 @@ def wrapped_call(*args: list[Any], **kwargs: Any) -> Any:
try:
return confluence_call(*args, **kwargs)
except HTTPError as e:
if e.response.status_code == 429:
if (
e.response.status_code == 429
or RATE_LIMIT_MESSAGE_LOWERCASE in e.response.text.lower()
):
raise ConfluenceRateLimitError()
raise

Expand Down
Loading