Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for upserting multiple document types #96

Merged
merged 3 commits into from
May 8, 2023
Merged

Conversation

Ashesh3
Copy link
Member

@Ashesh3 Ashesh3 commented May 5, 2023

Fixes #62

This PR adds support for text content in addition to files and URLs to the Document model.

Previously, the API required users to upload a file for every document. Now, users can either upload a file or provide text content for the document. The Document model now has a new field document_type which specifies the type of the Document. It can now be either a file, URL or text.

Additionally, a new field text_content has been added to the model which is used to store the text content of the document when document_type is set to text or the URL of the webpage when document_type is set to url.

The DocumentSerializer has been updated to include the new fields when serializing a Document object. It now also checks if either file or text_content is provided during creation and raises a validation error if neither field is present.

The DocumentViewSet has also been updated to use the document_type field when uploading a new document. It now calls the upsert() function with the appropriate parameters depending on the document_type of the uploaded Document.

A new library beautifulsoup4 has been added to support web scraping and parsing when providing a URL as the document_type. This is required for parsing the HTML of the website.

Finally, the SPECTACULAR_SETTINGS setting has been added to display FileField as a file upload input in the Swagger UI.

Tested with all the document types, file, url and text.

The below cases show the data list which is being upserted to the Pinecone index.

Upserting a pdf file: sample.pdf

00: ' A Simple PDF File '
01: ' This is a small demonstration .pdf file - '
02: ' just for use in the Virtual Mechanics tutorials. More text. And more '
03: ' text. And more text. And more text. And more text. '
04: ' And more text. And more text. And more text. And more text. And more '
05: ' text. And more text. Boring, zzzzz. And more text. And more text. And '
06: ' more text. And more text. And more text. And more text. And more text. '
07: ' And more text. And more text. '
08: ' And more text. And more text. And more text. And more text. And more '
09: ' text. And more text. And more text. Even more. Continued on page 2 ... Simple PDF File 2 '
10: ' ...continued from page 1. Yet more text. And more text. And more text. '
11: ' And more text. And more text. And more text. And more text. And more '
12: ' text. Oh, how boring typing this stuff. But not as boring as watching '
13: ' paint dry. And more text. And more text. And more text. And more text. '
14: ' Boring.  More, a little more text. The end, and just as well. '
len(): 15

Upserting https://react.dev/reference/react/useEffect :

000: 'useEffect – ReactSupport Ukraine 🇺🇦🇺🇦 Help Provide Humanitarian Aid to Ukraine.ReactSearch⌘CtrlKLearnReferenceCommunityBlogreact@18.2.0HooksuseCallbackuseContextuseDebugValueuseDeferredValueuseEffectuseIduseImperativeHandleuseInsertionEffectuseLayoutEffectuseMemouseReduceruseRefuseStateuseSyncExternalStoreuseTransitionComponents<Fragment> (<>)<Profiler><StrictMode><Suspense>APIscreateContextforwardReflazymemostartTransitionreact-dom@18.2.0ComponentsCommon (e.g. <div>)<input><option><progress><select><textarea>APIscreatePortalflushSyncfindDOMNodehydraterenderunmountComponentAtNodeClient APIscreateRoothydrateRootServer APIsrenderToNodeStreamrenderToPipeableStreamrenderToReadableStreamrenderToStaticMarkuprenderToStaticNodeStreamrenderToStringLegacy APIsLegacy React APIsChildrencloneElementComponentcreateElementcreateFactorycreateRefisValidElementPureComponentIs this page useful?API ReferenceHooksuseEffectuseEffect is a React Hook that lets you synchronize a component with an external system.useEffect(setup, dependencies?)'
001: 'Reference useEffect(setup, dependencies?) Usage Connecting to an external system Wrapping Effects in custom Hooks Controlling a non-React widget Fetching data with Effects Specifying reactive dependencies Updating state based on previous state from an Effect Removing unnecessary object dependencies Removing unnecessary function dependencies Reading the latest props and state from an Effect Displaying different content on the server and the client Troubleshooting My Effect runs twice when the component mounts My Effect runs after every re-render My Effect keeps re-running in an infinite cycle My cleanup logic runs even though my component didn’t unmount My Effect does something visual, and I see a flicker before it runs '
002: ''
003: 'Reference '
004: 'useEffect(setup, dependencies?) '
005: 'Call useEffect at the top level of your component to declare an Effect:'
006: "import { useEffect } from 'react';import { createConnection } from './chat.js';function ChatRoom({ roomId }) {  const [serverUrl, setServerUrl] = useState('https://localhost:1234');  useEffect(() => {    const connection = createConnection(serverUrl, roomId);    connection.connect();    return () => {      connection.disconnect();    };  }, [serverUrl, roomId]);  // ...}"
007: 'See more examples below.'
008: 'Parameters '
009: ''
010: ''
...
044: 'Usage '
045: 'Connecting to an external system '
046: 'Some components need to stay connected to the network, some browser API, or a third-party library, while they are displayed on the page. These systems aren’t controlled by React, so they are called external.'
047: 'To connect your component to some external system, call useEffect at the top level of your component:'
048: "import { useEffect } from 'react';import { createConnection } from './chat.js';function ChatRoom({ roomId }) {  const [serverUrl, setServerUrl] = useState('https://localhost:1234');  useEffect(() => {  \tconst connection = createConnection(serverUrl, roomId);    connection.connect();  \treturn () => {      connection.disconnect();  \t};  }, [serverUrl, roomId]);  // ...}"
049: 'You need to pass two arguments to useEffect:'
050: ''
051: 'A setup function with setup code that connects to that system.'
052: ''
053: 'It should return a cleanup function with cleanup code that disconnects from that system.'
054: ''
...
more: [100:583]
len(): 583

Upserting raw text:

There was a programmer named Max who had to manage a database.
One day, he needed to add new records to the database but realized that some of the records already existed.
He needed to update the existing records and insert the new ones. This is called upserting data.
Max used a special command to accomplish this and was able to complete his task quickly and efficiently.

0: 'There was a programmer named Max who had to manage a database.'
1: 'One day, he needed to add new records to the database but realized that some of the records already existed.'
2: 'He needed to update the existing records and insert the new ones. This is called upserting data.'
3: 'Max used a special command to accomplish this and was able to complete his task quickly and efficiently.'
len(): 4

@Ashesh3 Ashesh3 requested a review from shivankacker May 5, 2023 19:52
@shivankacker shivankacker merged commit f785473 into master May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to upsert web links and direct text (through endpoint)
2 participants