Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - Index creation failed #4

Closed
toi500 opened this issue Mar 5, 2024 · 22 comments
Closed

BUG - Index creation failed #4

toi500 opened this issue Mar 5, 2024 · 22 comments

Comments

@toi500
Copy link

toi500 commented Mar 5, 2024

I am getting this error:

2024-03-05T23:11:23.573Z Index creation failed: HTTPSConnectionPool(host='controller.us-west-2.pinecone.io', port=443): Max retries exceeded with url: /databases (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7efcb138d710>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Can u please help to fix it?

Screenshot 2024-03-06 002437
@HonzaTuron
Copy link
Owner

Hello, would you try again? Seems there was some Pinecone outage: https://status.pinecone.io/

@toi500

This comment was marked as outdated.

@toi500
Copy link
Author

toi500 commented Mar 6, 2024

OK, I figured out what the issue is.

This integration is not ready for Pinacone Serveless indexes, which is the whole point to use Pinacone now as it is 50x cheaper that a pod.

It only works with PODs.

Can you please update the integration at your best convenient?

Screenshot 2024-03-06 175116

@HonzaTuron
Copy link
Owner

Thank you, will take a look during this week to support serverless indexes as well.

@HonzaTuron
Copy link
Owner

@toi500 Hello, just released a new version of an integration. Feel free to check it and let me know if you find something inconvenient.

@toi500
Copy link
Author

toi500 commented Mar 21, 2024

@HonzaTuron

I have been checked this for 2 hours now and it looks like that there is some sort of error when the actor tries to load data from the dataset.

Screenshot 2024-03-21 152911

I have tried to test different fields but i got the same result. I also checked that the dataset had the fields that I set in the integration.

Screenshot 2024-03-21 152634

@HonzaTuron
Copy link
Owner

@toi500 Just released new version, you can try again.

@toi500
Copy link
Author

toi500 commented Mar 21, 2024

@HonzaTuron

It looks like there is a problem with capturing the credentials from OpenAI now. Also, according with the log there is only loading data from the "url" field even when I set up "url" and "text"

Screenshot 2024-03-21 155748

I have checked several times and the api key from OpenAI is correct

Screenshot 2024-03-21 155925

@HonzaTuron
Copy link
Owner

Yeah it seems OpenAI token wasn't passed correctly in integration side with new release. You can try again as I built new version.

@toi500
Copy link
Author

toi500 commented Mar 21, 2024

it is working now BUT i am facing two unexpected problems:

  1. The actor is running for ever (still running)
Screenshot 2024-03-21 165502

and

  1. The data did not upsert correctly. The url field is missing:
Screenshot 2024-03-21 165625

I had to abort it.

Screenshot 2024-03-21 170758

@HonzaTuron
Copy link
Owner

Issue with Actor exiting addressed. Regarding your 2nd issue, try to add url field to metadata_fields instead.

@toi500
Copy link
Author

toi500 commented Mar 22, 2024

Yes, I got 1 successful run under that strict set up, using TEXT as field and URL as metadata.

--

I have also noted that the TEXT field has to be hard coded somewhere since the Actor is upsetting the data correctly but with the wrong tag.

Here, I put URL as Field and as you can see it is showing the correct url value but under the TEXT tag.

Screenshot 2024-03-21 212457

--

The entries "Field" and "Metadata" in the Actor are confusing since Pinecone uses what they call key:value pairs so whatever you upsert in there it is going to be a "metadata" for them.

The best approach would be to rename the FIELD entry with METADATA as it is the nomenclature that Pinecone is using for their key:value pairs.

  1. If empty, the Actor upsert all the data from the dataset as metadata.

  2. The end user can filter it down by inputting the values they want.

@toi500
Copy link
Author

toi500 commented Mar 22, 2024

@HonzaTuron

I have been checking the code (not my expertise) but I think that there is an issue in this loop that causes that only 1 Field is loaded:

Screenshot 2024-03-22 171816

The loop creates a separate ApifyDatasetLoader for each field. Inside the loop, the datasetMappingFunction constructs a Document object. But, the metadata section only processes one field from metadata_fields using getNestedValue.

So, maybe something like this?

Screenshot 2024-03-22 172931

@HonzaTuron
Copy link
Owner

@toi500 Thanks for your feedback. Will take a look tomorrow :)

@HonzaTuron
Copy link
Owner

@toi500 So I guess you can setup one field to be passed in fields and rest of the fields as metadata_fields then? Wouldn't it do the trick?

@toi500
Copy link
Author

toi500 commented Mar 25, 2024

Well, not really since you only can pass TEXT as field. Whatever other field you put in there is going to be hard tagged as TEXT as I explained you before.

Screenshot 2024-03-21 212457 ..

If u dont want to fix the loop, the best solution would be to make Fields optional or remove it for good and let us to set up what metadata we want to upsert to Pinecone.

That would work.

@HonzaTuron
Copy link
Owner

Okay, just released new version with optional fields. Not sure if framework I'm working with supports empty text though. Please let me know once you try :)

Also, do I undestand correctly 'text' field is messing up your data?

@toi500
Copy link
Author

toi500 commented Mar 25, 2024

@HonzaTuron

it is spiting this error with the last version:

Screenshot 2024-03-25 193021

About the text field, your 0.0.57 version upserts only 1 field from the dataset (even if you enter 10) and for some reason it tagged as text even if you put in there url.

But I am pretty sure that this text key will be another one if the testing datasets were scrapped from another Actor. We are using the Website Content Crawler one to text Apify. We just need a production ready tool to upsert data to Pinecone to feed LLM apps for our company.

@HonzaTuron
Copy link
Owner

HonzaTuron commented Mar 25, 2024

I see, I'm not that advanced Pinecone user, your feedback is very valuable. Just released new version which should fix errors you sent. Also tried to hide default 'text' field.

@toi500
Copy link
Author

toi500 commented Mar 26, 2024

@HonzaTuron
Dont worry my friend, I am happy to help. In fact, I am mastering Apify and Pinecone with all those testing :)

Please, bear with me 1 min:

The last version 0.059 does not work. It gives you a success run but no data is upserted to Pinacone. I have tried all the possibles set ups (using fields, metadata, both) and it does not work.

Screenshot 2024-03-26 020849

I propose you the following:

  1. Lets give a last try and see if we can fix the last version 0.0.59 and know why is not upserting data. And, if it does not work:

  2. Lets roll back to the version 0.0.57

I have been testing this thing A LOT in order to know what the hell is going on and it turned out that the "hard tagging" of the "text" field is something that it is coming from Pinecone itself. So, or it is mandatory or it is bugged.

So the version 0.0.57 it is the best we can accomplish here right now. You can even make the text field mandatory and explain in the integration that any other data needs to be done via metadata. I know the trick of the version 0.0.57 but you know, it would be good also for other users.

@HonzaTuron
Copy link
Owner

HonzaTuron commented Mar 26, 2024

@toi500 In that case I'm out of ideas and releasing 0.0.60 as copy of 0.0.57 as text is mandatory field. Closing this one, feel free to open new issue if you come up with some bug or idea :) Also you can reach me out on Apify Discord.

@toi500
Copy link
Author

toi500 commented Mar 26, 2024

Thank you for your help. I really appreciate your time here.

I let 2 captures with the correct set up in the integration and the final result:

  1. Set up on the Pinacone Integration Actor at Apify
    --
    Screenshot 2024-03-26 130640

  2. Final result on Pinecone
    --
    Screenshot 2024-03-26 131751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants